Revealing Higher-Order Interactions in High-Dimensional Complex Systems: A Data-Driven Approach

Natural and manmade complex systems are comprised of different elementary units, being either system components or diverse subsystems as in the case of networked systems. These units interact with each other in a possibly nonlinear way, which results in a complex dynamics that is generally dissipative and nonstationary. One of the challenges in the modeling of such systems is the identification of not only pairwise but, more importantly, higher-order interactions, together with their directions and strengths from measured multivariate time series. Here, we propose a novel data-driven approach for characterizing interactions of different orders. Our approach is based on solving a set of linear equations constructed from Kramers-Moyal coefficients derived from statistical moments of N -dimensional multivariate time series. We demonstrate the substantial potential for applications by a data-driven reconstruction of interactions in various multidimensional and networked dynamical systems.


I. INTRODUCTION
Complex systems are characterized by nonlinear interactions between their numerous components as well as by their ability to form different coherent dynamical behaviors and transitions between them.In addition, any natural or manmade system is subject to noise, which is inevitable and results from interactions with the environment.An appropriate description of the temporal behavior of complex systems is given by a multidimensional stochastic dynamics, which contains deterministic processes together with the impact of fluctuations.All the complex systems considered in this work share the property that they can be considered as interaction networks, such as, e.g., food webs [1,2] or mutualistic plantpollinator networks in ecology [3], opinion formation processes in social science [4], spin glasses in physics [5], interactions between brain regions in neuroscience [6], spreading of diseases in epidemiology [7], or metabolic networks in biochemistry [8].
To explain the concepts, we use the example of a food web [9].Many complex networks, such as the ones mentioned above, comprise a large number of components, e.g., species in an ecosystem that are interacting in a complex manner: predator-prey interactions or competition in food webs or mutualistic interactions in plant-pollinator networks [10].The species are the nodes of the network, while interactions, which are, in general, nonlinear, make up the links to form the topology of the network.In the simplest case, interactions can be pairwise, such as, e.g., in a food web, where the direction of an interaction is given by the fact "who eats whom" and the strength of an interaction is given by the consumption rate.
In many systems, there are also higher-order interactions, like the chemical reactions that form the links in a metabolic network.These interactions can involve several components, such as different chemical substances (nodes) interacting in a single chemical reaction.Such complex interaction networks subject to noise are, in general, formulated as sets of stochastic differential equations describing the dynamics in terms of nonlinear functions Fðx; tÞ and Gðx; tÞ of the N -dimensional state vector xðtÞ [11][12][13][14][15][16] containing all the components: In theoretical modeling of complex systems with stochastic dynamical equation (1), two cases can be distinguished: Case 1: The function Fðx; tÞ usually consists of different terms that are based on first principles in physics or on appropriate assumptions about the functional forms of the physical, chemical, and/or biological processes involved.This function stands for the deterministic part of the dynamics, while Gðx; tÞ takes into account additive or multiplicative stochastic fluctuations.In addition, η j ðtÞ, j ∈ f1; …; N g represent independent Gaussian white noise with unit intensity, i.e., hη k ðt 1 Þη j ðt 2 Þi ¼ h kj δðt 2 − t 1 Þ, where h kj ¼ δ k;j .See below for the case of correlated noise.Interactions between the components x i and x j of the complex system can occur in the deterministic part Fðx; tÞ and/ or in the stochastic part Gðx; tÞ of the dynamics.
In general, the functions Fðx; tÞ depend on control parameters, which, in a specific application, can be estimated using an optimization procedure based on observational data [14][15][16][17].Other approaches to identify the function Fðx; tÞ are based on combining different nonlinear terms (with an appropriate choice of basis functions) and finding an appropriate formulation with the help of Bayesian inference or maximum likelihood methods [18][19][20][21].Case 2: For many phenomena in nature, it is not clear how an appropriate mathematical formulation of the interactions should look, but there is plenty of observational data available to reconstruct the dynamics and validate a data-driven model.The big challenge is how to estimate the functions Fðx; tÞ and Gðx; tÞ to reveal the strength, direction, and functional form of interactions between the different components from measured multivariate time series.So far, methods like Langevin modeling with an estimation of Kramers-Moyal (KM) coefficients have been employed to construct a dynamical equation for various one-and two-dimensional time series [14][15][16].However, complex systems in nature are, in general, composed of many, say N , interacting components, requiring a multivariate reconstruction to reveal not only pairwise but also higher-order interactions.
In more general terms, the components of a complex network can be understood as subsystems that are represented by a suitable observable that can be measured and that acts like a state variable x i ðtÞ.
Thus, the model equations need not be known.Examples of those kinds of networks are power grids [22] and neuronal networks [23], among others.
Different linear and nonlinear methods have been established that allow a data-driven assessment of interaction properties (e.g., strength, direction, and functional form) and inference of networks from empirical data ; see also Refs.[46][47][48][49][50] for recent developments in the field.Most of these approaches are based on pairwise interactions only and, moreover, require a prescribed threshold to decide whether the existence of an interaction (including detection of direction and characterization of interaction weight in weighted networks) is established or not.Our goal is to work out a data-driven approach that combines the knowledge of designing Fðx; tÞ and Gðx; tÞ upon observational data using an expansion of Kramers-Moyal coefficients with ideas to identify pairwise and, more importantly, higher-order interactions.In fact, our plan is twofold.
Recipe (1): We assume that the mathematical functions Fðx; tÞ and Gðx; tÞ, i.e., the functional form of interactions, are known from basic principles.We estimate their parameters within nonpolynomial or polynomial functions to determine the strength of pairwise and higher-order interactions from time series corresponding to Case 1.
Recipe (2): We assume that we do not have any knowledge about the functions Fðx; tÞ and Gðx; tÞ, and we reconstruct the pairwise and higher-order interactions from time series corresponding to Case 2. In this latter case, we have to estimate not only the strength of an interaction but also its direction.
To demonstrate our approach and to showcase its versatility for field applications, we investigate time series that we derive from simulations of various high-dimensional model systems with preset higher-order interactions.Model systems are from different disciplines of science and comprise, among others, a population growth model (ecology), a cancer growth model (medicine), and a phase oscillator network model (physics/engineering).We contaminate these time series with different types of noise to mimic observational errors.

II. PAIRWISE AND HIGHER-ORDER INTERACTIONS
The starting point of our work is the fact that pairwise interactions between different components of an N -dimensional complex system can be described by a linear set of differential equations: where xðtÞ ∈ R N and A is an N × N matrix.The state variable xðtÞ can be the amount of traffic that passes through a node on a communication network, voltage changes in measured time series of brain dynamics, particle position, concentrations of reactants, protein expression levels, and more.Each element of the interaction matrix A i←j ≡ A ij measures the effect of a slight increase of the value of the state variable x j on the state variable x i .Equation (2) can also be interpreted in the network context.In this case, the state variable x i ðtÞ would represent the dynamics of node i in the network, and the matrix A ij would then correspond to a matrix reflecting the strengths of pairwise interactions between nodes.In fact, it would be a product of an adjacency matrix identifying the topology of the network with entries "0" and "1" and a weight matrix specifying the strengths of the interactions.
Apart from the linear interaction terms in Eq. ( 2), with strengths of pairwise interaction A ij , most natural and manmade complex systems contain nonlinearities, as represented in the general formulation of Eq. (1).These nonlinearities can be expressed by higher-order interaction terms such as C ijk x j x k and E ijkl x j x k x l , etc., with strengths C ijk and E ijkl .Now, the question arises as to whether it is possible to construct these higher-order interactions and determine the strengths of interactions A ij , C ijk , and E ijkl , etc., directly from observations of the dynamics of a complex system.Here, we propose a data-driven approach to detect and quantify pairwise and, in particular, higher-order interactions.Our approach is based on disentangling the deterministic and stochastic parts of multivariate time series of measured state variables x i ðtÞ or of suitable observables that characterize each subsystem (node) of a complex network.To do so, we generalize our recently proposed approach to estimate Kramers-Moyal coefficients for onedimensional systems [51] and show that interactionsincluding pairwise, three-way, and higher-order-in the dynamical state variables of a high-dimensional complex system can be formulated in terms of unconditional, smalltime-lag correlation functions and statistical moments of multivariate time series of dimension N .Solving a set of linear (nonlinear) equations (Appendixes A, B, C, and F) constructed from Kramers-Moyal coefficients derived from the aforementioned functions and moments allows us to characterize these interactions.

III. METHOD OF CHARACTERIZING INTERACTIONS
The ultimate goal of the analysis of N -dimensional time series-measured from N interacting components of a complex system-is to extract the underlying dynamical equations from these time series in the form of a system of stochastic differential equations [11][12][13][14][15][16].Constructing nonlinear stochastic dynamical equations from measured multivariate time series enables us to reveal interactions of different orders in the deterministic and stochastic parts of the dynamics, including their strength and direction.In the following subsections, we estimate two sets of quantities from a given multivariate time series.The first set comprises the strengths of higher-order (≥2) interactions for the deterministic part FðxÞ, and the second set includes the strengths of higher-order interactions in the stochastic component G of Eq. (1).These sets also facilitate the reconstruction of the dynamics of the multivariate time series based on the estimated strengths of interactions.

A. Characterization of interactions from the deterministic part of the dynamics
Let us start with the more challenging task of Recipe (2), where we do not know the functions Fðx; tÞ but have time series based on which we reconstruct the deterministic part of the dynamics.When comparing Eqs.(1) and (2), it becomes evident that F contains only a linear term represented by AxðtÞ.In order to include a possible nonlinear property of F in Eq. (1), we write down FðxÞ up to the third order (a generalization to arbitrary orders is straightforward; cf.Appendixes B and C) and find Here, x is any point in the N -dimensional state space, and C ijk ≡ C i←ðjkÞ (and E ijkl ≡ E i←ðjklÞ ) refers to the combined effect of state variables x j and x k or observables measured in subsystems j and k on state x i , and so on.We omitted the t dependence in Eq. ( 3) to enhance readability.The constant drift for each state variable is denoted by α i .The matrix A and tensors C and E represent the strengths of pairwise, three-way, and four-way interactions in the deterministic part of the dynamics F, respectively, and are real-valued matrices or tensors.The function Fðx; tÞ is equal to the conditional moment D ð1Þ ðx; tÞ known as the first KM coefficient or N -dimensional drift vector [52] (Appendix A).Its components are given by (using the Itô interpretation of the underlying stochastic dynamical equation) [14][15][16]53] where i ¼ 1; …; N , and the bracket h…ij :: .represents conditional averaging.In the case of stationary processes, these averages correspond to time averages.For nonstationary processes, one has to consider ensemble averages for a given t in Eq. ( 4) when accounting for an explicit time dependence.It is important to note that the averaged values are dependent on the state variable x.The expression (4) demonstrates that the drift vector can be determined from time series using the conditional probability distributions pðx 0 ; t þ τjx; tÞ in the limit of a small time lag τ.
In the following, we present a general approach for obtaining the elements of matrix A in Eq. (2) by employing statistical moments and short-term (small-lag) correlation functions from a time series.Consider the N -dimensional zero-mean linear time series xðtÞ satisfying where the drift vector D ð1Þ i ðx; tÞ is given by Eq. (4).By defining ϕ i;j ðτÞ ≡ ϕ i;j ¼ τA i;j for i; j ¼ 1; …; N , and using Eq. ( 5) and the definition of a drift vector [Eq.( 4)], we have We omitted the t dependence on the right-hand side of Eq. ( 6) but kept the t and τ dependence on the left-hand side to enhance readability.The conditional average in relation (6) is defined as Here, we used Bayes' theorem and will subsequently utilize the second and third rows of Eq. ( 7).Multiplying both sides of Eq. ( 7) by x j pðx 1 ; …; x N Þdx 1 Á Á Á dx N , where pðx 1 ; …; x N Þ is the N -point joint probability distribution function, and integrating over all state variables x i results in We note that the averaging in all terms in Eq. ( 8) does not depend on any conditions regarding xðtÞ ¼ x ¼ ðx 1 ; x 2 ; …; x N Þ.Therefore, the coefficients ϕ i;j can be derived from Using the definition y i ¼ x i ðt þ τÞ − x i ðtÞ, the problem of calculating ϕ i;j ðτÞ and henceforth A i;j ¼ lim τ→0 ϕ i;j ðτÞ=τ is reduced to calculating short-term correlation functions of hx i ðt þ τÞx j ðtÞi and statistical moments hx i ðtÞx j ðtÞi.In the end, it is necessary to approach the limit as τ → 0 in order to estimate A i;j (for more information, see Appendix B).
Building upon the Kramers-Moyal coefficients described earlier, we generalize the aforementioned approach to determine the constant drift terms α i and the strength of interaction matrices or tensors A, C, and E in Eq. ( 3) from multivariate time series as outlined in Appendixes B and C. It is worth noting that higher-order interactions represented by C and E within the deterministic component of the dynamics F can be regarded as measures to test for non-Gaussianity of empirical multivariate time series.
An expression such as Eq. ( 3) can be interpreted from two different points of view: first, to consider deviations from a steady state at the origin (for α i ¼ 0), and second, for networked dynamical subsystems whose interactions can be expressed by polynomial functions, as we demonstrate with examples in the next section.For both these cases, we are able to estimate the pairwise and higher-order interactions from the time series.If we keep the expansion (3) up to the order Z, the total number of independent coefficients (entries in the matrices) would be n ¼ ½N ðN þ ZÞ!=N !Z!.The required statistical moments up to the order p to find the pairwise, three-way, etc., strengths of interactions from the time series are p ≤ 2Z, where Z ¼ 3 for expansion (3) and p ¼ 1; …; 6 (see Appendix C).
Next, we briefly discuss the simpler case of Recipe (1) for which we already consider stochastic equations that are described by Eq. (1).In addition, we assume that the deterministic component of the evolution of state vector xðtÞ is given by the flow FðxÞ; i.e., the interactions are known in terms of their functional form but not their strength.If the type of interaction is polynomial, we can immediately apply the formalism above to determine the strength of interactions corresponding to the unknown parameters in the interaction terms.In addition, our method can be generalized to nonpolynomial expressions in FðxÞ, which makes it even more versatile.It allows one to estimate, from time series, strengths of interactions and control parameters in any given functional form FðxÞ resulting from a modeling process that could generate many applications to problems in physics, where first principles already determine the deterministic form of the equations.

B. Characterization of interactions from the stochastic part of the dynamics
In a manner similar to the estimation of constant drift terms α i and the strength of interaction matrix A and tensors C and E in the deterministic component of the dynamics F from multivariate time series [see Eq. ( 3)], here, we characterize strengths of interactions in the stochastic component G of Eq. (1) [14,54,55].In Eq. (1), the elements g ij ðxÞ of the multiplicative tensor GðxÞ are linked to the diffusion matrix (the second-order Kramers-Moyal coefficients of the N -dimensional time series) as follows: D Similar to the expansion in Eq. ( 3), we can likewise expand the diffusion matrix.For a general N -dimensional time series, expanding each component of D ð2Þ ij ðxÞ (comprising N × N components) up to, for example, the third order, yields strength of interaction tensors P, Q, R, and S, each multiplied by the noise η j ðtÞ.These interaction tensors can also be expressed in terms of unconditional, small-time-lag correlation functions and statistical moments of a multivariate time series of dimension N (see Appendix D for further details).We note that adding stochasticity in Eq. (3), such as multiplicative Wiener noise, will not change the form of the drift vector in the Itô interpretation of driven dynamical equations (1) [15].The explicit form of the matrix elements g il ðxÞ obtained from the diffusion coefficients D ð2Þ ij ðxÞ is given in Appendix E, also for the case of correlated noise.
For the correlated white noise in Eq. (1), with covariance matrix h, the diffusion coefficients are given by D ð2Þ ðxÞ ¼ GhG T , and the drift term will not be affected (see Appendix E).

C. Interim summary
Our method to characterize higher-order interactions encompasses two distinct sets of quantities that we estimate from a given multivariate time series.The first set comprises constant drift terms α, the interaction matrix A, and tensors C and E for the deterministic part FðxÞ of Eq. (1), which are derived from the drift vector D ð1Þ i ðxÞ.The second set comprises tensors P, Q, R, and S for the stochastic part GðxÞ of Eq. (1), which are derived from the diffusion matrix D ð2Þ ij ðxÞ.These matrices serve different purposes: The linear term A and the constant tensors P and Q describe the linear characteristics of N -dimensional time series, while tensors C; E; … and R; S; … account for higher-order interactions in the deterministic and stochastic parts of the dynamics (see Appendix E for further details).Nonvanishing Q, R, S signify the multiplicative nature of the multivariate time series under study.Importantly, these sets enable us to reconstruct the dynamics of the multivariate time series using the estimated strengths of interactions.
The computational complexity of our method to estimate pairwise and higher-order interactions has, at most, Oðn 3 Þ, which originates from the complexity of solving an n-by-n system of linear equations to calculate α, A; C; E; Á Á Á and P; Q; R; S; Á Á Á.
We now demonstrate our approach with two types of interacting systems: For a proof of principle, we consider (1) complex systems with one to nine interacting components possessing different nonlinear interactions, and (2) a networked dynamical system that possesses nonlinear higher-order interactions.

IV. INVESTIGATING THE METHOD'S SUITABILITY
We used multidimensional nonlinear dynamical systems (Table I), for which the strengths of interactions between state variables are known in the form of preset control parameters [cf.Recipe (1)] in the respective governing equations (the phase portraits of these and other exemplary systems are presented in Appendix H).We generated synthetic time series of length N by numerically integrating these equations after adding Gaussian-independent white noise η (time derivative of the Wiener process with intensity σ).To do so, we used an adaptive, embedded, Rößler-type, stochastic, Runge-Kutta integration method [56][57][58].The integration time step dt was chosen as a trade-off between the stability of numerical integration and computational costs.Including the aforementioned noise enabled us to estimate the deterministic and stochastic components of the dynamics using only one noise realization.However, this required a noisy trajectory long enough to assume stationarity and ergodicity (large number of data points N) instead of having many realizations starting from different positions in state space.If the trajectories obtained from simulations were only short, we performed ensemble averaging.
We assessed the accuracy of our method at various stages of the estimation procedure with statistical tests (cf.Appendix G).First, we evaluated the matching between the true drift functions computed for each component F i (xðtÞ) and the estimated drift functions e F i (xðtÞ) based on the simulated xðtÞ (we report the coefficient of determination R 2 ; for a perfect match, R 2 ¼ 1).Next, we considered the entire dynamics, which arises from both the deterministic drift F(xðtÞ) and the stochastic part G(xðtÞ) in the dynamical equation (1).We evaluated the equality of the CDFs of time series generated from the true functions and parameters with the CDFs of time series generated using the estimated functions and parameters (two-sample KS test; we report the KS statistic D KS and the corresponding significance level p KS ).Eventually, we estimated and reported the differences Δ between estimated strengths of interactions and preset values.We performed these analyses for different integration times T ¼ Ndt and averaged over 50 realizations of applied noise.

A. Dynamical equations 1. Example A1
A two-dimensional dynamical system for which FðxÞ is a polynomial of third order (four-way interactions).The dynamical equations read Time series of x 1 and x 2 exhibit transitions between two stable states [Fig.1(a)].This example can be considered as a test case for which the estimated coefficients can be interpreted as strengths of interactions in the expression of FðxÞ in Eq. (3) [Recipe (1)].The results of characterizing higher-order interactions from simulated time series are presented in Fig. 1.

Example A2
A host-immune-tumor model, also known as the dynamical model of cancer growth, is given by [59] This model is used to demonstrate a test case for a nonpolynomial function FðxÞ.More details are given in Appendix H; here, we note that all control parameters in the equations are positive.Exemplary time series of x 1 , x 2 , and x 3 and results of characterizing higher-order interaction are presented in Fig. 2.

Example A3
The Malthus-Verhulst model describes population growth in ecology, where the growth rate is proportional to the population size and limited by a carrying capacity.TABLE I. Dynamical equations (DE) and networks dynamical systems (NDS) used to investigate the suitability of our proposed method to characterize higher-order interactions.Results for dynamical equations A1, A2, A3, and A7 as well as for the networks dynamical systems are reported in the main text; for an in-depth discussion of these and the other models, see Appendixes H and I. Here, N denotes the dimensionality of the system (for NDS, N equals the network size), and T 95 denotes the integration time T ¼ Ndt for which the coefficient of determination is R 2 ≳ 0.95 between estimated and actual drift functions.Introducing randomness into the capacity parameter gives rise to multiplicative stochastic dynamical equations, known as the noisy Malthus-Verhulst model [60].By incorporating stochasticity, the model provides a more realistic representation of ecological systems, accounting for the uncertainties and variability observed in real-world populations [61].The noisy Malthus-Verhulst model exhibits a noise-induced transition, in which the noise takes on an "active" (multiplicative) role.The dynamics is given by x 2 , and g 12 ¼ g 21 ¼ 0, with ffiffiffi σ p ¼ 0.2) for different integration times T. We find these differences to be smaller for higher noise intensities (see Fig. 11 in Appendix H), which results from the fact that the system explores a larger part of the phase space and, therefore, the noisy time series carry considerably more information about the dynamics.Thus, a larger noise intensity might even be advantageous for our method and can potentially yield better results.(e) Matching (R 2 -score) between true F(xðtÞ) and estimated drift functions F(xðtÞ) based on the simulated x i ðtÞ for different integration times T. Error bars represent the standard errors of the means derived from 50 different realizations of applied noise.In panels (c)-(e), lines are only to guide the eye.
where ν and Γ are real constants and η is white noise.More details are given in Appendix H, and exemplary time series of the dynamics before and after the noise-induced transition at Γ ¼ 1 are shown in Fig. 3(a).

Example A7
A nine-dimensional polynomial Lorenz-type model was proposed in Ref. [62] to study high-dimensional chaos.
The dynamical equations as well as the full statistical analysis to present the accuracy of our method, even in a highdimensional case, are shown in Appendixes H and J.We estimated 9 × 55 different coefficients, and the results are depicted in Fig. 4.
The findings achieved so far already underline the high suitability of our data-driven method to characterize the strength of higher-order interactions in nonlinear dynamical systems and in the presence of both additive and multiplicative noise.In Appendix H, we highlight the generality of our findings with a variety of other dynamical systems FIG. 2. Same as Fig. 1 but for Example A2 [Eq.( 13)].Initial conditions were chosen randomly from the interval (0, 1) for every dynamical variable.The state variables x 1 ðtÞ, x 2 ðtÞ, and x 3 ðtÞ in Eq. ( 13) have positive values [see panel (a)].To ensure positivity of time series, we included multiplicative noise terms x i ðtÞ ffiffiffi σ p η i in Eq. ( 13).The integration time step was dt ¼ 0.01 and ffiffiffi σ p was fixed at 0.001.The KS test indicated no differences between true and estimated CDFs (T ¼ 4000; for x 1 ðtÞ, D KS ¼ 0.02; for x 2 ðtÞ, D KS ¼ 0.02; and for x 1 ðtÞ, D KS ¼ 0.03; p KS < 10 −4 for all cases).After estimating the diffusion coefficient D ð2Þ ij ðxÞ, σ can be found using the method presented in Appendix E. Details for solving this nonpolynomial example and estimating the unknown parameters are given in Appendix H.
(cf.Table I), each of them containing a different challenge.Our findings, however, also indicate that the integration time T constrains the method's accuracy.In general, the larger the sample size, the higher the accuracy; however, the choice of sample size depends on the system under investigation, which would need to be taken into account when investigating systems for which one does not have access to the ground truth.
As regards analyses of empirical data, another important aspect is that the order Z of the expansion (3) is a priori not known.Since an inappropriate choice of Z may deteriorate the method's accuracy, in Appendix J, we discuss a procedure to estimate the highest order for a given integration time T [51].In this context, we also assess the quality of the estimation of higher-order statistical moments.Eventually, in Appendix K, we investigate how observational noise impacts the estimation of the strength of higher-order interactions.

B. Networked dynamical system
We now apply our approach to a networked dynamical system and reconstruct pairwise and higher-order interactions from time series [Recipe (1)].The system consists of Kuramoto-Sakaguchi phase oscillators [63][64][65] coupled onto networks of different sizes and with different preset pairwise and higher-order interactions.The phase dynamics at node (oscillator) i reads Here, θ i denotes the phase of oscillator i, ω i is its natural frequency [to be drawn from some distribution pðωÞ; for instance, the uniform distribution in an interval], and K 1 , K 2 , FIG. 3. Same as Fig. 1 but for Example A3 [Eq.( 14)] simulated with time step dt ¼ 0.01.Initial conditions were chosen randomly from the interval (0, 1) for every dynamical variable.Excerpts of exemplary time series before (ν ¼ 1, Γ ¼ 0.6; blue) and after (ν ¼ 1, Γ ¼ 1.1; light blue) the noise-induced transition are shown in panel (a).For this one-dimensional example, the applied noise is multiplicative.After estimating the diffusion coefficient D ð2Þ ðxÞ, the value of Γ can be found from the relation and K 3 are the coupling strengths of the 1-, 2-, and 3-simplex interactions, respectively.The mutually independent Gaussian white noise η i has intensity σ.The notation in Eqs.(15) follows Refs.[63,[66][67][68]; we note that, in our approach, the strengths of the interactions are the product of the adjacency matrix a and the tensors c and e with the coupling strengths K 1 , K 2 , and K 3 .
Let us consider an undirected and unweighted network, which is encoded in the 1-simplex adjacency matrix a, the 2-simplex adjacency tensor c, and the 3-simplex adjacency tensor e, where a ij ¼ 1 if nodes i and j are connected by a link (and otherwise a ij ¼ 0), c ijk ¼ 1 if nodes i, j, and k belong to a common 2-simplex (and otherwise c ijk ¼ 0), and e ijkl ¼ 1 if nodes i, j, k, and l belong to a common 3-simplex (and otherwise e ijkl ¼ 0).We represent all nonzero elements of the adjacency matrix and the respective tensors corresponding to node i in a compact form as i: ð½a ij ; c ijk ; e ijkl Þ or in a simplified form as i: ð½j; jk; jklÞ.
For oscillator networks of size N ∈ f3; 4; 6g [see upper panel of Fig. 5 for their 1-simplex (pairwise) structures], we proceed as above and integrate Eqs.(15) with time step dt ¼ 0.005 after adding Gaussian-independent white noise η with intensity σ ¼ 0.25 to each dynamical equation.We generated 50 ensembles of time series of θ i , each consisting of N ¼ 2 × 10 6 data points (T ¼ 10000), for different network configurations presented in the following.
For this network configuration, we check to what extent our reconstruction method generates spurious links for the case in which there is no direct connection between a pair of nodes.To this end, we disconnect nodes 2 and 3 (a 23 ¼ 0), integrate the dynamical equations (15) with control parameters and coupling strengths as before, and obtain K 1 a 23 ¼ −0.00010 AE 0.00007.This finding points to an estimation error of the order of 10 −4 for a link with a ij ¼ 0. These findings emphasize the accuracy of our reconstruction method also in the case of a networked dynamical system.

Configuration 2
We next consider a fully connected, undirected oscillator network with N ¼ 4 nodes that features pairwise, threeway, and four-way interactions (see Fig. 5 for nonzero elements of interaction).We fix the coupling strengths (K 1 ¼ −0.100,K 2 ¼ 0.060, K 3 ¼ 0.030) and the natural frequencies (ω 1 ¼ 0.605, ω 2 ¼ 0.885, ω 3 ¼ 0.602, ω 4 ¼ 0.950), and obtain strengths of interactions K 1 a ij , K 2 c ijk , and K 3 e ijkl as depicted in Fig. 5. Given our approach, we expected two peaks for K 1 a ij : one near the preset value of K 1 (with a ij ¼ 1) and another near zero (for the elements a ij ¼ 0).This argument is valid also for K 2 c ijk and K 3 e ijkl .FIG. 6. Clock-type representation of nonvanishing elements of the 1-simplex adjacency matrix a, the 2-simplex adjacency tensor c, and the 3-simplex adjacency tensor e for a network of size N ¼ 6, depicted in Fig. 5.The three circles around the central nodes i ¼ 1; …; 6 present the elements of the matrix a and of tensors c and e, respectively.Nonzero elements are connected with lines and with different colors from the central node to the elements in each circle.As an example, we have chosen the links from each node as i ¼ 1∶ð½5; 25 With the knowledge that the adjacency matrix a ij and the respective tensors c ijk and e ijkl take the values 0 or 1 and by discarding the three peaks at the origin (K 1 a ij ¼ K 2 c ijk ¼ K 3 e ijkl ≃ 0), we obtain K 1 ¼ −0.099 AE 0.015, K 2 ¼ 0.060 AE 0.002, and K 3 ¼ 0.030 AE 0.002.Although the higher-order strengths of interactions (K 2 c ijk and K 3 e ijkl ) are small compared to the 1-simplex interaction strength (K 1 a ij ), our approach extracts the strengths of interactions from the simulated time series with a very good precision.The same holds true for the mean natural frequencies, for which we obtain ω 1 ¼ 0.605 AE 0.003, ω 2 ¼ 0.882 AE 0.004, ω 3 ¼ 0.602 AE 0.002, and ω 4 ¼ 0.952 AE 0.004.

Configuration 3
We now consider an oscillator network of size N ¼ 6 (see Fig. 5) that undergoes an abrupt synchronization transition via hysteresis and bistability of synchronized and incoherent states due to the presence of higher-order interactions [68].This network also has the potential to feature pairwise, three-way, and four-way interactions, and the chosen nonvanishing elements of the adjacency matrix a and the respective tensors c and e are depicted in Fig. 6.For fixed natural frequencies ω i ¼ ð0.883; 0.549; 0.946; 0.948; 0.555; 0.636Þ and coupling strengths (K 1 ¼ −0.010,K 2 ¼ 0.020, K 3 ¼ 0.010), we proceed as before, obtain the strengths of interactions K 1 a, K 2 c, and K 3 e [Fig.7(a)], and yield the estimated coupling strengths as K 1 ¼ −0.010 AE 0.004, K 2 ¼ 0.019 AE 0.007, and K 3 ¼ 0.010 AE 0.001, which is again rather close to the respective preset values.Figure 7(b) illustrates the differences between estimated and preset natural frequencies.Details of identifying the nonzero elements of the adjacency matrix a, and the respective tensors c and e for the case where K 1 , K 2 , and K 3 are constants and for the more general case of general weighted networks (where and K 3 e ijkl → K 3ijkl e ijkl with arbitrary K 1ij , K 2ijk , and K 3ijkl ), are given in Appendixes I and F.
For this network configuration, we study the robustness of our method with respect to the presence of different global states in the network (incoherent, bistable, and synchronized).To this end, we estimate the natural frequencies for three 1-simplex coupling strengths K 1 ∈ f−0.1; 0.1; 1.0g [K 1;low (incoherent), K 1;mid (bistable), K 1;high (synchronized)] and for fixed K 2 ¼ 0.020 and K 3 ¼ 0.010.We observe the estimated frequencies to be rather close to the respective preset values regardless of the different global dynamical states [Fig.7(b)].

Configuration 4
Finally, we evaluate the performance of our method with respect to the presence of an indirect path and a spurious link due to a common driver.To this end, we study a directed oscillator network of size N ¼ 6 (shown in the left panel of Fig. 8), for which the preset values of matrix elements a 13 and a 16 are zero.We fix the coupling strengths (K 1 ¼ −1, K 2 ¼ 0.020, K 3 ¼ 0.010) and the natural frequencies [ω i ¼ð0.883;0.549;0.946;0.948;0.555;0.636Þ],and our reconstruction method yields strengths of interactions K 1 a 13 ¼ 0.0004 AE 0.0001 and K 1 a 16 ¼ 0.007 AE 0.002.For the nonvanishing elements a ij , we obtain K 1 a ij ¼ −1.0044 AE 0.0001, K 2 c ijk ¼ 0.019 AE 0.003, and K 3 e ijkl ¼ 0.0095 AE 0.0012 (cf.middle panel of Fig. 8); the reconstructed natural frequencies are (0.890AE0.036;0.553AE0.023;0.955AE0.030;0.986AE0;141;0.552AE0.043;0.636AE0.001).The notably small strengths of interactions for the indirect path Oð10 −4 Þ and for the spurious link Oð10 −3 Þ indicate that our approach can handle such ambiguities quite well.estimated and preset natural frequency Δω i of each oscillator for three 1-simplex coupling strengths K 1;low , K 1;mid , and K 1;high ) and for K 2 ¼ 0.020 and K 3 ¼ 0.010.Error bars denote the standard error of the mean obtained from 50 realizations with integration times T ¼ 5000 and T ¼ 15000.

V. CONCLUSIONS AND DISCUSSIONS
The study of many natural and manmade complex systems in terms of complex networks has been a tremendous success over the past decades.One research focus has been on how to model such systems based on first principles, including the design of appropriate couplings between subsystems in a network context.A second research focus has addressed the problem how to construct a network from multivariate time series.As a first approximation, one needs to estimate an "adjacency matrix," which identifies the interactions between pairs of network components or between subsystems associated with network nodes.Network links can be derived from bivariate statistical measures such as cross-correlation coefficients, mean phase coherence, mutual information, causal directionality, etc.; see, for instance, Refs.[37,38,43,70,71].The matrix then provides an averaged pairwise influence of a given time series over others.However, nowadays, it is believed that interactions in a network often take place between multiple nodes [46,49].These higher-order interactions appear, in general, when the physical, chemical, or biological interactions between components of a complex network are modeled.Consequently, they need to be considered when reconstructing the dynamics of a networked system from multivariate time series.In addition, one has to bear in mind that all natural systems are subject to noise.
The approach developed in this paper provides a novel method to characterize higher-order (≥2) interactions, which we achieved by combining the following: (i) the idea of approximating the deterministic and stochastic dynamics of a system, including interactions of higher orders based on observed time series, and (ii) the estimation of drift and diffusion terms employing the Kramers-Moyal coefficient approach.By investigating time series from various high-dimensional model systems with preset higher-order interactions and thus with access to ground truth, we could demonstrate that our approach can be successfully applied to reconstruct high-dimensional complex systems with many interacting components as well as networked systems where the dynamics of each node is represented by one state variable, based of multivariate time series.Our approach yields estimates for pairwise (matrix A) and higher-order (tensors C and E) interactions.It is important to note that the interaction matrix A in our approach is fundamentally different from the aforementioned adjacency matrix [see Figs.8(b) and 8(c)].This difference is due to the fact that, from the first step, we started from the dynamics encoded in a time series and, hence, our estimated interaction matrix A provides local dynamical properties of state variable x i .Therefore, our estimated interaction matrices and tensors in different orders originate from the dynamics of the system under consideration.
The data-driven characterization of interactions of different order (including pairwise, three-way, and higher-order) in open and adaptive high-dimensional complex systems is rapidly gaining increasing importance.Nowadays, there are many approaches to reconstruct stochastic models from data.Examples include the generalized Langevin equations [72,73], fractional Klein-Kramers equations [74] and underdamped Langevin equations [75], as well as discrete-time ARFIMA (autoregressive fractionally integrated moving average) and NARMA (nonlinear autoregressive moving average) models [76,77], compressed sensing [78,79], phase dynamics [80], event timing patterns [81], model-free inference of direct network interactions [82], detecting hidden units [83], and statistical inference of such   models from stochastic processes with long-range correlations and fractional diffusion processes [84,85].Our method goes beyond these approaches.It allows one to derive important characteristics of higher-order interactions in deterministic and stochastic components of the dynamics with high accuracy from the statistical moments and smalllag correlation functions of time series measured in subsystems of a complex system.In addition, it provides detailed insights into a subsystem's (nodal) dynamics.Extensions of our approach to dynamical systems with time-delayed interactions will be an imminent consequence of methods presented in this work.We believe that our approach provides a methodology that can help improve our understanding of the impact of higher-order interactions in many fields of science, including neuroscience, physics, chemistry, biology, informatics, finance, data science, ecology, climate dynamics, etc. Research along this line is already underway and will be published elsewhere [86].
The code [87] contains all the necessary details for calculating both pairwise and higher-order interactions from multivariate time series data.

APPENDIX A: PAWULA THEOREM
In general, the probability distributions of N -dimensional Markov processes (in practice, Markovianity of a given time series should be verified via some statistical test [15]) satisfy a first-order partial differential equation in time and an infinite order of differentiation with respect to the state variable.The governing equation is known as the Kramers-Moyal equation [15,52,53].The Pawula theorem [88] states that there are only three possible cases in the KM expansion up to order n: (i) The KM expansion is truncated at n ¼ 1, meaning that the process is deterministic; (ii) the KM expansion stops at n ¼ 2, with the resulting equation being the Fokker-Planck equation that describes diffusion processes, and finally, (iii) the KM expansion contains all the terms up to n → ∞.Any truncation of the expansion at a finite order n > 2 would produce a nonpositive probability density pðx; tÞ [15,53,88].For case (ii), the KM expansion reduces to the Fokker-Planck equation, which means that the first and second KM coefficients D ð1Þ ðx; tÞ (drift coefficient or drift vector) and D ð2Þ ðx; tÞ (diffusion coefficient) would be nonvanishing, whereby there is a possibility of vanishing D ð1Þ ðx; tÞ.Now, one can ask which dynamical equation governs the stochastic variable x itself, where its marginal and conditional PDFs satisfy the Fokker-Planck equation.The corresponding stochastic equation is known as the Langevin equation.With the Itô interpretation of a stochastic integral, it has the following form [14,15]: where η j ðtÞ is independent Gaussian white noise with unit intensity, i.e., hη k ðt 1 Þη j ðt 2 Þi ¼ δ k;j δðt 2 − t 1 Þ.In the Itô interpretation, one has F ¼ D ð1Þ ðx; tÞ in Eq. (1) in the main text.The diffusion matrix of the dynamics is given by the functions g ij ðxÞ as D ð2Þ ij ðxÞ ¼ P N l¼1 g il ðxÞg jl ðxÞ and is stated in terms of second-order conditional moments of the N -dimensional time series (see Appendixes D and E).

APPENDIX B: BASIC IDEA TO DETERMINE THE LINEAR DRIFT VECTOR
Let us describe the general idea that can be used to determine elements of the matrix A (in addition, a constant vector α) in Eq. (2), in terms of statistical moments and short-time (small-lag) correlation functions of an N -dimensional time series.Consider the N -dimensional linear process xðtÞ satisfying where the drift vector D i ðx; tÞ is given by with i ¼ 1; …; N and y i ðt; τÞ ¼ x i ðt þ τÞ − x i ðtÞ.By defining ϕ i;j ðτÞ ≡ ϕ i;j ¼ τA i;j for j > 0 and ϕ i;0 ðτÞ ≡ ϕ i;0 ¼ τα i , using Eq.(B1), we have Here, we assume that, besides the linear interactions represented by A, the dynamics could even contain a constant contribution corresponding to the term ϕ i;0 .The conditional average is defined as Multiplying both sides of Eq. (B4) by the N -point joint probability distribution function pðx 1 ; x 2 ; …; x N Þ and integrating over all state variables results in If we repeat the same procedure by multiplying x j pðx 1 ; x 2 ; …; x N Þdx 1 dx 2 Á Á Á dx N , we then obtain The following equations will give the coefficients ϕ i;j : Using the definition y i ¼ x i ðt þ τÞ − x i ðtÞ, the problem of calculating ϕ i;j ðτÞ, A i;j ¼ lim τ→0 ϕ i;j ðτÞ=τ, and α i ¼ lim τ→0 ϕ i;0 ðτÞ=τ will be reduced to calculating correlation functions of hx i ðt þ τÞx j ðtÞi and hx i ðtÞx j ðtÞi.Equation (B7) is a linear set of equations for ϕ i;j , and the values of these coefficients are a complex mixture of all two-point correlation functions of multivariate time series in time lags 0 and τ.
To meet the limit τ → 0, we can calculate ϕ i;j ðτÞ for τ ¼ dt; 2dt; …; kdt (here, dt is the inverse of the sampling rate used for data acquisition), and subsequently, a linear regression is performed for the first three time lags to find α i and A i;j [89].An alternative approximation for constants α i and matrix A is to use ϕ i;0 ðτ ¼ dtÞ=dt and A i;j ¼ ϕ i;j ðτ ¼ dtÞ=dt, respectively [15].

APPENDIX C: DRIFT VECTOR: HIGHER-ORDER INTERACTIONS
For the higher-order terms in Eq. (B1) [and Eq. (3) in the main text], the procedure to find the Kramers-Moyal coefficients is the same, and the interaction matrices can be given in terms of statistical moments and small-lag correlation functions of the N -dimensional time series.To consider higher-order interactions, e.g., up to third order in x j , the right-hand side of Eq. (B4) can be written as where ði; j; k; lÞ ¼ 1; …; N .Here, the τ-dependent αi and matrices ϕ i;j ; ψ i;jk , and γ i;jkl are related to interaction matrices A, C, and E as A i;j ¼ lim τ→0 ϕ i;j ðτÞ=τ, C i;jk ¼ lim τ→0 ψ i;jk ðτÞ=τ, and E i;jkl ¼ lim τ→0 γ i;jkl ðτÞ=τ, respectively.Additionally, α i ¼ lim τ→0 αi ðτÞ=τ.Similar to the linear case, we can multiply both sides of Eq. (C1) by the N -point joint probability distribution function pðx 1 ; x 2 ; …; x N Þ, with x p , x p x q , and x p x q x r , and find hy i ðt; τÞx p x q i ¼ αi hx p x q i þ X j ϕ i;j hx j x p x q i þ X j;k ψ i;jk hx j x k x p x q i þ X j;k;l γ i;jkl hx j x k x l x p x q i; ðC4Þ hy i ðt; τÞx p x q x r i ¼ αi hx p x q x r i þ X j ϕ i;j hx j x p x q x r i þ X j;k ψ i;jk hx j x k x p x q x r i þ X j;k;l γ i;jkl hx j x k x l x p x q x r i: ðC5Þ Therefore, we can find the unknown coefficients in the expansion (C1) as the solution of the following set of linear equations: To approach the limit as τ → 0, we can calculate α i ðτÞ ¼ αi ðτÞ=τ, A i;j ðτÞ ¼ ϕ i;j ðτÞ=τ, C i;jk ðτÞ ¼ ψ i;jk ðτÞ=τ, and E i;jkl ðτÞ ¼ γ i;jkl ðτÞ=τ, for τ ¼ dt; 2dt; …; kdt, and apply a linear regression to their estimated values for the first three time lags [89].An alternative method for approximating the coupling strengths is to estimate them at τ ¼ dt [15].The existence and uniqueness of the estimated elements of matrices or tensors α, A, C and E are presented in Appendix F.

APPENDIX D: DIFFUSION COEFFICIENTS
The diffusion matrix is defined as [15,53] We define y ij ðt;τÞ ¼ (x i ðt þ τÞ − x i ðtÞ)(x j ðt þ τÞ − x j ðtÞ), and similar to the expansion of Eq. (C1), we can expand the diffusion matrix.Therefore, we write where all matrices α ij , etc., are τ dependent and the zero limit τ → 0 should be taken.Similar to the drift vector expansion, we can multiply both sides of Eq. (D2) by the N -point joint probability distribution function pðx 1 ; x 2 ; …; x N Þ, with x p , x p x q , and x p x q x r , and find For each element D ð2Þ ij ðxÞ in the diffusion matrix expansion, we have m ¼ ½ðN þ ZÞ!=N !Z! unknown coefficients.Therefore, we find the unknown coefficients in the expansion (D2) as the solution of the following set of linear equations: where i; j ¼ 1; …; N .The interaction matrices from diffusion coefficients are  the matrix G can be produced by the lower triangular matrix G l , the upper triangular matrix G u , and the symmetric matrix G s as respectively.In the symmetric case, we need to have and kðl þ hÞ ¼ c, resulting in nine positive solutions for k, h, and l [15].The generalization of the results for a higher-dimensional diffusion matrix N ≥ 3 is readily shown for G l and G u [15].For example, the lower or upper triangular matrix may be determined by the Cholesky decomposition method, which is also applicable to higher dimensions [90,91].Therefore, using the relation between the diffusion matrix D ð2Þ and the matrix G, we may choose one of the possible forms for G via relations (E1) to compute its entries.
In all examples with dimension N ≥ 2, we use the upper triangular matrix construction for matrix G from the diffusion matrix D ð2Þ ij ðxÞ.Another method to derive the matrix elements g ij in any dimension N from the diffusion matrix D ð2Þ ij ðxÞ is as follows.Since D ð2Þ ij ðxÞ is a symmetrical, positivesemidefinite matrix, it has only real, non-negative eigenvalues λ 1 ; …; λ N [15,92].Therefore, an orthogonal transformation U can be found which diagonalizes D ð2Þ such that U T D ð2Þ U ¼ diagðλ 1 ; …; λ N Þ.When taking the positive root of the eigenvalues and transforming them back, for each element of G, we obtain For the correlated white noise specified in Eq. (1), i.e., hη i ðtÞη j ðt 0 Þi ¼ h ij δðt − t 0 Þ, where the matrix h denotes the covariance matrix, the diffusion coefficients can be expressed as D ð2Þ ðxÞ ¼ GhG T .Importantly, with the Itô interpretation, the drift term remains unaffected.This finding arises from the observation that in Eq. (1), when computing the diffusion matrix D ð2Þ ij ðxÞ, one encounters the term hGðx; tÞηðtÞGðx; t 0 Þηðt 0 Þi, leading to the relation D ð2Þ ðxÞ ¼ GhG T .

Multivariate Gaussian distribution
For an N -dimensional stochastic dynamical system with pairwise interactions and additive noise, denoted by constant A and P in Eqs. ( 3) and ( 11), the joint probability distribution of the state variables will follow an N -dimensional Gaussian distribution.
Consider an N -dimensional linear system with additive noise ηðtÞ (multivariate Ornstein-Uhlenbeck process) given by where matrices A and P are constant N × N matrices.
If the real parts of the eigenvalues of A are negative, the stationary probability density function of x i is expressed as: where the matrix Σ is determined from the Lyapunov equation AΣ þ ΣA T ¼ −PP T .This expression (E3) represents the stationary probability density function for the given N -dimensional linear system with additive Gaussian white noise.

APPENDIX F: EXISTENCE AND UNIQUENESS OF ESTIMATED COUPLING STRENGTHS AND ADJACENCY MATRICES
The main idea of our methodology for finding the coupling strengths in Eqs.(C6) and (D7) and, in addition, the adjacency matrices in Eq. ( I7) is based on solving a system of linear equations.The Rouché-Capelli theorem determines the existence and uniqueness of the solutions to a linear system [93].For a given number of unknowns, the number of solutions depends only on the rank of the coefficient matrix and the rank of the corresponding augmented matrix.According to this theorem, a system of linear equations has no solutions if the rank of the augmented matrix is greater than the rank of the coefficient matrix; on the other hand, if the ranks of the two matrices are equal, there must exist at least one solution.The solution is unique if and only if the rank is equal to the number of variables.Otherwise, the general solution has k free parameters, where k is the difference between the rank and the number of variables; in this case, there is an infinite number of solutions to the linear system.

APPENDIX G: ASSESSING THE METHOD'S ACCURACY 1. Sources of error in estimating the strengths of interactions
The core concept of our methodology for determining the strength of an interaction, outlined in Eqs.(C6) and (D7), involves solving a set of linear equations.We define the right-hand side of Eqs.(C6) and (D7) as the vector ϕ, which encompasses all strengths of interactions, and we represent the left-hand side as Y. Additionally, we denote the matrix of statistical moments as M. Consequently, we seek to find the solution to the set of linear equations expressed as The solution is considered unique if and only if the rank of the matrix of statistical moments matches the number of variables, as detailed in Appendix F. The characteristics and properties of matrix M and of the N -dimensional time series (including potential oversampling, i.e., having measured the same time series for at least two state variables) can significantly influence the convergence behavior and reliability of the estimation process.Notably, if matrix M possesses eigenvalues that are close to zero, this can result in slower convergence or even numerical instability during the estimation procedures of the strengths of interaction.In such scenarios, it becomes essential to employ regularization techniques (such as stochastic gradient descent optimization) to guard against overfitting or to consider methods for reducing dimensionality [94].
We did not use any regularization techniques for the examples presented in this paper.We relied on linear and nonlinear solvers.The code [87] accompanying this work contains all the necessary details for calculating both pairwise and higher-order interactions from multivariate time series data.An error in estimating the strengths of interactions from ϕ in Eq. (G1) originates from modifications to It is worth noting that we account for the possibility that M might not be directly invertible.Equation (G2) provides us with error propagation, showing how δϕ measures errors in M and Y, which propagate through the solution process, affecting the solution ϕ.By varying the order of interactions Z (Z ¼ 0; 1; …), we can find errors in estimating the strengths of interactions δϕ.
To determine the errors in the estimation of M and Y, we consider that the vector Y and the matrix M encompass various statistical moments, including hx k i and mixed moments.When we have a single realization of an N -dimensional time series, we can employ the following approaches to estimate the error of these statistical moments: (i) A straightforward method is to estimate their mean squared error given by , where N is the number of data points.(ii) Alternatively, we can utilize the bootstrap method to estimate the moments' confidence intervals.By utilizing Eq. (G2) and possessing knowledge of the errors associated with M and Y, we can calculate the error δϕ in the estimations of the strengths of interactions.This error is impacted by the integration time T as well as by the order of interactions Z. Finally, in cases where we have access to several realizations of N -dimensional time series, we can perform ensemble averaging of the estimated strengths of interactions from each realization.
For all examples presented in this paper, we report-for different integration times T-means and errors (mean squared error from all strengths of interactions in each realization) of the strengths of interactions from 50 realizations of applied noise.

Comparing true and estimated drift terms: Coefficient of determination
In cases where we have access to ground truth (e.g., polynomial or nonpolynomial drift functions), we can assess the similarity between two drift functions-F, which is based on known (preset) coupling strengths and parameters, and F, which we estimate using our method-by calculating the coefficient of determination (R 2 -score).In the case of a perfect match, R 2 ¼ 1.
In higher dimensions (such as for N -dimensional multivariate time series), the drift function F in the dynamical equation (3) comprises preset vectors, matrices, and tensors denoted as α i , A ij , C ijk , and E ijkl .Given an N -dimensional multivariate time series with N data points and sampling interval dt, we can estimate αi , Ãij , Cijk , and Ẽijkl (ði; j; k; lÞ ¼ 1; 2; …; N ) using the relations (C6).To assess the quality of these estimations, we examine the time dependency of F(xðtÞ) and of F(xðtÞ) in Eq. ( 3) by substituting the simulated x i ðtÞ.This method involves preserving the simulated N -dimensional multivariate time series xðtÞ and then reinserting them into the expressions for the true drift function F(xðtÞ) and the estimated drift F(xðtÞ).This process yields two sets of time series, each representing the values of the two drifts at each time point t.We perform these substitutions for each component of the two drift functions.Since the precision of the estimated drift function depends on integration time T, R 2 values will also be influenced by T. For all dynamical equations (A1)-(A7), we report R 2 values for various T for each component of the respective drift function.
A similar analysis can be conducted for the stochastic component G in Eq. (1).However, in the majority of our examples, we employed additive noise, which means that the tensor G has constant values and is independent of x.

Comparing true and reconstructed dynamics: Kolmogorov-Smirnov test
Having estimated functions and parameters for both the deterministic drift FðxðtÞÞ and the stochastic part Gðx; tÞ in the dynamical equation (1) allows us to generate time series of the dynamics and to compare them with time series generated with the preset functions and parameters.To this end, we estimate the CDFs for each of the aforementioned time series (F e and F p , each with the maximum integration time T; the subscript "e" and "p" refer to the estimated, respectively, preset CDF) and use the nonparametric two-sample KS test to determine whether the two CDFs differ.The maximum "distance" between the two CDFs, vanishes for equal CDFs.For all dynamical equations (A1)-(A7), we report D KS and the corresponding significance level p KS .In Fig. 9, we demonstrate, with the true and reconstructed dynamics from Example A1, the dependence of the KS statistic D KS on integration time T. Clearly, D KS decreases with increasing T, and we observe a similar dependence for the other examples A2-A7.We note that for some realizations of the applied noise, the reconstructed dynamics might be unstable for the same integration time step dt, and one may need to reduce dt.

APPENDIX H: EXEMPLARY NONLINEAR DYNAMICAL SYSTEMS
In the following, we provide details for the exemplary low-and high-dimensional nonlinear dynamical systems investigated in this work to demonstrate the suitability of our method.These systems comprise dynamical equations that are polynomial or nonpolynomial functions of the state variables as well as a stochastic process that undergoes a noise-induced transition by changing the intensity of the external noise; the deterministic component is a given function with constant coefficients.

Example A1
The system contains preset higher-order interactions up to third order and possesses three fixed points: one saddle and two stable spirals.The values of the estimated coefficients are given in Fig. 1; the two-dimensional phase portrait of Eq. (H1) is shown in Fig. 10.The noisy trajectory (see Fig. 1 in the main text) exhibits transitions between the two stable states.

Example A2
A dynamical model of cancer growth is given by the following three-dimensional, nonpolynomial dynamical equations containing a rational term [59]: x 1 (t) between host cells x 2 and tumor cells x 1 results in a loss term of tumor cells, which is given by a 12 x 1 x 2 .The term a 13 x 1 x 3 refers to the killing rate of tumor cells by the effector cells x 3 .In Eq. (H3), the host cells x 2 also grow logistically.The tumor cells inactivate the healthy cells at a rate of a 21 .The last equation of the model describes the change in the effector immune cell population with time t.
The first term of Eq. (H4) illustrates the stimulation of the immune system by the tumor cells with tumor-specific antigens.The rate of recognition of the tumor cells by the immune system depends on the antigenicity of the tumor cells.Since this recognition process is very complex, a simplified description based on a Michaelis-Menten kinetics [95] has been assumed, whereby the maximum stimulation rate is r 3 and the half-saturation constant is k 3 .The effector cells are inactivated by the tumor cells at a rate a 31 .Natural mortality of the effector cells is modeled proportionally to the population density with a rate d 3 .We note that all control parameters in the equations are positive.
To determine the eight parameters in Eqs.(H2)-(H4), we multiply the discretized equation (H2) by x 1 and x 2 , and after ensemble averaging, we find the linear equations Equation (H5) provides two linear equations for the coefficients a 12 and a 13 .Similarly, we can find two other equations for the unknowns r 2 and a 21 by multiplying the discretized equation (H3) by x 1 and x 2 as which is a linear set of equations for r 2 and a 21 .Likewise, for the remaining coefficients, r 3 , a 31 , k 3 and d 3 , we find four equations by multiplying which can be written as four nonlinear equations for r 3 , a 31 , k 3 , and d 3 , x 2 Þ ¼ 0 (blue) and F 2 ðx 1 ; x 2 Þ ¼ 0 (green dashed lines) and phase portrait for system (H1).The color code of trajectories represents the magnitude of the drift ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi For our investigations, we chose control parameter settings as a 21 ¼ 1.5, d 3 ¼ 0.5, r 2 ¼ 0.6, a 31 ¼ 0.2, r 3 ¼ 4.5, a 13 ¼ 2.5, k 3 ¼ 1, and a 12 ¼ 1, for which the dynamics in state space exhibits a chaotic attractor [59].The estimated values provided by our approach are given in Fig. 2.

Example A3
The Malthus-Verhulst model was originally proposed to describe the evolution of a biological population.Let n denote the number (or density) of individuals of a certain population, which will change due to growth, death, and competition.In the simplest version, birth and death rates x 1 x 1 x 1 x 2 x 2 x 2 0 10 20 are assumed to be proportional to n, while competition is proportional to n 2 [60]: The constant ν is the net growth rate or decay rate, for ν > 0 and ν < 0, respectively.The parameter β can be removed by using the scaling transformation nðtÞ → xðtÞ ¼ βnðtÞ.Suppose that the original growth and/or death rates ν fluctuate in time.Assume that νðtÞ ¼ ν þ ΓηðtÞ, with constant noise intensity Γ and constant ν, where ηðtÞ is a Gaussian white noise.The associated stochastic differential equation is then a multiplicative process, Let us find the PDF for the random variable xðtÞ to investigate possible abrupt changes of behavior as the parameters change.The PDF pðx; tÞ of x at time t satisfies the following Fokker-Planck equation: Its stationary solution in the interval 0 < x < ∞ is where c is the normalization constant.We note that there is a drastic change in the character of this stationary distribution when ν crosses the value Γ 2 : When 0 < ν < Γ 2 , p s diverges at x ¼ 0, while for ν > Γ 2 , p s ð0Þ ¼ 0 (cf.Fig. 12).We note that the drift coefficient −x 2 þ νx is independent of Γ, and, as we expected, the estimated drift coefficients for the two values of Γ ∈ f0.6; 1.1g (see Fig. 3) are independent of Γ, although the simulated time series exhibit a different type of fluctuation as demonstrated in Fig. 3(a).

Example A4
Consider the dynamical equations The flow has three fixed points: (0,0), (1,1), and ð−1; 1Þ, which are one saddle and two attractive spirals, respectively.The values of the estimated coefficients are given in Fig. 14.The corresponding two-dimensional phase portrait of Eq. (H16) is shown in Fig. 13.When noise is applied to this system, as depicted in Fig. 14, the dynamics shows a hopping dynamics between the two attractors corresponding to transitions due to the noise, where the critical noise intensity is 0.

Example A5
The FitzHugh-Nagumo model is given by the dynamical equations Here, x 1 is the membrane potential, x 2 is a recovery variable, and I is the magnitude of an external stimulus current.This model is a two-dimensional simplification of the Hodgkin-Huxley model of spike generation in squid giant axons.The estimated values of the coefficients are given in Fig. 16.The magnitude of the stimulus current is fixed at I ¼ 0.5.The parameters of the system are taken from the oscillatory regime of the FitzHugh-Nagumo 0 .0 0 .5 1 .0 1 .5 2 .0 2 .5 3 .0 x P s (x ) 12. Stationary solution p s of the Fokker-Planck equation (H14) for cases (ν ¼ 1; Γ ¼ 0.6) and (ν ¼ 1; Γ ¼ 1.1).There is a drastic change in the character of p s when ν crosses the value Γ 2 : when 0 < ν < Γ 2 , p s diverges at x ¼ 0, while for ν > Γ 2 , p s ð0Þ ¼ 0. As it can be seen from Eq. (H15), p s has diverging and regular behaviors at the origin for ν=Γ 2 − 1 < 0 and ν=Γ 2 − 1 > 0, respectively.

Example A6
Consider the three-dimensional system of dynamical equations  which, for a > 0, is a generalization of the rock-paper-scissors model [96].The system possesses a fixed point at ðx Ã 1 ; x Ã 2 ; x Ã 3 Þ ¼ 1 3 ð1; 1; 1Þ, which is attractive for a > 1.To analyze the simulated time series from Eq. (H19), we transform the variables x i → y i ¼ x i − 1 3 to shift the fixed point to the origin.The estimated values of the coefficients are given in Fig. 17.The magnitude of the parameter a is 1.1.The dynamical equations for y 1 , y 2 , y 3 are FIG. 16.Same as Fig. 1 but for Example A5, where the dynamics are given by Eq. (H17) (T ¼ 10000, dt ¼ 0.05, ffiffiffi σ p ¼ 0.2).For x 2 ðtÞ, KS test indicated no differences between true and estimated CDFs (D KS ¼ 0.06; p KS < 10 −4 ).For x 1 ðtÞ, we obtained D KS ¼ 0.15 (n.s.).

Example A7
Consider a nine-dimensional system of dynamical equations [62] given by where a ¼ 0.5, r ¼ 14.The dynamics are given by Eq. (H20).We show CDFs of time series generated from the true drift vector F(xðtÞ) and diffusion coefficient, compared to those using the estimated drift F(xðtÞ) and diffusion coefficient for T ¼ 10000.The KS test indicated no differences between true and estimated CDFs for all time series [D KS ∈ ð0.01; 0.01; 0.07; 0.03; 0.05; 0.04; 0.02; 0.05; 0.03Þ; p KS < 10 −4 in all cases].Therefore, for given i, using Eqs.(I3)-(I6), we find the unknown coefficients α i , ϕ ij , ψ ijk , and γ ijkl , as the solution of the following set of linear equations: Using Eq. (I7), we calculate the τ-dependent α i , K 1ij a ij , K 2ijk c ijk , and K 3ijkl e ijkl for τ; 2τ; Á Á Á.For instance, the natural frequencies of the system can be found using the formula ω i ¼ lim τ→0 α i =τ, where α i .The process of approaching τ to zero is akin to the approach employed in Appendixes C and D. For constant coupling strengths K 1 ¼ K 1ij , K 2 ¼ K 2ijk , and K 3 ¼ K 3ijkl , and undirected networks without any self-loops, as in the example presented in the main text in Fig. 5, we note that a ij ¼ a ji and a ii ¼ 0, and all permutations in tensors c and e, i.e., c ijk ¼ c ikj ¼ c jki ¼ Á Á Á and e ijkl ¼ e ikjl ¼ Á Á Á, will have the same value.Additionally, in all examples, we have considered the case that, for given nodes i and j with a ij ¼ 0, elements of higher-order adjacency tensors with indices c ijk and e ijkl will be zero.For instance, in Fig. 5 with N ¼ 6, we have a 12 ¼ 0, and therefore, one can write c 123 ¼ c 132 ¼ c 124 ¼ c 142 ¼ 0 and e 1234 ¼ e 1324 ¼ Á Á Á ¼ 0.

APPENDIX J: ESTIMATING THE HIGHEST-ORDER Z OF EXPANSION (3) FROM DATA
Here, we deal with the question of up to which order Z of expansion (3) we can reliably estimate the strength of interactions from a time series of length T ¼ Ndt.This problem is referred to as the stop condition in Ref. [51].As shown in relation (C6), it is necessary to calculate statistical moments of x i up to order 2Z (denoted as hx ) to set up matrix A and tensors C and E in the deterministic part, and tensors P, Q, R, and S in the stochastic part of the dynamics (see Appendixes C and D).Then, we can estimate the strengths of interactions up to order Z.
In order to assess the quality of these calculations, we first need to ensure that the tails of the joint PDF x m 1 i x m 2 j …x m o l pðx i ; x j ; …x l Þ are adequately resolved.To illustrate this, we consider Example A1 with interaction terms of order 3 for which we need to calculate statistical moments hx m 1 i x m 2 j x m 3 k Á Á Ái with ðm 1 þ m 2 þ Á Á ÁÞ ∈ f1; 2; …; 6g.In Fig. 22, we plot x 4 i pðx 1 Þ and x 6 i pðx 1 Þ for both components (i ∈ f1; 2g) and for different integration times T. We find that the tails of x 4 2 pðx 2 Þ and x 6 2 pðx 2 Þ are not resolved for T < 600, which indicates that we cannot reliably estimate the second-and third-order coefficients with the expansion Eq. ( 3).For T > 1000, the tails of x 4 2 pðx 2 Þ and x 6 2 pðx 2 Þ can be sufficiently well resolved, and we can thus obtain reliable matrices and tensors.We anticipate that the joint PDF decays more rapidly at high values of state variables than polynomial functions.
Next, we perform a more quantitative analysis to explore how the errors associated with the calculation of hx m 1 i x m 2 j x m 3 k Á Á Ái depend on the integration time T. We exemplify this analysis at the nine-dimensional dynamical system of Example A7 [Eq.(H20)], for which we would like to calculate statistical moments up to order 2k, specifically 2,4,6,8, and 10. Figure (23) shows how the i pðx i Þ and x 6 i pðx i Þ (i ∈ f1; 2g) calculated from time series of the second (top) and first (bottom) component of Example A1 with different integration times T. Probability distribution functions of variables x i were estimated using a binning method with 71 bins.

APPENDIX K: IMPACT OF MEASUREMENT NOISE ON THE ESTIMATION OF STRENGTHS OF INTERACTIONS
In order to investigate the impact of measurement noise, we derive an N -dimensional multivariate time series zðtÞ by adding independent white noise nðtÞ to the time series xðtÞ, with zðtÞ ¼ xðtÞ þ nðtÞ, where nðtÞ is assumed to have zero mean and a finite covariance matrix σ n and is independent of xðtÞ.One straightforward method to assess the presence of measurement noise in a given time series is to examine the second-order conditional moments, K ð2Þ ij ðx;τÞ ¼ h(x i ðt þ τÞ − x i ðtÞ) × (x j ðt þ τÞ − x j ðtÞ)ij xðtÞ¼x¼ðx 1 ;x 2 ;…;x N Þ ; ðK1Þ and observe their behavior with changing τ.Dividing Eq. (K1) by τ yields the diffusion tensor D ð2Þ ij ðxÞ in Eq. (10).
We focus on the diagonal elements in Eq. (K1), where i ¼ j.The theoretical values for K n ii [31].It is important to note that the term 2σ 2 n ii is independent of τ, and σ 2 n ii represent the diagonal elements of the covariance matrix σ 2 n .Therefore, when calculating the diagonal elements of the diffusion tensor in Eq. (10), it is necessary to divide K ii ðxÞ by τ and taking the limit τ → 0, we can derive the second-order Kramers-Moyal coefficients, which enables us to estimate the strength of interactions in the stochastic part for the time series xðtÞ.
Measurement noise with a nonzero mean can lead to a similar divergent behavior when estimating the drift using Eq.(C6), where we have K ð1Þ i ðzÞ → K ð1Þ i ðxÞþhn i i.Consequently, conducting an analysis akin to the one carried out for second-order conditional moments [Eq.(K1)] allows us to determine the mean value of the noise component [15].Using a similar approach to what we employed for the second-order Kramers-Moyal coefficients, we can then deduce the strength of interactions from the drift function for the time series xðtÞ.In the case of measurement noise with zero mean, the process for estimating the strength of interactions will be the same as the approach used to estimate the strength of interactions for the drift function in the absence of measurement noise.
In the case of empirical time series, it is advisable to perform this analysis to assess the potential presence of measurement noise in the collected data.
ð2Þ ij ðxÞ ¼ P N l¼1 g il ðxÞg jl ðxÞ.This expression for D ð2Þ ij ðxÞ can be represented in terms of conditional averaging (see Appendix D) as

FIG. 7 .
FIG. 7. (a) Histograms of estimated coupling strengths for the oscillator network with N ¼ 6 nodes.The preset values are indicated by vertical dashed lines.(b) Differences betweenestimated and preset natural frequency Δω i of each oscillator for three 1-simplex coupling strengths K 1;low , K 1;mid , and K 1;high ) and for K 2 ¼ 0.020 and K 3 ¼ 0.010.Error bars denote the standard error of the mean obtained from 50 realizations with integration times T ¼ 5000 and T ¼ 15000.
APPENDIX E: DERIVING THE MATRIX GðxÞ WITH ELEMENTS g ij FROM THE DIFFUSION COEFFICIENTS D ð2Þ ij ðxÞ Several possibilities exist for constructing the matrix G from the symmetric diffusion matrix D ð2Þ , i.e., D ð2Þ ðx; tÞ ¼ GG T [15].For a 2 × 2 diffusion matrix with a c c b ¼ GG T ; Equation (H2) gives the rate of change in the population of the tumor cells with time t.The first term of Eq. (H2) refers to the logistic growth of the tumor cells in the absence of any effect from other cell populations.The competition

FIG. 9 .
FIG. 9. KS statistic of a comparison between reset and reconstructed dynamics of Example A1 [Eq.(12); panel (a), x 1 ðtÞ; panel (b), x 2 ðtÞ].We show distributions of D KS (left) and p KS values (right) (blue-and orange-shaded areas) over 50 realizations of applied noise.Black lines indicate the first and the third quartile of the distributions, and white dots indicate the median values.

6 FIG. 19 .FIG. 20 . 9 FIG. 21 .
FIG.19.Differences Δ between estimated coefficients of deterministic drift terms (strengths of interactions) and preset values for different integration times T. Error bars represent the standard errors of the means derived from 50 different realizations of applied noise.

FIG. 22 .
FIG. 22. Exemplary plots of x 4i pðx i Þ and x 6 i pðx i Þ (i ∈ f1; 2g) calculated from time series of the second (top) and first (bottom) component of Example A1 with different integration times T. Probability distribution functions of variables x i were estimated using a binning method with 71 bins.
ð2Þ ii in the absence and presence of measurement noise are related as K ð2Þ ii ðzÞ → K ð2Þ ii ðxÞ þ 2σ 2 ð2Þii ðzÞ by τ.This division leads to a divergent behavior with a proportional relationship of 1=τ in the presence of measurement noise as τ varies.The divergence originates from the constant term2σ 2 n ii in Kð2Þii ðzÞ.By selecting values of τ ¼ dt; 2dt; Á Á Á and observing the slope of 1=τ in K ð2Þ ii ðzÞ=τ, one can initially estimate σ 2 n ii .Subsequently, with the knowledge of σ 2 n ii , we can subtract it from K ð2Þ ii ðzÞ to determine K ð2Þ ii ðxÞ.By dividing K ð2Þ [1] N. D. Martinez, Constant connectance in community food webs, Am.Nat.139, 1208 (1992).[2] U. Brose, P. Archambault, A. D. Barnes et al., Predator traits determine food-web architecture across ecosystems, Nat.Ecol.Evol.3, 919 (2019).

FIG. 25 .
FIG.25.Left panel: variation of mixed moments hx 2 1 x 2 2 i, hx 4 1 x 2 2 i, hx 2 1 x 4 2 i, and hx 4 1 x 4 2 i with integration time T ¼ Ndt for variables x 1 ðtÞ and x 2 ðtÞ of Example A7 [Eq.(H20)].Right panel: corresponding size of errors for the mixed moments decreases with increasing T following a scaling behavior of approximately 1=T γ , with γ ≈ 0.5 (as indicated by the violet dashed line).Lines are only to guide the eye.
The networked Kuramoto-Sakaguchi oscillators were simulated with integration time T ¼ 10000.Model Id refers to model identification number.
FIG. 4. All 9 × 55 strengths of interactions (red) estimated from a single realization (T ¼ 10000) of Example A7 [Eq.(H20)] together with preset values (blue).Estimates of vanishing preset values are shown with very small amplitudes.
Histogram of interaction strengths K 1 a ij , K 2 c ijk , and K 3 e ijkl for the oscillator network with N ¼ 4 nodes.The preset values of the coupling strength are denoted by dashed vertical lines.
kl ðτÞ=τ, and S ij;kl ¼ lim τ→0 γ ij;klm ðτÞ=τ.The process of approaching τ to zero is akin to the approach used for the drift vector.The existence and uniqueness of the estimated elements of tensors P ij , Q ij;k , R ij;kl , and S ij;klm are presented in Appendix F.