Data-driven Selection of Coarse-Grained Models of Coupled Oscillators

Systematic discovery of reduced-order closure models for multi-scale processes remains an important open problem in complex dynamical systems. Even when an effective lower-dimensional representation exists, reduced models are difficult to obtain using solely analytical methods. Rigorous methodologies for finding such coarse-grained representations of multi-scale phenomena would enable accelerated computational simulations and provide fundamental insights into the complex dynamics of interest. We focus on a heterogeneous population of oscillators of Kuramoto type as a canonical model of complex dynamics, and develop a data-driven approach for inferring its coarse-grained description. Our method is based on a numerical optimization of the coefficients in a general equation of motion informed by analytical derivations in the thermodynamic limit. We show that certain assumptions are required to obtain an autonomous coarse-grained equation of motion. However, optimizing coefficient values enables coarse-grained models with conceptually disparate functional forms, yet comparable quality of representation, to provide accurate reduced-order descriptions of the underlying system.


I. Introduction
Numerical simulations of complex multi-scale phenomena are fundamental to modern science, for which commonly sought goals involve the development of tractable yet accurate reduced-order models. There are many approaches to this sort of problem throughout many diverse domains. For instance, in turbulence modeling, this is known as the closure problem for the Reynolds-Averaged Navier-Stokes equation (RANS) [1][2][3][4]. In molecular dynamics, it is known simply as coarse-graining [5][6][7]. Mean-field approaches to analyzing stochastic dynamics on networks also fall into this description [8,9].
Investigations on model reduction of nonlinear dynamics often consider the Kuramoto model of coupled oscillators [10], which has long been studied as an example of collective behavior [11][12][13][14][15]. The key features of the Kuramoto model are its composition of many oscillatory units with distinct natural frequencies, and pairwise coupling that tends to drive phases together. Its popularity as an object of study comes from its tractability in certain special limits [16,17] and its nonetheless rich phenomenology [18][19][20]. In addition, the Kuramoto model and its variants describe a number of synchronization phenomena in domains of diverse nature such as coupled Josephson junctions [21], neuroscience [22], chemical oscillators [23,24], and the power grid [25].
Substantial work on the Kuramoto model focuses on understanding synchronization under various conditions on the natural frequencies and on the structure of the network that couples oscillators to one another. One main mode of understanding consists of finding simplified mathematical descriptions of the dynamics, and this is where the closure problem arises. Perhaps the bestknown example of this type is the seminal work by Ott and Antonsen [17], who showed that in the N → ∞ limit and under certain conditions on the distribution of natural frequencies and initial phases, the center of mass of a population of Kuramoto oscillators obeys an autonomous ODE. Other studies have focused more on complex copuling topologies, proposing techniques using spectral information to merge nodes together [26] or otherwise systematically discard irrelevant degrees of freedom [27]. Still others take a more strictly data-driven approach and seek e.g. closed equations of motion for low-order moments of the distribution of phases [28,29] or to identify good coarse-grained variables via manifold learning techniques [30]. Finally there are approaches that employ a "collective coordinate" ansatz governing the phase of each oscillator within a phase-locked cluster, and thereby arrive at a closed equation of motion [31][32][33][34][35].
Here we use a collective coordinate ansatz to derive coarse-grained equations of motion at the level of phaselocked clusters that are consistent with arbitrary distributions of natural frequency within each cluster. We focus especially Gaussian and Cauchy distributions, though our approach is generic. The resulting equations are precisely determined in the N → ∞ limit in terms of parameters of the distributions in question and the matrix of coupling strengths. Similar to other previously obtained analytical results, the emergence of an explicit closure model crucially depends on a number of simplifying assumptions, and no closed-form reduced-order equations of motion are known for finite systems that do not satisfy these assumptions. We aim to move beyond the N → ∞ limit by treating our derived coarse-grained equations as inductive biases, and allowing data from finite-N simulations to determine optimal parameter values. The result is a systematic data-driven procedure for finding physically meaningful coarse-grained models of finite systems of coupled oscillators, that provide a more accurate reduced-order description of the system. This manuscript is organized as follows. In Sec. II we review in detail prior work and precisely state the problem we address. In Sec. III A we state the set of conditions and approximations that let us derive coarse-grained equations of motion for coupled oscillator systems, and in Sec. III B we introduce our data-driven approach for finding optimal coefficient values to use in these coarse-grained equations. In Sec. IV we apply our methods to a concrete example system and evaluate the performance of both our theoretically-derived coarsegrained models, and the same models optimized to fit training data.

II. Background
The Kuramoto model is the ordinary differential equation (ODE) systeṁ where θ i ∈ S 1 is the phase of the i th oscillator, ω i ∈ R is its natural frequency, N is the total number of oscillators, and K ∈ R N ×N is the coupling matrix that defines which oscillators influence each other [10]. We can equivalently formulate the above model in terms of complex phases y i = exp(iθ i ). A straightforward calculation yields the equivalent representatioṅ where we have used the fact that y * i y i = 1. We subsequently explain why this representation of the Kuramoto dynamics is more convenient for our purposes.
Perhaps the best-studied case of the Kuramoto model is that of mean-field coupling, where K ij = K/N for all i, j. In this case it is well known that in the limit N → ∞, the system (1) exhibits a phase transition with respect to K: if ω i are sampled from a symmetric, unimodal distribution, then there exists K c such that for K < K c , the oscillators behave mostly independently, while for K > K c a subset of oscillators spontaneously locks to a single frequency [36].
Synchronization in the Kuramoto model is typically quantified by the order parameter, where R ∈ [0, 1] is the synchrony and Φ ∈ [0, 2π) is the average phase. If all phases are equal then R = 1, and if the phases are spread uniformly over the unit circle, then R ≈ 0. Thus R is a natural measure of synchronization. The dynamics of z depend on the dynamics of all θ i , but it is natural to suppose that in some limit there exists a closed equation for the dynamics of z. Indeed there is, as demonstrated by Ott and Antonsen [17]. Assuming that ω i are Cauchy-distributed, i.e., ω i ∼ g(ω) where g is the Cauchy probability density function with mode Ω and width δ, and that the coupling is mean-field, then z evolves according to a Stuart-Landau equation, The form (4) shows clearly that z = 0 is always a solution, but undergoes a pitchfork bifurcation at K c = 2δ, when a new solution with R = 1 − 2δ/K (and dΦ/dt = Ω) appears, representing partial synchrony that becomes global synchrony (i.e. R → 1) as K → ∞. Interestingly, the same analysis carries over to the case where oscillators are not coupled all-to-all, but are divided into subsets such that the strength of coupling between any two oscillators depends on the subsets to which they belong. Let Π = (P 1 , . . . , P C ) be a partition of the index set {1, . . . , N } and let K be a C × C matrix of coupling strengths. Following [17], such a modular system can be written aṡ The same mathematical machinery as before can be applied to show that if for every σ, {ω i |i ∈ P σ } are distributed according to a Cauchy distribution with mode Ω σ and width δ σ , then the cluster order parameters {z σ }, defined by obey a coupled Stuart-Landau equation of the forṁ The above representation is mathematically equivalent to the original Kuramoto model written in terms of complex phases (2), albeit with the linear term containing a nonzero real part. The consequences of this equation for the existence of mesoscale synchronization were discussed in detail in a prior study [37]. We note that Eq. (7) reduces exactly to the Kuramoto model in the case that δ σ = 0 and |z σ | = 1 for all σ. In this sense, this natural emergence of Eq. (7) implies that, in the Kuramoto model, groups of oscillators can behave collectively as a single oscillator. In other words, there is a renormalization procedure for coupled oscillator systems that remains within the same model class (in particular, the one defined by Eq. (7)).
The above analyses are valid in the limit N → ∞, and require certain regularity assumptions on the initial distribution of phases. As such, the behavior of a finite-N Kuramoto system will in general exhibit fluctuations relative to (4) or (7). On the other hand, the dynamics of any finite-N system in the absence of noise are in fact deterministic, and hence its evolution can be computed exactly from its current (N -dimensional) state. This raises the natural question: Given a Kuramoto system of the form (1), what is the coarsest partition Π = (P 1 , . . . , P C ) such that the local order parameters (6) evolve according to an autonomous ODE? And, what is that ODE?
Alternatively, we suppose that Π is a partition and z Π = (z 1 , . . . , z C ) ∈ C C is the vector of local order parameters as defined above. Does the value of z Π exactly determine the value ofż Π ? And if so, what is the functional relationshipż Π = f (z Π )? Our proposal to address this question is to learn the function f from samples {(z Π (t),ż Π (t))|t ∈ [0, T ]} obtained by coarsegraining experimental or simulated dynamics data of the full system. This learning task becomes a well-posed optimization problem when a finite set of basis functions that can be used to construct f is chosen. Finding f such that ż Π − f (z Π ) is small means that the dynamics of z Π (i.e.ż Π ) can be accurately computed from only the value of z Π , and we say that the system admits a good coarse-grained model described by f (z Π ).

III. Methods
We now propose criteria to select the coarse-graining partition Π, and present analytical calculations that suggest which finite-dimensional function spaces are likely to contain a satisfactory coarse-grained model in the sense of approximating cluster-average phase trajectories. We do this by imposing certain assumptions on the distribution of individual oscillators' phases, and arrive at expressions for the coarse-grained model in terms of static parameters of the fine-grained model. These expressions hence suggest a natural set of basis functions that can be used to inform our data-driven procedure for inferring coarse-grained dynamics directly from observed time-series data.

A. Conditions and Analytical Implications
To fix notation, we consider the Kuramoto model in complex form (2), and refer interchangeably to either real-valued phases θ i or their complex versions y i := exp(iθ i ). It is possible to write exact evolution equations for the coarse variables z σ (6) if we allow explicit timedependence through the residuals x i (t) = z σ (t) − y i (t). Explicitly, we havė where each coefficient (A)-(G) is a function of the residuals x i (t). The terms with coefficients B σ and E σσ are those that are present in the original Kuramoto equations written in complex form.
The functional form of the coefficients is given by If one could obtain an accurate expression for x i in terms of microscopic parameters ω i , K ij , and the coarse variables z σ , then the explicit time-dependence could be removed, leaving an autonomous ODE for the coarse variables {z σ }. In order to arrive to simplified expressions, we build on the work of Gottwald concerning collective coordinates [31]. That study proposes that the residual phase of each oscillator within a phase-locked cluster is directly proportional to its natural frequency relative to the cluster average frequency. The associated constant of proportionality serves as a "collective coordinate", which enables an expression involving all phase variables in terms of the cluster average phase and the collective coordinate. Accordingly, we describe how to simplify the above equations based on conditions for a collective coordinate ansatz to be appropriate.

Condition 1. Modularity of the coupling matrix.
The coupling matrix K is such that K ij = K σσ whenever i ∈ P σ and j ∈ P σ .
Through Condition 1, we require that the coupling strength between any two oscillators is a function only of their module memberships. Importantly, Condition 1 is preserved under refinement; if Π 1 refines Π 2 and Π 2 satisfies Condition 1, then Π 1 also satisfies Condition 1. It is also true that there is a unique coarsest partition Π struct that satisfies Condition 1, and any other partition satisfying Condition 1 refines Π struct ; to see this, note that Π struct can be constructed as the set of equivalence classes under the relation i ≡ j ⇐⇒ K i· = K j· ∧ K ·i = K ·j , i.e. two nodes are equivalent if and only if their in-and out-neighborhoods are identical. We refer to Π struct as the structural partition. Note that this definition is more stringent than other partitions that have been used elsewhere such as (external) equitable partitions [38] or orbit partitions induced by the automorphism group of the coupling matrix [39,40]. Condition 1 greatly simplifies several of the coefficients; we have Notice that the dependence on the fine-grained state of the system now appears only in (15) and (19), so with an appropriate ansatz for the {x i } in terms of {z σ } and parameters, a closed-form system can be obtained.

Condition 2. Phase-cohesiveness of clusters.
Each cluster is phase-cohesive, meaning that |θ i (t) − θ j (t)| remains bounded for all t, whenever oscillators i and j belong to the same cluster.
As before, Condition 2 is a property preserved under refinement, and the set of partitions that satisfy it can be defined by a coarsest partition Π dyn , the dynamical partition. Any partition satisfying Condition 2 is a refinement of Π dyn , and one can obtain Π dyn as the set of equivalence classes under the equivalence relation This enables the following approximation, which underpins the present derivation.
Approximation 1. Linear collective coordinate ansatz. We take a simple collective coordinate ansatz, namely that the phase residual of each oscillator around its cluster mean is directly proportional to its frequency residual about its cluster mean. Formally, we assume whereω i = ω i − ω i i∈Pσ is the residual of oscillator i's natural frequency relative to the mean in its cluster and α σ is a cluster-dependent constant of proportionality, to be determined.
Note that this is only one choice of many; another natural choice is that the phase residual depends not linearly on the frequency residual but on its arcisne. We also note that the Conditions 1 and 2 are necessary for such an ansatz to be appropriate, since under these Conditions the only remaining distinction between oscillators within a cluster is in their natural frequencies. Thus we may reasonably expect that the dynamics of different oscillators in a cluster differ in a regular way.
The proportionality constant α σ must be such that y i i∈Pσ = z σ , and it is this constraint that lets us finally obtain an autonomous ODE for {z σ }. This leads to a consistency condition that reads where χ σ is the characteristic function of the distribution of natural frequency residualsω i = ω i − ω i i∈Pσ within cluster σ. This condition gives the value of α σ implicitly in terms of z σ , and thereby lets us describe the entire distribution of phases within a given cluster -and hence the expectations in Eqs. (15) and (19) -in terms of the single variable z σ . The general formulae are If for, example, the natural frequency residuals are distributed according to a Gaussian distribution with variance v σ , then χ σ (α σ ) = exp(−v σ α 2 σ /2), and we have Substituting the above expressions into Eq. (8) yields a closed system of equations for the dynamics of the coarse-grained variables, whose parameters are directly computable from the parameters of the fine-grained system.
is the first class of closed reduced-order system of equations considered here. It was derived by substituting the closure approximations (25) and (26) into Eq. (8). We refer to the functional form on the right-hand side of Eq.
On the other hand, it is reasonable to suppose that the equations (7) derived by Ott and Antonsen, proven to be valid in the N → ∞ limit, are still approximately valid for small N , or at least are worthy of consideration as a candidate model class.
is the second class of closed reduced-order system of equations considered here. It is derived by applying the Ott-Antonsen ansatz to the infinite-N limit of Eq. (5) under the assumption of Cauchy-distributed natural frequencies within each module. We refer to the functional form on the right-hand side of Eq. (28) as the Cauchy Ott-Antonsen (COA) form.
It is in fact possible to formally derive Eq. (28) by applying the formulae (24) under the assumption that natural frequencies in each cluster σ are Cauchy distributed with location Ω σ and scale δ σ . In this case we have χ σ (t) = exp(−δ σ |t|), so Note, however, that strictly speaking, the collective coordinate approximation (22) is not consistent in this case because it models the phases of oscillators with large natural frequencies to be a large constant greater than the mean phase, and implies that their instantaneous frequency is the mean frequency, but this is not the case.
Since in general we consider small numbers of oscillators, it is not necessarily true that the set of natural frequencies within any given cluster is representative of a Gaussian-or Cauchy-distributed population. To circumvent this, we approximate the characteristic function χ σ as a Taylor series, and show that its behavior in the regions we care about is dominated by the first and second central moments of the empirical distribution of natural frequencies within a cluster. We have Note that the relevant values of α are necessarily such that the residual phases αω i are not too big, and so are inversely related with the largest values ofω i . Finally we must impose the consistency condition (23). Assuming that terms of O(α 3 ) are negligible and that ω i = 0, we can solve the consistency condition χ σ (α σ ) = |z σ | for α σ to get Using this approximate solution to estimate the closure terms in Eq. (24), we obtaiṅ which approximates Eq. (27) when {|z σ |} are near 1.

B. Data-driven Inference of Model Parameters
The derivations above make several simplifying assumptions and useful approximations, some of which will not be satisfied in practice. To find a coarse-grained description of a given finite-dimensional system of coupled oscillators, we take a hybrid approach, using the above derivations as inductive biases, and using data to select coefficient values.
We focus especially on the coarse-grained systems in Eq. (27), i.e., GCC, and Eq. (28), i.e., COA. For each coarse-grained system, the right-hand side is a combination of particular functions of {z σ }, each with a coefficient that depends somehow on the distribution of {ω i } and the coupling matrix K. In the absence of the N → ∞ limit, however, it is not clear that simple summary statistics will give a model that accurately reflects the coarsegrained dynamics of the finite-N system.
We now describe a procedure for inferring the parameters of a coarse-grained model, either of the form in Eq. (27) or Eq. (28) assuming that the coarse-graining partition Π is known. To illustrate what we mean explicitly, we restrict our attention momentarily to Eq. (28). Given a solution {θ i (t)} of the system (5) in a time-series form, we can obtain a coarse-grained time series {z σ (t)} according to Eq. (6). Our hypothesis is that these coarsegrained variables evolve according to an equation of the form Eq. (28). If the hypothesis holds, then it should be possible to recover the appropriate coefficients, which we can interpret as effective natural frequencies and coupling parameters, using a least-squares regression based on the coarse-grained data {z σ (t)}. Observe first that the right-hand side of (28) is a linear combination of terms that can be measured directly from data. To clarify this, we re-write Eq. (28) aṡ Let z σ = (z σ (t 1 ), . . . , z σ (t n )) T andż σ = (ż σ (t 1 ), . . . ,ż σ (t n )) T denote the time-series of observations of the coarse-grained variable z σ and its derivative, respectively.
Then the parameters p σ = (ω σ , B σ1 , . . . , B σC ) T can be inferred by solving the following least-squares optimization [41,42] argmin where the n × C + 1 matrix G σ contains the values of the terms of which the entries of p σ are coefficients, and (·) denotes the imaginary part of the argument. Explicitly, we can write the set of basis functions G σ as where complex conjugate, product, and squaring of the vectors z σ are understood to be taken element-wise. Minimizing the squared residual (38) for each σ = 1, . . . , C gives a complete set of parameters for the COA model in Eq. (37). The constraints in Eq. (39) ensure that the resulting model leaves the C-fold product of the unit disk invariant, i.e. that |z σ (t)| ≤ 1 ∀σ =⇒ |z σ (s)| ≤ 1 ∀σ for all s > t. This property is desirable since {z σ } are averages of unit complex numbers, and so should always lie inside the unit disk. We can likewise perform the same analysis using the inductive bias given by GCC in Eq. (27), by setting the set of basis functions as follows: In accordance with previous results [17,37], we expect to see a satisfactory fit if N (the number of oscillators) is very large and the coupling is as described in Eq. (5). The numerical procedure described above lets us assess the suitability of GCC in Eq. (27) and COA in Eq. (37) as coarse-grained models for finite N .

IV. Results
We now evaluate the performance of the abovedescribed coarse-grained models by applying them to a range of Kuramoto oscillator systems for which we have fine-grained trajectories. We examine their performance both with theoretically-derived coefficient values (as in Eqs. (27) and (28)) as well as those optimized to fit observations. Further, we demonstrate the necessity of both Conditions 1 and 2 by showing that coarse-graining in a way that satisfies only one or the other Condition gives a worse model than coarse-graining in a way that satisfies both.

A. Data Generation
We perform numerical experiments for a system of N = 15 Kuramoto oscillators, i.e. Eq. (1), organized into three modules of five nodes each, and we only consider coupling matrices that respect this organization. This means that we know the structural partition Π struct a priori; it is {{1, . . . , 5}, {6, . . . , 10}, {11, . . . , 15}}. The coupling between nodes is such that each pair of nodes in the same module is coupled with strength K ij = K in /N and pairs of nodes in different modules are coupled with strength K ij = K out /N . For each oscillator we sample a natural frequency ω i from a standard normal distribution (i.i.d.), and fix them through all experiments, and their values are shown in Fig. 1. The free parameters in our experiments are K in ∈ [0, 10] and K out ∈ [0, 2.5]; for each parameter we sampled 50 values evenly spaced in the respective interval (inclusive of endpoints). For each pair of (K in , K out ) values, we generate a trajectory of length T = 1000 with a time resolution of ∆t = 0.01, starting from uniformly random initial phases. For each trajectory, we derive a coarse-grained model in each of twelve (3 · 2 · 2) different ways, corresponding to three different choices: In what follows we clarify the meaning of each of these choices and examine their effect on the quality of coarsegrained model that results. Partition: For each trajectory {θ i (t)}, we must choose a partition to coarse-grain the data. According to Condition 1, the partition should respect the symmetry of the coupling matrix, i.e. it should be a refinement of Π struct , and according to Condition 2 it should be such that all oscillators within a partition element are phase locked, i.e. it should be a refinement of Π dyn . Imposing both conditions means that the partition should refine both Π struct and Π dyn , i.e. it should refine Π meet := Π struct ∧ Π dyn (see Fig. 2). In order to illuminate the importance of each Condition while obtaining the coarsest possible model, we evaluate the performance models coarse-grained according to each of Π struct , Π dyn , and Π meet .
In practice, it is prohibitively expensive to enumerate all partitions of N nodes to find Π dyn that satisfies the condition defined by Eq. (21), so we use a heuristic. Given a trajectory {θ i (t)}, we compute a long-term average frequency ω i = (θ i (t 0 + T ) − θ i (t 0 ))/(T − t 0 ), and cluster nodes initially by their value of ω. We can then 2. An example of the meet of the structural and dynamical partitions. In the structural partition, nodes are colored according to the partition that determines the coupling strengths (line thicknesses) between nodes. In the dynamical partition, nodes are colored according to their membership in phase-cohesive groups. The meet (right) is the coarsest partition that refines both the structural and dynamical partitions.
check if the resulting clusters are in fact phase cohesive, and if they are not, perform a finer clustering of ω and repeat until we have phase cohesive clusters. Other approaches to identifying phase clusters have been considered and could be used in place of the method described above [43][44][45]. Constructing Π meet from Π struct and Π dyn is straightforward; it is simply the set of all nonempty pairwise intersections of partition elements from Π struct and Π dyn .
Model class: Given a partition Π, we coarse-grain the data to obtain {z σ (t)|P σ ∈ Π}. We now choose what class of model to find, COA or GCC. The COA model is defined by Eq. (28), and the GCC model is defined by Eq. (27).
Method for finding coefficients: Finally, we have the choice of using coefficient values predicted from theory (that includes several possibly-violated assumptions and approximations), or inferring them from data as described in Sec. III B. Note that the theoretical values present in the COA model depend on parameters (Ω σ , δ σ ) of the Cauchy distributions from which natural frequencies are drawn, which do not actually exist in our case. To circumvent this difficulty we perform a maximum-likelihood inference of these parameters from the data {ω i |i ∈ P σ } assuming that they were drawn from a Cauchy distribution (as implemented by scipy.stats.cauchy.fit [46]).
Overviews of coarse-grained model quality as a function of coupling parameters (K in , K out ) are depicted in Figs. 5 and 6. The metric used is root-mean-squared error in time-derivative, i.e. f (z Π ) − dz Π /dt , wheref is the coarse-grained model in question and · is the rootmean-square with respect to time t and partition element index σ. Heatmap colors are arranged on a logarithmic scale. Compare to Fig. 3 to relate coarse-grained model performance to characteristics of the dynamical partition relative to the structural partition.
A more systematic set of comparisons illuminating the impact of the three choices (partition, model terms, and method) is given in Figures 7-10.  Kout) colored by how the dynamical partition compares to the structural partition. The relation P < Q means that partition P is a refinement of partition Q, and incomparable means that neither P < Q nor Q < P . See Fig. 4 for examples of each of these cases.

C. Numerical Results
We now present an overview of the dynamics we simulated, and the performance of each of the twelve coarsegrained models described above.
While Π struct is known a priori and fixed for all experiments, Π dyn changes depending on the values of K in and K out . Figure 3 gives an overview of how Π dyn compares to Π struct as a function of K in and K out . There are four possibilities: either Π struct < Π dyn , Π dyn < Π struct , Π struct = Π dyn , or the two partitions are incomparable (i.e. neither one is a refinement of the other, nor are they equal). As one might expect, the region of parameter space where the two partitions are equal lies where K in is large and K out is small. When both K in and K out are large, global synchronization occurs and thus Π struct < Π dyn holds. For moderate values of K in and small K out , phase-locked clusters remain restricted to structural clusters and Π dyn < Π struct , while for the remaining parameter space the two partitions are incomparable.
Theoretically determined coefficients: In Fig. 5, we give an overview of coarse-grained model quality using theoretically-derived coefficient values. We can see that the same subsets of parameter space highlighted in Fig.  3 appear in panels b, d, and f of Fig. 5, corresponding to the GCC ansatz. Note that applying the GCC ansatz to the meet partition (d) obtains reasonably good performance across the entire parameter space, while under the structural partition (b), good performance is only attained when the structural modules are in fact internally synchronized. Conversely, under the dynamical partition (f), the GCC model performs well mainly in regions of parameter space where the dynamical partition refines or is equal to the structural one (cf. Fig. 3), since in these cases the dynamical partition equals the meet partition. Notice also that when K in ≈ K out (red line, e-f), the structural partition is effectively the entire network as a single module, and in this region we also see somewhat better performance of the GCC model. The COA ansatz attains reasonable performance for small values of K in under the meet partition -in this case many dynamical modules are singletons, and are therefore trivial to coarse-grain; otherwise, the COA ansatz does not perform very well, indicating that the assumptions underlying its derivation do not hold for the system at hand.
Coefficients optimized to fit the data: In Fig. 6 we give an overview of coarse-grained model quality using coefficient values optimized to fit the data. The most notable contrast with Fig. 5 is that the left and right columns (COA vs. GCC) are much more similar to each other in Fig. 6 than in Fig. 5. This indicates that given the possibility of tuning coefficient values, the functional form of the COA ansatz is nearly equivalently expressive to the functional form of the GCC ansatz, relative to the dynamics at hand. Again, note that coarse-graining according to the dynamical (resp. structural) partition attains good performance in regions where it refines the structural (resp. dynamical) partition and hence is equal to the meet partition. However, all partitions are able to achieve good performance in the large region of global synchrony, owing to the simplicity of the resulting dynamics.
Quality of inferred models: In Figs. 7-10 we take a more quantitative view on the results just described by forming scatter plots that make pairwise comparisons between coarse-grained models that differ in one design characteristic: partition, model class, and inference method. In each figure, every dot represents one Each point corresponds to a single instance of the coarse-graining problem. In many cases (under the meet partition) the GCC theory performs nearly as well as GCC inference, indicating that the assumptions going into the GCC theory were well-supported. Note that COA theory with the structural partition never does particularly well.
instance of the coarse-graining problem that has been solved in each of twelve ways.
In Fig. 7 we examine the impact of inferring coefficients based on data vs. using their theoretical values. We see clearly that in all cases, inference affords better performance, but that in some cases theoretical coefficients perform nearly as well. This is the case for all situations except the COA anstaz using the structural partition, where the model with theoretical coefficients always performs poorly.
In Fig. 8 we examine the impact of using the COA vs. the GCC ansatz. As noted qualitatively above, when allowing for inferred coefficient values, the two ansätze attain comparable performance, while GCC is generally better when using theoretical values. This reflects the fact that the two function classes are equivalently expressive for the dynamics at hand, while the assumptions underlying the derivation of GCC are more closely reflective of the system under study than those leading to COA.
In Figure 9 we investigate the difference in coarsegrained model quality when coarse-graining according to the meet partition vs. the structural partition. This amounts to imposing Conditions 1 and 2 vs. only imposing Condition 1. Unsurprisingly, the meet partition never  Fig. 7, organized to show the effect of using model terms predicted from GCC instead of COA. Interestingly, when given the freedom to adapt coefficient values to observed data, COA and GCC models perform comparably. This indicates that relative to the behaviors considered in these numerical experiments, the COA and GCC function spaces are somehow comparably expressive.
gives poorer fit quality than the structural one, but there are cases along the diagonal when the two partitions coincide.
Finally, Fig. 10 compares the meet partition with the dynamical partition. The meet partition generally outperforms the dynamical partition, indicating that Condition 1 is indeed necessary above and beyond Condition 2 to obtain a satisfactory coarse-grained model, even if one is allowed to optimize coefficients to fit the data. This suggests that mere phase-cohesiveness is not enough to justify a collective coordinate ansatz such as Eq. (22). Rather, the cluster members should be influenced by other oscillators in the same manner -have the same inneighborhoods -rendering their natural frequencies the only remaining differences between them to explain their different phases.

D. Discussion and Path Forward
In this study, we introduced and numerically validated a data-driven procedure for discovering the reduced-order equations for the Kuramoto system in the presence of modular synchronization. In particular, we have demonstrated that in a modularly synchronized Kuramoto oscillator system, groups of mutually synchronized oscillators may be treated each as meta-oscillators, thereby yielding  Fig. 7, organized to show the effect of coarsegraining according to the meet partition vs. structural partition. Note that the meet partition is never worse than the structural one, and they agree whenever they are equal. This suggests that the added condition of phase-cohesiveness within modules (Condition 2) does make a significant difference in one's ability to find an accurate coarse-grained model.  Fig. 7, organized to show the effect of coarsegraining according to the meet partition vs. structural partition. Note that the meet partition only rarely outperforms the dynamical one, and they agree whenever they are equal. This suggests that the added condition of modularity of coupling strengths (Condition 1) does make a significant difference in one's ability to find an accurate coarse-grained model.
dynamics that belong to a family that strictly generalizes the Kuramoto model.
A central challenge of the problem considered here is to remove explicit time-dependence from the triviallytrue coarse-grained model (8). Here, we accomplished this task by imposing a set of assumptions on the coarsegraining map that justify a collective coordinate ansatz.
An alternative approach requiring less advance knowledge of the underlying system would be to search over the space of partitions, either by brute-force or greedily, seeking those for which the coefficients (9)-(14) are a single-valued function of the coarse-grained state {z σ }. If such a function exists, its substitution into Eq. (8) yields an autonomous ODE for the coarse variables.
Stripping away yet another layer of theory, we can remain largely agnostic about the functional form of the equations that should governż and obtain the set basis functions using a procedure such as SINDy [41], or use black-box representations such as neural-networks to discover a parsimonious basis. Still, given the enormous complexity of the space of functions f : C C → C C , it is prudent to impose certain domain-informed restrictions on the function dictionary we choose.
First, the OA ansatz and its associated dimension reduction shows that the reduced model exhibits on-site and pairwise terms, and no higher-order (i.e. three-ormore-particle) terms. Moreover, each coupling term is of the same form. Therefore it is reasonable to suppose that our function basis should contain a set of onsite terms and a set of coupling terms, and that these terms should be replicated for each node and edge, respectively.
Next, it is known that coarse-graining of dynamics can induce memory effects [47], so it is reasonable to expect that a coarse-grained model including memory or nonlocality (in the form of polyadic coupling terms) would be effective. Data-driven discovery of coarse-grained dynamical models including memory is discussed in [48]. Polyadic coupling terms make no conceptual difference, but combinatorial explosion may make their inclusion computationally costly. A key potential advantage to including memory is the possibility of obtaining a good coarse-grained model with fewer state variables (i.e. a coarser partition).
Finally we remark on some physical principles that inform our methods for inferring coarse-grained dynamical models. Notice that in Eq. (28), the coefficient of the linear term has negative real part (because δ σ ≥ 0 is the width of a Cauchy distribution, and so cannot be negative), and the coupling matrix K σσ is real. These conditions together imply that the C-fold product of the unit disk is invariant; in other words, if all complex phases z σ initially have magnitude not greater than 1, then they will continue to have magnitude not greater than 1 for all time. This condition breaks down if we allow the coupling matrix to have an imaginary part or the coefficient of the linear term to have positive real part, and so we impose those constraints during our inference procedure.
Constraints of this type can be thought of in a more general context. We know that the coarse-grained variables z σ should always remain bounded in the unit disk, because they are averages of quantities on the unit circle. Knowing this, we perform constrained inference of the right-hand side of an ODE governing those variables, with the constraint being that a function is feasible only if the corresponding ODE leaves the appropriate set in state space invariant. Fortunately, this constraint is convex, and is therefore straightforward to impose on the inference procedure.

V. Conclusions
In this study we present a data-driven and theoretically-grounded approach to constructing coarsegrained models for finite-dimensional Kuramoto oscillator systems. To do this, we build on prior work that leverages the infinite-N limit, as well as collective coordinate methods that offer the possibility of closing the explicit time-dependence induced by coarse-graining. We complement these theoretical approaches with a datadriven inference procedure that selects both the appropriate coarse-graining partition and optimal coefficient values for a given finite Kuramoto system. Our aim is to demonstrate the utility of combining theoretical insight with data-driven inference for finding accurate coarsegrained models of finite heterogeneous systems.
Our results reveal that partitioning nodes in a way that respects the underlying coupling network is vital to the validity of a collective coordinate approach, above and beyond phase cohesiveness. In addition, we find that given the freedom to optimize coefficient values based on data, the functional form predicted by Ott and Antonsen based on the assumption of Cauchy-distributed natural frequencies performs very comparably with the functional form predicted by the more realistic (for the case considered here) assumptions of Gaussian-distributed natural frequencies and a linear collective coordinate ansatz. This shows that there is considerable freedom in learning the right-hand side of a coarse-grained ODE, and that other criteria could be brought into play to disambiguate between comparable models. In the present work, the derivations given in Sec. III perform this function in their role as inductive biases; other criteria such as parsimony [41] or interpretability might also be applied. We anticipate that the approach introduced in this work will find applications in other types of complex dynamical systems.