Generalized Glauber dynamics for inference in biology

Large interacting systems in biology often exhibit emergent dynamics, such as coexistence of multiple time scales, manifested by fat tails in the distribution of waiting times. While existing tools in statistical inference, such as maximum entropy models, reproduce the empirical steady state distributions, it remains challenging to learn dynamical models. We present a novel inference method, called generalized Glauber dynamics. Constructed through a non-Markovian fluctuation dissipation theorem, generalized Glauber dynamics tunes the dynamics of an interacting system, while keeping the steady state distribution fixed. We motivate the need for the method on real data from Eco-HAB, an automated habitat for testing behavior in groups of mice under semi-naturalistic conditions, and present it on simple Ising spin systems. We show its applicability for experimental data, by inferring dynamical models of social interactions in a group of mice that reproduce both its collective behavior and the long tails observed in individual dynamics.

Living systems are intrinsically dynamic and out of equilibrium, as manifested by the co-existence of multiple timescales and the breaking of Markovian rule in animal behavior [21][22][23][24] and neuron activities [25,26].However most recent work focuses on inferring the static properties of collective behavior: analyzing the joint probability distributions of the interacting components and relating these global states to functional behavior.In many cases, inferred pairwise interaction models succesfully reproduce the correlation structure of the data [27][28][29], leading to identifying the empirical rules for collective neuronal encoding [2,15,[30][31][32], the interaction structure of bird flocks [19,33], or contact maps of proteins [34].While these approaches do not directly address the dynamics, attempts have been made to reconcile them with dynamical models of neurons [18,35] and flocks [36] by using classical rules of equilibrium dynamics.
Recent methods to learn the dynamics of interacting systems focus on extensions to second order dynamics for continuous systems [37][38][39][40].For discrete spiking neurons, * Corresponding authors.These authors contributed equally.
dynamical inference focuses on reproducing pairwise correlation functions between different timepoints [41][42][43][44], or inferring transition probabilities or causal dependencies [45].The extension of maximum entropy to reproduce time-delayed cross-correlations (called maximum caliber [46]) is computationally expensive and requires a lot of data to train [41,42].However,many possible dynamical models can generate the same steady state distribution.More fundamentally, in general the dynamics cannot be automatically related to the steady state distribution, especially if the learned models are out-ofequilibrium.In transition models, the transition rate of a component is given by the history of itself and other inputs, stimuli and interacting components in the network.By incorporating autoregressively generated noise in the transition rate, recent developments in Generalized Master Equation (GME) models have introduced the capacity to encode latent variables [47].The generalized linear model (GLM), a representative of this class used in neuroscience, successfully reproduces different types of neuronal dynamics and firing patterns in many brain regions [45].However, since GLMs are constructed by definition to predict only the next time point forward, they often generate unstable trajectories and produce inconsistent steady state distributions with respect to the training data [48][49][50][51].
We propose an inference method, called generalized Glauber dynamics (GGD), that combines the power of steady state inference with dynamical inference.Constructed through a non-Markovian fluctuation dissipation theorem, the generalized Glauber dynamics tunes the dynamics of an interacting system, while keeping the steady state equilibrium distribution fixed.In practice, this method allows for the inference to be separated into two parts: first, inference of the steady state distribution using maximum entropy models, and then, tuning the dynamics to match the data.The basic idea behind the GGD is similar to generalized Langevin dynamics: coupled degrees of freedom are integrated out to generate an effective memory kernel, such that the dynamics of the system depends on its history.Interestingly, the functional form of the GGD is similar to the GLM but differs dramatically in its link to the steady-state distribution.We demonstrate the power of GGD to predict the co-localization pattern of groups of socially interacting mice.

A. Collective behavior of social mice
We studied the interaction structure of groups of animals in a controlled environment.We analyzed data generated from the Eco-HAB experiment [11] and presented in [52], where a group of N = 15 freely-moving male mice live in an artificial ecological environment resembling natural burrows (Fig. 1(a)).The Eco-HAB consists of four chambers, two of them with food, connected by tunnels.The experiment lasts for 10 days, with alternating light conditions of darkness and brightness, each lasting 12 hours, to simulate the day-night cycle.Mice are able to behave and interact freely, without any experimental constraints or manipulation.Each mouse is equipped with a unique radio frequency identification transmitting chips (RFID) that allow for the detection of their position everytime they pass by the 8 recording antennas (marked in black in Fig. 1(a)).The data consists of time points with 1-ms resolution at which each mouse passed a given recording device, which allows us to identify the location of each mouse as a function of time, σ i (t) = 1, 2, 3, or 4 (Fig. 1(b)).
Mice are nocturnal animals with increased activity during dark periods.For the purpose of method development we focus on dark periods and restrict our analysis to a 6 hour period of stable activity, which consists of the first and bigger of the two nocturnal activity peaks characterizing the used strain of mice, C57BL6/J (see SI Fig. S1(a)).The individual dynamics are characterized by a basal activity rate of moving between boxes, and by the tendency to explore the next box rather than come back to the previous one, which we term "roaming" (see SI Fig. S1(b)).Dominant mice tend to chase others more frequently [52], resulting in chaser-chased dynamics that are significant within a short time scale of a few seconds (see SI Fig. S1(c)).
To measure collective behavior at the group scale, we examine the excess frequency of finding two mice in the same box.The mean frequency of mouse i in box α reads δ σi,α , where δ a,b = 1 if a = b, and 0 otherwise.The excess pairwise frequency is given by the correlated pairwise correlation C ij = δ σi,σj − α δ σi,α δ σj ,α , where the first average is over all box combinations (Fig. 1(c)).Non-zero correlations between mice strongly decrease when the data are shuffled across time within the same day, and completely go away when the data are shuffled across days.This suggests that interactions between mice drive the correlation, rather than the environment (i.e. the day).
To study the dynamics of that collective behavior, one can look at the temporal auto-correlation of the total number of mice in each box, n α (t), defined as The excess of this auto-correlation over its counterpart in the shuffled dataset (Fig. 1(d)) is a signature of group behavior, which slows down the overall dynamics of occupancy by keeping individuals in heavily-occupied boxes longer.This is confirmed by the long tails in the distribution of the waiting time, i.e. the duration between transitioning events (Fig. 1(e) and (f)).

B. Modeling the steady-state distribution
These experimental observations suggest collective effects driven by direct interactions between mice that lead to effective phenomena scanning a broad range of timescales.Our goal is to find a set of effective equations that describe the evolution of the system and are consistent with the properties of the observables in Fig 1. Predicting the full dynamical collective behavior requires defining both the static distribution of box occupancies, and the type of dynamics that governs the transitioning of the mice between boxes.We separate the inference problem into two steps: first, we infer the steady-state distribution, P s (σ) for the macrostates σ = (σ 1 , . . ., σ N ) (N = 15), using a maximum entropy approach, and then infer the dynamics while keeping the steady state distribution fixed.
The maximum entropy approach has been applied in a wide range of biological contexts [2,15,16,19,34,53].It generates approximations to the steady state distribution that match the expectation values of a chosen set of observables while keeping the model otherwise as random as possible.Here we constrain the co-localization probabilities of all pairs of mice, C ij , as well as the single-mice box occupancy functions δ σi,α .The maximum entropy distribution then takes the form of a Boltzmann's law: where h i,α and J ij are Lagrange multipliers that must be tuned to satisfy the constraints, E(σ) is interpreted as an energy by analogy with statistical mechanics, and Z s enforces normalization.We fit this model to the Eco-HAB and show that it correctly predicts collective statistics of occupancy that were not fit in the model, such as triad correlations and the probability of pairs of mice to be in a specific box (SI Fig. S2).

C. Glauber dynamics fails to capture the long-time behavior
The same steady-state distribution, Eq. ( 1), can be generated by many different dynamical models.The simplest assumption inspired by statistical physics is to assume that the transition rate of a mouse from one box α to an adjacent one β (assuming that transition is instantaneous so that two mice never transition at the same time) is a function of the difference in energies between the ending and starting states, ∆E i,α→β (σ).Writing the transition rate between adjacent boxes as W i,α→β = µ i f (∆E i,α→β ), where µ i is the overall activity of mouse i and f (∆E) is a function, a sufficient and necessary condition on f for these rates to admit Eq. (1) as steady state is given by detailed balance: We tested some classical forms for f (∆E) on the data (Fig. 2(a)).We found that the empirical normalized transition rates W i,α→β /µ i are well reproduced by the form of the Glauber dynamics, f (∆E) = 1/(1 + e ∆E ), but not by the Metropolis-Hasting prescription, f (∆E) = min(1, e −∆E ).However, even the Glauber dynamics do not reproduce the long tails in the waiting time distribution, nor does a nonparametric form of transition dynamics f (∆E) directly estimated from the data (Fig. 2(b), SI Fig. S3).It did not have to be the case: while these dynamics are Markovian and memory-less at the group level, long time scales may nonetheless emerge from interactions, as for example during critical slowing down.The failure of the model suggests that, by themselves, these concurrent and co-localized pairwise interactions are not strong enough for such long time scales to emerge.Additionally, the transition probability conditioned on the elapsed time after the last transition exhibits tails (SI Fig. S4).Together, these results suggest that the dynamics may have a Glauber form, but that additional memory effects must be incorporated.Here, the term "memory" describes how the dynamics depends on the past location time series.

D. Generalized Glauber dynamics
Our goal is to add long-term memory effects to the equilibrium dynamics described above, while keeping FIG.2: Glauber dynamics based on the inferred steady state distribution, Eq. ( 1), fails to predict the long-time dynamics of mice.(a) The normalized mean transition rate is well reproduced by Glauber dynamics with the parameters of the steady state model, but not by Metropolis-Hasting dynamics.(b) However, Glauber dynamics with the parameters of the steady state model do not reproduce the long tails of the waiting time distribution.The dynamics are also not reproduced by a non-parametric estimation of the transition rates (see SI Fig. S3).
maximum entropy distribution valid.For concreteness, we start from Glauber dynamics, and call the method the generalized Glauber dynamics (GGD), although the approach can be generalized to other equilibrium dynamics.
Our approach is general and applies to any group of N correlated, categorical variables taking q possible values, σ i = 1, . . ., q (q = 4 in the Eco-HAB).For simplicity of exposition, here we outline the derivation for a single binary (Ising) spin (N = 1, q = 2), σ = ±1 in spin convention.Relating to the Eco-HAB, this is equivalent to a single mouse placed in an experiment apparatus with two connected chambers (denoted -1 and +1).The general derivation for arbitrary N and q is given in the SI.
The maximum entropy distribution is given by the energy, E(σ) = −hσ, and the Glauber dynamics is defined by the transition rates: W (σ → −σ) = µe −hσ /(e h + e −h ).To include memory, we take inspiration from multi-dimensional Markov systems with equilibrium dynamics, such as hidden Markov models and generalized Langevin equations.The idea is to consider a larger equilibrium system coupling both the observed spins and some hidden degrees of freedom.While this augmented system is Markovian, the subsystem formed by the observed spins may exhibit memory.
In practice, we couple the spins to a heat bath of harmonic oscillators (see Fig. 3(a) for schematics, Ref. [54] for the standard derivation for continuous variables and SI for a detailed derivation for categorical variables).After integrating out the hidden degrees of freedom, the transition rates of the GGD take a Glauber-like form, W (σ → −σ) = µe −h eff σ /(e h eff + e −h eff ), but with an ef-fective, time-dependent field where the noise correlation satisfies the generalized fluctuation-dissipation relation Γ(t) is an arbitrary function that specifies how the spectrum of oscillators couples to the spin (see SI Section 1B).
The second and the third terms depend on the memory kernel, and is defined to be h mem (t).
The first term of Eq. ( 2) is the local field learned from the maximum entropy model, already present in classical Glauber dynamics.This terms generalizes readily to the case of multiple interacting spins as the local field h i (σ) acting on spin i (defined as half the energy difference between the configurations with σ i = −1 and σ i = +1, the other spins being fixed).The second and the third terms depend on the history of the spin σ(t), and add memory to the dynamics.The last term the colored noise that results from the coupling with the memory kernel.When extending to q states, the fields h and h eff becomes vectors of length q, and Γ(t) takes the form of a q × q matrix coupling the different states together.Memory kernels G ij (t) may also be added to couple different degree of freedom i to the memory of j (see SI Section 1C).By construction, the process is reversible, as a subsystem of a larger equilibrium system including both spins and the oscillator bath, and its steady state is still given by Boltzmann's law, Eq. (1).The choice for the memory kernel Γ(t) is general and can be chosen from a large family of functions.

E. Range of possible dynamics of GGD
We now illustrate the range of possible dynamics generated by GGD, by simulating simple toy models of Ising or Potts spins (see Material and Methods for details on the simulations).We start by asking whether our illustrative example of a single spin variable with memory can generate non-Markovian tails of the waiting time distribution, as observed in the mice experiments (Fig. 2(b)).We define a GGD for a single spin with an exponentially decaying memory kernel, Γ(t) = A exp(−t/τ ), where τ is the time scale of the self-memory, and compare to the classical Glauber dynamics (A = 0).By construction, both dynamics predict the same steady state distribution, characterized by σ = tanh(h) (see SI Fig. S5).However, the GGD predicts a long tail in the waiting time distribution, whereas the naive Glauber dynamics yields an exponential distribution of waiting times (Fig. 3(b)).Thus, even a simple form of the memory kernel can create long memory effects similar to those observed in data.
Second, we illustrate the model's ability to account for non-Markovian flow between states.In the Eco-HAB FIG.3: Toy models with generalized Glauber dynamics.(a) Schematics of the GGD.In addition to classical couplings Jij, by considering coupling to an oscillator bath, the observed degrees of freedom are coupled to their own memory (through kernel Γi(t)), to that of their neighbours (through kernels Gij(t)), as well as to noise sources ξi(t) correlated across time and variables.(b) shows for a single Ising spin that adding an exponentially decaying memory kernel can create long tails in the waiting distribution.(c) A single 4-state variable with memory can create a bias in the tendency to continue transitioning in the same direction as in the previous transition, versus going back to the previous state.(d) In a multiple spin system, the dynamics can depend on the history of other spins, illustrated by 10 Ising spins arranged in a loop.
data, different mice have different levels of roaming-the ratio of probabilities of moving forward versus moving backward in two consecutive transitions.This effect is non-Markovian and arises from memory effects.In the GGD, this memory can be encoded in non-diagonal elements of the 4 × 4 matrix, Γ(t).We define a GGD on a 4-state variable through a kernel Γ(t) with cyclic symmetry and exponentially decaying terms (see detailed parametrization in SI Section 1D).The model can generate a bias between transitioning to the next state (A→ B→ C) rather than coming back to the previous one (A→ B→ A), and this tendency is tuned by the strength c of the off-diagonal elements of the memory kernel with a maximum value of P (A → B → C) = 0.5 (Fig. 3(c), SI Section 1D).
Mice tend to chase each other in the Eco-HAB experiment (SI Fig. S1(c)), suggesting that transitions also depend on the history of other mice's behavior.This can be encoded in the GDD through cross-individual memory kernels, G ij (t).This memory coupling enforces the flipping rate of a degree of freedom (in more general terms, called the follower) to depend on the recent transition of another one (called the leader), such that the transition rate of the leader-follower pairs has a distinguished characteristic timescale that is not visible in other pairs (Fig. 3(d), see SI Section 1C for the memory kernel).The symmetry of the memory kernel enforces the memory dependence between a given pair to be symmetric.

F. Inference
How do we fit the GGD to data?Assuming that a maximum entropy distribution has already been learned, we need to solve the inverse problem of finding the memory kernel Γ that reproduces the experimentally observed dynamics.We assume an exponential form for the kernel, Γ i (t) = Ae −t/τ , which allows for rewriting ξ i (t) as an Ornstein-Uhlenbeck process.We maximize the likelihood of the discretized data series σ(t) (with some small time bin) over the three parameters θ = (µ, A, τ ), using the Expectation-Maximization (EM) algorithm [55] to deal with the hidden variables ξ i (t) (see SI). Specifically, we adopt the EM algorithm used to infer neural firing dynamics with hidden noise [47,[56][57][58].The key difference is that for our inference problem of the GGD, the non-Markovian fluctuation-dissipation relation (Eq. 3) acts as an extra constraint between the parameters generating the noise ξ(t) and the memory kernel Γ i (t), while studies applying EM algorithm to infer neural firing dynamics do not assume the fluctuation-dissipation relation.In addition, the transition dynamics are of the Glauber form (i.e.logistic function, instead of exponential as in models of neural spiking dynamics), which does not lead to simple mathematical expressions and requires Monte-Carlo sampling in the computation (see SI).
Figure 4(a) shows an example of a single Ising spin undergoing GGD with a single exponentially-decaying memory kernel.To mimic the situation of a spin within a large interacting system subject to a changing external field h i (t), we consider a time-dependent field h(t) with sinusoidal form.With our EM algorithm, we are able to recover the parameters with high accuracy (see SI Fig. S6 and SI Table.S1), as well as estimate the hidden noise (see Fig. 4(b)).Trajectories simulated with the inferred set of parameters reproduce the properties of the waiting time distribution (Fig. 4(c)) and the autocorrelation function (Fig. 4(d)) observed in the data.The error of the EM algorithm, measured as the percentage difference between the EM inferred parameters and the ground truth parameters, scales with data length T as ∼ T −1/2 .This scaling is expected from the Cramer-Rao bound (SI Fig. S8, SI Fig. S10(a), top right), and therefore the same as one expects from maximum caliber methods and generalized linear models.
We can speed up the inference by heuristically minimizing the distance between the empirical and simulated distributions of dynamical variables, which is defined as the sum of the area between the empirical and FIG.4: Example of GGD inference with an expectationmaximization (EM) algorithm on a single Ising spin under an oscillating field h(t) = sin(0.15t).The memory kernel is Γ(t) = A exp(−t/τ ), with true parameters A = 0.8, τ = 19.5s,and the baseline transition rate µ = 4s −1 .The simulation was conducted for a duration of T = 300s, using a time step of ∆t = 0.1s.(a) the combination of h(t), the noise ξ, and the effective local field due to history of spin flips, hmem (the second and third terms of Eq. 2), generates the sample spin trajectory st.(b) Given the spin trajectory σ(t) and h(t), the EM algorithm recovers an ensemble of possible realizations of the hidden noise ξ(t).With parameters inferred using the EM algorithm, simulated trajectories recovers the waiting time distribution (c) and the autocorrelation decay (d) as the data.The envelope of the curve is the standard deviation.
model-simulated waiting time cumulative distributions in double logarithmic scale.Although the error in the inferred parameters is larger than when using EM (SI Table.S1), the waiting time distribution and the autocorrelation function are recovered correctly (SI Fig. S7).
One can extend the parameterization of memory kernel to sums of exponential decays, Γ(t) = l A l e −t/τ l , to approximate more general forms of memory kernels which decays at infinite time (Prony's series, also see [47]).The extension of the EM algorithm is straightforward, as the noise can be written as a linear sum of Ornstein-Uhlenbeck processes.For the Eco-HAB mice data, we only used the heuristic method, because the EM algorithm becomes unreliable and hard to converge for categorical data with more possible states (see SI Fig. S10 and SI Section 3D), due to a distortion of the optimization landscape which leads to problems in convergence, which is consistent with the literature (see Chapter 3.4 in [59]).

G. GGD of social mice
We now go back to our original problem of 15 mice living in an Eco-HAB and ask if we can distinguish properties of individual animals from emergent behavior resulting form interactions.Atop the static maximum entropy model we learned previously, we learn the GGD model to fit the waiting time distribution.Since the three non-Markovian features (self memory, individual-specific inertia, and chaser-chased dynamics) occur on different time scales, in principle we should construct a memory kernel whose diagonal and off-diagonal terms have very different time scales.To simplify the task, in addition to the activity prefactors µ i , we only learn the self-memory kernels Γ i (t) = A i e −t/τi , as this memory occurs on the longest time scale and contributes the most to the observed fat tail in the waiting time distribution.Recall from Fig. 1F that for all mice the waiting time distribution collapses after we divide by its mean, so we can further reduce the number of parameters by assuming A i = A 0 , and µ i τ i = const.We learn this reduced set of dynamical parameters by minimizing the total distance between the observed and predicted waiting time distribution for all mice, computed independently for each mouse while fixing the trajectories of other mice to their experimental values.The optimized parameters are found to be We then simulate the dynamics for all 15 mice, using the static parameters learned by pairwise maximum entropy model (h i,α , J ij ), and the dynamical parameters learned by GGD (µ i , A i , τ i ).Simulations correctly capture the tails of the waiting time distribution (Fig. 5(a) and (b)).By construction, the GGD model reproduces the static observables (SI Fig. S11(a)), which the GLM model fails to reproduce (SI Fig. S11(b)).Since the memory kernel consists of single exponential decays, it suggests a biologically plausible mechanism for its encoding by mice using an iterative leaky integrator of their internal state, without the need to remember their entire past behavior.
As discussed in previous paragraphs, while the GGD for Potts spins can tune the probability of consecutive forward transitions within a certain range, the "roaming" effect exhibited by the Eco-HAB mice is more pronounced than the GGD allows for in its current form.Specifically, while GGD for a single Potts spin allows a maximum value of P (A → B → C) = 0.5, in the Eco-HAB mice data, P (A → B → C) mice is almost always greater than 0.5.

III. DISCUSSION
Motivated by data from the Eco-HAB experiment monitoring the spontaneous collective behavior of mice, we developed an inference method that simplifies dynamical inference by separating the learning of the steady state distribution and the learning of the dynam- ics.Steady-state inference is performed with the welldeveloped method of maximum entropy models.Then, a family of reversible dynamics is constructed by adding both a memory kernel and a colored noise, which are related through a non-Markovian fluctuation-dissipation theorem.
GGD has several advantages compared to existing methods of dynamical inference.Unlike maximum caliber methods, it is defined by an explicit form of the transition matrix, making it easier to simulate and interpret.In addition, maximum caliber methods demand learning of many more parameters.For a system of size N , for each time delay of ∆t, maximum caliber needs to simultaneously fit to N (N − 1)/2 equal time correlations, and the N (N − 1) cross-time correlations.A total time lag of L will lead to a total of N (N − 1)/2 + LN (N − 1) parameters to be fitted simultaneously.In comparison, in GGD, we first fit N (N − 1)/2 equal time correlations in the maximum entropy learning step, then for each of the N components, we fit the dynamics to a parameterized memory kernel with a chosen number of parameters (3 in the case we considered).By separating the learning procedure into first learning the static distribution, and then independently the dynamics for each component, we have much fewer parameters to learn.Unlike generalized linear models, it is guaranteed to agree with the empirical steady-state distribution, and is immune to problems of blow-up divergences that plague inferred state-transition models (see [48][49][50][51], also SI Fig. S11).Thus, GGD introduces memory while guaranteeing the steady-state.
GGD is a special case of the Generalized Master Equation (GME), as the transition rate is a function of the current state and past states through a memory kernel, and latent variables can be encoded in autoregressivelygenerated noise in the transition rate.Compared to other GME models, the key ingredient of the GGD is the non-Markovian fluctuation-dissipation relation between the correlation function of the colored noise and the memory kernel, which is essential in restoring the same steady state distribution for a wide range of possible dynam-ics.
The choice of the memory kernels can be very general and is only constrained by the reversibility constraint imposed by the fluctuation-dissipation relation.The GGD can capture rich dynamics, such as the effect of individual memory, inertia on discrete dynamics, and dynamics which depends on other individuals in the system.While we have focused on continuous-time Glauber dynamics for concreteness, the approach can be extended to any continuous or discrete dynamics (e.g.Metropolis-Hastings). Possible extensions and future applications of GGD include inferring dynamical models for interacting spiking neurons.
We take note of GGD limitations.Its dynamics are reversible by construction, precluding out-of-equilibirum effects.On the Eco-HAB data, a GLM model that does not enforce detailed balance shows asymmetricity in the inferred interaction matrix (SI Fig. S12(d)), implying that the Eco-HAB mice are indeed out of equilibrium.Nonetheless, we find a strong correlation between the GLM interaction matrix and the GGD interaction matrix, with correlation coefficient of 0.70 for the asymmetric GLM interaction matrix, and 0.68 for the symmetrized GLM interaction matrix (SI Fig. S12(b, c)), and the local fields h also correlate with a coefficient of 0.69 (SI Fig. S12(a)).Because the memory kernel Γ(t) is also an autocorrelation function, it cannot take arbitrary forms.For example, an abrupt suppression to simulate refractory period in neurons would not be possible.In its current form, the constraint on the memory kernel imposes a limit on the maximum "roaming" effect possible, which is less than what the Eco-HAB mice exhibit.On the technical side, we did not manage to reliably infer the effect of memory between individuals G ij (t) from the data.This may be due to the large number of parameters to consider, which scales with the number of pairs.Another difficulty is that the memory of histories of other individuals may happen on a shorter time scale than self-memory (as suggested by the chaser-chased dynamics [52]), confusing the inference procedure.Interestingly, the autocorrelation of the box occupation number decays more slowly than predicted by the model, suggesting that these effect may play an important role (SI Fig. S13).In addition, because the GGD integrates all memory from the past, it is unable to describe a full memory reset.Finally, unlike the maximum entropy model, GGD is not a minimal construction that one can use to build dynamics with increasing complexity.One possible solution is to consider GGD with memory kernels built using a complete basis of functions.Such an extension is likely to be very useful, since it could capture phenomena on different timescales, which we saw is relevant for behavior in the Eco-HAB.

IV. MATERIALS AND METHODS
The Eco-HAB mice data The Eco-HAB system and the appropriate data analysis tools are described in [11].The particular experiment and data used for analysis and methods development in this manuscript is published in [52].
Compute normalized chasing rate and rate of sequential flip In a lead-and-follow pair, e.g. both the chaser-chased pairs in the Eco-HAB system and the sequential spin flips in multiple interacting Ising spins, we compute the normalized following rate.For a given pair of mice (spins) (i, j) and a fixed time period, we count the number of consecutive transitions where mouse j leads mouse i to leave the same box and to enter the same box, separated by a time difference of ∆t.We then divide this count by an expected null value, computed by cyclically shuffling the time series of all mice, to obtain the normalized leadand-follow rate.

Learning the static maximum entropy model
The static maximum entropy model is learned by gradient descent, where at each optimization step, the parameters J ij and h i,α are updated by the difference between the empirical observable and Monte Carlo sampled observables at the current estimation for each parameter [60].The initial condition is for h i,α to be that of the independent model and all J ij set to zero.The stop condition is when the square difference between the data and the simulation is less than the data variation, computed by bootstrapping the data.Monte Carlo sampling of the model and computing the mean and correlation are performed using the UGM MATLAB package [61], while all other steps are performed by customized MATLAB codes.

Simulate generalized Glauber dynamics
To simulate generalized Glauber dynamics with a given memory kernel Γ(t), we first generate noise ξ(t) whose correlation is ξ(t)ξ(t ) = Γ(t − t ) using methods of Fourier transforms [62].For systems with higher dimensions, the noise is first independently generated in the eigenbasis of the memory kernel, then transformed back to the standard basis.Then, the dynamics is simulated by discretizing time and using parallel updating.The time step is chosen to be small to make sure at most one spin transitions at any given time step.

Inference of dynamical parameters
The inference of the dynamical parameters is performed with two methods.The first is an Expectation-Maximization algorithm (EM) developed in details in SI, and implemented with customized MATLAB code.In the single Ising spin example, we chose the stopping criterion such that the absolute change in all three parameters (µ, a, σ 2 ε ) must be less than a threshold value of 0.01 over the last 100 iterations of the EM algorithm.
Alternatively, heuristic optimizations use two consecutive grid searches followed by a Nelder-Mead algorithm provided by the built-in MATLAB function patternsearch to find the optimum.

Learning the generalized linear model
For comparison with the generalized Glauber dynamics (GGD), we train a genearlized linear model on the Eco-HAB data.The model follows [45] and writes the transition probability in ∆t as where the transition rate is for α = β with a normalization factor For direct comparison with the GGD model, the memory kernel is chosen to be parameterized by single exponential, Γ i (t) = A i e −t/ τi .The parameters, µ i , h iα , J ij , A i , τ i are estimated using maximum likelihood.
In the main text, we present the generalized Glauber dynamics (GGD) as a method to tune the dynamics of a discrete system, while keeping the steady state distribution fixed.Briefly, we build upon the maximum entropy distribution and ask how we should add an adjustable component to the energy function such that the dynamics can be tuned to fit the data.In this section, we first introduce the standard Glauber dynamics.Then, we present detailed derivations for GGD in three systems with increasing complexities: single binary (Ising) spin with binary states ±1; multiple binary spins coupled to each other; and single multi-state (Potts) spin with q states.A. Glauber dynamics satisfies detailed balance For a discrete system with a steady state distribution, where H(σ) is the Hamiltonian of the system (in the main text we used E(σ)), a natural way to write down a dynamics that can generate such a steady state distribution, is the equilibrium dynamics that satisfies detailed balance.Namely, the transition rate between state σ α and state σ β must satisfy The Glauber dynamics is one type of dynamics that obeys detailed balance, where the transition rate is given by If the Hamiltonian can be expressed by an effective local field, H(σ) = −hσ, the transition rate is The effective local field h at time t determines the transition probability in the time window [t, t + ∆t).
While Glauber dynamics satisfies detailed balance and has a steady state equal to the one given by the Boltzmann distribution, it is Markovian, i.e. the transition rate only depends on the current state of the spins.To include memory in the dynamics, we take inspiration from the multi-dimensional Markov system with equilibrium dynamics, such as hidden Markov models, and generalized Langevin equations.In this case, if we only observe part of the system, the system behaves as if there is memory.Specifically, following the derivation of generalized Langevin equations, we consider coupling the spins to harmonic oscillator baths [54].
For concreteness, we choose to focus on Glauber dynamics, although the method applies to all types of equilibrium dynamics.

B. GGD for a single Ising spin
Consider a system with a single Ising spin, σ ∈ {±1}, described by the Hamiltonian The Glauber dynamics transition rate is To include memory, we couple the spin quadratically to a bath of harmonic oscillators indexed by k with momentum p k and position q k .The energy of the system is a sum of the spin energy H s and the oscillator bath part energy H b , with with the coefficient h s being the steady state local field of the spin.Among the coefficients for the oscillators, ω k is the frequency of the k-th oscillator, and γ k measures the strength of coupling of the system to the k-th oscillator.We solve for the spin dynamics by integrating out the dynamics of the harmonic oscillators.
Hamilton's equations of motion for the oscillators are We solve the oscillator dynamics in terms of the spin trajectories, Integration by parts give us where we recognize a term that only depends on the initial condition of the system, and a term that depends only on the history of the evolution of the spins, Meanwhile, assuming we know dynamics of the harmonic oscillators, we can also write down an effective energy for the Ising spins, integrating out possible configurations of the bath.
Because the spin obeys σ 2 = 1, the spin-related part of the energy can be written as In the limit of infinitesimal time discretization, we can write down an effective local field for the spin as Now, we plug the solution of the harmonic oscillators (Eq.(S4)) into the positions q k for the above effective local fields.With some linear algebra, we can write We can analyze this effective local field term by term.The term involving the history of the spins is From the first to the second line, we recognize the sum over the harmonic oscillators as a cosine Fourier series.
The boundary term comes from integration by parts, and is given by the left hand side of Eq. (S4), The term that involves the initial position of the oscillators, h 0 eff ({q k (0), p k (0)}), constitutes the noise in the dynamics.We replace the initial conditions with their average values, which are stochastic variables with colored noise, Based on the equipartition theorem, the expectation values for initial position at temperature k B T = 1 is After some algebra, we find the noise correlation as In summary, the effective local field is where the noise correlation satisfies Eq. (S8).The first term is the static local field, as learned from the steady state model, the second and the third terms describe the memory, and the last term the colored noise that results from the coupling from the memory kernel.The memory term depends on the coupling of the initial value of the memory function and later values of the spins (the second term) as well as the convolution of the memory kernel and the spin flipping rate.For clarity in the main text, we denote the second and the third term as The transition rate of the spin is given by Notice that the mathematical form of the transition rate is identical to that of the naive Glauber dynamics (Eq.(S10)).Nonetheless, the effective local field h eff depends on the history of spin σ, which adds a memory dependence to the dynamics.By construction, detailed balance is ensured in the whole system that includes both the spin and oscillator baths.

C. GGD for multiple Ising spins
Now we increase the complexity of the generalized Glauber dynamics, and consider a system of N interacting binary (Ising) spins, σ = (σ 1 , σ 2 , . . ., σ N ).For a small enough time interval, at most one spin flips at a time, so the Glauber dynamics is given by the same form as in the single Ising system, Now we couple each spin σ i quadratically to a bath of harmonic oscillators with momentum p k .The energy of the system is a sum of the spin energy H s and the oscillator bath part energy H s , with where the coefficients g are the local fields for the spins, J the pairwise interactions among the spins, and h = g + J σ the steady state local field.Among the coefficients for the oscillators, ω k is the frequency for the k-th oscillator, and γ k measures the strength of coupling of the system to the k-th oscillator.For convenience, we set identical coupling strength for each spin σ i .For simplicity, we assume that the coupling matrix among the baths, M , is symmetric and positive definite, M = ODO −1 , where O is a matrix of eigenbases of M .We denote , and the eigenvalues of the matrix M using λ α .
In the eigenbasis of M , Hamilton's equations of motion for the oscillators are This set of equation is analogous to Eq. (S2) and can be solved accordingly.
We solve the oscillator dynamics in terms of the spin trajectories, where we recognize a term that only depends on the initial condition of the system, , and a term that depends only on the history of evolution of the spins, Meanwhile, for the Ising spins, we can write down an effective energy that depends on the harmonic oscillators.The spin-related part of the energy is In the limit of the infinitesimal time discretization, at each time point there is maximally a single spin flip, we can write down an effective local field for the spins such that the difference of the Hamiltonian between two states σ, σ , separated by a single spin flip is equal to h eff σ − h eff σ .The effective local field is Here, the symbol diag(A) indicates the diagonal matrix of matrix A, with the i, j−th entry being δ ij A ij .Now, we can plug the solution of the harmonic oscillators into the positions q k for the above effective local fields.With some linear algebra, we can write We can analyze this effective local field term by term.The term involving the history of the spins is From the first to the second line, we recognize the sum over the harmonic oscillators as a cosine Fourier series, and define Γ(t) to be a diagonal matrix, with the diagonal entries as and The boundary term comes from integration by parts, Notice that similar terms appear in the first half of the effective local field (Eq.( S16)).
The term that involves the initial position of the oscillators, h 0 eff ({x k (0), v k (0)}), constitutes the noise in the dynamics.Based on the equipartition theorem, we replace the initial conditions with their average values, which are stochastic variables with colored noise, After some algebra, we find the noise correlation as In summary, the effective local field is where the noise correlation satisfies Eq. (S19).The first term is the static local field, as learned from the steady state model, the second and the third terms describe the memory, and the last term the colored noise that results from the coupling from the memory kernel.The memory term depends on the coupling of the initial value of the memory function and later values of the spins (the second term) as well as the convolution of the memory kernel and the spin flipping rate.We can recognize the diagonal elements of the memory kernel Γ i (t) ≡ Γ ii (t) as terms for self-memory, i.e. how the dynamics of the spin is coupled to its own history.The off-diagonal terms, G ij (t) ≡ Γ ij (t) for i = j, couples the transition probability of a given spin to the history of other spins.For Fig. 3(d) from the main text, we generate the dynamics of 10 Ising spins in a loop, by setting the local field h s = 0 such that the equal-time correlation between different spins are zero, and the memory kernel generated by We diagonalize Γ(0) to obtain the orthogonal basis O and the eigenvalues λ α , and we set the memory kernel at larger time as

D. GGD for a single Potts spin
We now extend it to single Potts spin with ρ states, where we denote the states using unit vectors in the standard basis.For example, for the Eco-HAB mice dataset, where the mice have ρ = 4 distinct states, the spin takes possible values of The Hamiltonian can again be expressed using a local field, H(σ) = −h σ.

Uncoupled states
We first consider the case where these states are uncoupled, i.e. the transition rate from a given state only depends on histories of the spin in the same state.Mathematically, the Hamiltonian can be decoupled into a sum of terms that are independent for each state, and the effective local field is identical to that of the binary spins, The auto-correlation of the noise is One can check that in the limit of number of states ρ = 2, the solution for single Ising spin with GGD (with parameters denoted by tildes) is recovered by identifying h + − h − = 2 h and Γ(t) = 2 Γ(t).

Coupled states
The memory kernel can also couple different Potts states.In this case, the Hamiltonian for the bath of oscillators coupled to q spins is For simplicity, we assume that the coupling matrix M is symmetric and positive definite.We notice that this Hamiltonian is identical to the Hamiltonian for multiple interacting Ising spins (Eq.(S13)), and so is the spindependent part of the Hamiltonin (Eq.(S15)), The only difference is the values taken by the spin σ, so the second term of H part. takes a different value.The effective local field is then and, following analogous procedures as in the multiple Ising case, can be further reduced as The noise correlation satisfies In the case that M = 1 ρ , we recover the expression for a single Potts spin with memories where the states are uncoupled.Now, the memories are coupled such that the transition rate from a specific state can be effected by the spin history in another state.In Fig. 3(c) from the main text, we choose the memory kernel to observe the symmetry of the Eco-HAB, by parametrizing the memory kernel and assume it exponentially decays for t > 0. The noise is generated independently in the eigenspace of the memory kernel Γ(0), each with a correlation that decays according to a single exponential with the same correlation time τ , then projected back to the original space.
Averaged over the noise ξ, and approximating with only the most recent transition, the effective local field is Assuming the simplest case where the static local fields are the same for each state (h A = h C ), we have h C < h A .This means the more likely transition for the next state is A, i.e.P (A → B → A) > P (A → B → C) with the limit of equality achieved at c → 1.

II. FORMULATING THE NOISE USING YULE-WALKER EQUATIONS
To ensure that the memory kernel in GGD is a well-defined cross-correlation function, we choose to first generate the hidden variable ξ t , and then find the functional form of the memory kernel.
We consider the broad class of models, where the hidden variable ξ t can be generated by linear dynamics with the residue correlation Σ = εε t .Multiplying both sides by ξ t−k and averaging over t, we find that the auto-covariance sequence Γ k = ξ t ξ t−k t follows the Yule-Walker equation The simplest form of noise correlation is when the noise is generated with a maximum of 1 time-step coupling to the past (VAR(1)), which gives an Ornstein-Uhlenbeck process.Here, one can solve for Γ given the parameters in the noise, the coupling parameters A and the noise correlator Σ, In the example of a single Ising spin, the effective local field can be written as (k∆ = t).

III. INFERRING DYNAMICAL PARAMETERS WITH AN EXPECTATION-MAXIMIZATION ALGORITHM
For clarity of the notations for later calculation, from now we use s instead of σ for the spin.We specify s 0,k = s 0 , s 1 , . . ., s k : the spin configurations from time t = 0 to time t = k∆.Denoting functions evaluated at t = k∆ with a subscript k, we can write the discretized dynamics as This effective local field h eff k determines the transition probability for time t ∈ [k∆, (k + 1)∆).Assuming Γ(∆t) = Ae −∆t/τ , we rewrite the noise as generated by an Ornstein-Uhlenback process, and the discrete dynamics of the effective local field (Eq.(S28)) become The non-Markovian fluctuation-dissipation theorem (Eq.(S8)) imposes the mathematical relationships A = σ 2 ε /(1−a 2 ) and τ = −1/ ln a.These equations are the basis for the EM algorithm.
The coupled dynamical equation in Eq. ( S29) is a state-space model, where a hidden variable, the noise ξ stochastically determines the observed dynamics of the spin s.Inferring parameters for such state-space models with hidden variables has been studied extensively, for continuous dynamics in the context of Kalman filters [67], and for discrete dynamics in the context of point-process adaptive filters developed in the field of neuroscience [69], typically using an expectation-maximization (EM) algorithm [55] to perform maximum likelihood estimation (MLE).Following the literature [56][57][58]68] we will develop an EM algorithm to learn the parameters of GGD.The differences of the GGD model compared to previous work is that 1) the dynamics is Glauber and 2) the parameters for generating the hidden noise ξ, and for the memory kernel (a, s 2 ε ), need to obey the relations imposed by the non-Markovian fluctuation-dissipation theorem as given by Eq. (S8).
While a direct evaluation over this probability is difficult to obtain, the EM algorithm is an iterative method that finds (local) maxima of the data likelihood, by iteratively calculating the expectation of the log-likelihood of the data evaluated by averaging of the posterior distribution of the noise based on the current latent parameter estimates θ (l) , and updating the latent parameters by maximizing the current estimate of the likelihood, Q(θ|θ (l) ): We have rewritten the likelihood two terms.The first term is given by the transition rate function, which we approximate by assuming ∆ is small where I(k∆) ≡ 1(s((k + 1)∆) = −s(k∆)) is the indicator function, i.e. 1 when there is a transition in time t ∈ (k∆, (k + 1)∆] and 0 otherwise.W (k∆) is the state-dependent transition rate for any transitions between time point k∆ and time point (k + 1)∆, given by Eq.S10.The second term is given by the Gaussian probability of the noise, as specified in Eq. (S29), The EM algorithm finds the optimum of the marginal likelihood by alternating between two steps.At each iteration step l, the E(xpectation)-step computes Q(θ|θ (l) ) with respect to the posterior distribution of the noise, given the current estimate of parameters, {θ (l) }.The posterior distribution of the noise is approximated by a Gaussian distributions and estimated with a point process estimator and backward Kalman smoothers, which track the oneand two-point statistics.Then, in the M(aximization)-step, the complete log likelihood Q(θ|θ (l) ) is maximized, and the argument is used as the updated parameter θ (l+1) .
A. Computing posterior distribution of the noise in the E-step At each iteration, the E-step computes the expectation value of Q(θ|θ (l) ), the complete likelihood of the data s 0,K and the hidden variable, noise ξ 0,K , over the posterior distribution of ξ 0,K , given the current estimation of parameters θ (l) .Following the literature [56,57], the posterior distribution of ξ 0,K is estimated as Gaussian distributions, whose mean and covariance are denoted as The notation k|K denotes the point estimates for the noise ξ at the k-th time point given the entire observation, s 0,K .We denote the point estimates given observation up to the j-th time point using the subscript k|j.The mean and covariances of the noise are computed in three steps: first, we use a forward point process filter to compute ξ k|k and σ 2 k|k ; second, a backward Kalman smoother to compute ξ k|K and σ 2 k|K ; and finally, a state-space covariance algorithm to compute σ k,k+1|K .

i. Forward point process filter
The posterior distribution of the noise is computed recursively.Given the mean and variance of the noise at time point k − 1 and the entire history of spin and spin updates up to time point k, H k = {s 0,k , ds k }, the posterior distribution of the noise at time point k is given by Based on Gaussian continuity assumption, we have The mean of the posterior distribution of ξ k can be estimated using importance Monte Carlo sampling.For N MC random numbers ξ MC i sampled from the Gaussian distribution N (ξ k|k−1 , σ 2 k|k−1 ), the Monte Carlo estimation of the mean is The variance is computed using Gaussian approximation, , which after we plug in Eq.S31 for the posterior distribution of the noise p(ξ k |s 0,k ) and Eq.S10 for the transition rate W (k∆), becomes Alternatively, one can also compute the variance using importance Monte Carlo sampling, similar to how one compute the mean in Eq.S32.
ii. Backwards Kalman smoother The estimation for the mean and the covariance of the hidden noise ξ k can be improved by using the entire observed sequence s 0,K .Because the hidden noise ξ is generated by a gaussian Markov process, we follow [56] and use the fixed interval smoothing algorithm to iteratively update the expected moments at time point k using the expectation value at time point k + 1: where and ).The initial condition is ξ k|K = ξ k|k at the end of the time series, k = K.

iii. Compute correlation
The cross-correlation of the hidden variable ξ can be computed following the state-space covariance algorithm [70], which gives us the one-step separated correlation The correlation coefficient is

B. Maximizing data likelihood in the M-step
As described in previous paragraphs, in the M(aximization)-step, the complete log likelihood Q(θ|θ (l) ) is maximized, and the argument is used as the updated parameter θ (l+1) .Different methods are used to compute the two terms of the expected value for the complete log likelihood Q, averaged over the posterior distribution of the noise.The terms related to the Gaussian Markov process of the hidden noise is evaluated exactly using the mean and the covariance.For the term contributed by the Glauber dynamics (the first term in Eq.S30), we approximate it using a second-order Taylor expansion, where for simplicity of notation, we define f (ξ 0,K ) ≡ log p(s 0,K |ξ 0,K , µ).

C. Detailed implementation
In the main text, we applied the EM algorithm to infer the toy model with a single Ising spin with a single exponential decay memory function.Here we give the detailed protocol.To estimate the mean of the posterior distribution of the noise, we use a total number of N M C = 100 points in important Monte Carlo sampling for each time step k.To compute Q, we used 50 random samples of the Gaussian distribution of noise ξ 0,K , given the mean and the covariance.

D. Inferring a GGD model: EM vs heuristic method
We note that the EM algorithm is not exact, due to logistic form of the Glauber transitioning dynamics, the use of the Gaussian approximation in estimating the posterior distribution of the noise, and the use of Monte Carlo sampling of the posterior noise distribution, which may cause a distortion of the optimization landscape, and leads to problems in convergence (see Chapter 3.4 in [59]).Furthermore, specifically to our problem where the fluctuation-dissipation relation constrains the parameters, the posterior noise Q is a non-convex function of the parameters to be estimated, (µ, a, σ 2 ), which can make it hard for the EM algorithm to learn the ground truth parameters, as the EM algorithm approaches sets of parameters where the gradient disappears and not necessarily the global optimum.
For a single Ising spin, EM recovers the parameters much better than the heuristic method (SI Table.S1 and SI Fig. S10 top panels).However, the problem of convergence becomes more severe for EM applied to Potts spins with more than 2 states.The convergence is much slower, and EM does not find the ground truth (see SI Fig. S10(b) for a sample evolution of EM inferred parameters, for which the existence of the parameters implies existence of multiple EM convergent points).To start from informative parameters, we use the optimal parameters identified by the heuristic method as the initial parameters for the EM algorithm.The EM-identified parameters are then compared to the heuristically identified parameters (see SI Fig. S10(a) bottom panels).

FIG. S6:
The EM algorithm is able to recover the true underlying trajectories, shown here for the same simulated Ising spin dynamics as given in Fig. 4 (a) The evolution of the parameters as a function of the EM steps.The red lines represent the true underlying parameters, and the blue curves show the evolution of the inferred parameters.(b) The complete log likelihood of the data and the noise, Q, increases the iterations.(c) The EM algorithm is able to recover the hidden noise, here plotting the mean of the estimated noise, ξ estimated vs. the true underlying noise ξtrue.The Pearson correlation coefficient ρ = 0.5243.FIG.S7: Compare GGD inference using the EM algorithm and using the heuristic algorithm minimizing the distance between the waiting time distribution of the data and of simulations.A single Ising spin coupled with a single exponentially-decaying memory kernel is used to generate the data.Simulated trajectories with parameters inferred by the EM algorithm and by the heuristic algorithm recovers both the waiting time distribution (a) and the autocorrelation decay (b) as the data.The envelope of the curve is the standard deviation.The results from the EM algorithm is plotted in gray, and the results from the heuristic algorithm is plotted in green.

FIG. 1 :
FIG. 1: Collective dynamics among social mice.(a) The Eco-HAB experimental setup (top view) with C57BL6/J male mice (N = 15) placed in four inter-connected chambers.The two chambers with food are labeled by letter F. The location of each mouse is recorded using mouse-embedded RFID and antennas at the edge of the tubes (indicated by black bars).(b) Example trace for the co-localization patterns of a group of 15 mice over 10 full days, consisting of alternating dark and light cycles (12 hours).The red vertical lines indicate the beginning of each dark cycle.(c) Pairwise connected correlation function of mice colocalization, Cij (with error bar computed by bootstrap).Cyclically shuffled data shows no correlation, while data shuffled within the same day shows a strongly reduced correlation.(d) Auto-correlation function for the number of mice in a given box, Cn(t), as a function of time difference, computed from the mice position between 13:00 and 19:00 each day, a period of the intensified activity chosen for the presented analysis.Error bars are the standard error from the mean across 10 days.(e) Mean waiting time for each mouse, defined to be the time a mouse spent staying in a given box before exits.Each dot indicates a different day.(f) Distribution of waiting times normalized by their mean for each mice.Distributions collapse across all mice, and decay slower than exponentially, indicating the existence of long time scales.

FIG. 5 :
FIG. 5: The Generalized Glauber dynamics on top of the static maximum entropy model on Eco-HAB data reproduces the long tail of both the shape (panel (a)) and the mean (panel (b)) of the waiting time distributions.The Pearson correlation coefficient ρ is given in the plot.

FIG. S1 :
FIG. S1: Features of non-Markovian dynamics in Eco-HAB mice data.(a) The mean activity varies throughout the day.For our analysis, we focus on the 6 hours of stable activity, indicated by the red rectangle.(b) The exploration measures how likely the mouse keeps its direction of motion across two immediate transitions, and is consistent across days.Specifically, we define roming by the ratio of number of consecutive forward transitions (e.g. from box A to B then to a different box C) to the number of forward-backward transitions (e.g. from box A to box B and back to box A (c) Mice actively chase each other on a time scale of 5 seconds, here the schematics showing the gray mouse chasing the orange.Each colored curve represents one mouse, and the error bars are the standard error from the mean over days.

FIG. S2 :
FIG. S2: Pairwise maximum entropy describes the statics of mice behavior, but not the dynamics.(a) Pairwise maximum entropy model fits the constraints, the mean probability of mouse i in box r, and probability that two mice are in the same box Cij, and is able to predict higher order structures of the data, here showing the probability of triads of mice in the same box (panel (b)) and the probability of K mice in the same box (panel (c)).Errorbars in panel A and B are the estimation of variablity of the data, by random bootstrapping half of the data.The Pearson correlation coefficients ρ are given in the plot.
True vs. inferred parameters for an example GGD dynamics with a single Ising spin and a single exponentiallydecaying memory kernel.The two inference methods include the expectation-maximization algorithm, and the heuristic method minimizing the distance of the cumulative distribution of the waiting time between the data and the model.

FIG. S4 :
FIG. S4:The probability of mice transitioning decreases as the time since the last transition increases.

FIG. S8 :
FIG.S8: Accuracy of the EM algorithm as a function of data length T , tested for single Ising spin coupled with a single exponentially-decaying memory kernel.The ground truth parameters are given in TableS1, with τ * = 19.5 as the ground truth memory timescale.Error bars represent standard deviation across 10 randomly generated sample trajectories for each data length.(a) The EM inferred parameters approach the ground truth.(b) The inferred memory time scale τ and the strength of memory kernel Â. (c) The mean squared difference between the true noise ξ and the estimated expectation value of the noise ξ k|K , averaged over the entire trajectory.