Data-driven model construction for anisotropic dynamics of active matter

The dynamics of cellular pattern formation is crucial for understanding embryonic development and tissue morphogenesis. Recent studies have shown that human dermal fibroblasts cultured on liquid crystal elastomers can exhibit an increase in orientational alignment over time, accompanied by cell proliferation, under the influence of the weak guidance of a molecularly aligned substrate. However, a comprehensive understanding of how this order arises remains largely unknown. This knowledge gap may be attributed, in part, to a scarcity of mechanistic models that can capture the temporal progression of the complex nonequilibrium dynamics during the cellular alignment process. The orientational alignment occurs primarily when cells reach a high density near confluence. Therefore, for accurate modeling, it is crucial to take into account both the cell-cell interaction term and the influence from the substrate, acting as a one-body external potential term. To fill in this gap, we develop a hybrid procedure that utilizes statistical learning approaches to extend the state-of-the-art physics models for quantifying both effects. We develop a more efficient way to perform feature selection that avoids testing all feature combinations through simulation. The maximum likelihood estimator of the model was derived and implemented in computationally scalable algorithms for model calibration and simulation. By including these features, such as the non-Gaussian, anisotropic fluctuations, and limiting alignment interaction only to neighboring cells with the same velocity direction, this model quantitatively reproduce the key system-level parameters--the temporal progression of the velocity orientational order parameters and the variability of velocity vectors, whereas models missing any of the features fail to capture these temporally dependent parameters.


I. INTRODUCTION
Active matter refers to systems composed of many individual agents that interact with each other and consume energy from their surroundings or an internal source to generate complex, global behaviors [1][2][3]. It encompasses a wide range of biological systems, such as flocks of birds, schools of fish, groups of humans, herds of sheep, monolayers of cells, colonies of bacteria, and others, all exhibiting intriguing out-of-equilibrium phenomena. The dynamics of active matter, which we term "active dynamics", is the key to describing the collective, spatiotemporal self-organization that arises from interactions and decision-making at the individual agent level.
Disorder-to-order transitions in biological tissues, characterized by the emergence of patterns, are a common feature of cellular systems. These transitions have been compared to phases of matter, such as disordered gas and amorphous solids [4,5]. They underlie many developmental processes [6], and are implicated in cancer invasion [7]. The resulting patterns impart tissue with form, function, and integrity, as seen in the basket-weavelike pattern of the dermis that serves as a shield for the deeper layers [8,9], the generation of forces in blood vessels [10,11], and the coordination of cell fates during embryonic development [12].
The mechanisms behind these cellular transitions are not well understood, but traditionally, collective behaviors are thought to be regulated by cell chemosensing [13], and upstream biochemical signaling pathways [14]. While biochemical regulations are on-demand and shortlived, physical guidance ensures the generation of cell phenotypes and long-term maintenance of tissue structures [15]. More recently, simple two-dimensional transitions of such nature have been realized through in vitro experiments [16][17][18][19], by subjecting a cell monolayer to an external guiding field, such as a well-aligned molecular field [20], which can lead cells to collectively orient along a predetermined axis in a density-dependent manner.
Modeling spontaneous alignment of particles in active matter systems has been the subject of numerous studies [21][22][23][24][25][26]. The two main approaches in modeling are either by constructing mechanistic models through physical laws, or by applying data-driven methods to extract information from observations to construct models [27,28]. Describing individual cell behaviors often starts from a Langevin equation of Brownian particles [29]: mv = −ζv + ϵ F , with m, v and ζ being the mass, velocity and friction coefficient, respectively, and ϵ F being a random force fluctuation, typically assumed to be a Gaussian white noise. Variants of it have been widely used for modeling persistent random motions of cells in experiments and simulation [30][31][32][33][34], whereas the complex cell-cell interaction at high density, often implicated in tissue formation and repair, is not explicitly considered in these models.
In data-driven methods, regression techniques, for instance, are widely applied to estimate linear coefficients of a set of additive basis [35][36][37], and to learn particle interaction kernel functions by piece-wise linear functions [38], neural networks [39], and Gaussian processes [40]. Learning the distance-based interaction kernel function has been successfully applied to model schools of golden shiner [41], flocks of surf scoters [42] and systems of interacting particles [38]. However, these approaches are rarely applied to estimate cell-cell interaction from experiments with a large number of cellular trajectories, partly because the large computational cost prohibits accurate interaction kernel learning approaches [40]. Furthermore, as the underlying mechanism of these interactions remains largely unknown, a principled way to identify the relevant input of interaction kernel and delineating neighboring sets is needed as adding nonrelevant inputs can dramatically reduce estimation efficiency.
Despite the prevalence of the disorder-to-order transition observed in experiments [20,43,44], capturing the temporal progression of dynamic quantities in active matter systems, such as the orientational order parameter and variability of velocity, remains a challenging task due to a few reasons. First of all, active matter systems intrinsically contain large fluctuations. Gaussian distributions of fluctuation, for instance, are often assumed for modeling the velocity progression in constructing mechanistic models [45,46]. In a data-driven approach, the model is usually optimized by minimizing a loss function. One common choice of the loss function is the sum of squared errors, and minimizing it is equivalent to finding the maximum likelihood estimator, under the assumption of having independent Gaussian noises with equal variances. However, we find that velocity distributions from our experiments significantly deviate from Gaussian. In fact, even though models assuming Gaussian distribution of the velocity fluctuation can fit the magnitude of temporally dependent velocity, they cannot capture the progression of orientational order parameters. Non-Gaussian distributions of displacements have been widely observed in biological systems, such as particles absorbed on lipid bilayers, immersed in entangled actin solutions [47], in a bath of swimming algal cells [48], liposomes in active actin solutions [49], and receptors on living cell membranes [50]. While models have been developed to explain the fluctuation of displacements independent of time [51,52] and for how these distributions arise [53], the fluctuation distributions have rarely been incorporated in simulation to reproduce the temporal progression of velocity and order parameters in nonequilibrium cell migrations.
Second, in principle, having a large number of observations containing thousands of particle trajectories over hundreds of time points can help offset the uncertainty of estimation due to large fluctuations of the active particles. Nonetheless, calibrating the simulation model with such a large number of observations is a nontrivial task. Placing a smooth Bayesian prior on the interaction ker-nel function, such as a Gaussian process prior, can partially filter the noise in estimation but also leads to large computational costs. Thus, most of the current studies of learning interaction kernels are restricted to a small number of particles and time frames [38,54]. Hence, there is a need for robust and computationally scalable algorithms for feature selection and estimation from a large number of observations. This work aims to address these issues by introducing an efficient workflow to automate model construction, promoting convergence between simulation and experiments. We take a hybrid approach that integrates physical models and statistical learning approaches: First, we start from a conventional form of the physical models (e.g. the Vicsek model [21,55]). Second, we apply statistical tests for feature selections. Then, these features are utilized to construct a data-generative model, instead of minimizing a loss function as usually adopted in other data-driven discovery approaches. The data generative model provides a better physical interpretation of the estimated quantities, and more importantly, the uncertainty of the estimation can also be quantified. We develop an automated feature selection and estimation approach, which is applied to learning physical quantities from live microscopy of fibroblasts moving on liquid crystal elastomer substrates, to discover a variety of new features. Simulation models constructed with these new features can reproduce the progression of system properties, such as orientational order parameters and velocity distributions, whereas models missing any of these features do not match experimental findings.
Our main contribution is to develop an efficient feature selection and estimation approach for cellular movement experiments on liquid crystal elastomer substrates from video microscopy. The four selected features can be classified into two groups. The first group concerns cell-substrate interactions. We found that the velocity of the cells is anisotropic and has heavier tails than a Gaussian distribution, motivating the use of a Laplace distribution or generalized Gaussian distribution of fluctuation for characterizing cell-substrate interaction. Though Laplace distribution was used to fit the probability distribution of the displacements [47,49,56] and the mechanism of exponentially distributed cellular motility was explored in [53], its effect on capturing the progression of the orientational order parameter in simulations has not been demonstrated. The second group of discoveries concern cell-cell interactions. We observe from experiments that when cells interact, cell bodies become elongated, suggesting cells pulling each other along through a tensional network. Our findings indicate that cells are not perfectly aligned with the average of their neighbors in the prior time frame, as is the case in the classic Vicsek model [21]; Instead, effects from the previous step shrink towards zero, likely due to a frictional force term in the Langevin equation [29]. Additionally, we discovered that cells traveling in the opposite direction may be excluded from the neighboring set, leading to improved predic-  1. (a) Schematics of the two types of substrates (isotropic and nematic) tested in this study. On the nematic substrate, the molecular alignment (denoted by a red double-sided arrow), is parallel to x. (b) A stitched view of a snapshot of long-term cell imaging where cell cytoplasm is labeled in red. (c) Correlation plots in exploratory data analysis, where x, y velocity components are plotted against those of the neighboring average in the previous step. The slope of the fit (red solid lines) is smaller than 1 (black dashed lines). (d) Parameter estimation of the slope parameters, ω, the slope value of the red solid line in (c), and their 95% confidence intervals (shaded area). The statistical tests yield four potential features (green blocks), representing new features added to the classical Vicsek model in fitting the current data set. (e) The grid-based system to search for the neighbors. In order to find the neighbor of the target cell i (circled in red), a radius of r will be searched. The computational procedure is simplified by only searching through neighboring squares connected to the square cell i is located in. This workflow generates various physical parameters via simulation to compare to quantities computed from the experiment.
tions of the velocity at the subsequent time point. This likely arises from the nematic nature of cell-cell interactions [43], where cells traveling in the opposite direction can simply glide past each other without influencing each other's velocity.
Furthermore, we develop scalable computational tools for analyzing and simulating dynamical processes of active matter. Figure 1 illustrates an instance of the particle-based module of our computational tools. In Fig.  1(c), we present a plot showing the correlation between cellular velocity and the mean velocity of neighboring cells in the previous time point. The slopes of the best-fit lines (red solid curve) are smaller than 1 for both x and y directions. To determine if this result applies to all time points, we plot the fitted weight parameters (equivalent to the slope of the fit in Fig. 1(c)) over time and calculate the 95% confidence interval, which is represented by the shaded area in Fig. 1(d). The upper bounds of the 95% confidence intervals are substantially smaller than 1, indicating that the weight parameters must be included in the simulation model. All the identified features are included in the simulation model, and a maximum likelihood estimator is derived for efficient parameter estimation (Appendixes A and B). For both parameter estimation and simulation, we employ a grid-based approach [57] by storing the particles in a coarse-grained grid ( Fig.  1(e)), which can improve the efficiency in searching for neighbors. With a typical microscopy video of n ≈ 2500 cells and T ≈ 100 time frames, the time it takes to perform feature selection, parameter estimation, and simulation of dynamics with particle interactions, is less than 30 seconds on a desktop computer. This provides almost immediate feedback that can be used to inversely guide the design of the experiment. The data sets and code used in the article have been made publicly available [58].

II. FEATURE SELECTION FROM EXPERIMENTAL RESULTS
We begin by discussing experimental methods for live cell imaging and the process of extracting trajectories from cells cultured on aligned (nematic) and disordered (isotropic) liquid crystal elastomer substrates. We then conduct exploratory data analysis and statistical tests to identify significant features from the experimental data. These features are used as building blocks to extend a baseline model. Lastly, we perform simulations to validate our findings in Section III.
A. Experimental setting: in vitro cell alignment experiment The experimental data were derived from [20], consisting of cell trajectories. Liquid crystal elastomer (LCE) substrates were fabricated with a mixture of reactive monomers RM82 and RM23 (SYNTHON Chemicals GmbH & Co.) in a 1:1 molar ratio with the molecular field having either isotropic or nematic (uniform along the x-direction) configurations. The monomers were crosslinked to make a solid film, which was topographically flat but molecularly aligned, previously examined by wide angle X-ray scattering and atomic force microscopy [20].
Human dermal fibroblasts (HDFs, American Type Culture Collection) were seeded onto the substrates at cell density ρ ≈ 50 mm −2 , and grown to desired ρ. Prior to imaging, cell nuclei and cytoplasm were stained with Dyes Hoechst 33342 and CellTracker Deep Red (both from ThermoFisher), respectively. Cells are maintained at 37 o C, 5% CO 2 and imaged every 20 min for about 36 hours. Tens of images were taken at every time point and stitched together to construct an image on the order of square millimeters in area. LCE nematic substrates impose a molecular direction to guide HDF growth. During the experiment, cell density ρ grows from roughly 350 to 400 mm −2 . With an increase in cell density, the monolayer undergoes a disorder-to-order transition. Initially, there were approximately 2600 cells captured within the frame. When we finished imaging, the count had increased to about 2950 cells. To obtain the trajectories as inputs, we fit an ellipse to the nuclei and track particle trajectories using ImageJ [59], a standard cellular imaging tool. On the nematic substrate, cells develop a long-range order along the x-direction, the same direction as the molecular orientation in the aligned substrate, but not on isotropic substrates.
From the experiment, we extract the 2D position vector s i (t) = (s i,x (t), s i,y (t)) T and velocity vector v i (t) = (v i,x (t), v i,y (t)) T of cell i, at time frame t, for i = 1, ..., n t , with n t being the number of cells on time frame t, for t = 1, ..., T . The velocity is computed by dividing the displacement over time intervals as follows: v i,x = si,x(t+∆t)−si,x(t) ∆t , and similarly for v i,y . We then store the data matrix, where each row contains the 2D position, velocity and a unique cell identification number resulting from trajectory linking. The goal is to construct an interpretable model and constrain the model by data, for reproducing the temporally dependent variability of the velocity and orientational order parameter. We plot the standard deviation of the velocity at x and y directions at each time frame in Fig. 2(a). Since the nucleus does not distinguish between the head and tail, its order is apolar, we cannot apply the polar order parameter to quantify the system alignment [60]. Instead, we first transform velocity angle of the particle i at time frame t, denoted by θ i (t) = arctan(v i,y (t)/v i,x (t)), to [0, π] by letting:θ The orientational order parameter at time t is an ensemble of transformed velocity angles amongst all cells: S(t) = ⟨cos(2θ i (t))⟩ i . This way, antiparallelly traveling cell pairs (θ i = θ j + π) have the same contribution to orientational order parameters. In Fig. 2(d), we show that the orientational order parameter S(t) increases with time. The distributions of the velocity angles at two different time points are plotted in Fig. 2(e) and (f), showing that over time the velocity distribution becomes more parallel along x-direction.
To account for observed migrational patterns, we start by identifying unexpected deviations of the magnitude and distributions from the classical Vicsek Model. Exploratory data analysis (EDA) [61] is a useful step to visualize patterns from complex experimental data before performing statistical tests and modeling. Here, we first perform EDA on various aspects of the data, followed by statistical tests and estimation to identify features to include in the modeling. The main text focuses on the analysis of an experiment where cells initially cover roughly 50% of the substrates. During the experiment, the order parameter increases rapidly with cell proliferation. We present results on the analysis of two additional experiments at different cell densities or on isotropic substrates in Appendix D.

B. Permutation F-test on variances of the velocities
To test whether the variances of the velocities along x and y directions are the same at all time frames, we first compute the sample standard deviation of the velocity vector at x and y coordinates:σ 2 , wherev x (t) andv y (t) are the mean of the velocity at time t computed from an ensemble of all cells in the system, and both are close to zero. As shown in Fig. 2(a), we found that the standard deviation of the velocity along the x coordinate is always larger compared to that along y. The larger magnitude of cellular velocity along the x coordinate is induced by the liquid crystal elastomer substrate, which is molecularly aligned along the x direction in this experiment. As the velocity magnitudes along x and y directions become more unequal over time, the orientational order parameter increases ( Fig. 2(d)).
Is the observed difference between σ x (t) and σ y (t) due to signal or noise? Typically, testing the equality of the variance is performed using the F-test assuming both populations are normally distributed. However, as we will see later, velocity distributions do not follow a normal (Gaussian) distribution, while the F-test is sensitive to the normality assumption [62].
Here we circumvent the normality assumption by running a permutation F-test. The permutation test is a general nonparametric test that is insensitive to the distributional assumption. At each time t, we begin by combining the velocities at x and y directions into a long vector. Subsequently, we randomly assign these combined velocity values into two groups of equal size, creating a permuted sample. We repeat this step B times to col- ,y denote the sample variances of the velocity for x and y directions, respectively, from the permuted sample b. The p-value is twice the minimum of the probabilities Pr(F (b) (t) > F (t)) and Pr(F (b) (t) < F (t)), which can be computed empirically using the permuted samples. Figure 3 shows the distribution of the F-test statistics of the permuted samples and the observed F-test statistics for velocity observations at time equal to 20 hours. Since the observed F-test statistics is larger than any of the permutation sample, the p-value is smaller than 10 −4 , indicating that the variance of the velocity along x and y directions is unequal at this time frame. We perform the permutation F-test for each of the time points and verify that the variances σ 2 x (t) ̸ = σ 2 y (t) for all time points. Thus, cellular movements are anisotropic when cells are grown on a molecularly aligned substrate.

C. Shapiro-Wilk test for normality of velocity distribution
While at first glance cell motility bears much resemblance to passive, freely diffusing particles, our study reveals that velocity distributions in our system devi- Theoretical Quantiles have spiky modes and heavy tails, which cannot be captured by the best-fit Gaussian distribution (Fig. 2, black dashed curves), for which the mean and standard deviation are specified as the sample mean and sample standard deviation, respectively. The shape of the probability density distribution suggests the use of the Laplace distribution ( Fig. 2(b) and (c), orange solid curves), which fits the velocity distributions reasonably well.
To test whether the velocity distribution follows the normal distribution at each time frame, we compute pvalues of the Shapiro-Wilk test for normality [63], plotted in Fig. 4. The p-values are all extremely small, indicating the velocity distribution at any time frame substantially deviates from Gaussian. Furthermore, we plot the sample quantile-theoretical quantile from a normal distribution (QQ-plot) of the velocities along both directions at 20 hours, both of which deviate from theoretical quantiles, shown as the black line in the inset. The p-values get smaller at later time frames when the system approaches jamming, as the network effects are larger. Furthermore, the p-values for velocities along y-direction are smaller than those from the x-direction, indicating that the deviation from normality is also bigger, signaling more confinement along y-direction.

D. Neighboring feature construction for cellular interaction
Next, we discuss the contribution of the neighbor ensemble to cell alignment. For any cell, the conventional choice of the neighboring set is to include all cells within a radial distance r (as shown in Fig. 5(a)). Evidence of cell intercalation, where cells squeeze past their neighbors and they exchange positions, is increasingly found in recent studies [64,65]. When antiparallelly aligned cells move past each other, they minimally impact each other's velocity. This motivates us to investigate a neighboring set that excludes the cells with opposite velocities, as shown in Fig. 5(d). To compare these two approaches of accounting for the neighbors, in Fig. 5(b), we plot the cellular velocity of each particle v i,x (t) at time = 20 hours versus the mean velocity of neighboring cells at the previous time frame at x coordinate, while in Fig. 5(e) the same quantities are computed when the cells with opposite velocities are excluded from the neighboring set. As the neighboring set in the previous step includes the cell itself, it is not surprising that both approaches yield a relatively high 1-step forecast accuracy, since individual cells tend to migrate with a persistent velocity [66].
The correlation in Fig. 5(e) is around 0.64, which is substantially higher than the correlation of 0.41 in Fig.  5(b), indicating that excluding the particles from the neighbors substantially improves the 1-step predictive accuracy. The same conclusion holds for velocity updates along the y coordinate, shown in Fig. 5(c) and (f). Furthermore, by evaluating the 1-step correlation over all time points (Fig. 5(g)), we find that the method including only same-direction neighbors substantially outperforms the one including all of the neighbors, as in the Vicsek model. These results indicate that the HDF cells in our experiment appear to distinguish between head and tail polarities [67], despite extensive analogy that has been drawn between weakly interacting fibroblasts and active nematics [43,68],

E. Reduced magnitude of local alignment
Unlike [69], we do not normalize the velocity to keep the velocity magnitude to be the same, as the velocities can also change due to an increase in cell density.
are the mean of the velocity of the neighboring cells at time t − 1. To further test whether the statement is statistically significant, we compute the maximum likelihood estimator of the weights when the random fluctuation follows a Laplace distribution and generalized Gaussian distribution, with a different set of parameters at x and y coordinates.
As shown in Fig. 5(h), the maximum likelihood estimators of weights w x (t) and w y (t) are both smaller than 1. Furthermore, the shaded area shows that the 95% confidence intervals of the estimation, which is calculated by the residual bootstrap estimate [70]. In all plots, we found the upper bounds of the 95% confidence intervals of w x (t) and w y (t) are smaller than 1 at any time frame, indicating that the velocity of a cell aligns with the velocity of neighboring cells, while the velocity alignment is offset by other forces, such as frictional force between substrates and cells [13,23,71,72].

F. Summary of data-driven findings and model development
Here, we summarize the discovery from the EDA and statistical tests. First, variances of the random motion along the x and y coordinates are distinct, and both decrease over time due to cell proliferation. Second, the probability density of the velocity distribution at both x and y directions has shown non-Gaussian behavior with a heavy tail and spiky mode near zero. Third, excluding cells with opposite velocities substantially improves the correlation between cellular velocity and mean velocity of the neighboring cells at the previous time step, as cells glide across each other in a manner reminiscent of intercalation [73]. Fourth, the slope of the coefficient of a linear fit between the particle density and mean of neighboring particles in the prior time frame is smaller than one, indicating velocity alignment with its neighboring particles is offset by cell-substrate forces. All of the statistical tests we have developed so far are not restricted to the context of externally guided alignment; they are generally applicable to video microscopy which contains rich spatiotemporal data. Next, we illustrate how these analyses can be utilized to develop simulation models and parameter estimation for these models.

III. MODEL CONSTRUCTED FROM SELECTED FEATURES AND MAXIMUM LIKELIHOOD ESTIMATION
Our objective is to develop a minimum physical model that can account for temporally dependent orientational order parameters and velocities. We begin our model construction based on the seminal work by Vicsek [21], consisting of agents moving at a constant speed and updating their direction of movement at each step. The "new" velocity direction is determined by the summation of the average of the velocities of neighboring agents in the prior time frame, plus a random fluctuation. Despite its simplicity, this system exhibits a wide range of phenomena, including a transition from disordered to ordered behavior by adjusting the magnitude of the fluctuation. Later modification to this model includes incorporation of distance-dependent interactions [23], which effectively reproduces observed phenomena such as local cohesion [69] and jamming transitions [74]. These phenomena are characteristic of a migrating epithelium [75], which has strong cell-cell junctions. Here we utilized a data-driven approach to make systematic improvements to the Vicsek model by incorporating the four selected features into the model. Though some of these aspects have been noted to some extent in literature, to our knowledge, there has been no work incorporating all of them and systematically testing their applicability for quantitatively reproducing the evolving orientational order observed in the experiment. The velocity vector of the ith particle at time frame t, v i (t) = (v i,x (t), v i,y (t)) T , for t = 1, ..., T and i = 1, ..., n t , is modeled by two terms. The first and second terms represent the cell-cell, and cell-substrate interactions, respectively, as follows where ϵ i,x (t) and ϵ i,y (t) are independent zero-mean random variables with variances V[ϵ i,x (t)] = τ 2 x (t) and V[ϵ i,y (t)] = τ 2 y (t), respectively,v i,x (t − 1) andv i,y (t − 1) are the mean of the x and y directional velocities of neighboring particles at time t − 1, and w x (t) and w y (t) are real-valued scalars denoting the weights. The mean of the velocity at the x and y coordinates is modeled bȳ where ne i (t − 1) is the neighboring set of cell i at time frame t − 1 and p nei(t−1) is the number of cells in the neighboring set. Let s i (t − 1) be the 2D position of cell i at time frame t − 1. The neighboring set is defined as , which contains particles within a radius distance r but excludes particles with opposite velocities. Section 2 II C provides compelling evidence that the fluctuation distribution in the model significantly deviates from Gaussian distributions. Here, we introduce two distributions to model the random fluctuations in velocity: the Laplace distribution (or the double exponential distribution) and the generalized Gaussian distribution (or the stretched exponential distribution). Both distributions have been fit to the probability distribution of displacements that go beyond Gaussian assumptions, particularly when the system approaches jamming [49,56,76,77]. Yet, finding the maximum likelihood estimator of the parameters with non-Gaussian fluctuations is not a computationally trivial task, given that the video contains 10 2 time frames and each frame has over 2000 cells. Hence, we introduce computationally-scalable approaches to compute the maximum likelihood estimator for models with these non-Gaussian fluctuations.

A. Maximum likelihood estimator with Laplace fluctuation
For the sake of simplicity, we will only demonstrate the model using velocity along the x direction as an example. The model for velocity along the y direction can be constructed in a similar manner. To model the fluctuation by the Laplace distribution ϵ i,x (t) ∼ Laplace(0, τ x (t)), entails that the residual of velocity of particle i at time t e i, The maximum likelihood estimator of w x (t) can be obtained by iterative algorithms such as the expectationmaximization algorithm and iterative reweighted least squared algorithm [78,79], while a faster alternative is through linear programming [80,81]. The likelihood function and the maximum likelihood estimator of the weight parameter,ŵ x (t), are introduced in Appendix A. The maximum likelihood estimator of the standard deviation of the random fluctuation can be obtained by maximizing the profile likelihood after substituting inŵ x (t): B. Maximum likelihood estimator with generalized Gaussian distribution In general, both the Laplace and the Gaussian distribution are special cases of the generalized Gaussian distribution (GGD), also referred to as a stretched exponential distribution [82,83]. Here, we illustrate the formulation by applying the model to fit the velocity component along the x direction. We assume that the residual e i,x (t) follows a zero-mean GGD with parameters α x (t) and β x (t) at time frame t: . (8) The variance of the GGD follows τ 2 x (t)Γ(3/β x (t))/Γ(1/β x (t)), with Γ(·) denoting the gamma function. When β x (t) = 2 and α x (t) = √ 2τ x (t), GGD reduces to a Gaussian distribution. When β x (t) = 1 and α x (t) = τ x (t)/ √ 2, GGD reduces to a Laplace distribution. Therefore, the shape parameter β x (t) controls how close the GGD resembles a Gaussian distribution. Similarly, the GGD of the residual along y can also be defined in terms of parameter α y (t) and β y (t) at any given time frame t.
It should be noted that at each time frame, the GGD has three parameters, and some of these parameters, such as the power parameter, are notoriously difficult to estimate [84]. Hence, we derive closed-form derivatives to numerically compute the maximum likelihood estimator of the parameters, introduced in Appendix B.

IV. RESULTS
Here we perform simulations to compare models with and without the selected features, and the simulation details are provided in Appendix C. We compare models using simulated order parameters and variability of velocity. The simulated order parameters is computed by S(t) = ⟨cos(2θ i (t))⟩ i withθ i (t) defined in Eq. (1). As the velocity distribution more closely resembles a Laplace distribution, we use the average absolute deviation or L 1 loss to measure the variability of velocity:σ and v y (t) ≈ 0, instead of the more commonly used variance or L 2 loss to quantify the variability of the velocity. This is because the velocity distribution is closer to a Laplace distribution than a Gaussian distribution (Fig. 2(b) and (c)), and thus the L 1 loss is more appropriate to account for the variability of the distribution.
In Fig. 6, we compare experimental observations and simulated results of the particle-based simulation, from which we computed the dynamics of the orientational order parameters and the average absolute deviation of the velocity. We apply the estimation and simulation procedure to scenarios where either all features are present or one or more of them are missing. Remarkably, we find simple models from Eqs. (2)-(3) with four features found in Section II A can reproduce the dynamical progression of order parameters (solid curves in Fig. 6(b) and (c)) and average absolute deviation of the velocity (Fig. 6(e) and (f)), even though we do not fit a loss function for these ensemble properties directly. In Fig. 12 and Fig.  13 in Appendix D, we show the simulation using the estimated parameters from two other experiments and observe that our model is sufficiently general to capture the progression of system characteristics with varying initial densities and substrate materials.
In all simulation models, the magnitude and variability of the velocity along the x coordinate are both larger than those along the y coordinate, stemming from our first finding that the velocity distribution is anisotropic. The growing anisotropy of the velocity induced by the molecularly aligned substrate leads to the increase of orientational order parameters.
Second, the simulation with neighbor ensembles that include only cells traveling in the same direction (solid curves in Fig. 6) more accurately reproduces both the progression of the orientational order and velocity than the simulation with interactions that includes all cells within a radial distance. This finding, which is derived solely from tracking the nuclei, reflects what has been observed from the video microscopy [? ]: cells elongate, pulling on one another, as they migrate in the same direction. Conversely, when a pair of cells move pass each other in opposite direction, both cells maintain similar direction and velocities before the encounter. This effect likely occurs because of the complex interplay of cell-cell interactions, which are mediated through a cascade of signaling molecules. Consequently, the direction and polarity of contact are crucial factors in determining the strength of cell-cell interaction [67].
Third, the weight parameters w x (t) and w y (t) estimated by observations are always smaller than 1 (Fig.  5(h)). In the simulation, we find that the model with estimated weights typically outperforms the model where the weights are constrained to be 1, particularly for reproducing the dynamical progression of the average absolute deviation of the velocity (Fig. 6(a-c)). If the estimated slope parameters are equal to 1, the simulated velocity magnitude is frequently larger than what is observed. In comparison, the inclusion of the estimated weights appears to resolve this issue, as the velocity is constrained. This effect likely arises due to a loss of momentum to friction. Furthermore, a unique aspect of our formulation is that the variability of velocity can change over time, applying estimated weights enables us to capture a wider array of behaviors, such as slow-downs. For instance, with increasing cell density, the average magnitude of the velocity must change. Ultimately, crowding leads to arrested dynamics [5]. Nonetheless, simulation models often oversimplify this aspect by only modeling the velocity angle, overlooking the important role of slowdowns or heterogeneous dynamics [21,25].
Lastly, we find that the simulation model validates the finding that velocities are not normally distributed (Fig.  4)-the model with fluctuations following Laplace distributions ( Fig. 6(b)) better reproduces the progression of the orientational order parameter than the model with Gaussian fluctuation (Fig. 6(a)), even if they contain the same number of fitting parameters. Besides, the model with GGD fluctuation (Fig. 6(c)) fits slightly better than the one with Laplace fluctuation, but it also contains one more parameter at each time point than both the Gaussian and Laplace distributions, and thus it has more flexibility in controlling the decay of the tail of the distribution. It is important to note that reproducing solely the variability of the velocity is inadequate to replicate the velocity orientational order parameter in our system, which is sensitive to the change in the velocity distribution.
To further explore the difference between the effect of different random fluctuation distributions, in Fig. 7, we show the distribution of the simulated residuals in x direction in (a-c): ) by the colored circles and triangles, and the solid curve denotes the distribution of the residuals in the simulations. The fitted weight is derived from the maximum likelihood estimator where the fluctuations follow Gaussian, Laplace, and GGD, from the left to the right. Simulated Gaussian fluctuation distributions along x and y directions are shown in Fig. 7(a) and (d), which underestimates the number of cells with near-zero velocities in both directions. Instead, simulation fluctuation distributions with either Laplace distribution or GGD better reproduce the experiments, as shown in Fig. 6(b), (c), (e) and (f).
Here we use a neighbor radius of r = 75 µm. Fit of orientational order parameters by models with different neighbor radii is plotted in Appendix E, and no significant difference amongst them has been observed. Thus, the radius seems to play a role in refining the fit, but to a much lesser degree than the choice of neighbor and fluctuation models. To further explore difference between models, we present a summary of the results of incorporating various modeling parameters and compare the simulation to the experimentally obtained orientational order parameter in Table I. The root mean squared error (RMSE) of the velocity orientational order parameter is where S * t and S t are the ensemble orientational order parameters from the simulation and experiments at time frame t, respectively. The RMSE forσ x andσ y can be defined similarly, and they are shown in Table II. RMSE values that are better than the baseline average absolute deviation are bolded. Values from Table I and II indicate  TABLE II. RMSE for the average absolute deviation of the velocity estimatesσx andσy. All assumed the weights parameters from the fluctuation from the x and y directions are separately estimated. The benchmark RMSE of the average absolute deviation (calculated by the standard deviation of the average absolute deviation) forσx,σy are 5.7×10 −4 and 5.3×10 −4 , respectively. An effective method (highlighted in bold) has RMSE smaller than this value. AN (= all neighbors) and SDN (= same direction neighbor) stand for methods with all neighboring cells within the radius and the same direction neighboring cells, respectively. RMSE = (table value)  that, while there are several effective methods to describe the average absolute deviation of velocity, it is more difficult to capture the progression of the order parameter. This is not surprising as the order parameter is a complex function of the distribution, which is not explicitly controlled by any parameters in the likelihood function, given v i,x and v i,y are modeled independently. When modeling the velocity fluctuation, approaches incorporating Laplace and GGD fluctuations better capture the progression of the orientational order parameter, which is the main characteristic of the system, compared to the model with the Gaussian fluctuation.

B. Sensitivity analysis of the simulation
As captured by Eqs. (2) and (3), the anisotropy of substrate materials induces asymmetric velocities in cells, which are manifested through both interactions and fluctuations. These contributions are temporally dependent, as changes in cell density due to proliferation lead to fluctuations in the velocity magnitude. For a given set of model parameters and fluctuation distribution, it will be helpful to understand their impacts on the asymptotic order parameters, such as those that can be achieved after enough time has elapsed, for refining experimental designs and controls.
Here, we perform a sensitivity analysis for simulation with Laplace fluctuations and all other selected features. We vary the ratio of the variance of the fluctuation τ x /τ y and the ratio of their weights ω x /ω y in the model (Eqs. (2) and (3)). While the variation of the order parameter is controlled by four parameters (τ x , τ y , ω x , and ω y ), we find that the estimated ω x (Fig. 5(h)) does not change drastically over time. Furthermore, we show in Appendix Bthat simulations with two different choices of τ x do not have a large impact on the variation of the order parameter so long as the ratios of ω's and τ 's remain the same. Hence, we are able to represent the asymptotic value of the order parameter on a 2D plot with τ x = 0.02 and ω x = 0.7 fixed to be the mean of estimation over all time frames. Each simulation is implemented in 70 time points, where the order parameter fully equilibrates after 20 time points (see the evolution of the order parameter for select simulation parameter in Appendix B. Due to this, the order parameter in Fig. 8 was computed by averaging the last 50 time points. We find that the asymptotic order parameter increases when either τ x /τ y or ω x /ω y increases (Fig. 8). The order parameters computed from the experiment at different time points are overlaid on top of the diagram, where the warmer colors represent a larger magnitude of the order parameter, consistent with the color scheme of the phase diagram. The ratio ω x /ω y fluctuates around a fixed value ∼ 1.2, potentially due to fixed properties of the external substrate guiding cells to migrate preferentially along x. That is to say, the substrate has a fixed modulus ratio along the x and y directions, as found in [20]. This ratio is reflected in the variance of the velocity along the corresponding directions, thereby restricting their ratio. As the system evolves over time, the caging effect becomes more prominent along y, which increases the ratio τ x /τ y . This analysis provides further evidence to support our conclusion that substrate anisotropy and cell crowding both influence alignment. However, at low cell densities, the alignment effect is likely drowned out by noise.
Our findings also open up exciting design opportunities, where τ x /τ y can potentially be controlled by varying the initial cell density or the composition of the substrate [19], so the cell alignment dynamics can be tuned as a result. Overall, good agreements between the experiment and simulation are obtained.

V. DISCUSSIONS
Cells can be induced to form aligned structures through a complex cascade of physiochemical signaling processes in both in vitro and in vivo settings. Experiments have shown that cells can migrate preferentially following the molecular orientation of the substrate. This preferential migration results from two interrelated effects: cell-cell and cell-substrate interactions. The former is due to the crowding due to the presence of other cells, and the latter is from cell polarization due to the substrate. Cell polarization occurs due to anisotropic interactions between the cell and the substrate or the extracellular matrix, including friction, damping, binding, and directional proliferation. Cell-cell and cell-substrate interactions both drive the progression of velocity and order parameters. Neither the effect from one-body external potential acting on the random fluctuation term nor cell-cell interaction can be overlooked in our system. The combination of these two effects distinguishes our system and models from many prior studies [30][31][32]85]. We model these two contributions separately and quantitatively to reproduce the temporally dependent order parameters and velocity from the nonequilibrium process of cell alignment.
Here, we extract the cellular trajectory of a few thousand interacting cells from video microscopy, and use this trajectory information to identify key factors required for reproducing the progression of velocities and alignment order parameters. We found that when fibroblasts are cultured on an anisotropic substrate, global cellular alignment typically develops at high cell density but not low cell density [20]. This distinction sets our work apart from previous studies [17,18], as alignment forces acting on individual cells alone were insufficient in in-ducing alignment. Thus, our system cannot be simply regarded as particle alignment driven by a one-body potential from an external field, and the cell-cell interaction has to be included in modeling. To model complex cellcell interactions and anisotropic random fluctuations, we develop data-driven methods to select features and expand the baseline Vicsek model, by utilizing these selected features. Our findings highlight the importance of anisotropic, non-Gaussian distributions of velocities in reproducing the progression of cell movements guided by the molecularly aligned substrates. Of equal importance is the construction of neighboring interaction, by eliminating contributions from cells with opposite velocities and applying weights different from unity. These features are reminiscent of recent works that take into account the directionality of cell-cell instantaneous velocity in modeling interaction, particularly in the context of cell-extracellular matrix interaction [86], and models that include contact-induced inhibition [13,87,88].
The shape of the probability density of the temporally dependent velocity has a significant impact on accurately reproducing orientational order parameters and velocities through simulation. We observe that the experimental velocity distribution vastly deviates from a Gaussian distribution (Fig. 4), as the velocity distributions of a large number of cells that have undergone minimal movement, along with a non-negligible group of cells that have moved a substantially long distance over a specific time interval h. Non-Gaussian distribution of displacement has previously been reported in systems with heterogeneous dynamics, as noted in [48,89]. The high concentration of immobile cells centered at zero can be attributed to cells that are confined by their neighbors. On the other hand, the presence of heavy tails of the velocity distribution can be attributed to the coordinated movements of many particles, also known as "jumps", as discussed in [56]. Empirical studies have also shown that a significant jump is often followed by other jumps [47]. Consequently, these probability densities exhibit slower rates of decay in the tail of the distribution compared to a Gaussian distribution, whereas they are more appropriately captured by the Laplace distribution [49]. Although various experiments have shown anomalous distributions of displacement probability [47-49, 51, 89], it has been less recognized that the distribution of random fluctuations plays a critical role in reproducing the progression of orientational order parameters observed in the experiments, partly because of a lack of means of efficient parameter estimation. Our work has filled in this gap by developing a maximum likelihood estimator for model parameters and a fast simulation scheme for non-Gaussian dynamics with the estimated parameters. Our findings demonstrate that the temporal progression of orientational order parameters can be more accurately captured when modeling cell-substrate interactions using a Laplace fluctuation of noise, as opposed to a Gaussian fluctuation, despite both having the same number of parameters.
Our analysis has also revealed that the distinct be-haviors of cells on isotropic or nematic substrates can be largely attributed to the presence or absence of anisotropic velocity. Our model captures this phenomenon by demonstrating that a difference in variability of the velocity along x and y directions leads to alignment (Figs. 6 and 12), while cells on isotropic substrates do not develop any order (Figs. 13). We further show that asymptotic alignment can be controlled by the ratio of the weights and variance parameters (Fig. 8), and the order tends to approach zero as the velocity becomes more isotropic. Our procedure has several distinctive features that showcase data-driven discovery. A popular approach is to include a dictionary basis [36,90] and use regression to estimate the coefficients of this basis. However, including irrelevant features is like adding unnecessary noise into the models, which can drastically reduce the efficiency of estimation. Rather than including all potential covariates, we begin by selecting the features to be included through exploratory data analysis and statistical tests. Such an approach produces more interpretable models and improves the efficiency of estimation, since only features with significant effects will be included. Second, we provide a method for the feature selection and testing for cell studies. Conventionally comparing all models with a combination of p features requires 2 p simulations, which could be very inefficient. Here the statistical test of a feature do not depend on assumptions of other features, and thus only p tests are needed for feature selection. This new hybrid approach can be utilized in other systems to aid physicists in automating the tasks of visualization and feature selection, and extending baseline physics models by incorporating the selected features. Lastly, by using the selected features we define a data generative model, instead of minimizing a loss function as conventionally adopted in other machine learning approaches. The data generative model provides a probabilistic mechanism, where the uncertainty of the estimation can be rigorously estimated and propagated throughout the analysis. We demonstrate that the selected features are key ingredients to capture the progression of alignment dynamics.
A few additional directions are of interest for future work. First, it would be helpful to quantify the effect and establish the physical mechanism of the selected features. These include, for instance, quantifying diminished influence of opposite-direction neighbors by morphological analysis of their deformations. It is also worthwhile exploring fluctuation distributions with multiplicative noise variances (i.e. the noise variance of the fluctuation depending on individual cellular velocities), as these models were found to approximate the Laplace distribution in previous studies of cellular experiments [30,31]. In addition, the role of imaging noise and tracking error will be quantified.
Second, position-dependent interaction kernels were estimated using observations of larger objects such as golden shiner [41], surf scoters [42] and simulated par-ticles [38,40]. It would be interesting to include a second interaction term with cellular positions as the inputs. Various extensions of existing computational tools would be needed to achieve this goal, such as accelerating the estimation of kernel function with Laplace fluctuation distributions, and enabling temporally dependent interaction kernel functions. Furthermore, though the current model takes into account the influence of neighboring cells on velocity changes [30,31], it is important to note that cells also exhibit persistent motion as they travel, resulting in relatively smooth, one-dimensional spatial patterns [66]. Characterizing this feature may require inferring second order statistics from observations, whereas a discretized model with first-order approximation may not be sufficient [34]. On the other hand, the effect of increasing cell count and the corresponding slow-down as the system approaches jamming must be more carefully modeled in order to effectively forecast the variance of the velocity fluctuation.
Ultimately, the velocity changes are governed by forces [91][92][93]. Direct, in-situ force measurements will help further elucidate mechanisms and refine our models. Efforts must be made to reconcile the anisotropic nature of the substrate that induces cell alignment with the isotropic assumption often made in mechanical analysis to deduce force, as in the case of traction force microscopy [94]. Other important considerations that can also be integrated are cell shapes and the restructuring of their subcellular structures such as actin filaments, which provide mechanical supports. When tracking cell migration, we largely rely on fitting an ellipse to the nuclei. In order to monitor the dynamics of aforementioned features, algorithms without particle tracking, such as Fourier-based differential dynamic microscopy [95][96][97], can be applied to extract system properties such as mean squared displacements, for inspecting the mechanical properties of the system. Finally, in the case of other types of cells and microenvironments, similar procedures can be followed to discern the presence of different cell movement features in the model construction.
where the residual of the ith particle at time frame t is defined as e i, For any time frame t, the maximum likelihood estimator of w x (t) is equivalent to the least absolute deviation (LAD) regression below: Although there is no closed-form solution for the estimator in the LAD regression, fast algorithms, such as the Barrodale and Roberts (BR) algorithm [81], that can transform the LAD regression into a linear programming problem, are available. The BR algorithm is available in standard software platforms. For instance, the package "L1pack" in the Comprehensive R Archive Network (CRAN) implements the BR algorithm to solve a LAD problem. After obtaining the estimatorŵ x (t), we substitute it into the likelihood function and maximize the profile likelihood to obtain the maximum likelihood estimator of τ x (t), which is given in Eq. (7).

APPENDIX B MAXIMUM LIKELIHOOD ESTIMATOR WITH GENERALIZED GAUSSIAN FLUCTUATION
Next, we discuss the maximum likelihood estimator for models with generalized Gaussian fluctuation. To do so, we denote three vectors of parameters w x , α x , and β x , each having T dimensions. The logarithm of the likelihood function follows We first differentiate the likelihood function with respect to α x (t) and set it to zero. For any given β x (t) and w x (t), the likelihood is maximized when . (A2) After substituting theα x (t) from Eq. (A2) into the log-likelihood function, we obtain the logarithm of the profile likelihood of (w x , β x ): Denoting the logarithm of the profile likelihood by ℓ(w x , β x ) = log(p(v x | w x , β x ,α x )) and differentiating Eq. (A3) with respect to w x (t) and α x (t), we then find where Ψ(z) = Γ ′ (z)/Γ(z) stands for the ratio between the derivative of a Gamma function and a Gamma function for any z, and sgn(e i,x (t)) denotes the sign of e i, . Thereafter, we iteratively maximize the likelihood function with respect to w x (t) and β x (t) using the profile likelihood in Eq. (A3) and closedform derivative in Eqs. (A4) and (A5) by the low-storage quasi-Newton optimization method (L-BFGS) [98]. Note that, for each t, we only need to iteratively maximize the log profile likelihood with respect to two parameters, making the computational procedure both fast and robust. The maximum likelihood estimators of parameters α x (t), α y (t), β x (t), β y (t), w x (t) and w y (t) assuming the fluctuation follows the generalized Gaussian distribution, are shown in Fig. 9. The estimated β x (t) and β y (t) are both close to 1, which means that the fluctuation is closer to the Laplace distribution. In addition, β y (t) is typically smaller than β x (t), as there is more confinement in the y direction because of the substrate-imposed directionality. Furthermore, both w x (t) and w y (t) are smaller than 1 for any t, consistent with the findings when assuming the fluctuation follows the Laplace distribution.
To demonstrate the impact of parameter selection on the order parameter, we investigate the 2D parameter space defined by the ratios ω x /ω y and τ x /τ y . Figure 8 illustrates one possible path of order development, using a selected set of parameters τ x =0.01 and ω x =0.7. Next, we show that when τ x =0.01 and τ x =0.04, encompassing the experimentally observed values range, the order parameter exhibits consistent behavior along the temporal course of the experiment for different τ x values (= 0.01, 0.02, 0.04) [ Fig. 10(a) and Fig. 10(b)]. To determine the order parameter, we average the last 50 steps in the simulation, as it reaches a plateau after the first 20 steps (Fig. 11).

APPENDIX C DETAILS ON SIMULATION
We first briefly introduce the setup of the simulation: Cells are modeled as particles, and their initial velocities and positions are imported from the data. For simplicity, we assume that the cell number remains constant. We start by partitioning the space into grids, and the spacing between gridlines is no smaller than the cell-cell interac-  10. (a-b) Change of order parameters due to the change of simulation parameters with τx = 0.01 in (a) and τx = 0.04 in (b), which cover the range of the observed τ 's. wx = 0.7 are assumed for both cases. Red and blue X's denote the beginning and the end of the trajectory. tion radius r. In our simulation, grids are populated with cells based on cells' positions s i (t) = (s i,x (t), s i,y (t)) T (Fig. 1(e)). The grid-based approach [57] improves computational efficiency for searching neighbors in simulation since as such, identifying a cell's neighbors only requires searching the nine grids surrounding it. Using this method, computing the likelihood function for parameter estimation and simulation only requires O( T t=1 n t ) operations, which is much faster than the conventional approach requiring O( T t=1 n 2 t ) operations, where n t is the number of cells at time frame t. We note that this coarse-graining strategy to search for neighbors only improves computational efficiency, but does not change the simulation results.

Burn-in Averaging
The velocities v i,x (t) and v i,y (t) are updated according to Eqs. the same direction velocity, and weights (ω x and ω y ) of the interaction fixed to be 1 or estimated from the data, which are due to cell-cell interactions. All parameters are estimated based on the maximum likelihood estimators introduced in Section III.

APPENDIX D ADDITIONAL RESULTS ON EXPERIMENTAL DATA
The four new features are used to construct a minimum physics model that can quantitatively capture the progression of the orientational order parameter and velocity distribution, for various experiments with different initial cell densities and liquid crystal elastomer substrates. The results of several additional experiments are also analyzed to validate the generality of our algorithm ( Fig.  12-13). Similar to the main text, we evaluate models by velocity orientational order parameter and the average absolute deviation of the velocity at each time point.
The experimental setup in Fig. 12 has the same substrate material as the one presented in Fig. 6. Except that at the start of imaging, cell density is higher. During the course of the experiment, the cell density ρ grows from ρ = 500 to 650 mm −2 . The simulation model with selected features can accurately capture the progression of orientational order parameter and velocity magnitude, shown in upper and lower panels in Fig. 12. Models without all selected features cannot reproduce some of the properties. In particular, if Gaussian fluctuation is used with the other three features (blue curve in panel (a)), the model substantially underestimates the orientational order parameter, while the model with Laplace or GGD fluctuations, shown by blue curves in panel (b) and (c), respectively, reproduces the progression of these parameters reasonably well. Furthermore, all approaches incorporating estimated weights and considering neighbor ensembles of only cells traveling in the same direction capture the average absolute deviation of velocities σ x andσ y reasonably well.
On the other hand, results presented in Fig. 13 are derived from cell movements on an isotropic substrate, during this time, cell density changes from ρ = 420 to 470 mm −2 . We have observed that, on the isotropic substrate, we reproduce order parameters that are around zero with any type of fluctuations ( Fig. 13(a-c)), signaling the lack of order, and the velocity variance is also isotropic ( Fig. 13(d-f)). It can be noted that when the cell moves on an isotropic substrate, choices of neighbor, fluctuation distributions, and weights become less important in fitting the order parameter ( Fig. 13(a-c)). Nonetheless, the absolute deviationσ x andσ y change over time due to proliferation. Approaches that utilize estimated weights and consider ensembles of neighboring cells traveling in the same direction effectively capture and accommodate these changes.

APPENDIX E ADDITIONAL RESULTS ON SIMULATION USING DIFFERENT NEIGHBOR RADII
Here, we explore the effect of radii in reproducing the velocity order parameter for the entire monolayer. Given the cell width ≈ 20 µm and length ≈ 100 µm in projection, we explore length scales comparable to these dimensions. Below, we plot the order parameters reproduced by using different pair-wise distances: 25, 50, and 75 µm, in simulation (Fig. 14). All simulations are reproduced by including only the same direction neighbors and applying weights less than 1, as discussed before. Our results show that the variation in r plays a much smaller role than changing the model for fitting the velocity fluctuations.