Dynamic Density Functional Theory of Multicomponent Cellular Membranes

We present a continuum model trained on molecular dynamics (MD) simulations for cellular membranes composed of an arbitrary number of lipid types. The model is constructed within the formalism of dynamic density functional theory and can be extended to include features such as the presence of proteins and membrane deformations. This framework represents a paradigm shift by enabling simulations that can access length scales on the order of microns and time scales on the order of seconds, all while maintaining near fidelity to the underlying MD models. These length and time scales are significant for accessing biological processes associated with signaling pathways within cells. Membrane interactions with RAS, a protein implicated in roughly 30% of human cancers, are considered as an application. Simulation results are presented and verified with MD simulations, and implications of this new capability are discussed.


I. INTRODUCTION
Cellular membranes are vital to a myriad of biological functions, as they not only control the structure and permeability of a cell but are also involved in processes such as cell adhesion, ion conductivity and cell signaling through the presence of membrane proteins. As such, the modeling of cellular membranes and their interaction with membrane proteins is of great interest to predict membrane dynamics at scales that are inaccessible by experiments. Unfortunately, as we discuss below, there is a significant gap between the resolution of most experiments and the scales that atomically-resolved simulations can achieve. Worse, many biological phenomena of interest, such as the migration and signaling of membrane proteins, exist in this elusive regime.
The primary simulation methods for modeling cellular membranes are molecular dynamics (MD) and Monte Carlo (MC) [1,2], in which either individual atoms or biologicallyrelevant clusters of atoms are evolved according to an effective force field [3][4][5][6][7]. As with any MD simulation, there will be limitations on the accessible time and length scales, which prevents MD from probing many biological problems. At current capabilities, meaningful MD simulations of membranes can achieve length scales of order 10 4 nm 2 and time scales of order 10 μs [8].
Continuum models are sometimes used to achieve larger scales but usually at the sacrifice of both accuracy and generality. Phase field models are a common continuum approach [9][10][11][12], which have been extended using stochastic Saffman- Delbrück hydrodynamics [13][14][15]. The primary challenge of these continuum models is that they rely on phenomenological constructions of the free energy, which must be fit to either experimental or MD simulation data. Furthermore, the lipid bilayer of biological membranes typically consists of hundreds of different lipid types [16][17][18], and a detailed description should resolve all common species. Unfortunately, phase field models are prone to an explosion of fitting parameters with an increasing number of species, and consequently, most phase field models of cellular membranes only include 2-3 species. To mitigate these challenges, multiscale methods are paramount in accessing these larger scales with continuum models while still maintaining some level of atomic-scale accuracy.
Classical density functional theory (DFT) is a formalism that not only connects macroscopic quantities with microscopic degrees of freedom without the need for empirical parameters, but it also naturally generalizes to multicomponent systems with an arbitrary number of species. Furthermore, dynamic DFT (DDFT), which is a nonequilibrium extension of classical DFT [19,20], thus presents an alternate approach to modeling cellular membranes that are fundamentally both multicomponent and dynamic. Like many other continuum models used for describing cellular membranes, DDFT evolves densities in accordance with an appropriate free energy functional where f (n) is the free energy density, n = {n i (r, t )} is a set of densities, and V is the domain of interest. DDFT is a growing field in modern statistical physics [21], with applications in both the physical sciences, such as polymers [22], colloids [23], granular media [24] and oxidation [25], as well as the life sciences, such as protein absorption [26], tumor growth [27], and epidemiology [28]. There have also been recent advances to extend the field to hydrodynamic DFT for applications like plasmas in which inertial and viscoelastic effects are relevant [29][30][31]. The primary goal of this paper is to provide the theoretical framework for modeling cellular membranes using DDFT, which has already been shown to be successful in practice [32][33][34]. The document is organized as follows. The mathematical model is described in full in Sec. II, with simulation results of this model being presented in Sec. III. Finally, conclusions and a discussion of these results are discussed in Sec. IV.

II. MODEL FORMULATION
In this section, we derive the governing equations of motion, which primarily relies on the construction of an appropriate free energy functional of the system, as detailed in Sec. II A. From this free energy, the corresponding dynamics can be inferred for the membrane lipids as well as any macromolecules embedded in the membrane such as proteins, as detailed in Sec. II B.

A. Free Energy Functional
The total free energy of the membrane will thus depend on lipid-lipid, lipid-protein, and protein-protein interactions and be decomposed in terms of these interactions as f = f + f p + f pp , respectively. To determine an approximate form for the lipid-lipid contribution, we can consider a membrane being composed of N species of lipid headgroups, where species are additionally distinguished if they are on the inner or outer leaflet of the membrane, and each species density n i = n i (x, y, t ). Unlike phase field models, which approximate the free energy density through gradient expansions of the density (or a relevant order parameter), DDFT models capture these interactions through a correlation expansion, which formally connect to the microscopic statistics of the system. Within the Ramakrishnan-Yussouff (RY) approximation [35], which truncates this expansion to second-order and assumes spatial isotropy, we can express the free energy density as Here, n i (r) = n i (r) −n i is the density fluctuation about its meann i , k B is the Boltzmann constant, T is the temperature and is the de Broglie wavelength, which will cancel out upon taking the gradients within the evolution equations. Finally, c i j (r) is the direct correlation function (DCF) between lipid species i and j, which can be calculated using atomically-resolved simulations such as MD. For example, two-dimensional radial distribution functions (RDFs) g i j (r) can be calculated between the lipid headgroups, and the corresponding DCFs can be constructed by inverting the Ornstein-Zernike (OZ) relations [36] given by where the total correlation function is simply defined as h i j (r) ≡ g i j (r) − 1, and the operation ( * ) denotes a convolution. Details of our g i j (r) calculation are discussed below. Due to the low collisionality of proteins on the membrane, a continuum description is inappropriate. Instead, each protein can be represented as one or more "beads" with position r k (t ). Given P beads, the lipid-protein energy density can be expressed as where u ik (r) is the potential of mean force (PMF) between the kth protein and the ith lipid species. Note that some of these PMFs are potentially zero in cases where the protein interacts with only one leaflet. As with the lipid-lipid interactions, the lipid-protein PMF can be calculated consistently with the RY approximation as Note that Eq. (6) is identical to the hypernetted chain (HNC) closure relation [37]. While the lipid-protein RDFs can be calculated directly from atomically-resolved MD or MC simulations, the lipid-protein DCFs must be constructed using Eq. (3) along with the additional OZ equations that include the contributions from the lipid-protein interactions as well: Similar to the lipid-protein interactions, the protein-protein energy density can be expressed as where the continuous density fields have been replaced with a distribution of point beads expressed here using Dirac delta functions δ(r), and each sum is now only over the bead indices k. Note that despite f pp being a free energy density, we have omitted contributions from the configurational entropy, as they are expected to be negligible when compared to the lipidprotein interactions. Understanding the interactions between proteins has led to a vast field of study, and even atomistic simulations have proven to be too computationally expensive to fully parametrize them. As such, a complete treatment of these interactions is beyond the scope of this work, and we simply rely on a phenomenological approach by assuming that the interactions are composed of an attractive and a repulsive component. An example of such an interaction is given by the Mie (or Kihara) potential, which is a generalization of the Lennard-Jones potential for nonspherical molecules [38,39]. This potential can be further generalized by shifting the radial dependence as r → |r − r 0 |, which introduces an excluded volume to account for the finite radius (r 0 ) of a given bead.
Equations (2), (5), and (8) thus form the leading order components to the free energy density of the lipid-membrane system in our model. An example schematic of the mathematical domain and relevant variables for a lipid bilayer with several embedded proteins is shown in Fig. 1.

B. Evolution Equations
Once the free energy functional is constructed, the evolution of the lipid density fields and protein beads can be determined. Each lipid density field, which is an inherently conserved quantity in the absence of chemical reactions, can be described using a continuity equation where v i is the associated velocity field. Rather than evolve the velocity fields as well, which will begin a hierarchy of evolution equations for each velocity moment, we can instead close this hierarchy by exploiting the disparate timescales associated with the lipid dynamics. Due to the extremely small Reynolds number at this scale (Re ∼ 10 −6 ), the velocity fields will rapidly relax when any force is applied, and thus inertial velocity fields can be neglected when compared to the much larger viscous forces at this scale. We can thus separate each velocity into dissipating and fluctuating components as where the first term will be approximated in the Stokes limit as v d i ≈ βD i F d i , and the second term will be approximated with a stochastic model v f i ≈ ξ i . Here, ξ i (r, t ) is a vector Wiener process with the statistical properties ξ i (t ) = 0 and ξ i, j (t )ξ i,k (t ) = (2k B T /m i )δ jk δ(t − t ) for each vector component, with δ i j being the Kronecker delta and m i being the lipid mass. Lastly, F d i is the lateral drag force on the lipids, β = 1/(k B T ) is the thermodynamic beta, and D i is the diffusivity of lipid type i, which can be calculated from MD. The drag force can be expressed in terms of the chemical potential (and thus the free energy) as where the δ denotes a variational derivative. Combining these expressions for the velocity field components into Eq. (9) thus gives the DDFT equation It should be noted that the final term, which encapsulates the fluctuations, will maintain conservation of the lipids by construction. These fluctuation effects are more commonly included as an additive term, which then requires a modification to the correlation properties to maintain conservation of the density [40]. Evaluating the variational derivative of the free energy with respect to n i yields Finally, the coordinates associated with the protein beads will evolve in accordance with the energetic contributions f p and f pp . These equations of motion can be generalized to include the missing degrees of freedom associated with fluctuations in the cytoplasm through a set of Langevin equations: where θ k , like ξ i , is a vector Wiener process with zero mean and the delta-correlated variance θ k,i (t )θ k, j (t ) = 2k B T γ i δ i j δ(t − t ). The relaxation rate is obtained from the fluctuation-dissipation theorem as γ k = k B T /m k D k , where D k is the diffusivity of the protein, which can be calculated from MD through the appropriate Green-Kubo relation [41,42].

C. Membrane Deformations
It should be noted that an obvious extension of the model is to allow for membrane deformations, which can be done through adding the so-called Helfrich Hamiltonian to the free energy [43]. For a single component system, the associated Helfrich energy density is given by where κ is the bulk modulus, z = h(x, y, t ) is the mean deformation field of the membrane, and R −1 is the spontaneous curvature of the membrane that represents the equilibrium curvature measured from cylindrical micelles (typical spontaneous curvatures can range up to |R −1 | ∼ 20 nm −1 ) [9,10]. Note that ∇ 2 h is twice the linearized mean curvature of the membrane and is thus the appropriate measure of deviation from the spontaneous curvature. Unfortunately, it is currently TABLE I. Definitions of lipid types used in the DDFT simulations along with their corresponding concentrations and self-diffusion coefficients (D i ) within the inner and outer leaflets. In the Martini CG representation, 4 to 1 heavy atoms per CG bead, each lipid tail represent more than one physical lipid tail length [e.g., a P CG tail maps to both a stearoyl (C18) and a palmitoyl (C16)].
unknown how to generalize Eq. (15) to multicomponent systems [44]. Two popular suggestions are given by the local summation over curvatures and the global summation over curvatures In each representation, the lipid concentrations are given by c i = n i / j n j , and R i is the spontaneous curvature of a given lipid species. The situation is further complicated when accounting for a composition-dependent bulk modulus as well as the inner and outer leaflets of the lipid bilayer, which necessitates a leaflet dependence of each spontaneous curvature and the potential for multiple deformation fields. For these reasons and the fact that the underlying MD simulations in this study only exhibit relative membrane deformations of a few percent, we will ignore curvature affects and assume the cell membrane to be a two-dimensional surface. Additionally, supported bilayers, commonly used for membrane experiments, provide a relevant example of membranes that can reach length scales of many microns while remaining essentially flat. For systems in which deformations are appreciable, further research is required for the necessary multi-component generalization.

III. SIMULATION RESULTS
We turn our general formalism to the particular application of modeling the lipid interaction and aggregation of RAS proteins on the inner leaflet of a plasma membrane (PM) mimic. In its mutated state, RAS has been implicated in ∼30% of human cancers (95% of pancreatic cancers, 45% colorectal cancers, etc.) [45,46], and a better understanding of RAS-lipid interactions and aggregate formation in oncogenic signaling pathways is sorely needed for this often-labeled "undruggable target" [47].
To model this system of interest, we take KRAS4b proteins (a frequently mutated RAS isoform in human cancers [48,49]) inserted on the inner leaflet of a PM mimic. In particular, this lipid bilayer consists of an asymmetric seven-lipid mixture tuned to represent average PM bulk lipid properties, based on the eight-component mixture defined in [50] without the inner leaflet PIP2 lipid. The outer leaflet contains POPC, PAPC, POPE, DIPE, DPSM, and CHOL with more of the saturated/ordered lipids and cholesterol, whereas the inner leaflet also contains PAPS (a negatively charged lipid), less cholesterol, and more unsaturated lipids [50]. A table of the lipid definitions, their concentrations and their diffusivities on the inner and outer leaflets are given in Table I.
The primary inputs to the DDFT model are then the 91 lipid-lipid RDFs and seven inner leaflet lipid-protein RDFs obtained from MD simulations, which are in turn converted to DCFs and PMFs using Eqs. (3), (6), and (7), and the 13 selfdiffusion coefficients for the lipids on each leaflet. To obtain these inputs, a Martini MD simulation [6] of a 30 × 30 nm 2 bilayer with one protein was run for 32 μs using similar MD parameters, setup, and input parameter extraction as in [33]. The left panel of Fig. 2 shows a subset of these DCFs and PMFs for the model.
As a test of model validity, we seek to compare the RDFs obtained from MD with RDFs predicted from the DDFT simulations. To compute lipid-protein RDFs using the DDFT model, a protein was fixed in space, while time-averaged densities of the lipids were calculated as a function of distance from the protein. To introduce a similar point source for the computation of the lipid-lipid RDFs, an external density with a narrow Gaussian profile corresponding to the area of one  Fig. 2) compared to MD (black) used to parameterize the model. Though the largest disagreement between RDFs is seen at small r, neighbor counts within a radius of r = 1.5 nm show that relative errors for this measure are at most ∼2%.
lipid molecule was added to the system. This source was then assigned the appropriate DCF as time-averaged densities were once again calculated. Examples of true (MD) and predicted (DDFT) RDFs are shown in Figs. 3 and 4, where strong agreement can be seen. Absolute errors between lipid-protein RDFs were observed to be small, and while larger disagreements for lipid-lipid RDFs were observed at small r, neighbor counts in this region revealed relative errors that were at most ∼2%. It should also be noted that these larger errors appear to be due to both the softened interactions in the DDFT model Displacements of the x and y components, which are sampled 2 μs apart, are the red and blue curves, respectively. The distribution associated with D k = 23.4 μm 2 /s, which is the input diffusivity to the Langevin model, is the dashed curve. A numerical fit to the simulation data is shown in purple, which corresponds to a diffusivity of D k = 19.6 μm 2 /s. As expected, we see that the simulated displacement is reduced due to lipid-protein interactions.
as well as the poor sampling of MD at that small-r scale. Similar agreement was found in the remaining cases but were to extensive to present in this work.
Furthermore, we measure the diffusivity of a single protein on the inner leaflet. The input diffusivity used in the Langevin model for the protein is set to D k = 23.4 μm 2 /s; however, the interactions with the lipids will alter this value. Probability distributions of the protein displacement are shown in Fig. 5, where the x and y displacements are sampled 2 μs apart. Additionally, the distribution associated with D k = 23.4 μm 2 /s is compared with a numerical fit to the data. As expected, we see that the actual displacement corresponds to a slightly slower diffusion coefficient (19.6 μm 2 /s), due to lipid-protein interactions. It should be noted that the diffusion rates obtained at the MD scale are highly susceptible to finite size effects from the simulation [51][52][53]. Additionally, diffusion will be artificially high (roughly a factor of 4) due to the smoother potentials associated with the course-grained Martini force field [6]. Rather than make these corrections to the diffusion rates, we were careful in enforcing all MD simulations to be roughly the same size, so that all of our DDFT parameters (e.g., rates and populations) would be self-consistent. For a more thorough discussion of how the RAS diffusion rates were calculated, see Sec. 2.7 of the Supplemental Material in Ref. [33].
To demonstrate this capability, we simulate a region of the cellular membrane with 20 RAS proteins, shown in Fig. 6. In particular, the RAS proteins can be seen to form in clusters despite the simplistic protein-protein interactions, which are taken to be purely repulsive in these simulations. This clustering is therefore due to an implicit attraction induced by the lipid-protein and lipid-lipid interactions and is believed to be a lipid-dependent mechanism for the colocalization of RAS, which amplifies downstream signaling, making it a target for modulating excessively active oncogenic RAS [33]. We can further investigate the possibility of protein aggregation through perturbations of the lipid-protein interactions. In particular, we choose to perturb the PAPS-RAS interaction by augmenting the potential with a term of the form −αTre −1.5r , where r is the lipid-protein distance, and α is a positive, dimensionless parameter quantifying the magnitude of the perturbation. This added contribution effectively models proteins with different levels of affinity for PAPS. Examples of the perturbed and unperturbed PAPS-RAS interactions are shown in the top panel of Fig. 7. The simulations with 20 RAS proteins in the 30 × 30 nm 2 domain were repeated with the modified PAPS-RAS interaction, and as can be seen in Fig. 8, the system exhibits a transition towards protein aggregation as the perturbation parameter α is increased. The bottom panel of Fig. 7 shows the RAS-RAS neighbor density distribution for different α and confirms the visually observed aggregation of Fig. 8.
The transition to protein aggregation can be quantified by calculating the number of protein neighbors a given protein has within a 4 nm radius (roughly the first neighbor shell) and taking the ensemble average over the simulation. Alternatively, we can calculate the height of first peak in the neighboring protein density distribution. As shown in Fig. 9, both the protein neighbor count (left panel) and the neighbor density peak (right panel) experience a sharp increase as the perturbation parameter α is varied, with an inflection point observed at roughly α∼10, which corresponds to island formation and aggregation of proteins as seen in Fig. 8. While this transition appears to be continuous, it should be noted that the system is finite and only includes 20 proteins. We expect that increasing the domain would lead to more discontinuous changes in these neighbor counts as α is varied, consistent with a true phase transition.

IV. CONCLUSIONS
In summary, we have shown that DDFT provides an ideal framework for modeling multicomponent, cellular membranes as a continuum by incorporating the underlying physics at the molecular scale in a rigorous and self-consistent way. Once DCFs and PMFs are constructed from MD (or MC) simulations, the DDFT model can be used to access time and length scales many orders of magnitude beyond those of MD simulations with a relatively low sacrifice to accuracy.
An application of this model is considered in which the aggregation of RAS proteins is explored, a potential mechanism for the signaling pathway of cancer growth. Due to the computational efficiency of the DDFT model, the parameter space associated with the empirical protein interactions could be explored with relative ease, and a phase transition associated with the strength of the PAPS-RAS interaction was observed.
In future studies, more accurate protein interactions will be included in addition to effects of membrane deformations. Furthermore, the model will be extended to account for systems in which the RDFs evolve in an appreciable way, necessitating a feedback loop between the continuum and molecular scales.