An edge-based formulation of elastic network models

We present an edge-based framework for the study of geometric elastic network models to model mechanical interactions in physical systems. We use a formulation in the edge space, instead of the usual node-centric approach, to characterise edge fluctuations of geometric networks defined in d- dimensional space and define the edge mechanical embeddedness, an edge mechanical susceptibility measuring the force felt on each edge given a force applied on the whole system. We further show that this formulation can be directly related to the infinitesimal rigidity of the network, which additionally permits three- and four-centre forces to be included in the network description. We exemplify the approach in protein systems, at both the residue and atomistic levels of description.


I. INTRODUCTION
Elastic network models (ENMs) are ubiquitous in physics and have been applied to describe properties of a wide variety of structures including glasses [1? ], biological tissue [2], supercooled liquids [3] and, recently, the design of allosteric materials [4]. A particularly useful application of ENMs, sparked by the seminal work of Tirion [5], has been in the study of protein structures, as the use of molecular dynamics (MD) simulations on biologically relevant timescales remains challenging. The principal assumption of ENMs is that we may approximate the bottom of the potential energy well of a structure by a quadratic function, by taking the Taylor series of the potential energy with respect to node displacements about the minimum r 0 . In elastic models, the forces f are thus linear in the displacements r, i.e., f = H(r 0 ) (r − r 0 ), where H(r 0 ) is the Hessian matrix obtained by differentiating twice the potential function. Typically, the analysis of (infinitesimal) motions involves diagonalisation of H to determine the normal modes of the protein. Whilst real potential energy surfaces of proteins are complex, highly nonlinear and containing many minima [6], elastic models have been surprisingly effective for the analysis of slow equilibrium motions of proteins [7,8]. Another common use of ENMs is for the calculation of node fluctuations, which have shown good agreement with crystallographic B-factors [9,10].
The focus of ENMs has thus typically been on the node variables. Here, we present an edge based formulation of ENMs, which instead puts the emphasis on the interactions between the nodes, which in a mechanical framework corresponds to extensions (or compressions) of the 'springs' associated with the edges. More formally, the edge changes are the dual of the node motions [11]. An edge-centric approach has proved highly effective in previous studies of different networks [12][13][14], and indeed the formulation presented need not be restricted to proteins * m.hodges14@imperial.ac.uk † m.barahona@imperial.ac.uk and is general for networks embedded in d dimensional space. There has been extended discussion in the literature over the use of networks with scalar node variables to model 2-and 3-dimensional mechanical structures [15]. By instead working in edge-space, we avoid this issue altogether since the scalar edge variables, which represent changes in edge length, appear naturally in the theory regardless of the dimensionality of the geometric structure. Historically, the Born-Huang model [16] has often formed the basis for the study of lattice structures but its weakness in handling disordered materials like glasses has been highlighted in the context of rigidity percolation [17] and more recently by Zaccone and Scossa-Romano [18], who extended the Born model to include non-affine responses to external stresses. In many systems such as proteins, function is often driven by changes in structure, but crucially it is the relative change in node positions that is of interest. We thus show how to obtain edge fluctuations in elastic network models and compute the edge mechanical embeddedness as a useful property of the system. Finally, we show how this formulation naturally connects with the rigidity properties of the network, viewed as a set of edge constraints. We showcase the approach with specific protein examples.

A. Edge-based Formulation of Geometric Elastic Network Models
Consider a network of N nodes, associated with points in d-dimensional space r i,0 ∈ R d , and with E interactions between nodes (due to, e.g., physico-chemical potentials). Let us denote the (small) node displacements as the ddimensional vectors Each edge has an associated scalar variable e α ∈ R, α = 1, . . . , E, which measures its extension, i.e., its change in length. The node and edge variables are related directly FIG. 1. The extension of the spring can be written in terms of the displacements of the nodes [11] through the N d × E geometric incidence matrix, B: where u T = (u T 1 , . . . , u T N ) is the N d-dimensional vector compiling the node displacements, and e is the Edimensional vector of edge extensions.
To obtain the form of the geometric incidence matrix, note that each column of B is associated with an edge. Assuming small node displacements, it can be easily shown ( Fig. 1 and Appendix A) that the extension of the edge α = (ij) between nodes i and j induced by the node displacements u (to order O(|u| 2 )) is given by where r ij = r ij,0 /|r ij,0 | =: r α is the d-dimensional unit vector along the direction of edge α = (ij). Each row of B T is a vector B T α that follows from (2)-(3): to form the geometric incidence matrix: B = B 1 · · · B E . Note that the matrix B is akin to the standard N × E incidence matrix B in graph theory [13] but it includes full directional information through the d-dimensional edge unit vectors. Invoking a mechanical description, we can use Hooke's Law and Newton's Third Law to obtain the usual linear relationship between input forces on the nodes f nodes and the induced node displacements u: where f nodes is the N d × 1 vector compiling the external forces on the nodes and K is the N d×N d stiffness matrix with G = diag(g α ) denoting the E ×E diagonal matrix of spring constants. The stiffness matrix is thus the Hessian of the system-indeed this is the only form the Hessian can take [15]. Using our formulation, we can study the input-output properties of the system in terms of edge variables, i.e., the edge extensions e out induced by external forces f in applied to the edges. Let us consider external forces applied along the edges, which we compile in an E×1 vector f in . These edge forces result in edge compressions and stretches that induce forces on the nodes given by We wish to disregard any components of the induced forces linked to rigid motions of the elastic network, since such motions do not produce edge extensions (e out = 0). This can be achieved naturally by considering the pseudoinverse of the stiffness matrix. The induced non-rigid displacements are given by where K + is the Moore-Penrose pseudo-inverse of K, and the edge extensions induced by the applied edge forces are given by: For the input force f in , the output vector e out records the induced change in length of all the edges in the network. The meaning of the E × E matrix T is clear: given a unit force (input) applied along edge α, the induced (output) extension at edge β is the corresponding entry of T: As a consequence, the induced extension at the input edge i is given by the diagonal element T αα , which, depending on the location of the spring within the network, might not necessarily be the same as if the spring was isolated. This is the mechanical analogue of the effective resistance in electrical networks [13,19], also known as the resistance distance [20], yet, in our case, it is both the connectivity and the geometry of the network in ddimensional space that determines edge responses. We exploit this concept in the following section through the definition of the edge mechanical embeddedness.

B. Edge Fluctuations and Mechanical Embeddedness
One application of the model is to identify residueresidue interactions within a protein that exhibit the highest edge fluctuations. To see this, consider the Langevin equation of a 3-dimensional elastic network (d = 3) representing protein residues undergoing dynamical motion in a heat bath modelling the aqueous environment: where M is a diagonal mass matrix, Γ is the diagonal damping matrix and η(t) is a vector of i.i.d. Gaussian noises. The damping terms arise from interactions of the protein with water and itself, and are typically large. Hence We consider the overdamped limit, where we may neglect inertial terms. Although larger damping is sometimes set for residues located deeper inside the structure [? ], for simplicity we set all damping values to be equal and we renormalise time to obtain: which has the general solution where the residue position r(t) is now a random variable.
We are again interested in the random fluctuations of the edge extensions (2). Utilising our geometric incidence matrix, one can show that the covariance matrix of the edge fluctuations is given by: In a number of papers, authors construct networks from residue-residue interactions and identify significant residues using measures of centrality, such as edge or node betweenness [21][22][23]. However, it is not clear what the physical significance of such measures is. In contrast, the mean edge fluctuations are related to a graph theoretical measure called edge embeddedness, first introduced in [13] in the context of random walks on networks and resistor networks. We may then define the equivalent mechanical embeddedness for edge α in a geometric elastic network in d-dimensions as: The mechanical embeddedness has a clearer physical meaning: the second term is the fraction of the input force applied to edge α that edge α actually feels. If an edge feels all the force applied to it, it is not well "embedded' within the network and has a low value of ε (i.e., it is not strongly coupled to the rest of the network and does not dissipate its fluctuations into the network). Conversely, edges that are more "embedded" within the network structure feel a lower force, dissipate fluctuations into the rest of the network, and have a larger ε score nearer to 1.

C. Connection to Infinitesimal Rigidity
There is also a straightforward relationship between the geometric incidence matrix B and the classicrigidity matrix R of the structure, given in Eq. D5 [24]. In Appendix D, we show that: where D is the E × E diagonal matrix containing the interaction distances. The rigidity matrix can be used to determine the rigid parts of the elastic network structure (i.e., those that allow no internal motion) and the flexible parts via the concept of infinitesimal rigidity. (The distinction between rigidity and infinitesimal rigidity is discussed in depth in Ref. [24], but here we consider only generic structures and so the two terms are equivalent.) The rigidity matrix of a three dimensional structure possesses six zero eigenvalues corresponding to three translations and three rotations, but may have additional zero eigenvalues associated with motions of the structure that lead to no change in the potential energy of springs in the network. In Appendix E, we summarise an infinitesimal rigidity algorithm developed in Ref. [25] that uses the set of eigenvectors associated with such additional zero eigenvalues (if they exist) to cluster the structure into rigid clusters. At the cost of longer running times, this infinitesimal rigidity algorithm allows greater flexibility in the choice of constraints than the popular rigid cluster decomposition based on FIRST [26,27], which is computationally efficient, yet it imposes the presence of angle constraints in the network structure. . In some systems, such as chemical bonds within molecules, we do in fact have additional constraints on the angles between edges. Indeed, inclusion of threecentre interactions in the simulation of polymer glasses has been shown to be important for the interpretation of Raman scattering spectra [28]. We therefore consider three center interactions (and indeed four center interactions, corresponding to dihedral angles). Given three nodes i, j, k with edges (ij) and (jk), we compute the change in length of edge (ik) with the constraints that the other two edges are held constant: |r ij | 2 = |r ij,0 | 2 and |r jk | 2 = |r jk,0 | 2 . Expanding these equations and substituting into the expression for the extension of edge (ik), which is opposite to node j, we obtain (see Appendix B): Note that the three-centre extension (15) relative to the 'angle' at node j is not the same as if a two-center Hooke spring was placed between nodes i and k. From expressions of the form (15), we can construct the three-centre stiffness matrix K angle . Using a similar procedure, we also find the expression for the linear changes of a four-center interaction, by keeping the three two-center and two three-center interactions constant. Such changes lead to the four-centre stiffness matrix K dihedral . See Appendix C.
The total stiffness matrix is then the sum of the stiffness matrices: K total = K bond +K angle +K dihedral , where K bond is the two-centre matrix given in (6). The extensions e out induced by input forces f in follow the same form as in (7): Below we study the effect of the different components of the stiffness matrix in the input-output properties of the system.

III. APPLICATIONS
A. A mechanical model of protein-ligand binding at the atomistic level Allostery is a biological process whereby the binding of a ligand to a protein leads to a functional change at a distant site (often the active site) of the protein [29,30]. A common explanation for allostery is that ligand binding leads to a propagation of strain across the protein structure, potentially along specific residue pathways, causing a structural change at the active site.
Here, we study this process using an atomistic elastic network model of a protein bound to an allosteric ligand. First, we measure the elastic response elicited across the protein by the application of unit forces to all weak interactions between the ligand and the protein allosteric site with negative forces corresponding to compressions of the source interactions and positive forces corresponding to extensions (although the overall sign is arbitrary). Furthermore, we apply infinitesimal rigidity analysis (Appendix E) to obtain the rigid clusters within the protein to elucidate the propagation of the strain. Since strain cannot propagate through floppy regions, we expect both the allosteric site and active sites to be within the same rigid cluster if strain is to pass from one site to the other efficiently.
Atomistic graphs are constructed from PDB files containing full 3D atomic data of protein structures, and the software FIRST [27] to determine the presence of the various bond types (covalent, hydrogen and hydrophobic interactions). We assign values to the spring constants of the edges with the correct order of magnitude, as per the Amber15fb force field [31] (Table I). We do not use exact values for each interaction since it is difficult to assign spring constants to hydrogen bonds and hydrophobic interactions from force fields used in molecular dynamics. In such fields, hydrophobic interactions emerge from the presence of implicit or explicit water that favours interactions to polar regions of the protein, whereas hydrogen bonds are derived from electrostatic contributions.  We obtain the output extension for all edges in the protein in response to inputs at source edges given by the interactions with the ligand, and exemplify the results through the allosteric protein PDK1 (PDB code: 3ORZ). In Fig. 2 we show the top 2% of bonds by absolute length change (i.e., we do not discriminate between bond stretching or compression) in three scenarios: (i) where only the two-centre bond interactions are used to construct the elastic network, as is traditionally the case with elastic networks of proteins; (ii) where angle constraints between pairs of covalent bonds are included; and (ii) where dihedral angle constraints from double bonds are also modelled. Given that the highest scoring interaction in the bonds-only network (i) (the hydrophobic interaction between Lys120 and Asn122) exhibits an extension of 0.766, we choose to represent the top 2%, which exhibit changes above 0.01, as a reasonable cutoff. The most stretched edges are all located in the area connecting the allosteric and active sites. Furthermore, the infinitesimal rigidity analysis (Fig. 3) shows that, even for the 2-centre stifness matrix, the allosteric site and the region around the active site (Val96, Lys111, Tyr161, Ala162, Thr222, Asp223) all appear in a rigid cluster, with Leu88 the only active site residue that has no atoms within the rigid cluster. When 3-centre and 4-centre constraints are included, the protein becomes strongly, due to the qualitative nature of the infinitesimal rigidity condition. It appears then plausible that propagation of strain may be emitted from binding at the allosteric site towards the active site, particularly through the rigid cluster formed by the 2-centre interactions that contains a smaller subset of the atoms in the protein.
The results of the elastic response show that the e out decrease exponentially with distance (correlation coefficient = -0.603), even when angles and dihedrals are included (see Fig. 8 in the Appendix). Such a response is similar to random networks [4] and is not suggestive of a structure exclusively optimised for directed pertur- bations. Indeed, the two largest extensions are found in the Lys120-Asn122 (0.766) and Val124-Pro125 (0.437) hydrophobic interactions, which are within 5 Å of the allosteric source site, whereas the active site is around 17 Å away. The highest scoring interactions involving active site residues are two Lys111-Phe157 hydrophobic interactions, which have extensions of 0.0122, and rank 130 th and 131 st . Of the top 2% interactions (149 out of a total of 7391 edges) by output extensions, all but 4 are hydrophobic interactions. This is unsurprising given they have the weakest spring constants, but appears to lead to those weak interactions near the allosteric site effectively acting like a sponge, absorbing the shock of input forces and preventing long-range transfer of displacement. If we change the force constant of the hydrophobic interactions to 10 (the same as the hydrogen bonds), the range of the propagation increases. However, it is difficult to rationally assign such spring constant values to the hydrophobic interactions, and there does not appear to be strong evidence that the allosteric effect exhibited by PDK1 is mediated by traversal of strain energy. We have performed the elastic response analysis on a further two proteins (h-Ras, ATCase) with similar results. Hence our examples indicate that topological notions alone (such as rigidity) do not fully determine if a mechanical explanation for allostery is plausible, as the particular values of the edge spring constants are also crucial.

B. Fluctuations of residue-residue interactions
We applied our edge-based geometric formulation to a residue-residue interaction network (RRIN), i.e., an elastic network model of a protein at the residue level. We constructed several RRINs for ADK (4AKE [33]) using different distance cutoffs (7Å, 10Å, 12Å, 15Å) and ob-tained the average displacement for each of the edge interactions (12). To decide on the appropriate cutoff, we computed Spearman's correlation coefficient (ρ) of the resulting extensions across the RRINs created with different cut-offs and found greater robustness for larger cutoff values: ρ = 0.216 between the 7Å and 10Å RRINs; increasing to ρ = 0.679, between the 10Å and 12Å RRINs; and increasing further to ρ = 0.801 between the RRINs created with 12Å and 15Å cutoffs. (Below 7Å, zero energy modes appear in the network as revealed by singular value decomposition of the rigidity matrix R.) We thus use a cutoff of 12Å here and a single force constant (arbitrarily set to 1), in line with other reports in the literature [34].   [35] from dynamic data, i.e., by comparing residue displacements across an NMR ensemble of structures to calculate local strain. Note that here, however, just a single structure is used and strain is predicted a priori, emphasising the fact that the intrinsic topology of the protein determines where strain is distributed to assist function. The highest scoring interaction with 0.701 is Gly56-Lys57; Lys57 is one of the residues that shifts more than 10Å during the open-to-closed transition [36] whilst Gly56 has been shown to display particularly high fluctuations in coarsegrained MD simulations at the residue level [37]. Since we follow standard convention and use the same force constant for all residue-residue interactions (arbitrarily set to 1), by Eq. (13) (with G equal to the identity matrix) we can see that the interactions with the greatest average extension are also those with the lowest mechanical embeddedness, demonstrating a conceptual link to the network theory interpretation of protein structure.

IV. CONCLUSIONS
In conclusion, we have presented here a framework for the study of geometric elastic network models in d-dimensional space through an alternative formulation in the edge space. The edge space is often more natural than the dual node space, as it allows the direct description of interactions and constraints, with their associated energies (or costs). By conveniently working with the internal coordinates of the network, there is no need to consider rigid motions or to arbitrarily "pin" nodes. In many systems, such as proteins, it is changes in the interactions that are of interest rather than the nodes themselves, and optimization problems involving edge variables are more naturally dealt with using an edge-based framework.

V. ACKNOWLEDGEMENTS
This work was funded by an EPSRC Centre for Doctoral Training PhD Studentship from the Institute of Chemical Biology (Imperial College London) awarded to MH. MB acknowledges funding from the EPSRC project EP/N014529/1 supporting the EPSRC Centre for Mathematics of Precision Healthcare. We thank Joao Costa for his help and insight with the infinitesimal rigidity algorithm developed in his PhD thesis.
Consider the extension of an edge written in terms of the displacements of its associated nodes: We first expand |r ij,0 |: then expand |r ij |: Using Eq.(A4), we substitute terms: We can complete the square: and make a linear approximation by dropping nonlinear terms and square rooting both sides: By referring to Eq.(A3) we now have an expression for the change in spring length in terms of the node displacements: which can be expressed in vector form as: Here we have shown the expression for an isolated spring, For a spring in a network, the elements of the vectors relating to nodes not involved with the spring would be zero so that our geometric incidence matrix B has rows of the form (4). We wish to find an expression for the change in length of the distance i − k e ik = |r ik | − |r ik,0 | (B1) in terms of the node displacements u i , u j and u k under the assumption that the edge distances are fixed (Fig. 5).
The initial distance, using the cosine rule, is: and likewise the distance after the perturbation of the three nodes is: We apply the constraints |r ij | = |r ij,0 | and |r jk,0 | = |r jk,0 | as we are interested only in the change in angle, not in any two centre changes. We can then substitute terms from (B2) into (B3): We can rewrite this expression as: so that we can complete the square: By ignoring nonlinear terms and square rooting both sides can then write our extension from Eq.(B1) in terms of the initial and final angles: However, we wish to derive the extension in terms of node displacements (in Cartesian coordinates) and so substitute using the definition of the dot product: where we have again used the fact that the two bonds have not changed length. We now define the displacements of the nodes in terms of the bond vectors before and after perturbation: We now expand out Eq.(B5): Substituting terms from (B6), we drop the nonlinear terms in the second line of (B7) to get: Now we again drop nonlinear terms that result from the expansion of each of the dot products to give: which can be written more compactly as: or, in vector form, as: As for the two-centre case, each row of B angle , the geometric incidence matrix for the three-center interactions, has the form (B9) but with zeros in the entries relating to nodes not involved in the corresponding interaction.
The stiffness matrix for the three centre interactions can then be constructed similarly: with G angle the diagonal matrix containing the spring constants for the three-centre ('angle') interactions.

Appendix D: Definition of the rigidity matrix
Here we summarise standard calculations covered in Refs. [24,25,38].
In this Appendix D and the subsequent Appendix E, we adopt the usual notation in rigidity theory, where the positions of the points are represented as p i , and the system is defined by a set M of M distance constraints. Note that in the rest of the paper, p i are denoted as r i , and the M constraints correspond to the E edges of the graph encapsulating the interactions.
The rigidity problem for N points p i ∈ R d , i = 1, . . . , N with a set M of distance constraints c α , α = 1, . . . , M is given explicitly by the following set of M equations: where p i is the 3 × 1 position vector of node i. Solving this set of M nonlinear equations directly is usually infeasible for anything but very small systems. An alternative approach is Infinitesimal Rigidity, which considers infinitesimal violations of the equilibrium conditions of (D1). Taking the derivative of both sides of (D1) with respect to time t for all constrained pairs, we get: with u i = dp i /dt. We then expand out the brackets: and rewrite in vector form: The M × N d matrix R is called the rigidity matrix and each row represents a single constraint. For example, a three node system with each pair of nodes joined by an edge would have the rigidity matrix [25]: The infinitesimal rigidity properties follow from examining the nullspace of R. Hence these properties are an intrinsic property of the structure, and are independent of the environment or the friction terms. From (D5), it follows immediately that the geometric incidence matrix is a scaled version of R.
Appendix E: Algorithm for rigid cluster decomposition using infinitesimal rigidity The following algorithm was introduced in [25] and is summarised here for completeness. We use it to obtain the rigid clusters shown in Fig. 3. Rigid cluster decomposition algorithm using infinitesimal rigidity [25]. For each trivial infinitesimal motion, such as the one in (a), the atoms are moved by a small distance along each 3N × 1 vector to a new position b). A rigid tetrahedron of atoms is selected in the new position then in c) this is moved back to its original position. Any atoms that also return to their original position at the same time (for all infinitesimal motions) are part of the same cluster. d) The process is repeated until all atoms are clustered into rigid regions or are assigned as floppy.
The steps of the algorithm are as follows (see Fig. 7):