Finite-size scaling analysis of protein droplet formation

The formation of biomolecular condensates inside cells often involve intrinsically disordered proteins (IDPs), and several of these IDPs are also capable of forming droplet-like dense assemblies on their own, through liquid-liquid phase separation. When modelings thermodynamic phase changes, it is well-known that finite-size scaling analysis can be a valuable tool. However, to our knowledge, this approach has not been applied before to the computationally challenging problem of modeling sequence-dependent biomolecular phase separation. Here, we implement finite-size scaling methods to investigate the phase behavior of two 10-bead sequences in a continuous hydrophobic/polar protein model. Combined with reversible explicit-chain Monte Carlo simulations of these sequences, finite-size scaling analysis turns out to be both feasible and rewarding, despite relying on theoretical results for asymptotically large systems. While both sequences form dense clusters at low temperature, this analysis shows that only one of them undergoes liquid-liquid phase separation. Furthermore, the transition temperature at which droplet formation sets in, is observed to converge slowly with system size, so that even for our largest systems the transition is shifted by about 8%. Using finite-size scaling analysis, this shift can be estimated and corrected for.


I. INTRODUCTION
Advances over the past decade have shown that, in addition to classical membrane-bound organelles, various membraneless liquid-like droplets of proteins and nucleic acids can be found within living cells [1,2]. The droplets form through a liquid-liquid phase separation (LLPS) process, also called coacervation, in which intrinsically disordered proteins (IDPs) often play a key role. Furthermore, it has been demonstrated in vitro that several of these IDPs are able to phase separate on their own [3][4][5], depending on the solution conditions.
Phase-separating IDPs can be rich in charged residues [3], but can also be dominated by polar and aromatic residues [5].
To rationalize the phase behavior of IDPs and its dependence on solution conditions, a variety of theoretical/computational methods have been employed. A widely used method is Flory-Huggins mean field theory [6,7], and its extension to polyelectrolytes by Voorn and Overbeek [8]. However, this approach is sensitive only to the overall composition of amino acids, but not to their ordering along the chains. One way to overcome this short-coming without resorting to explicit-chain simulation is offered by the random phase approximation [9], which has been applied to model the phase-separating ability of IDPs with different charge patterns [10].
By turning to molecular simulation with explicit chains, key approximations made in the above approaches can be removed. In addition, structural properties become readily accessible. Therefore, despite being computationally costly, recent years have seen a growing number of explicit-chain simulation studies of biomolecular LLPS [11][12][13][14][15][16][17][18]. In particular, there have been simulations based on coarse-grained lattice or continuous representations to elucidate sequence determinants of phase-separating IDPs [11][12][13][14].
Another approach, recently applied for the first time to biomolecular LLPS [19,20], is to rewrite the original polymer problem as a statistical field theory problem that can be investigated by field-theory simulation. This approach has the potential to open for studies of system sizes that are inaccessible with explicit-chain simulation.
Yet, regardless of whether explicit-chain or field-theory methods are used, the simulated systems are finite and as such there is a need to understand how measured properties depend on system size. Fortunately, tools for this purpose are available, in the form of finite-size scaling theory for droplet formation by phase separation [21][22][23][24]. These tools have previously been applied to analyze droplet formation in simpler systems such as the lattice gas and the Lennard-Jones fluid [24][25][26], but we are not aware of any prior study of biomolecular LLPS using these ideas.
In this paper, we implement finite-size scaling methods to assess the phase behavior of two short model proteins, which provide an instructive testbed for the analysis methods. While several previous computational studies of IDP phase separation focused on the role of chargecharge interactions, we here consider a hydrophobic/polar (H/P) protein model. One of the sequences we study, called A, is alternating (HPHPHPHPHP), whereas the other, called B, has a block structure (HHHHHPPPPP). Using Monte Carlo (MC) methods, we perform canonical simulations of these sequences for a range of system sizes, with up to 640 chains.
Both sequences form dense multi-chain assemblies surrounded by a dilute background at low temperatures, while only small clusters are present at high temperatures. However, the sequences differ in phase behavior. We show that their phase behavior can be assessed in a systematic fashion by finite-size scaling analysis of the simulation data. This analysis demonstrates that one of the sequences, A, undergoes LLPS, whereas the other, B, does not.

A. Biophysical model
We study systems consisting of N copies of a polypeptide enclosed in a periodic cubic box with volume V . The polypeptide is represented as a string of n hydrophobic (H) or polar (P) beads. The length of the bond between two consecutive beads, b, is kept fixed, while the polar and azimuthal bond angles are both free to vary. In the absence of interactions, the bonds have a spherically uniform distribution.
The beads interact through a pairwise additive potential, E = i<j E ij , where the sum runs over all intra-and intermolecular pairs of beads in the system. All beads have a diameter of r ev = 0.75b. When two beads i and j are at a distance r ij < r ev from each other, the pair potential E ij becomes infinite. Additionally, each HH pair interacts through a soft attractive potential with interaction range r hp = 2b. If r ev < r ij < r hp , the interaction energy is set to −ǫ (with ǫ > 0). Thus, the pair potential can be summarized as where u ij = −ǫ when beads i and j are both hydrophobic, and u ij = 0 otherwise. Throughout this article, lengths and energies are given in units of b and ǫ, respectively.

B. MC simulations
We investigate the thermodynamics of droplet formation in this model by using MC methods to generate samples from the canonical (NV T ) ensemble. Of particular interest is the behavior at the onset of droplet formation. Therefore, given N and V , the temperature T is chosen close to the maximum of the heat capacity, by an iterative procedure.
Simulations at one or several additional temperatures are performed when needed to ensure an accurate description of the heat capacity throughout the transition region. The temperature-dependence of the heat capacity is computed by reweighting techniques [27], using data from all simulated temperatures.
The efficiency of MC simulations depends strongly on the choice of move set. We use a set of six elementary moves. Two of the moves update the internal structure of individual chains.
The first of these is a single-bead move, which turns a randomly selected non-end bead about the axis through its two nearest neighbors. The second one is a pivot-type rotation, where the rotation axis goes through a randomly selected non-end bead in a random direction.
Beads on one side of the selected one are turned as a rigid body about this axis.
The other four moves are rigid-body translations and rotations of either a single chain or a cluster of chains. In the cluster moves, the clusters are constructed probabilistically using a Swendsen-Wang-type algorithm [28,29]. The construction is recursive and begins by picking a random first cluster member, i. Then, each chain j that has an interaction energy E ij < 0 with chain i is added to the cluster with probability p ij = 1 − e βE ij , where β = 1/k B T is inverse temperature (k B is Boltzmann's constant). This step is iterated until the list of potential further cluster members is empty. Finally, the resulting cluster is subject to a trial rigid-body move. The form of the statistical weight p ij is such that no Metropolis accept/reject criterion is needed; any sterically allowed move is accepted. Multiple runs are used for the largest systems, to ensure statistical significance. Statistical uncertainties are computed using a jackknife procedure [30].
C. Finite-size scaling theory Droplet formation by phase separation in finite systems is a topic that has been extensively studied over the years [21][22][23][24]. This body of research provides a general framework for finite-size scaling analysis, which has been tested on systems such as the lattice gas and the Lennard-Jones fluid [24][25][26]. This section outlines some key results that will be used in Sec. III.
We consider a d-dimensional system of N particles in a volume V at temperatures T below an assumed critical temperature T c . A schematic phase diagram can be found in Fig. 1. Under grand canonical conditions, for a given T < T c and large system size, the system can be in one of two bulk phases with respectively low (ρ L ) and high (ρ H ) density, depending on the chemical potential. At some value of the chemical potential, a first-order transition occurs between these phases. Under canonical conditions, for T < T c and densities ρ such that ρ L (T ) < ρ < ρ H (T ), the system is in a mixed two-phase regime, bounded by the binodal curve, T b (ρ) (Fig. 1).
Consider now a finite but large system under canonical conditions, for a given T < T c and ρ just above ρ L (T ) ( Fig. 1). At some ρ (N ) L (T ) > ρ L (T ), the system transitions from a supersaturated dilute state to a mixed two-phase state. It has been shown that this mixed state consists of a single large droplet of the high-density phase in a low-density background, and that the linear dimension R of the droplet scales as R ∼ N 1/(d+1) with N [21][22][23]. This result can be rigorously proven for the two-dimensional lattice gas [31]. In brief, the size of the critical droplet can be viewed as the result of two competing mechanisms for handling a particle excess, δN. One is that the particle excess is absorbed as a density fluctuation in the low-density phase, the free-energy cost of which scales as (δN) 2 /N. The other mechanism is that a finite fraction of the particle excess forms a dense droplet, the free-energy cost of which scales as the surface area of the droplet, that is (δN) (d−1)/d . Assuming that droplet formation sets in when these two costs become comparable, one finds that the linear size of Schematic temperature-density phase diagram of a system that undergoes phase separation below an upper critical temperature, T c , into two phases with respectively low (ρ L ) and high (ρ H ) densities. In other systems, phase separation may occur above a lower critical temperature.
Below the so-called binodal curve, T b (ρ), the low and high density phases coexist. At the left branch of the curve, the system transitions between the low-density phase and a mixed two-phase regime. In finite systems, the transition temperature, T the critical droplet scales as R ∼ (δN) 1/d ∼ N 1/(d+1) [21][22][23].
Using this result, it follows that the finite-size shift of the transition density scales as . Correspondingly, with ρ rather than T fixed, the transition temperature has a finite-size shift, given by Note that this relation implies that the convergence of T toward its value for infinite system size, T b , is slow. For comparison, the finite-size shift of a regular temperature-driven first-order phase transition scales as N −1 [32].
In finite systems, the transition is not only shifted but also smeared. At fixed ρ, the smearing, or width, of the transition, w T , may be estimated as the temperature interval over which |β∆F | < ∼ 1 [23,26], where ∆F is the free-energy difference between the states with and without a droplet. Since ∆F vanishes at T ) to leading order. Here, ∆E is the energy gap, which, assuming that particle interactions are negligible in the low-density phase, should scale as the droplet volume, that is It then follows that the smearing of the transition scales as When analyzing the droplet transition, a useful property is the specific heat, C V /N, which exhibits a peak at the transition and can be computed from the energy fluctuations, , and smearing, w T , may be defined as, the position and width of the specific heat peak, respectively. With increasing N, the width of the peak, w T , decreases [Eq. (4)], whereas the height of the peak, C V,max /N, increases. With a two-state approximation, one has C V,max ≈ (∆E) 2 /4k B T 2 , where ∆E, as before, is the energy gap. Using this relation along with Eq. (3), one finds that A slightly different behavior, namely C V,max /N ∼ N d/(d+1) , has been suggested [26], based on the assumption that C V,max /N scales inversely proportional to w T . However, unlike at a regular temperature-driven first-order phase transition, in the case of droplet formation, the area under the specific heat peak vanishes in the large-N limit, since ∆E/N does so.
Hence, C V,max /N should grow slower than w −1 T ∼ N d/(d+1) with N, as it does in Eq. (5).

III. RESULTS
Using the model and MC methods described in Sec. II, we conduct thermodynamic sim-

A. Overall characterization
At high temperatures, the simulated systems are in a disordered state, with only small clusters present ( < ∼ 10 chains). As the temperature is reduced, markedly larger clusters, or droplets, appear. Their formation sets in abruptly, in a narrow temperature interval, where states both with and without droplets are observed. Figure 3 shows representative snapshots of configurations with droplets for both sequences, from simulations with 640 chains. For sequence A, a single large droplet can be seen, in a dilute background with only small clusters. For sequence B, more than one droplet is often present, and the droplets are smaller than those formed by sequence A. A single large droplet is what one expects to observe if droplet formation occurs through phase separation [21][22][23].
If phase separation occurs, the onset of droplet formation is, furthermore, expected to be associated with a divergence in the specific heat (Sec. II C). Consistent with this, for sequence A, specific heat data from simulations with 10-640 chains show a peak that steadily gets higher and narrower with increasing system size [ Fig. 4(a)]. The corresponding data for sequence B follow the same trend for small systems [ Fig. 4(b)]. However, for this sequence, at some system size (around 80 chains), the specific heat stops growing higher and becomes multimodal. This behavior reflects the fact that sequence B forms more than one droplet in the larger systems (Fig. 3), and implies that this sequence does not undergo LLPS.

B. Finite-size scaling analysis
The above results indicate that, unlike sequence B, sequence A may undergo LLPS. To determine whether or not this is the case, we next compare simulation data for several quantities with predictions from finite-size scaling theory (Sec. II C), focusing on sequence A.
At the onset of droplet formation, due to the coexistence of states with and without a droplet, the probability distribution of energy should be bimodal, as it is at a regular temperature-driven first-order phase transition. In the latter case, the energy gap between the two phases scales linearly with system size, corresponding to a non-zero specific latent   is still about 8% smaller than the fitted value of T b .
To summarize the above analysis, for all properties studied, we find that the simulation data for sequence A are consistent with the theoretical predictions, which provides strong evidence that this sequence indeed undergoes LLPS.

C. Droplet size and structure
The specific heat data discussed in Secs. III A and III B show that the sequences A and B, despite sharing the same length and composition, have different phase behaviors. To understand this difference, we next examine some basic structural properties of the droplets formed by these sequences. Throughout this section, we focus on data obtained using N = 640, ρ b = 0.025b −3 and a temperature near the onset of droplet formation.
One important characteristic is the mass of the droplets, or the number of chains that they contain. It was already noted that sequence A forms more massive droplets than sequence B (Fig. 3). To quantify this assertion, Fig. 7 shows cluster mass distributions for both sequences. From this figure, it can be seen that, in these systems, a typical sequence A droplet accommodates about 200 chains, whereas the corresponding number for sequence B is less than 50. Also worth noting is the statistical suppression of intermediate-mass clusters, which is particularly pronounced for sequence A. If phase separation occurs, one expects to observe a single dominant droplet [21][22][23], as is the case for sequence A.
Another basic characteristic is the density of the droplets. counting all beads, whether or not they belong to a chain in the cluster. The total density is split into H and P densities. The calculated density profiles for sequence A are essentially flat at both small and large r cm [ Fig. 8(a)], suggesting that these densities are representative for the interior of droplets and the dilute background, respectively. Using this property, we find that the density inside droplets is more than a factor 40 higher than that of the dilute background (where the total bead density is 0.019b −3 ). Note also that the droplets are homogeneous in composition; the H to P ratio is virtually independent of r cm .
The droplets formed by sequence B exhibit, by contrast, a micellar structure, with a high-density core composed almost exclusively of H beads and a corona of mainly P beads [ Fig. 8(b)]. The formation of a hydrophobic core is possible due to the block structure of this sequence. However, as the sequence is short and contains a stretch of P beads, this core can only accommodate a small number of chains, which explains the low mass of droplets formed by this sequence (Fig. 7). The mechanisms of micelle formation by block copolymers have been extensively studied by both theory and simulation [33][34][35].
While we have seen above that sequence A phase separates, it is still not immediately clear whether the dense phase is liquid-like. Therefore, we end with a brief assessment of the mobility of the chains in droplets formed by this sequence. The analysis uses configurations stored at a time interval of 10 3 MC cycles, which is much shorter than the average droplet lifetime of about 2 · 10 5 MC cycles. As before, a droplet is a cluster with more than 75 chains. We first consider the exchange of chains between droplets and their surroundings.
To this end, whenever two consecutive snapshots both contain droplets, the chain contents of the droplets are compared. Over this timescale (10 3 MC cycles), it turns out that, on average, 44% of the chains present in the original droplet are lost, indicating a fast exchange with the surroundings compared to the lifetime of a droplet.
To get a measure of whether also the internal structure of a droplet is dynamic, we monitor changes in chain-chain contacts within droplets. To this end, given a droplet-containing snapshot, we identify all pairs of chains in the droplet that are in contact (interaction energy <0), and where each chain also interacts with at least 15 other chains. The latter condition serves to focus the analysis on chain pairs buried in the interior of the droplet. Whenever a droplet is present also in the next snapshot (10 3 MC cycles later), we check the fate of the contacts identified in the first snapshot. On average, we find that 54% of the pairs remain in contact, whereas only about 11% are broken due to at least one of the chains leaving the droplet. This leaves 34% of the pairs separating due to internal rearrangements of the droplet, showing that the internal structure is far from rigid. Thus, the droplets are dynamic with respect to both exchange with the surroundings and their internal organization.

IV. DISCUSSION AND CONCLUSIONS
It is well-known that finite-size scaling theory provides a powerful tool for analyzing phase transitions in spin models as well as vapor-to-droplet transitions in simple liquids. In this manuscript, we have applied these ideas to investigate the sequence-dependent phase behavior of a simple explicit-chain model for protein droplet formation.
Of the two specific sequences studied, the block sequence B turned out not to undergo LLPS. It is worth noting that from data for small systems, one may be led to the opposite conclusion. In particular, the observed peak in the specific heat is for small systems higher for sequence B than it is for the alternating sequence A, which does phase separate. However, For sequence B, we observed micelle formation, rather than the formation of a droplet of a dense bulk phase. Micelle formation was found to set in at a kT of about 5. Note that the system need not remain micellar in character well below this temperature. In particular, it is conceivable that the global free-energy minimum of this system contains bilayer structures at low temperatures. However, a proper investigation of the low-temperature phase structure is computationally challenging and beyond the scope of the present article.
To determine whether or not sequence A phase separates, simulation data for several properties and a range of system sizes were compared with predictions from finite-size scaling theory. In this way, the phase behavior can, in principle, be investigated in a systematic fashion, but it must be remembered that the theoretical results are leading-order predictions for large systems and therefore not necessarily valid for the system sizes amenable to simulation. It turned out, however, that a scaling behavior consistent with the predicted asymptotic one could be observed for all properties studied. Hence, taken together, the results of this analysis leave little doubt that sequence A does indeed phase separate.
It is worth noting that sequences with alternating hydrophobic and polar residues tend to have a high β-sheet propensity [36,37]. The biophysical model used in our present calculations cannot describe β-sheet formation, due to the lack of hydrogen bonding. However, it has been shown that droplet formation through LLPS sometimes is followed by maturation into a solid-like state containing amyloid fibrils [38]. In this case, LLPS represents a first step toward β-sheet formation.
Among the specific scaling relations studied, the finite-size shift of the transition temperature deserves special attention. This shift scales inversely proportional to the linear size, rather than the volume, of the critical droplet, so that T toward its value for infinite system size, T b , makes finite-size scaling analysis an important ingredient when determining the phase diagram from simulation data. This conclusion is highlighted by the magnitude of the relative shift of the transition temperature for sequence A, which was found to still be ∼8% for the largest systems with 640 chains.
Simulation methods, based on explicit-chain or field-theory representations, offer some distinct advantages over mean-field methods in the study of sequence-dependent biomolecular phase separation. However, to exploit the full potential of the simulations, the system-size dependence of the generated data needs to be understood and accounted for. The results presented here demonstrate that a systematic analysis of the system-size dependence can be both feasible and rewarding.