Grand-canonical-like molecular-dynamics simulations by using an adaptive-resolution technique

In this work, we provide a detailed theoretical analysis, supported by numerical tests, of the reliability of the adaptive resolution simulation (AdResS) technique in sampling the Grand Canonical ensemble. We demonstrate that the correct density and radial distribution functions in the hybrid region, where molecules change resolution, are two necessary conditions for considering the atomistic and coarse-grained regions in AdResS equivalent to subsystems of a full atomistic system with an accuracy up to the second order with respect to the probability distribution of the system. Moreover, we show that the work done by the thermostat and a thermodynamic force in the transition region is formally equivalent to balance the chemical potential difference between the different resolutions. From these results follows the main conclusion that the atomistic region exchanges molecules with the coarse-grained region in a Grand Canonical fashion with an accuracy up to (at least) second order. Numerical tests, for the relevant case of liquid water at ambient conditions, are carried out to strengthen the conclusions of the theoretical analysis. Finally, in order to show the computational convenience of AdResS as a Grand Canonical set up, we compare our method to the Insertion Particle Method (IMP) in its most efficient computational implementation. This fruitful combination of theoretical principles and numerical evidence candidates the adaptive resolution technique as a natural, general and efficient protocol for Grand Canonical Molecular Dynamics for the case of large systems.


I. INTRODUCTION
The Adaptive Resolution Simulation (AdResS) scheme 1,2 is a method that in a concurrent fashion simulates a molecular system treated with different resolutions in different regions of space. The word "resolution" here refers to the amount of details included in a molecular model: the higher resolution corresponds to a more precise model, but at higher computational costs. The advantage is obvious, it keeps track of both: the local fine-grained processes in the high-resolution regions and the large scale behavior, requiring a less demanding computational effort compared to the system that, as a whole, would be described by the high-resolution model. The key feature of AdResS is that it allows for an on-the-fly change of resolution when a molecule travels from a high-resolution region to a low-resolution region and vice versa. Moreover recent research 3,4 has numerically demonstrated that different regions (with different resolutions) reach a thermodynamic equilibrium as if the whole system were equilibrated under the high-resolution description. Therefore one is tempted to state that a region with a certain resolution exchanges molecules with the rest of the system in a Grand Canonical fashion. This work studies the minimal necessary conditions, such that AdResS performs effective Grand Canonical simulations. Moreover we shed light on the level accuracy of the sampling in a Grand Canonical fashion; it will be shown that indeed the sampling can be considered of the Grand Canonical type if we accept an accuracy on the probability distribution of the system up to the second order. Though the second order is numerically satisfying, higher level of accuracy can be systematically reached by requiring to match, in the transition region, higher orders of the probability distribution.
From the numerical point of view, for systems which are large enough, a (relatively) small subsystem of a full atomistic region is a natural Grand Canonical ensemble, for this reason the approach we use here is that of showing the equivalence between the atomistic region of AdResS and the same region in a full atomistic simulation. Extensive numerical tests, for the relevant case of liquid water at ambient conditions, are presented in order to show that the hypothesis done within the theoretical analysis are justified from the numerical point of view. Beyond our expectations, we have found that in the atomistic region of AdResS not only the accuracy can be tuned up to the second order, but, without any additional correction, even the three body correlation function of molecular centers of mass (COM) turns out to be the same as that of a full atomistic simulation. Finally, a further numerical test was carried out: the coarse-grained molecules are substituted by a liquid of spheres interacting via the Weeks-Chandler-Andersen (WCA) potential 5 which has no reference to the atomistic resolution. We show that also in this case in the atomistic region, due to the work of the filter of the transition region, an accuracy up to the third order in the probability distribution is achieved. The results of this work allow us to make a step further in the possibility of employing the AdResS scheme as a natural, general numerical protocol for truly Grand Canonical Molecular Dynamics simulations independently from the nature of the particles acting as a reservoir in the coarse-grained region. The word "natural" refers to the fact that strictly speaking in statistical mechanics the Grand Canonical ensemble is defined in operative terms as a subsystem of an infinitely large system. Of course in simulation systems are never infinitely large, however they can be large enough to numerically satisfy this condition. In this sense, in our approach molecules are exchanged between the atomistic domain and the coarse-grained reservoir in a straightforward (i.e."natural") dynamical way. The advantage compared to previous Grand Canonical schemes for molecular dynamics is that the proper exchange of particles requires neither the knowledge or the calculation of the chemical potential, nor additional expensive insertion or removing of particles [6][7][8][9][10][11] ; we actually compare the computational costs of our method to those of the IPM and show that for dense liquid systems AdResS is more efficient. The paper is organized as follows: first we briefly describe the essential principles on which AdResS is based, then we discuss the approximations under which one can write the probability distribution of the system in terms of Grand Canonical partition function. Next we show that the work of the thermostat plus that of the thermodynamic force balances the difference in chemical potential; this idea allows us to derive the necessary conditions of simulation to reach the accuracy in terms of orders of the probability distribution of the system. Finally the numerical results and the efficiency of the method are discussed.

II. A BRIEF DESCRIPTION OF THE ADRESS SCHEME
In this work, the higher resolution refers to the atomistic description (AT) of a molecule, while the lower resolution refers to its corresponding coarse-grained (CG) model (for a pictorial representation of a typical AdResS set up, see Fig.1 for the case of liquid water). We assume that the dynamics of the system is subject to the Langevin equation, which is the thermostat usually used in the AdResS simulations. We denote the phase space variable x = (r 1 , · · · , r N , v 1 , · · · , v N ); r i , v i are the center-of-mass degrees of freedoms (DOFs) of the molecules. For simplicity we do not explicitly write the atomistic DOFs, but one should keep in mind that the interaction between two molecules with atomistic resolution are given by the sum of all pair-wise atomic interactions. Also for the sake of simplicity, we do not explicitly distinguish between the COM and the atomistic coordinates, using a generalized formalism where the coordinates are indicated simply as r i , however the interpretation of such a notation is made clear by the specific context. If the system is conservative, then one obtains the equilibrium density distribution p(x ) ∝ e −βH(x ) , where β = 1/k B T is the inverse temperature. In this paper, we assume that the kinetic part of the Hamiltonian is decoupled from the configurational part, then it is usually more convenient to consider the configurational probability distribution, namely p(r 1 , · · · , r N ) = p(r 1 , · · · , r N , v 1 , · · · , v N ) dv 1 , · · · , dv N . Throughout the paper, when we mention the i-th order of a multi-body configurational probability distribution, e.g. p(r 1 , · · · , r N ), we mean its i-th marginal distribution, which is defined by p (i) (r 1 , · · · , r i ) = p(r 1 , · · · , r i , r i+1 , · · · , r N ) dr i+1 · · · dr N . It is obvious that whenever we have the i-th order accuracy of a probability distribution, any lower order is automatically accurate. In the AdResS scheme, different resolutions in the system are described by a weighting function w(r ). Usually the higher resolutions is denoted by w = 1, while the lower resolution is denoted by w = 0. Between the higher and lower resolutions, a hybrid region allows a molecule to have both (interpolated) resolutions. The weighting function changes smoothly from 0 to 1; one possible form of such a function is: Where χ(r ) is the distance between the molecule at r and the boundary of the higher resolution region, χ(r ) < 0 means that the molecule is in the high resolution region, d ∆ is the thickness of the hybrid region, r c is the cut-off radius of the atomistic interactions. The intermolecular force is modeled by the following interpolation formula 4 : Where w i = w(r i ) and w j = w(r j ). F AT ij , F CG ij are the intermolecular interaction of the atomistic and coarse-grained resolutions, respectively. F rdf ij is a force that corrects the COM-COM radial distribution function (RDF) in the transition region ∆ 4 . A further one body force, named thermodynamic force, F th , is applied to ensure the correct thermodynamic equilibrium of the system (for specific details, see Ref. 3): which is defined by where p AT is the pressure of the atomistic resolution, p CG that of the coarse-grained resolution, ρ 0 is the equilibrium number density, corresponding to that of a full atomistic simulation. Such a thermodynamic force has been derived by empirical considerations regarding the equilibrium of open systems and is based on forcing the equality of the grand potential of the atomistic part with the rest of the system. This is then reduced to provide a balancing force to the difference of pressure at the wished uniform density of equilibrium. Regarding this point, one of the main results of this paper is that the thermodynamic force and the thermostat perform a work in the transition region which balances the difference in chemical potential between the different regions. As a consequence (equilibrium of the chemical potential) the interpretation of the AdResS simulation as a Grand Canonical sampling is enforced even further. A crucial point of this work is the following: the AdResS scheme is not Hamiltonian 12,13 ; however, under the hypothesis of fixing DOFs in the hybrid region for a statistical analysis, the atomistic and coarse-grained regions can be considered (in good approximation) Hamiltonian; this line of thought gives rise to the basic idea of the present study as it is presented in the next sections. Moreover Refs. 12,13 also show that a space dependent interpolation of potentials, which would lead to a direct Hamiltonian approach, is neither satisfactory from the point of view of the physical consistency, nor for the computational efficiency in improving in a systematic way the accuracy of the results in the AT region; instead this latter is one of the main results of this work.

III. THEORETICAL CONSIDERATIONS
A. The outline of the basic idea Here we denote the degrees of freedoms and number of particles in the atomistic region (AT), hybrid region (∆) and the coarse-grained region (CG) by (x 1 , N 1 ), (x 2 , N 2 ) and (x 3 , N 3 ), respectively. Therefore, the target is to prove that the atomistic region is subject to the Grand Canonical statistics: where the partition function Q 1 is defined by The marginal probability of finding N 1 molecules in the AT region is: where Q N1 is the partition function for a canonical ensemble with N 1 atomistic molecules: Let us consider p(x 1 , N 1 ) = p(x 1 |N 1 ) p(N 1 ). Then, from Eq. (5) and (7), the conditional probability p(x 1 |N 1 ) for a truly Grand Canonical ensemble turns out to be The key point of our argumentation is the following: we want to compare the distributions (in the various regions) of the AdResS simulation with the corresponding ones of a full atomistic reference system, which is ideally divided in subregions corresponding to the AT, ∆ and CG regions of the AdResS set up. The first step in our procedure is to fix the number of molecules in the atomistic region and consider the conditional probability p(x 1 |N 1 ). If the AdResS set up would sample the space in a Grand Canonical fashion, then this probability should be the same as the corresponding one of a full atomistic reference system, namely Eq. (9). Then, if the probability of finding N 1 molecules in the atomistic region is also the same as the full atomistic reference system, namely Eq. (7), we can safely state that the atomistic region of AdResS samples configurations in a Grand Canonical fashion. In fact as underlined before, it must be noticed that a subsystem of full atomistic system is a natural Grand Canonical ensemble in the thermodynamic limit (see, e.g., Ref. 14). The thermodynamic limit in our case is intended in such a way that at a fixed density, the size of both the AT and CG regions tend to infinite, and, at the same time, the ratio between the extension of AT and that of the CG regions goes to zero; as underlined before, in practical terms, in numerical simulation the system size does not go to infinity, however it can be large enough so that within a certain numerical accuracy the hypothesis of thermodynamic limit holds.
B. The conditional probability density p(x1|N1) To prove the above statement about p(x 1 |N 1 ), we divide it into two parts: where p(x 1 |N 1 ; x 2 , N 2 ) is the probability density obtained by fixing the coordinates and number of particles in the region ∆ and considering the distribution of the DOFs in the AT region (see further clarifications below), while p(x 2 , N 2 |N 1 ) is the distribution in the hybrid region, conditional on the number N 1 of particles in the atomistic region. We now comment on the underlying assumptions involved in the calculation of the conditional probability density: 1. If we use the weighting function (1), then the AT region is interacting with the ∆ region as if the ∆ region were part of the AT region because all hybrid molecules interacting with the AT molecules have unit weight.
2. We assume that p(x 1 |N 1 ; x 2 , N 2 ) can be approximated by where the Hamiltonian governing the physics of the molecules in the atomistic region is defined as: which is exactly the same as for the equivalent subregion of the full atomistic system of reference. (The approximation is essentially based on the assumption that AT and ∆ regions are only short-range correlated; this is discussed in the following point.) 3. We further assume that all interactions in the system have finite range with given cut-off radius; the electrostatic interaction is treated by the reaction field method. Specifically, we suppose that the AT region does not interact in a direct way with the CG region. For the case of liquid water at ambient conditions, it is reasonable to assume that the system is only short-range correlated, i.e. the AT region is only correlated with the ∆ region. We will show numerically that this is indeed the case.) We suppose that all interactions satisfy the superstability condition of Ref. 15. 4. The hypothesis of fixed molecules in the ∆ region, must not be intended in dynamical terms. This means that during the dynamical evolution of the system one should not suppose the molecules in the ∆ region to be frozen in their positions while molecules in other regions are moving. Instead this hypothesis must be intended in the sense of statistical analysis; within a statistical framework it is intended that for a given configuration of molecules in the ∆ region, the AT region explores a large (statistical) number of configurations. In such a case, the practical statistical analysis consists of taking a given configuration in the ∆ region, then consider the whole trajectory of a simulation and sort out all the configurations in the AT region corresponding to the given configuration in the ∆ region. If one repeats the process for a large number of fixed configurations of the ∆ region, then the data of the trajectory sorted according to this criterion would lead to a statistics for the AT region equivalent to that obtained by a sampling performed according to the given Hamiltonian of the AT region (Eq.12).
In accordance with the third assumption, we ignore correlations between the molecules of ∆ and CG regions, in fact we even consider the DOFs in the ∆ region fixed while sampling configurations in the AT and CG regions. The question then is, whether the probability p(x 2 , N 2 |N 1 ) in the ∆ region is the same as that of the equivalent region in a fully atomistic reference system. In general this will not be the case, so we state necessary conditions to enforce such an equivalence to lowest order in the configurational probability of the system; the necessary conditions are 4 : The first order marginal distribution is the particle (or molecular) density ρ ∆ (r ), while the second order marginal distribution is the radial distribution function (RDF), g ∆ (r). The necessary conditions of the correct distribution p(x 1 |N 1 ) are that these two distributions should be the same as those of the fully atomistic reference system. These are the minimal necessary conditions involving the basic DOF's, that is the molecular center of mass coordinates, however one may require a more specific accuracy by imposing that also any atom-atom g(r) is matched to the corresponding function of the atomistic reference system. In general, the conditions of Eq.13 and Eq.14 would assure that at least at the first and second order p(x 2 , N 2 |N 1 ) in the AdResS is the same as that of a full atomistic simulation. One needs to go at least at the second order, so that at the interface between the atomistic and transition region, the radial distribution functions of the atomistic part are not affected by artifact due to the deviation of the RDF's of the ∆ region from the correct atomistic reference. We have shown numerically how the RDF correction, can be numerically implemented 4 . Higher accuracy can be then systematically reached by enforcing the equivalence of higher orders of the distribution in the transition region. Next, we must show that p(N 1 ) is the same in AdResS and in the reference full atomistic simulation. The basic outline of arguments will be given in a following section while the (long) explicit calculations are reported in the Appendix A, however, before proceeding further we must first treat a key ingredient of this equivalence, that is the thermodynamic force. This force, in fact, together with the thermostat, is the crucial tool for assuring the thermodynamic and statistical equilibrium of the AdResS system when compared to the reference full atomistic simulation. In the next section, we will show how this force, derived on intuitive ground to enforce the equality of some basic thermodynamic relations, as shown in Eq.4, is, together with the action of the thermostat, the key ingredient for the balance of chemical potential between the various region at the desired density of equilibrium. The balance of chemical potential is implicitly a strong argument in favor of the idea of AdResS as Grand Canonical-like scheme (i.e. molecules are exchanged between the AT and CG region in conditions of equilibrium).
C. Thermodynamic force: from empirical intuition to strict formalization within the Grand Canonical framework Intuitive thermodynamic considerations led to formula (4). In this section we show that, despite the argument used in Eq.4 from Refs. 3,4 has a strong empirical component, one can formally justify this force as a tool to balance the chemical potential of the various resolutions and, as a consequence, a key aspect in interpreting AdResS as an effective Grand Canonical set up. Here, two assumptions are made: 1. N 2 ≪ N 1 ≪ N 3 : the second inequality corresponds to the thermodynamic limit of the Grand Canonical ensemble. The first one actually assumes that the ∆ region is infinitely thin so that it can be viewed as an infinitesimal membrane that allows free exchange of molecules from the AT region to the CG region and vice versa.

We have
The latter is reasonable if the system is short-range correlated.
The equilibrium in the AT and CG region is assured by the membrane ∆ via the action of thermodynamic force and of the thermostat. Conceptually, we assume there is an infinitely thin "filter" (see Fig. 2) located at the interface between the AT and CG regions. When a molecule enters into the AT region or leaves it, the filter does some work per molecule, ω 0 , on the atomistic system in order to assure the thermodynamic equilibrium. Therefore, we add an empirical term in the Hamiltonian of the system N 1 ω 0 , related to the work done by the filter to obtain configurations of the atomistic region with N 1 molecules. Having fixed the number of molecules, that is ideally considering case by case situations at fixed N 1 , N 3 , and by following the arguments of the last section, both the AT and the CG regions are subject to the Boltzmann distribution. Thus, the fixed-number partition function (N 1 in the AT region and N 3 in the CG region) of the system reads Considering the permutations of particles, the number of possibilities of N 1 molecules being in the atomistic region and N 3 molecules being in the coarse-grained region is N ! N1!N3! . Therefore, the partition function of the whole system reads with natural relations: For convenience, we denotẽ Further letN 1 be the value at whichq N1 reaches its unique maximum, namelyqN 1 = maxq N1 . (The existence of a maximum follows from the superstability of the interactions that guarantees that particles do not cluster as N → ∞; we further assume that it is unique.) Sinceq N1 is positive definite, a basic observation is that Due to the monotonicity of the logarithm, we have If ln N ≪ lnqN 1 , Laplace's method yields (see, e.g., Ref. 16 (The validity of the antecedent will be discussed later.) Hence or equivalently Where A denotes the Helmholtz free energy. At this point, the crucial question is if the condition ln N ≪ lnqN 1 holds, or equivalently Generally this is true: which is much larger than ln N for N ≫ 1; this validates condition (26). We will see later (from Eq. (A1)) that the maximumN 1 corresponds to the maximum value of probability p(N 1 ); this is also the average molecule number in the AT region that is of statistical importance under the thermodynamic limit. As the right hand side of Eq. (25) attains its maximum atN 1 when the other thermodynamic variables are kept fixed, differentiating it with respect to N 1 entails This means that the difference in the chemical potential between the AT and CG regions is taken care by the work of the filter. Now, if the filter ensures an equilibrium that is the same as the full atomistic reference system, then: ρ 0 is the number density at which the atomistic and coarse-grained resolutions should match, is a necessary condition for the work required to the filter in order to have the correct equilibrium of the AT and CG regions.
As anticipated in the previous sections, the work of the filter has two components, one corresponding to the work of the thermodynamic force and the other corresponding to the work of the thermostat, i.e. ω 0 = ω th + ω Q . We indicate with ω th the work of thermodynamic force and with ω Q that of thermostat in the transition region. The existence of ω Q has been numerically proven in Ref. 17 and can be decomposed in two parts: . ω dof is related to the thermalization of the degrees of freedom (rotational and/or vibrational) which are reintroduced/removed, and according to the equipartition theorem, one has 1 2 k B T per degree of freedom, while ω extra Q is due to the absence of the energy conservation that is related to the different intermolecular interactions in the different regions of the system as shown in Ref. 13 . In the Appendix D we will show how this term can be numerically calculated within a standard AdResS, thus allowing the explicit calculation of the chemical potential of a system.
We remind the reader that the argument in this section is essentially based on a large deviations principle (namely, Laplace's method) for the number of particles in the AT region that entailed thatN 1 rather than N 1 can be considered as representative of the configuration realizations in the AT region. This implies that the particle fluctuations, ∆N 1 , are negligible compared toN 1 in the AT region. This hypothesis is valid if the AT region is large enough so thatN 1 is large. This point seems to be true in all numerical experiments done so far (see, e.g., Ref. 18). In this way, we have shown the formal derivation of the thermodynamic force and the thermostat as tools to balance the chemical potential between the two regions. For a Grand Canonical-like set up, the balance of chemical potential between open regions is a necessary condition to have the exchange of molecules in thermodynamic equilibrium and as a consequence we have formally justified why Eq.4 is a necessary condition for AdResS to be considered an effective Grand Canonical set up.
D. The number probability p(N1) In this section, we provide the basic arguments by which one can define at which level of approximation p(N 1 ) in AdResS is the same as in a full atomistic simulation. This is an important point in order to justify the reliability of AdResS as a Grand Canonical set up in terms of probability distribution. In fact if p(N 1 ) in AdResS is very different from the corresponding one of a full atomistic simulation, then clearly the AdResS method cannot be considered valid since the artifacts due to the different p(N 1 ) would give a non realistic description of the system. Essentially the arguments of this equivalence are two; the 1st order accuracy of p(N 1 ), requires the balance of the chemical potential: This, in the previous section, has been shown to hold because of the action of the thermodynamic force and the thermostat in the thermodynamic limit. Whether or not the size of standard and feasible molecular simulations can be considered, effectively, in the thermodynamic limit will be checked later on with numerical tests; however we can anticipate that the answer is positive for systems whose size and time of simulation are nowadays routinely done. The 2nd order of accuracy, requires the equality for the compressibility: Eq.30 implies that the COM-COM RDFs (here, again, COM stands for the center of mass, and RDF stands for the radial distribution function) of the atomistic and coarse-grained region are matched 19 . Essentially these are the physical requirements to show that up to the second order p(N 1 ) in AdResS is the same as in the equivalent subregion of the full atomistic system of reference. Because of the lengthy arguments, specific details are reported in Appendix A.

IV. A NUMERICAL TEST: LIQUID WATER
The arguments we have used so far are not sufficient to infer that the probability p(x 1 , N 1 ), as presented in the theoretical analysis, corresponds to that of a Grand Canonical distribution (Eq. (5)). One must be very careful about any statement on p(x 1 , N 1 ) due to the following three reasons: 1. There is no hope whatsoever that the assumed Boltzmann form of the distribution p(x 1 |N 1 ; x 2 , N 2 ) can be numerically verified for any realistic test system. (13) and (14) are only necessary, and are in general not sufficient to guarantee the correct distribution p(x 2 , N 2 |N 1 ) in the hybrid region ∆.

Conditions
3. Formally, the particle number density p(N 1 ) is only a second-order approximation of the correct density p AT (N 1 ).
However, the number distribution p(N 1 ) has already been proved to be numerically correct in Ref. 3 and 4 for the relevant case of liquid water. In this section, we want to test, with a numerical simulation, to what extent, other quantities, e.g. the configurational distribution p(x 1 ) = N1 p(x 1 , N 1 ), are correct. Obviously it is a prohibitive task for any complex molecular system to determine the high-dimensional configurational distribution, for this reason, instead, we study its marginal distributions up to the third order. Here we report the H-O and H-H RDFs (see Fig. 3), and the 3-body COM correlation function C (3) (s 1 , s 2 , s 2 ) (see Fig. 4). For simulation protocols and the definition of C (3) , see the Appendix B and C. The first order marginal distribution, that is the molecular density profile and the second order marginal distribution, that is the COM RDF are not presented here, we address the readers to Ref. 4. Here we focus on the more delicate RDFs which involve the atomistic accuracy, and on the third  (r) is calculated, and the value of weighting function w(x) there. From left to right, the panels correspond to the AT region, the HY I subregion of ∆, the HY II subregion of ∆, and the HY III subregion of ∆, respectively. It can be seen that beyond 0.5 nm the functions go to one, that is particle beyond this distance are uncorrelated; this is fully consistent with our hypothesis of Eq.11. order COM three-body correlation function (C (3) ). From Fig. 3 one can see that the AdResS H-O and H-H RDFs are identical to those of the full atomistic reference system in the AT region. Interestingly, despite the corrective force of the RDF is applied only to the COM RDF, also in the region HY I (that is close to the atomistic region) the H-O and H-H RDFs are the same as in the full atomistic case. This implies that the AT region is embedded in an environment where not only the COM-COM structural properties but also finer structural properties are the same as for the atomistic resolution. In the HY II and III regions, the AdResS results obviously deviate from those of the full atomistic reference system, because the atomistic nature is decreasing as a molecule travels from HY I to HY III, so it must be expected that the orientational order is also fading away. In HY III, the H-O and H-H RDFs are almost structureless. Furthermore, we test C (3) in Fig. 4   C (3) see Appendix C). Interestingly, the three-body correlation is correctly reproduced by the AdResS in the AT and HY I regions. This is a further strong argument in favor of the fact that in AdResS the AT region is embedded in an environment that, at least up to the three-body COM correlation is the same as the full atomistic reference system. In the HY II and HY III regions, the three-body correlation deviates from the full atomistic reference, in fact since the various g(r)'s already deviates, one cannot expect a higher order correlation function to match (if not by chance) the full atomistic function of reference. Another important information of Fig. 3 and 4, is that the hypothesis of short-range influence of a region over another employed in Eq.11 is numerically justified and thus we can state that for the case of liquid water at ambient conditions the AT region is only short-range correlated with the rest of the system.

V. WCA POTENTIAL INSTEAD OF ATOMISTIC-BASED COARSE-GRAINED POTENTIAL: COUPLING THE ATOMISTIC REGION TO A GENERIC SOURCE OF ENERGY AND PARTICLES
The numerical test presented in the previous section shows how an atomistic model and its corresponding coarsegrained model, which resemble basic structural properties of the original full atomistic system, can be interfaced in an adaptive way and exchange molecules in a Grand Canonical fashion. However one may be tempted to think that the request of having a coarse-grained model which resembles as much as possible the structural and thermodynamic properties of a full atomistic model would be a necessary condition to have a proper exchange of particles between the different regions. If it was so, the idea of the adaptive scheme as a Grand Canonical tool would not be very general; our theoretical results instead suggest that it must be sufficient to impose some given conditions in the transition region to have a proper exchange of energy and particles between the small atomistic region and the large coarse-grained reservoir. This means that if the theoretical considerations are correct, one should be able to show numerically that the proper exchange of energy and molecules is independent from the molecular model used in the coarse-grained region; that is, the filter of the transition region makes the atomistic region a Grand Canonical ensemble. For this reason we have carried out a further numerical test where the molecules in the coarse-grained region interact with a generic WCA potential which assures that only the density, but not the radial distribution function, of a liquid governed by this potential is the same as that of the liquid water (for the simulation set up see the Appendix B).
The left plot of Fig. 5 shows that the density profile of the WCA AdResS is perfectly matching the all-atom reference system; satisfying agreement is found also in the AT and HY I regions of the right plot of Fig. 5 which shows the molecular number fluctuation over the whole system. Fig. 6 shows the COM-COM, H-O and H-H RDFs in the AT, HY I, HY II and HY III regions. All the RDFs are compared with full atomistic reference system (EX). Finally, Fig. 7 compares the 3-body correlation C (3) with the atomistic reference system (this latter reported in the first row of Fig. 4), and plots the difference. It must be noticed that in this case, on purpose, we want to investigate a "worst case scenario" and thus we did not apply the RDF corrective force in the transition region. In this way we can understand whether or not the corrective action of the thermodynamic force only, is sufficient from the numerical point of view to reproduce the Grand Canonical-like environment for the AT region. According to our theoretical framework, in absence RDF corrective force only the first order of the distribution (i.e. the density profile) is corrected by the thermodynamic force in the ∆ region. However, numerically, Fig. 6 shows a second order accuracy (correct RDFs) in the HY I region as well. Instead, the 3-body correlation in HY I deviates from the full atomistic reference; this is the price we pay for this "worst case scenario" set up. Nevertheless, the accuracy reached in the transition region is sufficient in order to have, at least up to the second order, the AT region of the WCA AdResS equivalent to the corresponding subregion in the full atomistic reference. Furthermore, numerical results (Fig. 7) show that, in the AT region, even the 3-body correlation C (3) is nonetheless very accurate. This means that, from the numerical point of view, even only the work of the thermodynamic force and that of the thermostat, by allowing an exchange of particles under conditions of equilibrium of the chemical potential, is sufficient to assure the correct equilibrium of the atomistic region. The possible application of the corrective force on the RDF's in ∆ in the case of liquid water is, in this case, numerically not required, but in case of necessity it would anyway represent a systematic tool for improving accuracy. Summarizing, results reported in Figs. 5, 6 and 7 show that in the atomistic region the accuracy at the third order and particle number fluctuations, for which the filter in the transition region is designed, are well preserved; that is the atomistic region exchange particle and energy with the reservoir in a proper Grand Canonical fashion (up to the level of approximation decided a priori). Being the molecules in the coarse-grained region a generic liquid model, we have proved numerically that indeed the adaptive technique employed here can be considered as a tool for general Grand Canonical Molecular Dynamics simulations.

VI. COMPUTATIONAL EFFICIENCY OF ADRESS AS A GRAND CANONICAL SET UP
In order to show the computational efficiency of the AdResS method as a Grand Canonical set up, we have carried out a numerical experiment to compare the performance of our method in inserting and removing molecules from the atomistic region and that of a highly popular approach used in Molecular Dynamics, the insertion particle method (IPM) 20 . The IPM is very efficiently implemented into the GROMACS code 21 and thus it is a natural choice as a Grand Canonical-like set up for a very large portion of molecular dynamics simulators. The experiment consists of the following: we have considered a large full atomistic system and applied the highly optimized IPM to insert molecules. From this study we extract the efficiency of the IPM in terms of computational resources required. Next, we consider Sys. size (nm 3 ) Traj. length (ns) CPU time (hours) AdResS 29.9 × 3.7 × 3.7 1 3.1 EX water 3.7 × 3.7 × 3.7 8 4.5 (traj.) + 36.7 (IPM) is even larger than that of the EX water simulation box, so that we are actually in a "worst scenario" situation. Moreover simulations with smaller reservoirs gave essentially the same results reported for this system. "Sys. size" means the size of the box used in the simulation. "Traj. length" means the equilibrium trajectory used for the particle insertion in the IPM approach. Along the trajectories, frames of configurations were recorded every 0.2 ps. "CPU time" the is wall clock time spend on the simulation. For the particle insertion simulation, the CPU time is counted by two parts: the time of generating the equilibrium trajectories (traj.) (not expensive) and the time required by the IPM procedure (very expensive). Moreover, not only the insertion procedure is expensive on itself, but even after 8ns the insertion process has not actually converged. Simulations on longer time scales show that the full convergence is actually never reached, which suggests that the insertion procedure for a molecule as simple as water is not fully rigorous from the physical point of view.
an adaptive system whose number of molecules, in the atomistic plus hybrid region, is at least the same as the full atomistic system considered (actually the for results presented here the size considered is even larger than that of the EX simulation); to this we add a rather large reservoir of coarse-grained particles (tests were carried out also with smaller reservoirs but the results were essentially the same as for the large reservoir). Also in this case the efficiency in measured terms of computational resources required. The results show that the IPM procedure itself requires a rather high computational cost, to which it must be added long trajectories so that the insertion of one molecule per time converges and data can be then collected for real statistics. Instead, the adaptive approach performs, in less time and at a much lower computational cost, on-the-fly dynamical multiple insertions keeping the instantaneous equilibrium intact, which implies the additional advantage that data can be collected on-the-fly for real statistics at any time. Specifically, we used an equilibrium trajectory of length 8 ns where the coordinates of water molecules were recorded every 0.2 ps. Using the IPM approach, in each frame, 10 5 test particles were inserted at random position of the system. For each insertion, 10 5 randomly chosen orientations were tried. The convergence is very slow, even at 8 ns the insertion process is not satisfactorily converged (see Fig 8), this means that up to this point data cannot be collected for analysis of physical properties. Compared to IPM, AdResS leads a very efficient insertion of particles: in each 1 ps time interval, (on average) 23 water molecules entered into the atomistic region. Some of them came immediately back, because of a unfavorable entering configuration. To investigate the proper insertion, we consider the time interval of 1 ns, and find 832 out of 13824 molecules were firstly in either the HY or CG region, and then travelled to the AT region and stayed there for longer than 60 ps, which indicates a mean square root displacement of roughly 1 nm in the AT region. Moreover, while for the EX simulation one must avoid size effects by choosing reasonable large systems, in AdResS the AT region can actually be chosen to be much smaller than the one used here, thus enhancing the amount of efficiency. Finally, even if one can afford a rather large atomistic system, thus a faster convergence, the cost of the insertion procedure will increase enormously because it would require a larger sampling due to the increased number of possibility to allocate a particle in the liquid. These results in terms of computational resources are summarized in Table I. Moreover, in previous work, Praprotnik et al. 22 have shown that the coarsegrained region of AdResS can be easily coupled to the continuum; this has the non trivial advantage that the insertion of a sphere in a liquid of spheres by IPM is by far much simpler than inserting an atomistically structured molecule in a liquid of atomistic molecules. Fig.8 shows the extremely fast convergence for inserting a particle in our "reservoir" of spherical particles compared to that of a full atomistic system. This means that in the long term perspective the part of the algorithm of Praprotnik et al. could be merged with our current approach and thus allow for even smaller sizes of the reservoir region. However, this extension, though conceptually straightforward goes beyond the scope of this paper. In general, there have been a large number of developments for the GC idea based on the "grow-in" and "shrink-out" MD approach (see e.g. Lynch and Petitt 10 and Eslami and Müller-Plathe 11 ) however, all of them share the same basic principle and face the same problem of the IPM. This numerical experiment shows that despite the need of a reservoir, our computation is by far more efficient than any other technique so far used. The comparison between the performance of AdResS and IPM as a Grand Canonical set up, leads to the natural question of whether or not our method can be employed to calculate the excess chemical potential which is the primary purpose for which IPM is used. If this is possible, then AdResS may represent a more efficient computational tool than IPM also for calculating such a quantity. In Appendix D we provide a practical scheme for using AdResS as a tool to calculate the chemical potential and show that the results are in rather satisfying agreement with those obtained by IPM. The plot shows the result that for EX water the IPM procedure does not converge within 8 ns; in fact the convergence of the chemical potential means that the insertion of the molecule is properly done and subsequent part of the trajectory (with the associated IPM) can be used for "statistical" average in a Grand Canonical framework.

VII. CONCLUSIONS
We have shown a detailed theoretical analysis about the validity, the limitations and meaning of the AdResS method within the framework of a Grand Canonical ensemble. Using strict formal arguments, we have demonstrated the role of the thermodynamic force and of the thermostat in balancing the difference in chemical potential due to the different resolutions in space. The theoretical results, derived under the assumption of thermodynamic limit, are then checked with several numerical tests. These latter represent a "worst scenario" case since the conditions of simulation are not ideal as in the analytical procedure. Next, we have shown that, under given hypothesis, p(N 1 ), the probability distribution in the atomistic region of AdResS, is equivalent to that in the same region in a full atomistic simulation up to the accuracy of second order. We have further strengthen our hypothesis and conclusions by carrying out a numerical experiment for the case of liquid water at ambient conditions where we could even go beyond the COM RDF. A further numerical experiment was made where the atomistic-based coarse-grained model was substituted by a generic liquid of spheres interacting via WCA potential. Results show that the accuracy in the atomistic region is the same as that of AdResS based on a coarse-grained model derived from the full atomistic reference system; this strengthen the idea of AdResS as a tool for coupling the atomistic region to a generic source of energy and particles in a Grand Canonical manner. Finally we validate the efficiency of our method as a Grand Canonical set up comparing its computational performance to that of well established and largely used alternative methods. In conclusion, these results provide a solid theoretical basis to explain the numerical reliability of the AdResS as an effective Grand Canonical set up and provide basis for further formal and numerical development of the adaptive idea in terms of matching probability distributions.
where n 1 is the summation variable that goes from 0 to N . In any case, when n 1 is not much smaller than N (i.e. out of the range of the thermodynamic limit), the statistic is not relevant, and can be safely neglected. The marginal number distribution of the full atomistic simulation, which the AdResS simulation should reproduce is We want to calculate the difference between p(N 1 ) and p AT (N 1 ); that is the difference between the particle probability in the atomistic region of AdResS (p(N 1 )), and the particle probability in a full atomistic simulation in the region equivalent to the AT region of AdResS. In order to do so we proceed with the following technical operation: multiply the numerator of p(N 1 ) by the denominator of p AT (N 1 ) (denoted by T 1 ), and multiply the numerator of p AT (N 1 ) by the denominator of p(N 1 ) (denoted by T 2 ). It follows: The difference between T 1 and T 2 is basically the difference between p(N 1 ) and p AT (N 1 ). Calculating T 1 : Notice that the free energy is extensive. By applying Euler's theorem 23 we thus find whereN 1 is the number of molecules in the atomistic region at the wished density ρ 0 of equilibrium, namelyN 1 = ρ 0 V 1 = N − ρ 0 V 3 . As we work under the hypothesis that the atomistic region is much smaller than the whole system, it is reasonable to assume N 1 ≪ N . Moreover we have found that, in the thermodynamic limit, N 1 ∼N 1 ∼N 1 which carries over to the running index n 1 in the grand canonical partition function that is non-negligible only if can be regarded as the free energies per unit volume, with the density perturbed from ρ 0 by (N 1 − n 1 )/V 3 and (N 1 − N 1 )/V 3 , respectively. These perturbations originate from the fluctuating number of molecules in the atomistic region: when the number deviates fromN 1 , molecules enter into the coarse-grained region and the density is perturbed. In the thermodynamic limit,N 1 is comparable to n 1 and N 1 , and all of them are supposed to be much much smaller than N while V 3 is of the same order of magnitude of V (the large CG region compared to the small AT region). As a consequence the perturbations (N 1 − n 1 )/V 3 and (N 1 − N 1 )/V 3 are small, compared to ρ 0 = N/V and Taylor expanding the free energies (A6) and (A7) about ρ 0 yields (We use the Landau notation o(1) to collect all terms that are asymptotically negligible, bearing in mind that in the thermodynamic limit N 1 ∼N 1 ; see also Ref. 4). By using the conditions (29) and (30), we have T 1 Similarly, we calculate T 2 : with similar expansions and by the conditions Eq. (29) and (30), we have: Since the difference between the probabilities p(N 1 ) and p AT (N 1 ) is basically the difference between T 1 and T 2 renormalized by the partition functions of the distributions, by invoking Laplace's method again, we see p(N 1 ) and p AT (N 1 ) agree up to the second leading order with respect to the density perturbations (N 1 −n 1 )/V 3 and (N 1 −N 1 )/V 3 , in the limit V 3 → ∞.
Since we assume the system to be homogeneous, the visualization of this high-dimensional distribution is simplified as follows. We first fix the distance between molecule 1 and 2 (Mol 1 and 2), then plot the correlation function with respect to the position of the third molecule (Mol 3, see Fig. 9). The position of Mol 3 can be described by the variables h 1 and h 2 , which are the projection of Mol 3's position on the to the axis s 12 defined by Mol 1 and 2.
Appendix D: Estimate of the excess chemical potential by AdResS In this paper we have proven that the work of a thermodynamic force and the work of the thermostat in the transition region, which are the ingredients needed to provide density equilibrium across the system, correspond to the difference of the total (kinetic and excess) chemical potential between the atomistic and reservoir region. Thus in principle the calculation of the total work done by the thermodynamic force and that of the thermostat would allow for the determination of, at least, differences of total chemical potential, and, when the kinetic part is properly removed, of differences in the excess chemical potential. While the calculation of the work of the thermodynamic force, ω th implies simply the straightforward calculation of the integral of such a force over the transition region, the work of the thermostat , ω Q , for allowing the proper change of resolution of the molecules across the transition region is somewhat a much more delicate issue. In essence, the calculation of the chemical potential, would require the accurate determination of ω extra Q , which unfortunately cannot be done in a direct and efficient way. However, we have derived a procedure by which such a quantity can be calculate in an indirect way without additional computational efforts in a standard AdResS simulation, in the following part we illustrate such a procedure.

Auxiliary Hamiltonian Approach
We define a system characterized by an atomistic region, a transition region and a coarse-grained (reservoir) region, as in a standard AdResS. We couple the different resolutions of the system via a Hamiltonian where the total potential is given by the interpolation of atomistic and coarse-grained potential in the same fashion of the force interpolation of standard AdResS: Next we determine the thermodynamic force for this system which leads to the same equilibrium density as that of the standard AdResS. We indicate such as force as F H th , and the superscript H is used to distinguish it from the thermodynamic force calculated with the standard AdResS, which we have indicated as F th . We have now the following situation; in the approach based on the Hamiltonian, the changing of resolution in the transition region at equilibrium is provided by the work of F H th (here indicated as ω H th ) and by the work of the thermostat to thermalize the reintroduced/removed degrees of freedom ω dof . Instead, in the standard AdResS, we have the same equilibrium density of the Hamiltonian approach, but the change of resolution is provided by F th , ω dof and ω extra Q . Since we have the same equilibrium and ω dof is the same in both cases, it follows that: The limitation of this procedure would be the fact that ω extra Q (and thus the chemical potential) could not be calculated within a standard AdResS simulation and one would require an auxiliary simulation, although the calculation is anyway ∆ F th < w ∇w ∆U> < w ∇w ∆U> H FIG. 10. The difference between the thermodynamic force of the potential interpolation-based scheme and force interpolation of the standard AdResS. The green solid line shows the difference of the thermodynamic forces, i.e. ∆F th = F H th − F th . The blue and red symbols indicate the quantity w∇w(U AT − U CG ) calculated in the two different approaches (blue corresponds to the standard AdResS while red to the Hamiltonian-based approach). The equivalence between the two approaches is suggested by the fact that ∆F th corresponds to w∇w(U AT − U CG ) , and that this latter is actually the same independently from the approach used to calculate it. not computationally demanding. However, based on an extended number of numerical tests, we could develop this idea further and calculate it by the standard AdResS simulation. To do so we have numerically studied the quantity ∆F th = F H th − F th for an extended number of systems, and proved that this is equal to w∇w(U AT − U CG ) (in Fig.10, is reported the case of liquid water). This quantity can be easily calculated without additional costs in a standard AdResS simulation, so ω extra Q is calculated by The quantity − w∇w(U AT − U CG ) is the average of the additional force (with respect to the standard AdResS) one would have if the intermolecular force is calculated from Eq.D1 as it happens in the Hamiltonian approach. This provided some physical arguments for the equality with ∆F th (besides the numerical evidence employed by us in first instance). In fact, if one writes down the average total force acting in the Hamiltonian approach and that in the standard AdResS, the difference between them is: F th −F H th + w∇w(U AT −U CG ) . Since ∆F th = w∇w(U AT −U CG ) , this implies that the total average force acting in the two approaches is the same, which further implies that the two approaches are essentially equivalent from a numerical point of view as one would expect since they have the same equilibrium. However, this procedure is based mainly on practical considerations and numerical experiments, thus at this stage it must be considered by the reader as a preliminary procedure although the essential ingredients seems to be already well defined. Instead, an elegant procedure for adaptive simulation based on the interpolation of potentials and rigorous thermodynamics considerations has been recently presented by Potestio et al. 26 . Finally, as already stated before in this work, the approach based on the Hamiltonian shall not be considered a proper Hamiltonian approach to adaptive resolution, but only an auxiliary tool to determine in simple way a specific quantity of interest. In fact, if this is considered a proper Hamiltonian approach, one would have the physical inconsistency underlined in Ref. 12 , that is the inevitable fact that the total potential (i.e. interpolated potential plus the potential corresponding to the thermodynamic force) in either the coarse-grained region or the atomistic region is not that of the original system (as it should be in a proper adaptive Hamiltonian procedure). However, in our view, the most important limitation is that, as shown in the current work, in this case the numerical accuracy can be assured only at the first order and there would not be a systematic way to improve it. This is the difference with the force interpolation-based method proposed here, where corrective forces in the transition region impose numerical correctness of the probability distribution of the system at higher orders. At this point, according to the procedure presented above, we have all the ingredients to calculate the excess chemical potential using AdResS and compare it with the same quantity calculated with the IPM, this is done in the next section.

Calculation of µexc and numerical experiments
For the difference of the total chemical potential between coarse-grained and atomistic region we have: The excess chemical potential is defined by removing the kinetic contribution from the total chemical potential, i.e. µ exc = µ − µ kin , (µ kin is the kinetic part of the chemical potential, which can be calculated analytically). It follows that µ AT exc − µ CG exc = F th dx + ω extra Q + ω dof − ∆µ kin = [F th + w∇w(U AT − U CG ) ]dx + ω dof − ∆µ kin . Next, by knowing µ CG exc , which can be calculated a rather low cost with IPM, one obtains µ AT exc , whose calculation is instead very expensive with IPM.
We have carried out several numerical tests interfacing spherical systems with different representation; we have found always a very satisfactory agreement between the chemical potential calculated with AdResS and that calculated with IPM. More importantly, we have applied this idea to liquid water and found a value of −22.8 kJ/mol with AdresS and a value of −24.6 kJ/mol with IPM (still not fully converged after about 400 ns), which must be compared with the value from literature of −23.5 kJ/mol 11 . The difference between the value from our approach and that from IPM is about 8%, which is not significant, but the computational costs of the IMP is much larger than that based on the use of AdResS.