Vector-borne epidemics driven by human mobility

Vector-borne epidemics are the result of the combination of different factors such as the crossed contagions between humans and vectors, their demographic distribution and human mobility among others. The current availability of information about the former ingredients demands their incorporation to current mathematical models for vector-borne disease transmission. Here, relying on metapopulation dynamics, we propose a framework whose results are in fair agreement with those obtained from mechanistic simulations. This framework allows us to derive an expression of the epidemic threshold capturing with high accuracy the conditions leading to the onset of epidemics. Driven by these insights, we obtain a prevalence indicator to rank the patches according to the risk of being affected by a vector-borne disease. We illustrate the utility of this epidemic risk indicator by reproducing the spatial distribution Dengue cases reported in the city of Santiago de Cali (Colombia) from 2015 to 2016.


Introduction
The explosive dissemination of Zika virus across the Americas has been one of the major concerns of public health organizations across the world in the recent years 1 . Zika's global threat is, unfortunately, the last example of the extremely rapid dissemination of mosquito-borne flaviviruses over the past two decades. From Dengue to Zika, through West Nile and Chikungunya viruses, more than one billion people are infected and more than one million people die from vector-borne diseases (VBD) every year 2 . According to the World Health Organization (WHO), VBD are responsible of one sixth of the illness worldwide and more than half of human population live in risk areas for these diseases 3 . Moreover, the threat of new emergent VBD in tropical and ecuatorial regions progressively span across more temperate areas as a byproduct of climate change. As temperature rises, the areas that are conducive to mosquitoes expand, meaning more opportunities for VBD to spread 4 . For the case of Malaria, an increase of global temperature of 2-3 degrees would rise the population at risk by several hundred million people while, in the case of Dengue, the population at risk would pass from 1-2 billion people in 1990 to 5-6 billion in 2085 [5][6][7] .
The lack of efficient vaccines constitutes another significant bump on the road of facing flaviviruses epidemics. Despite the efforts for finding effective immunization means to slow down the advance of Aedes-borne viruses, the most common way for preventing outbreaks is the use of pesticides, larvivorous fishes or Wolbachia bacteria, all of them directly acting over the vector 8 . The use of geolocalized control strategies, however, seems not to be an effective prevention strategy when facing the threat of a global-scale pandemic. On the contrary, the fast transcontinental movement of VBD demands a coordinated action of the involved actors for the efficient use of local control means. This implies taking into account that those populations at risk are not isolated and, as for human-human transmission diseases 9 , human mobility plays a key role in the spread of VBD across different populations 10 . In this sense, over the last decades, the increasing use and efficiency of transportation means have led to an explosion of human mobility, including urban, regional and long-range displacements [11][12][13][14][15] . Despite the social and economical benefits, the expansion of human mobility networks produced, as a byproduct, the speed up of epidemic waves and the emergence of correlated outbreaks in far away regions. For this reason, incorporating human mobility into disease transmission models has become a must when proposing mathematical frameworks aimed at capturing the contagions patterns observed in actual epidemic scenarios.
Metapopulation models, originally proposed in the field of ecology [16][17][18] , enable the mixing of mobility and contagion dynamics into a single formulation. These models can be described as networks in which nodes account for geographic locations (such as neighborhoods in cities, cities within countries, etc), i.e. subpopulations where large collectivities live and interact. In addition, the links of the network represent (and quantify) trips made by individuals between different subpopulations.
From the first studies making use of metapopulations for the study of infectious diseases transmission 19,20 , the field has advanced both in its theoretical grounds [21][22][23][24][25][26] and its use for large-scale agent-based simulations [27][28][29] . The latter approach incorporates many realistic aspects of human interactions with the goal of being useful for making epidemic forecast and the design of efficient prevention policies. On the other hand, the theoretical part has been spurred by the increasing spatio-temporal resolution of current data gathering techniques and many efforts have been devoted to bridge the gap between theory and realistic models during the last years 30 . First attempts in this direction involved displacement kernels 31,32 to model a local range of movement around an area that, in the last years, lead the way to the inclusion of more sophisticated mobility patterns, such as the commuting nature of human mobility [33][34][35][36] , the high order memory of human displacements 37 , or the coexistence of different transportation behaviors 38 . These sophisticated theories allow to capture the temporal and geographical spread of diseases while providing insights about the mechanisms driving the observed patterns.
In the case of VBD the use of metapopulation models has been recently fostered due to recent outbreaks such as Zika and Chikungunya. This way, epidemic models for VBD transmission have abandoned mean-field and well mixed hypothesis to consider patchy environments subjected to human flows. On the theoretical side, and pretty much as for metapopulations models of human-human transmission diseases, the frameworks rely on important assumptions that allow its analytical study. One of these assumptions is to consider the random diffusion of humans across patches [39][40][41] or displacement kernels 42 instead of realistic mobility patterns. On the other hand, when actual ingredients of human mobility, such as its recurrent nature, are taken into account both random diffusion 43,44 and displacement kernels models 45 fail in providing insights about the role that real mobility networks play on the transmission of VBD. Thus, metapopulation theories are still far from incorporating the many aspects influencing the onset of VBD outbreaks and lack the predictive power provided by data-driven agent-based simulations 46,47 .
The main goal of this work is to provide a benchmark that allows the study of large-scale vector-borne epidemics in a unified way and, more importantly, enabling the test of coordinated control strategies at the light of available data about vector incidence together with human demography and mobility datasets. To this aim, we first elaborate a metapopulation model for the transmission of VBD that allows us to derive the conditions under which epidemics take place. The analytical expression of the epidemic threshold is revealed by a matrix encoding the probability of crossed infections between humans and vectors of different subpopulations. Importantly, the spectral analysis of this matrix reveals the risk associated to each patch, pointing out those subpopulations triggering the epidemic onset. We confirm these results by testing synthetic metapopulations and a real case, the city of Cali (Colombia). To round off, taking advantage of the theoretical insights and analytical expressions provided by the formalism, we propose a metric capturing the risk associated to each patch. We implement this metric in the city of Cali obtaining a very good agreement between the estimated risk and the actual distribution of Dengue incidence across districts, highlighting the important role of recurrent human mobility patterns for explaining the spatial dissemination of VBD.

Metapopulation model for VBD transmission
In the following, we will focus on the description of a vector-borne contagion dynamics in a complex metapopulation. To this aim, we consider a set of N populations or patches in which contagion processes occur. In particular, we consider that the dynamic inside each patch is governed by the Ross-Macdonald (RM) model, as it captures the essential ingredients involved in VBD that do not confer immunity to reinfections such as Dengue. Other models including different disease-specific contagion patterns, such as human-human contagions in Zika 48,49 , can be easily accommodated in the introduced framework. The relevant variables of the RM dynamics are: (i) the fraction of infected humans at time t, ρ H (t), and (ii) the fraction of vectors infected at time t, ρ M (t). The evolution of these two variables is given as a product of the elementary processes described in Fig. S1a. Namely, susceptible humans become infected with probability λ MH after being bitten by an infected vector whereas healthy vectors become infectious with probability λ MH when interacting with an infected human. In addition, we assume that each vector makes a number of β contacts with (healthy or infected) humans. This way, no human-human or vector-vector direct infections are allowed. Finally, infected humans become susceptible with probability µ H , while (healthy or infected) vectors die with probability µ M , being replaced by newborn healthy ones.
Although the RM dynamics captures the elementary contagion processes taking place inside each population, the dynamical evolution of each patch depends strongly on the others, since they are not isolated. On the contrary, many individuals with residence in one subpopulation may visit others during, for instance, their daily commutes to other geographical locations. On the other hand, the mobility of vectors is rather limited, hence they are assumed not to move from their original population. This assumption is valid for many VBD such as Dengue, Zika or Chikungunya since their carriers, Aedes mosquitoes, typically fly an average of 400 meters 50 . Thus, it is the mobility of infectious individuals (who pass the diseases to healthy vectors living in distant subpopulations) what triggers the propagation of local disease outbreaks across the whole system. To characterize the mobility of individuals we denote each population as a node of a graph. Each node i has a population of n i individuals and m i vectors and, importantly, they may be different from one population to the other, as they are derived from the demographic (i) the probability that an infected vector transmits the disease to a healthy individual, λ MH , (ii) the probability that an infected human transmits the disease to a healthy vector, λ HM , (iii) the feeding rate of vectors β , (iv) the probability that an infected human recovers, µ H , and (v) the mortality rate of vectors, µ H . In (b) and (c) we show schematically the two basic stages of each Monte Carlo step in our metapopulation approach. As shown in (b), individuals are associated to one of the 3 nodes of a network. Namely, starting from the subpopulation in the top (1) and following clockwise we have populations composed of 8, 10 and 4 humans with 5, 3 and 9 vectors respectively. Once the movement has been done [see panel (c)] individuals mix and, consequently, the instant populations of humans at each node change to 6, 8 and 8 . The RM dynamics then takes place among the individuals and vectors coexisting at that moment in the same subpopulation. Finally, after these interactions, the individuals go back to their respective associated nodes and the configuration is again the one of panel (b). Note that we consider that vectors do not move from their corresponding node. partition of the population and the observed vector prevalence in each patch. It is important to recall that each individual is associated to one subpopulation, say i, considered as her residence. This way, the population n i of the patch i is the number of individuals whose residence is node i. In its turn, nodes are connected in pairs forming a complex weighted and directed network encoded in an adjacency matrix R, whose entries R i j account for the the probability that a trip departing from patch i has as destination population j (see Figure S1b). Matrix R can be computed from the observed number of trips between each pair of nodes (i, j), W i j , as: Thus, matrix R encodes the information provided by mobility datasets . The former two dynamical processes at work -RM dynamics and human movements-interplay in each time step of the metapopulation dynamics as follows. We start with a small fraction of infected humans and/or vectors. The initial quantity, being small, can be homogeneously distributed across the populations or localized in one or few nodes in case of being interested in determining those patches boosting epidemic spreading in its early stage. Once the initial infectious seed has been placed, the simulation steps per unit time are the following: • At each time step, t, healthy agents decide to move from their residence with probability p or remaining in it with probability (1 − p). Moreover, as symptoms associated to some VBD are severe, we also include the possibility of re-scaling the infected agents' mobility to α p with α ∈ [0, 1].
• If an agent leaves her residence, say i, she goes to a different subpopulation chosen among those connected to i. The choice is dictated by matrix R in Eq. (1), being R i j the probability of moving from i to subpopulation j.
• Once all the individuals have been placed in the patches (see Fig. S1.c), humans and vectors that are currently at the same patch interact as dictated by the RM model. This way, both humans and vector update their dynamical states (Susceptible or Infected as shown in Fig. S1.a) as a result of the contagion and recovery processes.
• Once the epidemic state of the agents have been updated, each individual moves back to her residence and the process starts again for time t + 1.

Markovian Formulation
Once defined the basic steps of the mechanistic simulations, we now tackle the mathematical formulation of the processes described above. To this aim, for each patch i (i = 1, ..., N) we have 2 variables: the probabilities that humans with residence in i, ρ H i (t), and vectors associated to i, ρ M i (t), are infectious at time t respectively. These 2N variables evolve according to the following Markovian equations: where I H i (t) and I M i (t) account for the probability that a healthy human with residence in subpopulation i and a healthy vector associated to i are infected at time t respectively. The former infection probability reads: where P H i (t) is the probability that an agent placed in population i at time t is infected. This probability can be written as: Finally, n e f f i (ρ h (t), α, p), which is the number of humans placed in (but not necessarily resident in) population i can be expressed as: In the same fashion, the expression for I M i (t) in Eq. (3) reads: where i e f f i (t) is the number of infected humans placed in population i at time t: The above equations describe the time evolution for the VBD incidence, ρ H (t) = {ρ H i (t)} and ρ M (t) = {ρ M i (t)}, in a collection of connected patches with arbitrary demographic, n, and vector, m, distribution. They enable, by integrating the equations starting from a given initial condition ρ H (0) and ρ M (0), to monitor the geographical spread of VBD and to analyze the stationary incidence of the diseases across patches, while the global incidence of the disease can be calculated as the fraction of infected humans in the entire system ρ H =

Estimation of the epidemic threshold
The validation of Eqs. (2)-(3) offers the possibility of saving computational costs by integrating the 2 × N Markovian equations instead of performing lengthy agent-based simulations. However, the advance behind Eqs. (2)-(3) is that they also allow for deriving metrics and analytical results about the dynamical behavior of VBD in large metapopulations. A relevant quantity that can be analyzed is the epidemic threshold, i.e., those conditions that turns the epidemic state into a stable solution of Eqs. (2)-(3). To address it, we suppose that the infection probabilities of humans and vectors in the stationary state are very small: This assumption allows us to linearize Eqs. (2)-(3) and, after some algebra (see SI), the stationary solutions for the infection probabilities of humans, ε H read: where the entries of matrices M and M (see SI for their derivation) take the following form: whereñ e f f i is defined asñ e f f i = n e f f i (0, α, p). Note that the form of the elements of these two matrices depends on both the mobility properties (p, R) and the demographic distribution of both agents and vectors ( n, m).
From Eq, (9) it is clear that nontrivial solutions for ε H correspond to the eigenvectors of matrix M M. Specifically, given a metapopulation defined by n, m, R and p, the stationary solutions with infinitesimal incidence correspond to eigenvectors of M M whose eigenvalues fulfil:

5/27
Under these conditions, the maximum eigenvalue Λ max (M M) encodes the combination of the RM parameters that corresponds to the epidemic threshold, namely: The former equation reveals the minimum infectivities, either λ HM or λ MH , that trigger the epidemic outbreak. To derive a simple critical infectivity one can set λ MH = δ λ HM , so that: To test the validity of Eq. (14) we have carried out extensive numerical simulations in synthetic metapopulations (see the SI for a detailed description) considering λ MH = λ HM = λ . The top panels in Fig. 2 show the epidemic diagrams ρ H (p, λ ) by computing the fractions of humans infected, ρ H , as a function of λ and p. From these diagrams it becomes clear that, for each value of p, there exist a critical value λ c so that for λ > λ c the epidemic phase appears. The border of this region (solid curves in Fig. 2.a-c)) is the function λ c (p) calculated with Eq. (14), showing an excellent agreement with the results from numerical simulations.

Abrupt transitions of leading pacthes
Apart from the agreement between Eq. (14) and the numerical simulations, the evolution of the epidemic threshold, λ c (p), reported in the three upper panels points out a non-trivial dependence with the degree of human mobility. Contrary to what naively expected, human mobility can be detrimental to epidemics, as clearly illustrated in the panels for α = 0.5 and α = 1.0. This counterintuitive effect of mobility was already found for SIR and SIS diseases in networked metapopulations 36 as a result of the redistribution of the effective populations across patches due to mobility. In the case of VBD, this process corresponds to an homogeneization of the effective ratios between vectors and humans so that a high risk patch with large γ i = m i /n i tends to decrease its effective value due to the increase of n e f f i , Eq.(6), caused by the mobility. A more striking phenomenon reported in the epidemic diagrams of Fig. 2 is revealed by the sharp variations in the slope of the curves λ c (p). These abrupt changes are the product of collisions between the two maximum eigenvalues of matrix M M as p varies. This way, the two maximum eigenvalues interchange their order at some critical mobility value p c . These collisions do not have an strong impact in the epidemic threshold since the function λ c (p) is continuous. However, they are the fingerprint of a sudden change in the form of the eigenvector corresponding to the maximum eigenvalue, v max , of matrix M M. This abrupt transition is of utmost importance since the components of v max encode the most important patches driving the unfolding of the epidemics.
Let us recall that matrix M M incorporates the demographic information, n, the mobility patterns, R and the vector distribution, m, having as unique parameter the degree of mobility p. Thus, for each value of p the spectral analysis of M M gives us the epidemic threshold λ c and the distribution of patches triggering the epidemic onset in the components of v max . The evolution of the components of v max as a function of p is shown in the bottom panels, (d-f), of Fig. 2. From these plots it becomes clear that the discontinuities of the slope of λ c (p) correspond to abrupt changes in the form of v max . Namely, in the three cases patch number 26 is the one causing the epidemic onset for p = 0 and p 1. This is obvious since patch 26 is the one with largest ratio γ i = m i /n i in the synthetic metapopulation. However, as p increases, the leading patch changes, being replaced by patch 18 in the case of α = 0 (d) and α = 0.5 (e) while for α = 1 (f) the leading patch is replaced by a collection of them. Remarkably, the case α = 0 shows a second abrupt transition at p c 0.92. These abrupt changes point out that containment strategies targeting a certain neighborhood can change sharply from efficient to useless due to changes in human mobility, as confirmed in the SI.
Assessing the epidemic risk of patches with M M Spurred by the ability of the Markovian formalism to capture the dynamics of VBD and the insights provided by the spectral properties of matrix M M for identifying the areas triggering the onset of epidemics, we move one step further and evaluate the epidemic risk associated to each patch. To this aim, we propose a theory-driven prevalence indicator which serves as a proxy to determine the most exposed areas to the spread of VBD.
For this purpose, let us analyze the elements of the matrices M and M, defined in Eqs. (10)- (11). From these equations we realize that the elements M i j ( M i j ) contain all the possible microscopic contagion processes from vectors (humans) associated to patch j to humans (vectors) associated to i. Therefore, it is possible to estimate the effective number of human-human contagions mediated by vectors that an individual from subpopulation i receives from those with residence in j. This quantity, denoted as C i j , can be obtained taking into account all the possible infection pathways connecting humans in patch j to those of 6/27 i as described in Eq. 15. Those infections may take place in three possible ways: (i) an infected individual from patch j visits patch i and infect a vector that will later pass the disease to a resident of patch i; (ii) a resident of patch j infects a vector and a healthy human traveling to j from i gets infected; (iii) both the infected individual from j and the healthy individual from i travel to a contiguous third patch k where the infection takes place mediated by a vector.
Finally, to make predictions about the impact of the disease on a geographical area, i, we must account for all the possible infections from each patch of the metapopulation and to weight the resulting number by the population of i. This way, the epidemic risk indicator for each patch i, in the following denoted as ER i , can be defined as: The evaluation of ER i , as defined above, can be done directly without the need of neither numerical simulations nor making the integration of the Markovian equations. In fact, once data about demography, vector distribution and mobility patterns are available, one can estimate the epidemic risk of each subpopulation. In the SI we show the validation of the risk indicator for the synthetic metapopulations by comparing the values ER i with the disease incidence as obtained from the mechanistic simulations. The fair agreement (coefficient of determination R 2 = 0.95) between our indicator and the results from numerical simulations allows us to rank the patches according to their exposure to VBD without the need of performing computationally expensive simulations or integrating an large set of Markovian equations.

A real metapopulation: Santiago de Cali (Colombia)
To validate further the Epidemic Risk measure we now move to a real metapopulation, the city of Santiago de Cali (Colombia). With a population of more than 2 millions of inhabitants, it offers the possibility of comparing our predictions in a scenario for which severe epidemic outbreaks of VBD are recurrently found. In particular, due to its location and climate, Cali is a Dengue endemic area in which records of the historical incidence of this disease are available for comparison. To this aim, we collected demographic and mobility datasets 10 whereas vectors abundance across districts was obtained from entomological reports made yearly by the local authorities 52 .
With this information at hand, and using Eq. (16), we assign the epidemic risk of each of the 22 districts in which the city is divided. These values are compared to the observed Dengue incidence across the 22 patches during the period 2015-2016 53 (details in the SI). In Fig. (3) we show this comparison by normalizing the values of both epidemic risk and Dengue incidence by their maximum observed value (in both cases that of district 13). In particular, we find a coefficient of determination of R 2 = 0.81, indicating that the proposed prevalence indicator is able to capture the spatial distribution of Dengue cases across the city. On more general grounds, this agreement points out that given the demography, the commuting patterns and the spatial distribution of vectors across a given population, one can use Eq. (16) to identify areas where containment measures should be promoted to reduce the impact of possible outbreaks. The strong dependence of R 2 on human mobility (see sensitivity analysis in the SI) points out the prominent role that human mobility plays in the dissemination of VBD and, therefore, its relevance for the design of efficient policies to prevent local outbreaks from spreading across populations.

Discussion
The control of infectious diseases represents one of the major societal challenges. Understanding the complex interdependency between human activity and contagion processes is key to explain the onset and development of large-scale epidemics. Here, focusing on VBD, we have integrated information from urban daily commutes and the geographic distribution of humans and vectors to estimate the epidemic risk associated to different connected regions. In particular, we have provided a metapopulation formalism to assess the role that the former ingredients play on the propagation of VBD. We have proved that this formalism constitutes a very reliable and time-saving platform, since its Markovian equations enable to reproduce very accurately not only the global incidence of VBD but also the spatio-temporal spreading patterns observed in Monte Carlo simulations.
Based on this agreement, we have derived an analytical expression of the epidemic threshold that captures the critical conditions which leads to the onset of epidemics. Apart from the detrimental effect that mobility may have on the spread of diseases, the study of the epidemic threshold has revealed interesting phenomena such the existence of abrupt changes in the way epidemics unfold. In particular, we have shown that the subset of patches leading the epidemic onset can suddenly change as human mobility varies. This phenomenon highlights the need of incorporating real human mobility patterns into the design of containment policies targeting specific geographical areas, for efficient policies can turn useless due to a small variation of human mobility habits.
Finally, relying on the matrix containing the information about the effective number of human-human contagions, we have derived an epidemic risk indicator that allows us to classify the patches according to their exposure to VBD. By computing this epidemic indicator, we have reproduced with great accuracy the geographical distribution of Dengue incidence in the city of Santiago de Cali (Colombia), where Dengue in an endemic disease, thus being able to identify the most vulnerable areas where prevention measures should be promoted.
In a nutshell, our results point out that the spread of VBD is the result of a delicate interplay between commuting flows, human census and vector distribution. This interplay is captured both in the analytical expression of the epidemic threshold and in the epidemic risk indicator. As a result of it, we have shown that small variations of the former ingredients, such as the degree of mobility, can lead to abrupt changes in the way epidemics unfold. Our framework, although containing several simplifying assumptions to allow the analytical treatment, has shown useful to integrate human and contagion dynamics and it can be readily implemented to identify those regions where immunization policies should be reinforced and to forecast the consequences of control strategies focused on mobility restrictions.

Mechanistic numerical simulations
With the aim of assessing the accuracy of the Markovian formulation we compared the predictions obtained using Eqs. 2-8 -for both the epidemic incidence and the spatio-temporal evolution of the disease-with numerical results from extensive mechanistic simulations.

8/27
In the simulations with synthetic networks we start by setting the human population of each node (n i ) to a constant value (n i = 10 3 ), while vectors population m i is set to be proportional to n i as (m i = γ i n i ) with γ i varying from one population to another and extracted from a uniform distribution within the range γ i ∈ [0.3, 1.7]. For the mobility network of the city of Cali instead, we set n i and m i according to census and mosquitos proxy data publicly available from the municipality of Cali (see the SI for details). All the populations are initially composed only by healthy individuals until a seed of the disease is introduced in the system. For the seed we consider two different options: one in which the 1% of the agents in each sub-population is initially infected and a second one where the seed is localized in a single sub-population.
After the seed is introduced, the reactive and diffusive processes take place with the same time-scale. At each time step, agents decide whether to move to a neighboring patch or to remain in their home patch. To do so, each agent generates an independent and identically distributed random variable (i.i.d.) r between [0,1]. If r is smaller than the moving probability p the agent will move otherwise, she will stay in her node. In addition, to model the possible impairment produced by the disease, we also assume that infected individuals could be less prone to move by rescaling their mobility to α p, with 0 ≤ α ≤ 1. In any case, if the agents moves, another i.i.d. r determines the destination patch. r is extracted between [0,∑ j W i j ], where W i j is the weight of the link between populations i and j. Then, the destination is chosen as the first node k that satisfies ∑ k W ik ≥ r , assuring that the destination is selected proportionally to the mobility flux between sub-populations i and k.
Once the agents reached their new location, an iteration of the contagion and recovery process starts. To model the infection dynamics of VBD inside each patch we rely on the Ross-Macdonald model on well mixed populations. Inside each population, each vector, regardless of its state, bites β randomly selected individuals. In the case that the vector is infected and the human is healthy the disease will spread with probability λ MH . In the opposite case, the vector gets the disease with probability λ HM . Finally, if the vector and the human are both healthy or both infected, nothing happens. Regarding the recovery phase, each infected human recovers with probability µ H while mosquitos do not recover but are replaced by healthy new individuals at rate µ M .
Simulations finish when the epidemic reaches a stationary state: i.e. or the disease dies out or the fluctuations in the total number of infected humans in the system ρ H (t) are lower than 10 −5 during the last 100 time steps. Results are then averaged over the different realizations and compared with the theoretical predictions given by the Markovian model.

Synthetic Metapopulations
Synthetic metapopulations used in Fig. 2 are composed of 50 patches that correspond to the nodes of a network constructed following the preferential attachment model of Barabási where N H and N M are the number of humans and vectors in the population respectively while I H (t) and I M (t) are the probabilities that a healthy human and a healthy vector get infected at time t, and they read as: The first term in the r.h.s. of Eq. (S1) ((S2)) describes the fraction of infected humans (vectors) at time t who remains infected at time step t + 1 while the second term accounts for the fraction of healthy humans (vectors) that gets infected at time t + 1.
Note also that Eq.(S3) includes an extra term with respect to Eq.(S4), that represents the probability for a human to receive a contact from each vector. We will demonstrate later that this element plays a crucial role on the unfolding of VBD.

13/27
Equations (S1) and (S2) can be reduced to the classical formulation of the RM model by taking into account that the terms λ MH ρ M (t) and λ HM ρ H (t) are pretty small so that, by using the approximated expression 1 − (1 − ε) x xε, the continuous-time versions of Eqs. S1 and S2 turn into: where γ is the ratio between the population of humans and vectors, γ = N M N H , that can be considered as another parameter of the model.
Equations (S5) and (S6) represent the usual way the RM model is presented. As shown, they correspond to an approximation that turns to be valid close to the epidemic threshold, i.e., when the fraction of infected vectors and humans are small but nonzero, while the formulation in Eqs. (S1-S4) is accurate in the entire epidemic diagram. The mathematical analysis of the model yields to an analytical estimation for the epidemic threshold as a function of the parameters λ MH , λ HM , µ H , µ M , β and γ. In particular, the first step to compute the threshold is to calculate the nullclines of the system of equations (S5) and (S6): where A = µ H β λ MH γ ≥ 0 and B = µ M β λ HM ≥ 0. The above system is trivially satisfied by the ρ H = ρ M = 0, the case in which no prevalence of the virus is found in the system. However, we are interested in the stability of the epidemic phase.
From Eq. (S7) it becomes clear that ρ M diverges as ρ H → 1 − whereas Eq.(S8) reveal that ρ M will continuously grow reaching ρ M = 1 asymptotically. In Fig. S2 we plot the two nullclines (S7) and (S8) with solid (blue) and dashed (red) lines respectively. From the plot, it can be easily seen that the two curves cross for ρ H > 0 only when the derivative of the dashed nullcline (S8) at ρ H is larger than the one for the solid nullcline (S7). This condition is satisfied whenever In this sense, the limit of this inequality determines the boundary between the epidemic and disease-free regimes. This boundary is given by the following equation:

S2 Description of the Cali dataset
One of the most important contribution of this work is the formulation of a new framework which can easily incorporate mobility data of real cities to address real epidemic scenarios. In the main text, we tackle the spread of VBD in the city of Cali (Colombia), whose geographical and meteorological features make it an endemic region for several VBD such as Dengue, Chikungunya or Zika. In particular, here we focus on the spread of Dengue.
To assess the effect of mobility on the spread of Dengue in Cali, it is necessary to reconstruct the mobility network of its inhabitants from data. For this purpose, we divide the city into 22 neighborhoods, which correspond to the official administrative divisions called comunas. Regarding demography, the population distribution across comunas has been extracted from census data that the municipality facilitates 10 . Mobility flows connecting comunas are extracted from urban commuting surveys 8 . As a result, more than 10 5 trajectories were recorded, which suppose a representative sample of Cali's commuting flows. Once data have been gathered, an origin-destination matrix, encoded in our formalism by matrix R, is computed as: where the numerator corresponds to the number of trips between patches i and j while the denominator counts all the reported trips departing from patch i. The result is a weighted directed network encoding the probability that an agent visits other neighborhoods different from its residence.

15/27
Apart from the mobility network, the distribution of vectors across the city also plays a crucial role on the outcome of the disease. The number of vectors inside a geographical region is strongly linked to environmental features such as altitude, temperature and humidity but also to human-dependent factors like health and economic conditions. To model vectors' distribution across patches, we use as a proxy the so-called recipient index 4 . This quantity encodes the probability of finding vector pupae in different recipients which have been previously distributed across the city. A high value of the index means a higher probability of finding vectors. For this reason, we assume that the ratio between the number of vectors and humans inside each patch in our model is directly proportional to its recipient index, which is extracted from a report published in 2014 4 .

S3 Validation of the Markovian equations
Here we aim at assessing the ability of our model to predict the impact of VBD in both synthetic and real world metapopulations.
To do so, we compare theoretical predictions from our model with results from extensive mechanistic simulations.  Figure S3 reveals the great agreement between theory and simulations for both the cases in which infected agents mobility is totally restrained, i.e. α = 0 (Fig. S3a), and when there is no influence of the disease on agents' mobility, α = 1 (Fig. S3b).
We now move to a real case, the city of Cali, whose metapopulation architecture is described in Sec. S2. Following the same procedure, we can notice in Fig. S3c-d that our formalism still captures very accurately the epidemic size, despite the higher complexity of Cali's mobility network. Apart from the agreement between theory and simulations, interesting physical phenomena arise from accounting for the mobility constraints associated with contracting severe VBD. In particular, we notice that the qualitative behaviour of the epidemic threshold with human mobility can drastically change when modifying the restriction parameter. These phenomena will be analyzed in more detail in the next section where we compute the epidemic threshold as a function of (p, α).
Finally, we also want our theory to reproduce the spatio-temporal unfolding patterns of VBD across both synthetic and real networks. To check this, we start by setting a seed localized in a single patch, and then we monitor the temporal evolution of the population affected by the disease inside each area. Figure S4

S4 Linearization of the Markovian Equations
Here we derive an analytical estimation of the epidemic threshold for the Ross-Macdonald model with metapopulations. In the main manuscript we have defined this quantity as the minimum infection rate from vectors to humans which leads to an epidemic regime in the stationary state. We start considering the steady state of the Markovian equations. Assuming Eqs. (2,3) of the main text turn into:

18/27
and the probabilities I H and I M that a human or a vector associated with node i becomes infected are respectively: (S14) (S15) Close to the boundary between the epidemic and the disease-free phases, these steady values are small but not zero, which mathematically is reflected by considering The latter allows us to linearize the equations by neglecting non-linear terms in ε. Thus, Eqs.(S14,S15) turn into: . (S17) We now introduce the former expressions into Eqs. (S12,S13) and keep only linear terms in ε, yielding: Equation S20 makes evident the bipartite nature of the processes involved in the spread of VBD with matrices M and M responsible for vector-human and human-vector infections, respectively. Thus, if we want to quantify indirect infections between humans mediated by vectors and vice versa we should iterate Eq. S20 obtaining:

19/27
From Eq. S21 it becomes clear that calculating the epidemic threshold involves solving an eigenvalue problem. In order to obtain a closed expression for the epidemic threshold and without loss of generality, we assume λ MH = δ λ HM . Finally, as we are interested in the minimum vale of λ HM for which the former equation holds, the epidemic threshold can estimated as: where Λ max is the spectral radius of matrix M M.

S5 Heuristic approximation of the epidemic threshold
In the main text, we demonstrate that Eq. S22 accurately predicts the epidemic threshold. However, despite the accuracy of this estimation, its calculation involves computing the spectral radius of an arbitrarily large matrix which, for very extensive systems, can be a hard computational task. Here we derive a heuristic indicator that solves this problem and clarifies the intuition behind the different behaviors emerging as a result of human mobility.
Let us first discuss the physical meaning of Eqs. S16 and S17. Eq. S16 accounts for all the possible contagions processes affecting an agent with residence in i. Specifically, one agent associated with node i can become infected in two possible ways: staying at node i and getting bitten by a vector or moving to another sub-population and getting infected there. On the other hand, Eq. S17 counts the infectious interactions of a vector from i with agents that live there or belong to other patches and have decided to visit i. Thus, the number of indirect contagion processes C 2 i received by a susceptible individual associated to a patch i from infected agents can be quantified by composing these cross contagion rates across all the patches of the metapopulation, i.e.: Therefore, the larger this quantity for patch i is, the higher the probability that an agent from i contracts the disease. Following this line, we can thus approximate the epidemic threshold as the minimum value of λ MH needed to assure that an agent from the most exposed patch is infected at least once. This value, here defined as r 2 , is given by: . To explain this discordance, we must take into account that, as infected individuals mobility is restrained, contagions between agents from different residences require more indirect contacts to happen. A very illustrative example is to consider two patches, say A and B, that are not mutually connected but both have connections to a third one, say C. If α = 1 an infected agent from A can pass the disease to a healthy individual from B in two steps: 1. The infected agent moves to C and infects a vector there.
2. The infected vector bites a visitor from B, who becomes infected.
However, for α = 0, this cannot happen since infected individuals are not allowed to move. In fact, now the shortest contagion path contains 4 steps: 1. The infected agent stays at A and a vector contracts the disease there.

A susceptible agent from C moves to A and becomes infected after being bitten.
3. This infected agent stays at C and transmits the disease to a vector there.
4. Finally, the infected vector at C passes the disease to the susceptible coming from B.
The latter example makes evident the necessity of considering contagion processes beyond two time steps to estimate the epidemic threshold. Following the same procedure as in the 2-step case, the number of indirect contagion processes that an agent from i receives in 4 steps can be calculated by: Therefore, the minimum λ MH value to hold the epidemic, denoted as r 4 , is given by: Finally, Fig. S5 demonstrates that r 4 (white dashed line) provides a better estimation for the threshold than r 2 , especially for hihg mobility values. As anticipated, this is mainly caused by the large number of contagious paths which are not captured by

S6 Consequences of the abrupt changes in the patches driving VBD
The study of the critical properties of VBD in the main manuscript has revealed the existence of some mobility values, denoted as p c , for which the components of the leading eigenvector change abruptly. Translated into epidemiological words, this striking phenomenon reflects the change in the most affected patches, which is of great relevance since targeted policies in specific areas can pass from useful to useless as human mobility varies. To prove it, we now study the effects of applying prevention measures in specific locations selected according to the largest components of the leading eigenvector of the critical matrix MM. We consider the synthetic network used in the main text and set α = 0.5, for which the change in the leading patch happens at p = p c = 0.38 (see Fig.2 in main manuscript). Finally, we target two different sub-populations for the immunization: patch 26 that is the leading patch for p < p c , and patch 18 that sustains the outbreak for p > p c . Figure S6 confirms that the effectiveness of targeted policies against VBD is strongly influenced by aspects concerning human mobility. Thus, for example, immunizing agents from patch 18 leads to the extinction of the disease for p > p c , whereas it is almost ineffective for mobility values below this threshold. This result, along with the others presented in the main text, highlights the importance of promoting containment policies not only based on demography or entomological information but also taking into account the complex interplay between human movement, census data and vector abundance.

S7 Epidemic risk indicators: Synthetic and real metapopulations
In the previous section, we proposed two heuristic indicators -C 2 i and C 4 i -able to capture, not only the dependence of the epidemic threshold on the mobility, but also changes in the patches driving the onset of epidemics. Here we go one step further and provide a prevalence indicator able to reproduce the spatial distribution of the disease across sub-populations. Our aim is to classify the different geographical areas according to the risk associated to VBD. To do so, let us recall that C 2 i encodes all the possible indirect infectious contacts that an agent from i receives during an outbreak. Therefore, to obtain an estimate of the disease incidence in each patch, we must also account for the population of each geographical area. This way, we can define an epidemic risk indicator ER i for each patch i as: At this point, it is important to notice that Eq. S27 does not depend on the epidemiological traits of the disease but only on human census, mobility and mosquitos abundance. In fact, as we assume all the epidemiological parameters to be constant across all the areas, dealing with one or another disease only leads to a rescaling of the epidemic risk of all the regions without producing any change in the ranking of affected populations.

Synthetic metapopulations
We start testing the goodness of the predictions obtained via Eq.S27 in a controlled environment using a synthetic metapopulation.
We compare its value for each patch with the incidence of the disease from lenghty mechanistic simulations on the synthetic network used in the main text. We then fix the epidemiological parameters (namely: the cross-contagion rates λ MH , λ HM , the vectors' biting rate β and the recovery rates µ H and µ M ) as well as human mobility p and α while human population inside each patch is constant (n i = 10 3 ) and vectors abundance is randomly assigned in the range m i ∈ [300, 1700]. Figure S7 demonstrates the accuracy of our formalism in reproducing the unfolding of an outbreak in this controlled scenario, leading to a coefficient of determination R 2 0.95.

Real metapopulations
We now check the applicability of our indicator to real scenarios. In particular, we focus on the spread of Dengue in the city of Cali in Colombia. Dengue constitutes an important threat for Cali inhabitants so local authorities monitor the situation and publish yearly reports 5,6 with the spatial evolution of the disease across comunas and surveys about commuting inside the city 24/27 Figure S8. Steps needed to compute the epidemic risk indicator for each geographical area i. First, it is necessary to construct a metapopulation network from the demographical distribution, the vectors abundance and the observed origin destination trips. Second, by fixing the value of the mobility p as well as the restriction to the mobility of infected individuals α, we can estimate the critical matrices M and M. Finally, we compute the epidemic risk indicators ER I using Eq. S27 are openly available 8 .
From the raw data only few steps are needed to compute the epidemic risk (see Fig S8). First, we should build the metapopulation systems. The inputs required are the demographical distribution n, the vectors' abundance m and the recurrent mobility patterns encoded in the matrix R. Demographical distributions can be easily extracted from census published by local authorities -in the case of Cali we extracted them from reports issued by the Municipality 10 -while vectors' abundance can be inferred from entomological indexes associated to each area such as the recipient index or the pupae index. Finally, different methods have been proposed to estimate the origin-destination matrix R, which range from theoretical models 7 to the realization of surveys 8 or the use of mobile phone call records 9 .
With the structure and mobility of the metapopulation system fixed, we can compute the entries of matrices M and M, which depend also on the parameters related to human mobility α, p. To estimate α we recall that it represents the restriction in human mobility of infected individuals due to the effects of the disease. As symptoms associated with Dengue are usually severe, we can assume that all the symptomatic patients will not move, so α can be estimated as the percentage of asymptomatic infections that are ∼ 75% of the infected population in case of Dengue disease. In its turn, the mobility p can be estimated as the fraction of individuals departing from their residence every day. Once defined the mobility, we can compute M and M by using Eqs. S16 and S17 respectively. Finally, the epidemic risk is easily obtained by including these matrices in Eq. S27.
The results of the comparison have been shown in Fig. 3  incomplete information, and its applicability to real world scenarios with the need of only few, and usually available, data.

S8 Range of validity
In the previous section, we have shown the accuracy of ER i while considering both real and synthetic metapopulations. However, as mentioned above, its value depends on features related to human movements such as the degree of mobility p or the disease induced constraint of the mobility α. In this section we aim to test the robustness of ER i under variations of these parameters and, therefore, to determine its range of validity.
To do so, we focus on the case of Cali and analyze the agreement between theory and the epidemiological data (quantified by R 2 ) as a function of both p and α.