Inferring Spatial Source of Disease Outbreaks using Maximum Entropy

Mathematical modeling of disease outbreaks can infer the future trajectory of an epidemic, which can inform policy decisions. Another task is inferring the origin of a disease, which is relatively difficult with current mathematical models. Such frameworks -- across varying levels of complexity -- are typically sensitive to input data on epidemic parameters, case-counts and mortality rates, which are generally noisy and incomplete. To alleviate these limitations, we propose a maximum entropy framework that fits epidemiological models, provides a calibrated infection origin probabilities, and is robust to noise due to a prior belief model. Maximum entropy is agnostic to the parameters or model structure used and allows for flexible use when faced with sparse data conditions and incomplete knowledge in the dynamical phase of disease-spread, providing for more reliable modeling at early stages of outbreaks. We evaluate the performance of our model by predicting future disease trajectories in synthetic graph networks and the real mobility network of New York state. In addition, unlike existing approaches, we demonstrate that the method can be used to infer the origin of the outbreak with accurate confidence. Indeed, despite the prevalent belief on the feasibility of contact-tracing being limited to the initial stages of an outbreak, we report the possibility of reconstructing early disease dynamics, including the epidemic seed, at advanced stages.


I. INTRODUCTION
The spread of SARS-CoV-2 virus constitutes the most recent example of the vulnerability of modern society to the spread of communicable diseases [1][2][3]. In particular, the combination of features such as extensive transand intra-national transportation networks, shortening travel-time between faraway regions [4][5][6], the existence of important socioeconomic inequities [7][8][9] and the phenomenon of rapid urbanization [10,11] have conspired to give rise to the unprecedented speed at which SARS-CoV-2 has advanced, becoming a global threat within a few months of the (reported) initial outbreak.
The risk of significant harm to society from an epidemic is increased when there is an initial lack of knowledge about the epidemiological features of a novel pathogen, limiting the use effective of medical treatments or vaccines to slow down progression at the early stages of the outbreak. Indeed, early attempts at mitigation resorted to non-pharmaceutical interventions such as recommending hand-washing, hygienic measures, social distancing, travel restrictions, and population confinement via stay-at-home orders [12][13][14]. A key tool for devising and assessing the effectiveness of such measures is mathematical modeling of the epidemic trajectories under various scenarios. The advantage of such models are twofold: on the one hand, epidemic models provide shortterm forecasts on the evolution of an outbreak, providing useful information to assess the potential harmfulness of the pathogen and act accordingly to reduce their impact. * andrew.white@rochester.edu On the other hand, the different layers of complexity introduced in the epidemic models has boosted their use as benchmarks to devise cost-effective non-pharmaceutical interventions aimed at hindering the spread of the disease [15,16].
Regardless of their stochastic or deterministic nature [17][18][19], the successful application of epidemic models to provide reliable forecasts is tightly linked with the correct estimation of their relevant parameters. Early on in an epidemic, the key parameters describing the spread of the infection are highly uncertain and this uncertainty can severely impact the predicted outcomes [20]. This becomes particularly relevant in the context of highly complex compartmental models that produce wildly-varying degenerate trajectories in the short-term dynamics, even for small changes in the parameter-estimates [21,22]. While, this degeneracy dissipates in the long-term dynamics due to exponential growth encoded in the equations, even minor inaccuracies in the epidemic parameters limits reliable predictions to at most a few weeks in the future [23]. Given this, the practical efficacy of epidemiological models is in providing a range of possible outcomes, rather than producing precise quantitative predictions [24].
Multiple ways to infer epidemiological parameters have been proposed in the literature. One typical method is to use maximum likelihood approaches, where parameter values are chosen to maximize the likelihood of observing the experimentally-measured data (observations), given some prior distribution on the parameters [25,26]. A disadvantage of this method is that the functional form of the likelihood function must be known or approximated to perform maximization. Another approach is least-squares fitting, which employs various optimization Figure 1. High-level model overview. a) Model inputs: an SEAIR compartmental epidemiological model, prior belief of the epidemiological parameters, and a set of sparse observations that come from disease screening tests. The contact network in a metapopulation can be represented as a network graph. The infection starts at an unknown origin and spreads through the network. We generate a large set of trajectories and explore the epidemic trajectory space over a high variance prior belief for the epidemiology parameters. The large variance is represented as the shaded areas with 80% confidence intervals. The infections starts in a single node in each trajectory series but that node varies over the next trajectories. b) Model outputs: MaxEnt re-weighted ensemble of trajectories given the observations, posterior distributions of the parameters and predicted infection origin. The re-weighted trajectories allow us to predict how the disease spreads through the network and infer the location for the source of infection. methods, including but not limited to: Markov chain Monte Carlo [27][28][29][30], sequential Monte Carlo [31][32][33], trajectory matching [34][35][36][37][38] and machine learning methods like support vector machines [39]. Other approaches include generalized profiling [40], approximate Bayesian computation [41][42][43], derivative-free optimization [44,45] and Bayesian inference [46][47][48][49][50]. Furthermore, most of the epidemiological models in the literature focus on forward dynamics of the diffusion of the pathogen through the network, while the backward-dynamics problem of identifying the diffusion source has been comparatively less studied [51][52][53]. Such an analysis bears significant importance in guiding systematic contact-tracing and increasing the chance of early containment of an outbreak.
An approach that circumvents these difficulties is a well-known method from statistical mechanics, maximum entropy (MaxEnt) biasing. MaxEnt has been proven to be successful in various settings such as molecular dynamics simulations [54][55][56], ecology [57][58][59][60], nuclear magnetic resonance spectroscopy [61,62], x-ray diffraction [63,64], electron microscopy [65,66], economics [67] and neuroscience [68][69][70][71]. This method uses the principle of entropy to measure the difference between two distributions or trajectories and applies a change using Lagrange multipliers to alter a given distribution to match a target one, while maximizing the entropy (and thus, effecting minimal change) [72]. This approach is highly promising in the context of epidemic modeling, as it mitigates the need for designing complex compartmental models and having to make a lot of simplifying assumptions. As remarked in [73]: "What has been produced the day before often must be completely revised the day after because a new piece of information has arrived". This approach relies more on daily (weekly) evidence, rather than relying on uncertain early estimates of disease parameters, especially at the early stages of an epidemic outbreak. A few instances of applying MaxEnt to characterize epidemic spreading exist in the literature. In [74] MaxEnt is used to bias the epidemic curves generated by mean-field SIS and SIR compartmental models to reproduce a set of empirical observations and uncover probability distributions used for contagion and recovery events. Harding et al. [75] propose a MaxEnt approach to modify a SIS framework running on a contact network to model the time-varying nature of human mobility in response to the diffusion of an epidemic outbreak.
Here, we explore the use of MaxEnt biasing when more layers of complexity are added to the dynamic equations governing the advance of an epidemic. To do so, we consider a more elaborated compartmental scheme, the SEAIR model, running on metapopulations [76,77] to accommodate different realistic features such as human mobility, the relevance of the incubation period of one pathogen or the existence of asymptomatic infectious individuals [1]. We show that MaxEnt biasing allows for both predicting future trajectories as well as inferring the source of infection. In Fig. 1 we represent a high-level overview of the framework. Graphs in this work were generated using NetworkX [78]. Model inputs include a compartmental epidemiology model, prior belief for its parameters and a set of sparse observations. The prior belief on the model parameters can include a relatively large variance, making our approach highly applicable to risk assessment analysis at the early stages of the outbreak, where the true parameters are unknown. The observations are weekly average data obtained by disease test screenings that contain random noise. This noise accounts for the uncertainty associated with the number of infected individuals due to the variance of testing policies across a metapopulation. The output is the MaxEnt re-weighted trajectories that are used for inference on the epidemic spread and the source of infection. Using this method applies minimal change to the model's original output, without altering the parameters directly. The premise of this change is that the original model is treated as well-trusted but only slightly incorrect, with the intent of improving predictive accuracy for future events by matching the model's output to experimental data (observations). However, experimental data is known to contain systematic error, so we include a formulation of MaxEnt that accounts for some bias. This method is agnostic to the functional form of the original model; given that it re-weights paths produced by sampling model parameters, which can be done a priori, it can be treated as a black box. This also has the advantage that the method's computational complexity scales with only the number of paths sampled and number of target functions, rather than the number of model parameters [72].
The manuscript is organized as follows. In Sec. II A, we describe the theory of MaxEnt applied to a general model function, P ( θ) with parameters θ, and describe the procedure for MaxEnt path biasing. In Sec. II B we describe the underlying equations of the SEAIR model occurring on a metapopulation framework. In Sec. III we present results on both synthetic and real-world metapopulation mobility networks and demonstrate how the method can predict infection spread, make a high certainty inference on the source of an epidemic using the posterior reweighted trajectory from the MaxEnt approach. In particular, we demonstrate that this inference can be done even in late stages of the disease dynamics. In Sec. IV we end with a discussion of the implications of our findings.

A. Maximum Entropy with Uncertainty
Consider for a given simulator f ( θ) with a set of parameters θ, we have a prior distribution of parameters P ( θ). For example, the function f ( θ) can be a system of ODEs in a compartmental epidemiology model. Given a set of N observations with uncertainty k , where {ḡ} k , k ∈ [1, . . . , N ], we constrain our prior model P ( θ) such that: This means that we want the average over the posterior distribution P ( θ) to match the observation data with some allowable disagreement based on { k }. Note that unlike in Bayesian frameworks, the mentioned average disagreement with the data is optional (i.e P 0 ( k ) = δ( k = 0)). However, in our settings, the Laplace distribution prior P 0 ( ) is used to account for this error with a given standard deviation σ 0 , thus: The posterior distribution P (θ) that satisfies N constraints is given by [56,[79][80][81]: where Z is a normalization constant and λ k values are iteratively updated using gradient descent to satisfy the constraint E[g k + k ] =ḡ k . The MaxEnt framework suggests a strong belief in our prior distribution of parameters in this setting, which reflects the use of approximately correct parameters. Consider health emergencies like COVID-19 global pandemic. At the initial phase of the outbreak, little to no information is available on the pathogen, its transmissibility and the general parameters that describe how the infection spreads. However, one can make an educated guess for the average values of these parameters and make reliable predictions by taking advantage of the ensemble of outcomes from MaxEnt, whose means agree with observed data. In this setting, the observations can be the number of confirmed disease, given some random noise to account for uncertainty. More information on the MaxEnt model implemented in this study can be found in the work of Barrett et al. [72].

B. Epidemic Model
Epidemic spreading can be represented as a reactiondiffusion process where the reaction term refers to the contagion events triggered by the interaction between infected and susceptible hosts whereas the diffusion phase corresponds to the spatial dissemination of the population across the system under study. In this sense, metapopulations, originally introduced in the field of ecology, represent a convenient framework, balancing complexity with analytical tractability, to account for the impact of mobility on epidemic spreading [82][83][84]. Metapopulations are comprised of spatial patches (nodes) where local populations interact in a mean-field manner, connected via flows (edges) corresponding to movement of individuals between patches. The spatial resolution of the spatial patch may vary (neighborhoods, zip-codes, districts, cities etc.) depending upon the granularity of the input data, or the scale at which the dynamics are being modeled. In what is to follow, we assume that our metapopulation is composed of N P patches and that each patch i is populated by n i residents.
To model the disease spread, we consider a variant of the Susceptible-Exposed-Infected-Removed (SEIR) model to account for the existence of (A)symptomatic individuals. With the addition of compartment A, our model is denoted as the SEAIR model. The choice for this particular flavor of compartments was inspired by its relevance in modeling the evolution of the current COVID-19 pandemic [85,86]. The schematic of the model is detailed in Fig. 2. Susceptible individuals become exposed by having contacts with asymptomatic and infectious agents with probability of Π. Let β and β be infectivity rates for I-S and A-S contacts, respectively. Once exposed, susceptible agents turn into asymptomatic or infected at rate η. The fraction of infected (symptomatic) individuals is denoted with . Finally, they recover or die at escape rate µ and become resolved. Note that once resolved, the individuals have lifelong immunity and can no longer be infected.
Considering mobility, we follow the movementinteraction-return scheme introduced in [87] to reflect the impact of commuting mobility on epidemic spreading. At the movement stage, the individuals decide whether to move or not with a probability p, which is identified as the degree of mobility of the population. If they move, they choose their destination according to the flows encoded in the links of the metapopulation. Following the redistribution of the population, contagion and recovery processes take place at the interaction stage, modifying the epidemic state of the population accordingly. Finally, to reflect the recurrent nature of daily human movements, all the agents come back to their associated residential areas.
The spreading process is represented through a temporally discretized ODE that includes the spatial distribution of the population as well as their mobility patterns [88]. Here we aim at characterizing the evolu-tion of the fraction of agents in state m (where m ∈ {S, E, A, I, R}) associated with each node i, denoted in the following by ρ m i (t). The temporal evolution of these quantities are given by: Π i (t) denotes the probability that a susceptible agent associated with node i contracts the disease by making contacts with an asymptomatic or infected individual. Under our assumptions regarding human mobility, it can be expressed as: The first term in Eq. 10 accounts for the probability of contracting the disease within the residential node, while the second term contains the contractions from neighboring nodes. Therefore, note that p = 1 corresponds to a scenario where all the agents follow their usual commuting patterns whereas p = 0 represents a controlled scenario where mobility is fully suppressed and every agent remains in its associated node. In this work, we work in the uncontrolled scenario and fix p = 1 throughout the entire manuscript. In this case, the movements are dictated by the entries of the origin-destination (OD) matrix R, whose elements R ij denote the probability for one individual residing in patch i moving to j. Assuming that the number of trips recorded between both locations in a real dataset is given by T ij , these probabilities are easily computed as is the probability of getting the disease in node i at time t and p accounts for the degree of mobility of individuals. Under Figure 2. SEAIR compartmental scheme. Populations in each patch can be any of Susceptible , Exposed , Asymptomatic, Infected and Resolved. Susceptible (S) individuals can get exposed (E) to the disease through I-S and A-S contacts with infectivity rates β and β . Once exposed, they become asymptomatic (A) or infected (I) at rate η. They finally recover or die at rate µ and become resolved (R). accounts for the fraction of infected (symptomatic) individuals.
the well-mixed assumption, P i (t) is written as: where n m j→i is the number of infectious agents going from j to i belonging to the compartment m and a i denotes the area of node i. In turn, n ef f i encodes the effective population of patch i after population movements. In particular: Note that the product in Eq. 11 accounts for the probability for an individual not getting infected while staying in node i and the exponent represents the number of contacts made with the infectious individuals from compartments A and I. Function f accounts for the dependence of the number of contacts on the population density (x) of each node. Our choice for this function is: where ξ is a constant, accounting for how the number of contacts depend on the population density of one area. Throughout the manuscript, we fix ξ = 5 · 10 −3 square miles. Finally, z is a normalization function to ensure that the average number of contacts across the whole population is k . Therefore: where N T OT is the total number of individuals across the metapopulation, i.e., N T OT = N P j=1 n j and s i denotes the area for node i.

III. RESULTS
In what it is to follow, with an initial guess on the epidemiological parameters and a set of observations, we apply our method to address two fundamental problems in epidemiology modeling: 1. Early assessment of the potential spread, 2. Identifying the origin of the outbreak. For observations, we consider weekly averages for fraction of the population in compartments I and R. We choose these two compartments, given that these are the most likely for which somewhat reliable estimates can be made from real-world data. Nevertheless, it is well documented [89] that such estimates are noisy and their fidelity varies from region to region and therefore to account for this, To account for the some degree of uncertainty about the data, we add multiplicative noise with a mean 1 and standard deviation 0.05 to the observations obtained from the ground truth trajectory. The sampling process tries to explore the trajectory space by adjusting the epidemiological parameters such as β, β , , η and µ from normal or truncated normal distributions,while varying the infection seed across different spatial patches, as well as accounting for a small variance in the mobility flows. Finally, Maxent re-weights the ensemble trajectories, maximizing entropy subject to the observations and determining the most probable state of the network. We consider a Laplace distribution prior (Eq. 2) with standard deviation of 1 to allow some disagreement between the MaxEnt fit and the observations. The Max-Ent implementation is done using Adam optimizer [90] with starting learning rate of 10 −2 and reduced learning rate on plateau callback (factor of 0.9, patience of 10 and minimum learning rate of 10 −4 ) for 1000 epochs. To assess the model's performance we compare the predictions against a ground truth trajectory derived from known pre-selected parameters. Knowledge of the ground-truth enables a proof-of-concept analysis to assess model performance under different scenarios. The ones we consider are density of the network, temporal window of observations, the number of observations and variations in mobility flow of observations with respect to the infection seeded origin. As performance metrics we consider: • Forward dynamics: To compare the predicted trajectory against the known ground truth trajectory we measure the KL-divergence, defined as .
Here T is the total time in the epidemic trajectory and m is the label for the compartments. The term ρ m i (t) is the model's prediction for the probability of an individual associated with patch i to belong to a compartment m at time t and ρ m i (t) is the corresponding value for the ground truth trajectory.
• Backward dynamics: The accuracy of the model in making the correct prediction with respect to the ground-truth source of infection (P 0 ). This can be treated as a binary multi-class classification problem, where the correct prediction of the true origin node is regarded as the true positive (TP) class and every other prediction falls into the false positive (FP) class. Given this, the accuracy (α) is defined as The posterior probabilities P 0 for nodes are obtained by summing over the MaxEnt posterior weights for each node seeded as the infection source-compartment E-in the sampled trajectories ensemble at t = 0, and the largest value among the set corresponds to P 0 probability. To assess performance, we use the top-k posterior probabilities P 0 , and the frequency of true positive predictions as our metric. For instance, for k = 5, the model's prediction for P 0 is classified as a true positive if the infection-source is among the top five values of P 0 probabilities and a false positive otherwise.
We employ our method on two systems: a synthetic metapopulation network, and the mobility network of New York state at the resolution of counties.

A. Synthetic Contact Networks
The 10-node metapopulation (N P = 10) is represented as a directed graph in Fig. 3a, where each node (patch) in the network represents a town or city in the metapopulation and the directed edges account for mobility flows between them. The nodes are connected at random with a connection probability τ = 0.4, such that on average each node is connected to four other patches (considering both in-and out-flows). The area of each node, the population, and entries of the mobility matrix are sampled from normal distributions with parameters listed in Table I .
The infection is initially seeded in patch 1 (node with the yellow edge in Fig. 3a) Table  S1 and Fig. S1 in supporting material. For all 8,192 sampled trajectories, we assume a uniform probability of infection, and randomly choose a patch, and an individual in that patch as the infection seed (see Fig. S2 in supporting material). As observations we consider a total of 50 data points (weekly-averages) from the I and R compartments within an observation window of (50, 140) days. The highlighted panels and blue circles in Fig. 3b mark the five randomly chosen patches and the observations, respectively.
We use the MaxEnt framework, to re-weight the ensemble of trajectories to agree best with the observed data-points and obtain the P 0 probability by summing all the weights for each exposed node in the sampled trajectories at t = 0. The re-weighted average over the sampled trajectories are shown as solid curves in Fig. 3b, and the shaded area marks the ±33% and ±67% quantiles. The calculated D traj KL of 8×10 −3 indicates close agreement between model predictions and the ground truth trajectory. In Fig. 3a we also show nodes colored by their value of P 0 probability, indicating that the algorithm predicts node 1 (the true-origin of infection) as the most probable

Effect of network density
Next we check the accuracy of the model as a function of the density of connections between nodes. We tune the connection probability in the range 0.25 ≤ τ ≤ 1 to sample the spectrum between a sparse and fully-connected network. We redo our simulations over 8,000 different networks in this range, and for each trajectory choose a random node from which to seed the infection. All other relevant parameters are kept the same. In Fig.4a we plot D traj KL as a function of τ , where the solid lines indicate the mode over 200 samples for a given τ , and the shaded areas mark the 30% confidence interval. The region marked in green corresponds to the True positives (TP) where the algorithm correctly identifies the true infection-seed as the most probable source, whereas the region marked in blue corresponds to False Positives (FP) when the true source was not identified as the most probable. Here we use a k = 1 acceptance criteria, a rather stringent condition, as even when the true source is identified as the second most probable, it is still marked FP. The low values of D traj KL indicates that irrespective of the correct identification of the infection-seed, the predicted and ground truth trajectories match well, independent of network density. Note that this is true for the chosen observations obtained in the (50, 140) day temporal window and will be further discussed later.
Additionally, we find high values of P 0 for TP, that is (mostly) independent of the connection probability τ , while for FP, we find low values of P 0 that get progressively worse with increasing τ (Fig. 4b). The model's calibration is assessed in the reliability diagram shown in Fig. 4c, where we plot the accuracy α as a function of P 0 . The case of a perfectly calibrated model, where α changes linearly with certainty is shown as the orange dashed line. The figure indicates that the model is more accurate than it believes, in a conservative manner. Finally, in Fig. 4d we plot α as a function of τ finding that the model's performance degrades in high-density networks, which is to be expected given that dense networks have more complexity in their mobility flows. Nevertheless, at worst, the model shows ≈ 60% accuracy in a fully-connected graph. Indeed, for a wide-range of connection probabilities (corresponding to realistic settings) we find an accuracy in the range of 80 − 90%.

Effect of temporal window of observations
Next we evaluate the model's performance as a function of the temporal window in which observations are made. Current understanding of epidemic dynamics, suggests that contact-tracing is effective only in the initial stages of the outbreak, and any information on the infection source is lost at later times. Indeed, in [51] an approximation to this temporal horizon, t hor , was derived for the SIR model. Adapting the formulation to the SEAIR model, leads to an expression of the form: where λ max corresponds to the leading eigenvalue of the linearized system of ODEs governing the evolution of the dynamics and c max a constant needed to fix the infectious seeds at the beginning of the outbreak (see Appendix A for a complete derivation). We consider a sparse (τ = 0.4) and dense (τ = 1) network and check for the presence of such a temporal horizon, by shifting the 5-week observation period within the range T = 250, collecting 50 data points (5 points from each of compartments I and R for 5 random nodes). As a robustness check, we exclude the true-infection source from our observed samples. In Figs. 5a,b, we plot D traj KL and P 0 as a function of the mid-point of observations for each 5-week window (200 sample runs in each bin), where curves indicate the mode and shapes refer to dense (circles) and sparse (inverted triangles) networks. Curves are split into TP (green) and FP (blue). In the figure, we show the k = 3 acceptance criteria, and in Fig. S3 in the SI we show the case for a k = 1 acceptance criteria. In Fig. 5c, we plot the accuracy α as a function of the mid-point of observations. As expected, the figure indicates high accuracy at the early stages of the outbreak (marked Region A), and decreases as the epidemic progresses. Considering the set of parameters (β, β , µ, k , , η) = (0.05, 0.025, 1 7 , 10, 0.6, 1 1.2 ) and a seed composed of a single exposed individual at the beginning of the outbreak, we obtain c max = 0.372 and t hor = 90.9 days marked as a red vertical dashed line.
Surprisingly, as one crosses t hor a non-monotonic trend is observed and a new peak in the accuracy is observed at later times (t ≈ 150) in both sparse and dense networks, marked as Region B. To the best of our knowledge, this peak in accuracy at advanced stages of the epidemic evolution, where information can be recovered on the infection source, has not been reported before. Indeed, this region also corresponds to the lowest values of D traj KL indicating the closest match to the ground truth trajectory, and thus an optimal window in which to simultaneously infer the most accurate information in forwardand backward-dynamics (panels a and b in Fig. 5, respectively). A possible explanation for this phenomenon is that it corresponds to region with the highest gradients in epidemic curves (Fig. 3b), whereas the low-gradients of the trajectories at other values of t provides the model with insufficient information to perform a reliable inference.

B. Mobility Network of New York State
In this section, we apply our formalism to characterize the spread of infectious diseases across a real metapopulation, the network of commuters across New York state at the spatial resolution of counties, of which there are 62. The mobility flows between counties, as well as their respective areas and populations are obtained from the United States LODES commuting database [91]. Our fo-cus here is on assessing the performance of the method in detecting the spatial location of the infection-seed given more complex and realistic mobility patterns. We first generate the ground truth trajectory according to the following epidemic parameters: β = 0.029, β = 0.052, = 0.586, η = 1 2.493 , k = 10 and µ = 1 1.49 , and then collect observations corresponding to weekly averages of populations in compartments I and R. Observations are collected from specific counties and are drawn from the (60, 140) day temporal window.

Effect of the number of observations
Given that the number of observations is directly linked to epidemic-surveillance efforts, we first check the performance of our model as a function of the number of counties from which data is collected. Specifically, we test the accuracy of identifying the correct spatial origin of the infection-seed as we increase the number of counties observed. We choose three counties with different population densities in which to seed the infection: Hamilton (2.74 per square mile), Monroe (1.14 × 10 3 per square mile), and Kings (3.72 × 10 4 per square mile). We collect 10 samples from each county (randomly chosen) and vary the number of counties observed from 1, 5 and 25. We do not necessarily exclude the seed counties from our randomly chosen observations. In Fig. 6 we plot the counties colored according to their values of the posterior probability P 0 . The top row represents observations from a single county, the middle row from 5 counties and the bottom row 25 counties. The true-origin is marked as a downward yellow triangle, and the observations by blue circles. The three columns correspond to the different infection seeds. In each case, we show D traj KL , P 0 for the true-origin and how the model ranks it as a likely source of infection, as well as the models prediction for the top-ranked county in terms of the posterior probability P 0 . For all three infection-sources, observations from a single county yields poor results for D traj KL , and the model ranks the true-origin quite low as a probable source (16 for Hamilton, 58 for Monroe and 6 for Kings). Sampling from 5 counties results in a considerable increase in performance for the first two counties (6 for Hamilton, 5 for Monroe) while for Kings the model correctly identifies it at the most likely origin. We also note about an order of magnitude decrease in D traj KL for all three counties indicating good agreement with the forward dynamics. Finally, sampling from 25 counties results in the best performance where in addition to Kings, the model correctly identifies Hamilton as a true infection source, while for the case of Monroe the model ranks it as the third most likely origin. We see further improvements in matching the forward dynamics with further decreases in D traj KL (about two orders of magnitude as compared to observing as single county). As an illustrative example we show the full trajectory-set for Monroe county trueorigin with 250 observations in Fig. S4 in the supporting material.
We note the difference in accuracy of the model when assessing Hamilton and Monroe counties. Hamilton despite being a much more sparsely populated area than Monroe, was correctly identified as the true source, whereas Monroe was ranked third. The reason for this discrepancy is that Hamilton was also included in the sample of 25 counties as an input to the model, whereas Monroe was excluded from its observation set. The likelihood of the model to correctly guess the true source increases greatly when the source itself is included as an observation, a feature also seen in our synthetic metapopulation networks. On the other hand, the ability of the model to identify Monroe as the third most likely source is notable given that no information on Monroe was available to the model. Indeed, Erie county, adjacent to Monroe was marked as the most likely source of infection. Kings county is an outlier compared to the other two, in that already with a single observed county the model marked it amongst the upper 10% of posterior probabilities P 0 . Certainly there are more people in Kings (it has the highest population density by far among the three counties) but also it is coterminous with Brooklyn, and a popular destination for residents of other counties. Therefore there is a higher likelihood of mixing of populations from different parts of the state.

Dependence of accuracy on effective proximity
Given the latter observation, we next check whether the strength of mobility flows (both in and out) between counties plays a role in the model's accuracy. Two locations are strongly connected if there are many people traveling between them, and therefore we define an effective proximity matrix φ with elements given by where R is the OD matrix, and we take into account both in-and out-flows. In this setting counties that are strongly connected by mobility flows have low values of φ ij and are therefore more proximal in mobility space. We next seed the infection in location i and sample from a single county j (including the source), ranked in increasing order according to their value of φ ij with the rank of i corresponding to 1. We then generate 8,000 trajectories with a randomly sampled true origin, and plot the accuracy α as a function of effective proximity to the origin county in Fig. 7. Each point in the figure corresponds to the average over 180 realizations. We clearly see a monotonically decreasing trend; sampling from counties further away from the origin-county leads to a sharp decline in accuracy saturating at around the 7th furthest county. The trend is expected given that locations further away from the source in mobility space, experience delays in arrivals of infectious cases. This lag results in the observation of degenerate epidemic trajectories, thus making the inference less accurate.

IV. CONCLUSIONS
This paper has provided, to the best of the authors' knowledge, the first systematic study of both backward and forward dynamics inference on contagion process in contact networks. We have applied the statistical mechanics principle of maximum entropy to the conventional SEAIR epidemiology models to re-weight disease trajectories and obtain the best fit to a set of observations, while making reliable predictions on the true source of the outbreak. The novelty of this work lies within working well under the sparse-data regime and highly uncertain initial parameter priors, making our method highly suitable for studying disease dynamics. Finally, the method proposed here is independent of the underlying compartmental model. While we presented our work in the context of epidemics, the approach is easily generalizable to similar classes of spreading processes. For example, a single computer virus can infect millions of other computers through the Internet. An isolated failure in an electrical power grid network can result a citywide blackout. Misinformation or a baleful rumor can spread through social networks and cause terror and inconvenience. In all these scenarios, the contagion process [92,93] could identify the source of the risk on the network and quarantine its harmful effects [94][95][96][97].  e Innovación (projects FIS2017-87519-P and PID2020-113582GB-I00), from the Departamento de Industria e  Innovación del Gobierno de Aragón y Fondo Social Europeo (FENOL group E-19), and from Fundación Ibercaja and Universidad de Zaragoza (grant No. 224220).

CODE AVAILABILITY STATEMENT
The MaxEnt implementation is publicly available on Github (https://github.com/ur-whitelab/maxent) as a python package called maxent and it can be applied to any simulator. The SEAIR model used in this work is publicly available as python package called py0 on Github (https://github.com/ur-whitelab/py0).

Appendix A: Derivation for Time Horizon
For the sake of comparison, we now compute the t hor value for our compartmental model, according to the rationale followed in [51]. Mathematically, the authors define the time horizon as the time at which the number of infectious individuals, whose evolution is assumed to follow the early stage dynamics of the outbreak, scales to the entire population. To simplify the analysis, we make a mean-field approach and neglect the contact heterogeneities existing across the different patches of the metapopulation. At this limit, the dynamics is completely characterized by the fraction of the population in each compartment m at each time step t, denoted in the following by ρ m (t). Specifically: with At the early stages of the outbreak, the number of affected individuals is negligible compared with the size of the population. Therefore, we can assume that ρ m 1, with m = {E, A, I, R}. This turns the latter expression into: where we have considered that β, β 1 as well. Introducing the latter expression into Eq. (A2) and neglecting O(ρ 2 ) terms lead to For the sake of simplicity, it is convenient at this point to rewrite the equations in terms of the occupation of each compartment m, denoted by m(t). In particular, restricting ourselves to the infectious or potentially infectious individuals, we have thaṫ where we have definedṁ = m(t + 1) − m(t). Consequently, the evolution of the system is given by: being λ i and v i each of the eigenvalues and their associated eigenvectors respectively and c i the integration constants needed to fix the initial conditions to run the dynamics. Albeit the latter expression constitutes the exact evolution of the system, the long-term dynamics is completely determined by the largest eigenvalue λ max and its associated eigenvector v max . Therefore, we can assume that: with λ max = (η − µ) 2 + 4 k η ((1 − )β + β) − (η + µ) 2 .
(A14) Without loss of generality, we set the component of the eigenvector associated with the symptomatic infectious compartment to v I max = 1. Finally, equating the number of symptomatic infectious individuals to the population size, we derive the time horizon t hor which reads as: I. SUPPORTING TABLES     initially seeded in Monroe county by introducing a single exposed agent at time zero. Yellow highlighted panels show the observed patches and the red markers represent the noisy observations. Note that y axis is in log scale. Solid lines are the mean and shaded area shows ±33% and ±67% quantiles in the MaxEnt reweighted ensemble of trajectories.