Epidemic spreading and digital contact tracing: Effects of heterogeneous mixing and quarantine failures

Contact tracing via digital tracking applications installed on mobile phones is an important tool for controlling epidemic spreading. Its effectivity can be quantiﬁed by modifying the standard methodology for analyzing percolation and connectivity of contact networks. We apply this framework to networks with varying degree distributions, numbers of application users, and probabilities of quarantine failures. Further, we study structured populations with homophily and heterophily and the possibility of degree-targeted application distribution. Our results are based on a combination of explicit simulations and mean-ﬁeld analysis. They indicate that there can be major differences in the epidemic size and epidemic probabilities which are equivalent in the normal susceptible-infectious-recovered (SIR) processes. Further, degree heterogeneity is seen to be especially important for the epidemic threshold but not as much for the epidemic size. The probability that tracing leads to quarantines is not as important as the application adoption rate. Finally, both strong homophily and especially heterophily with regard to application adoption can be detrimental. Overall, epidemic dynamics are very sensitive to all of the parameter values we tested out, which makes the problem of estimating the effect of digital contact tracing an inherently multidimensional problem.

Until effective vaccines are widely deployed in a pandemic era, carefully timed non-pharmaceutical interventions [1] such as wearing face masks [2], school closures, travel restrictions and contact tracing [3][4][5][6][7] are the best tools we have for curbing the pandemic.Contact tracing is an attempt to discover and isolate asymptomatic or pre-symptomatic (exposed) individuals.In the absence of herd immunity, contact tracing is a potent low-cost intervention method since it puts people into quarantine where and when the disease spreads.Therefore, it can have a significant role in containing a pandemic by relaxing social-distancing interventions [8], providing an acceptable trade-off between public health and economic objectives [9,10], developing sustainable exit strategies [11,12], identifying future outbreaks [13], and reaching the 'source' of infection [14].
Thanks to the emergence of low-cost wearable health devices [15][16][17][18][19][20][21][22] and mobile software applications, digital contact tracing can now be deployed with higher precision without the problems of manual contact tracing, such as the tracing being slow and labor-intensive or people's hesitation to give identifying data about their contacts due to blame, fear, confusion, or politics.On the other hand, smartphones and wearable devices also offer continuous access to real-time physiological data, which can be used to tune other non-pharmaceutical or pharmaceutical strategies.Modern apps enable us to monitor COVID-19 symptoms [23][24][25], identify its hotspots [26], track mosquito-borne diseases such as Malaria, Zika and Dengue [27,28], and detect microscopic pathogens.
A potentially important factor in the effectiveness of the contact tracing apps is related to how the app-using and non-app-using populations are mixed.Several studies have shown that similar people with similar features are more likely to be in contact with each other than with people with different types of features.This phenomenon is known as homophily [65][66][67].It has been reported in app adoption directly [40], and indirectly through correlation in app adoption and other features exhibiting homophily, such as jobs, age, income and nationality [68][69][70].Therefore, the fraction of population that adopts the app is not the only important factor for reducing the peak and total size of the epidemic, but also the amount of homophily in app adoption can potentially have a significant role.
Since the World Health Organization has declared the COVID-19 outbreak as a Public Health Emergency of International Concern, network scientists have developed different approaches towards analyzing epidemic tracing and mitigation with apps.Using the toolbox of network science, different groups have investigated the effectiveness of contact tracing based on the topology and directionality of contact networks [14,44,[71][72][73][74][75][76][77].Recently, a mathematical framework aimed at understanding how homophily in health behavior shapes the dynamics of epidemics has been introduced by Burgio et al. [78].This study expanded the model of Bianconi et al. [71] and computed the reproduction number and attack rate in a homophilic population using mean-field equations.
Our study investigates the effect of varying app coverage on the epidemic's threshold, probability and expected size in homogeneous and heterogeneous contact networks with and without homophily or heterophily in app adoption.Further, we explore the effect of distributing the apps randomly and preferentially to high-degree nodes [71] in these scenarios.Our main focus is on the epidemic threshold and the final size of the epidemics.Therefore, we assume the dynamics of the epidemic to be governed by the simple SIR model [50].This model can be easily mapped to a static bond-percolation problem [79,80] so that the epidemiological properties can be measured based on the topological structure of the underlying network [50,73,[81][82][83][84].Note that, more complex disease transmission models, such as SEIR models in which there is an infected-but-not-contagious period E, are also covered by this formalism [79,85].The difference in the spreading framework with the app to the normal one is that the infection cannot spread further if it passes a link between two app-users (app-adopters).That is, the infection process model needs to include the memory of the type of node it is coming from.We then extend the percolation framework such that we can add memory [86,87] to it in order to keep track of the infection path.This leads to the observation that the epidemic size is not the same as the epidemic probability as it would be in this model without the app-users [88].
Our results are largely based on mean-field-type calculations of the percolation problem, which are confirmed by explicit simulations of SIR epidemic process and measurements of component sizes in finite networks.Our findings show that: 1) the number of app-users has a direct effect on the epidemic size and epidemic probability and the difference between these two observables is larger in high-degree targeting strategy; 2) epidemics can be controlled to a much better degree in the high-degree targeting strategy; 3) even though degree-heterogeneity can strongly affect or even eliminate the epidemic threshold, high-degree targeting strategy can compensate this effect and increase the threshold significantly; 4) increasing heterophily from random mixing always increases the outbreak size and lowers the epidemic threshold; 5) increasing homophily does the opposite until an optimum, that is below the maximum homophily case, is reached; and finally 6) the probability of contact tracing succeeding in preventing further infections is not as crucial as the fraction of app-users, but can still have significant effects on the epidemic size and epidemic threshold.The only exception is when the apps are distributed to heterogeneous networks with the high-degree targeting strategy.

A. Disease model and connection to percolation
We employ a SIR disease model on networks with additional dynamics given by the disease interactions in the presence of the disease tracking application.In the model, without the tracking application, an infected (I) node will eventually infect a neighboring susceptible (S) node with a transmission probability p independently of other infections.The simulations are performed with a model where the infected nodes try to infect their susceptible neighbors with independent Poisson processes with rate β and go to the removed state (R) after fixed time τ .The fixed recovery time ensures that every infected individual, regardless of app adoption, can infect a susceptible neighbor independently with a transmission probability p = 1 − e −βτ [79,89].These assumptions allow us to study the SIR processes using component size distributions of undirected networks where parts of the links are randomly removed [79,85,[88][89][90]: an epidemic starting from a single node can reach any other node exactly when there is a path of such transmitting links connecting them, i.e., they are in the same component in a network where the potential contact links are removed with probability p.Thus, the epidemic threshold, epidemic probability and epidemic size can be read from percolation simulations [79,85,[88][89][90] (see Section I B).Note that without fixed recovery time, the presence of spreading paths through neighboring links would not be independent, and this would not be a bond percolation problem in an undirected network where edges are removed independently.However, the epidemic threshold, final epidemic size, and the expected outbreak size below the epidemic threshold would still be correctly predicted by this methodology [88,89].
We model the effect of applications to the disease spreading as follows: if an app-user infects another appuser, that second node will get infected but will quarantine themselves with probability p app .The quarantined user will have no further connections that would spread the infection they received from the other app-user.A substantial deviation from a realistic spreading case in our model is that the quarantine does not prevent the disease spreading to the quarantined node through a third node.That is, we only model the primary infection path from the other app-user causing the alarm but do not stop the possible concurrent secondary infection paths from a third node.Strictly speaking, this simplification in the modeling returns a lower bound on the effective-ness of the app-based contact tracing, but given that our contact network models are sparse random graphs (see Section I C) that do not contain local loops, the difference can only be observed if a large enough fraction of the population is infected at the same time.Critically, this does not affect the epidemic threshold but could have implications for parameter regions where the epidemic size is large, depending on the quarantine durations.
The SIR spreading process can be mapped to a slightly more complicated percolation problem in the presence of apps [44,71].To model app-user quarantines, one needs to delete the links between two app-users with the probability of successful quarantine due to the app, p app .This ensures that we ignore the infection paths through two app-users when one of them is successfully quarantined.However, removing these links also removes the second app-user from the component, even though they are infected.To correct this, we need first to find the network components and then extend them by including all app-users outside of the component connected to another app-user (and considering the probability p that the link is kept).See Fig. 1 for an illustration of this process, which leads us to two definitions of components: normal and extended.

B. Components, epidemic size and epidemic probability
In the SIR model without apps, the component size distribution can be used to describe the late stages of the epidemics approximately.Given an initially infected node, the size of the component it belongs to determines the size of the outbreak.In an infinitely large population, we say that an outbreak is an epidemic if it spans a nonzero fraction of the population.The relationship between percolation and the final epidemic size is straightforward if the population is large enough that it can be approximated with an infinite undirected transmission network [79,88].In this case, the percolation threshold gives the epidemic threshold and below it, an outbreak always spans only a zero fraction of the population because all the components are of finite size.Above the percolation threshold, there is a single giant component that spans s max = S max /N fraction of the nodes.This is equivalent to both the size of the epidemic, given that there is one, and the probability that there is an epidemic starting from a single initially infected node [79,88]; s max is the fraction of nodes that can be reached from the giant component (epidemic size) and the probability that randomly chosen node belongs to the giant component (probability of the epidemic).The expected epidemics size in a fraction of eventually infected nodes is, in this case, given as s 2 max .When we introduce apps to the spreading process, the equivalence of the epidemic size and epidemic probability breaks down.Both the normal component and the extended component become important.The component size still gives the probability that there is an epidemic, as is the case without the apps.However, the epidemic size, given that there is one, is now given by the extended component size s max .The expected epidemic size is then given by s max s max .
Similar relationships hold for finite-size systems.For example, the expected size of the epidemics from single source becomes where S c is the normal size and S c is the extended size of the component c and N is the total number of nodes.In this formula, S c /N gives the probability that the initially infected node is in the component c and S c gives the size of the epidemic if a node in component c is chosen.

C. Network models
We aim to study how the network topology, amount and distribution of app-users over the network affect the epidemics.We study networks with degree distribution P (k) and average degree k such that each node is an app-user with probability π a and not an app-user with probability 1 − π a .We distribute the app-users with one of two strategies: 1) uniformly at random or 2) by distributing the apps in the order of their degree such that the high-degree nodes get the apps first.
We use three different models to generate the network topology.We use i) Poisson (ER) random graphs [91] to model homogeneous contact patterns and ii) scalefree networks generated with the Chung-Lu model (CL) [92,93] to model heterogeneous networks.In generation of CL networks, the expected degree of each node is drawn from a continuous power-law distribution P (k) ∝ k −3 such that the minimum expected degree is set to a value that gives us the expected average degree k of our choice.Given a sequence of expected degrees W = {w 1 , w 2 , ..., w n }, Miller algorithm [94] assigns a link between node u and node v with probability p uv ∝ w u w v .This algorithm returns a network without multiple links with almost the same power-law degree distribution.
We model homophily (and heterophily) with regards to apps usage with iii) a modular network model (MN) introduced in Ref [ 95,96] with two groups of nodes: appusers and non-app-users.This model starts with a degree sequence produced either by the ER or CL models and connects the nodes depending on which groups they belong with probabilities π aa (app-user to app-user), π an (app-user to non-app-user), π na (non-app-user to app user), and π nn (non-app-user to non-app-user).We only need to fix one of these probabilities, π aa , and other types of links are formed with probabilities , where π a is the probability that a person is an app-user and the second equality comes from the balance between the number of links from app-users to non-app-users and from non-app to app-users, that is, π a N π an k = (1 − π a )N π na k .The numerical simulations of the MN work by randomly choosing a group for half edges with the given probabilities and matching them to each other uniformly randomly.This can lead to self-links and multi-links, which these are discarded after the randomization procedure.
While there is no correlation between the app adoption status in homogeneous (ER) or heterogeneous (CL) networks above, in the third model (MN), the existence of homophily or heterophily of the network structure is determined by comparing π aa to its value for the neutral case with no homophily or heterophily.In the absence of homophily or heterophily, π aa = η a , where η a is the ratio of the number of links that emerge from appusers to the total degree; this is because if the nodes were connected purely at random, the probability that a link from an app-user connects it to another app-user equals the ratio of the number of stubs that app-users have to the total number of stubs, i.e., η a .In the case of a random selection of app-users η a = π a , since both app-users and non-app-users have on average the same number of stubs and the fraction of stubs that app-users have equals the fraction of app-users in the system, i.e., π a .In a high-degree targeting strategy, the number of stubs that app-users have on average is larger than that of non-app-users.In that case, η a can be calculated from the degree distribution (see Sec. II A).When π aa > η a , app-users are more likely to be connected to each other than in a network in which a fraction of η a of them being uniformly randomly placed.On the other hand, when π aa < η a nodes are more likely to be connected to the nodes of the other type (heterophilic network).In the heterophilic regime, for some pairs of (π aa , π a ), networks are not realizable because of the constraints explained in Sec.II A. The white region in Fig. 6 shows the region of π aa -π a plane that networks cannot be created in that parameter space.

II. ANALYTIC AND SIMULATION METHODS
The epidemics are studied here with various methods of approximation.We employ analytical computations based on mean-field-type approximations to efficiently analyze our models' wide parameter space and provide explicit formulas for our main observable quantities.Here an approximation based on branching processes [97] can be used to determine the critical point.Following Ref. [44], a more detailed calculation based on percolation arguments will give us the component sizes which can be related to the final epidemic size and epidemic probability.Simulations of the network connectivity then complement these mean-field approximations.Finally, we explore the accuracy of the mean-field approximations via explicit simulations of the SIR model.

A. Giant component size from consistency equations
To study the behavior of the epidemic dynamics, we form consistency equations for the giant component size.In Ref. [44] the governing equations for the size of the epidemic and the transition point were obtained for the case of random networks in the absence of homophily.Here we derive the analytical results for the more general case of the spectrum of heterophilic to homophilic networks, a special case of which is the non-homophilic networks of Ref. [44].We consider that app-users and non-appusers might be connected together with a pattern different from pure random chance using the modular network model (MN).
We aim to write the self-consistent equations for the probability, u n , that following a link to a non-app-user does not lead to the giant component and probability u a , that following a link to an app-user does not lead to the giant component.Using these probabilities, the relative size of the giant component s and the relative size of the extended giant component s can be obtained, where s is, in fact, the fraction of nodes infected through non-appusers, while s also includes individuals who caught the infection through an app-user before they could quarantine themselves (see Sec. II C 1).
We need to know the probability u n (u a ), that a randomly chosen link leading to a non-app-user (app-user) is not in the giant component.The probability that a nonapp-user (app-user) is not connected to the giant component via a particular neighboring node is equal to the probability that that non-app-user (app-user) is not connected to the giant component via any of its other neigh-bors.A non-app-user is connected to another non-appuser with probability π nn = 1 − π na and to an app-user with probability π na .So, a link leading out from a nonapp-user does not lead to the giant component if it leads to another non-app-user that is not in the giant component (which happens with probability (1 − π na )u n ) or an app-user that is not in the giant component (which happens with probability π na u a ).That is, the total probability for following a link out from a non-app-user not leading to the giant component is u n→ = (1−π na )u n +π na u a .Since the degree of neighboring nodes is disturbed according to the excess degree distribution q k , the probability that a non-app-user that is encountered by following a link to it is not connected to the giant component via any of its k neighbors is k q k u k n→ .This probability is, by definition, u n , leading to the self-consistent equation below for u n : where g 1 is the generating function for excess degree distribution [50].To find u a , we can use the same treatment, except that we should consider how app-app connections depend on the probability of success in contact tracing [44].If p app is the probability the apps work as expected, then 1 − p app is the probability that the app-user does not effectively quarantine after being been in contact by an infectious app-user.Therefore, u a can be expressed as the self-consistent equation below: Note that π na is determined by the free parameters π a and π aa as we already showed that π na = πa 1−πa (1 − π aa ).Given u n and u a , the average probability that a node belongs to the giant component, or equivalently the fraction of the network occupied by the giant component, is now given by: where g 0 is the generating function for degree distribution.We can approximate s by writing: where, as opposed to Eq. 4, the third term is not a function of p app and the reason is that Eq. 4 assumes that if the app works (which happens with probability p app ) then the probability that a link connected to an app-user does not lead to the giant component is 1 (while if the app does not work it is u a ).However, whether the app works or not, the probability that an app-user does not get infected from another app-user is u a .When apps work, if the second app-user is infected, she quarantines herself and does not infect any other node).
In the case of including a transmission probability p which is less than 1 (in the above equations it was assumed the links are transmitting with probability 1), Eqs. 2 and 3 will change to: When the fraction π a of nodes selected to adopt the app are all the highest degree nodes in the network, these nodes all have a degree higher than k a − 1 such that they include some of k a nodes and the rest are comprised of all nodes with degree larger than k a .Then for the fraction η a of the links protruding from the app-users (which are the top π a fraction of nodes) we can write: where r * is the fraction of degree k a nodes that are app-users and in Eq. 9 we absorbed r * into p k so that p k a,right = r * p ka represents the fraction of nodes in the network that have degree k a and are app-users (so in Eq. 9, k a,right takes the value k a ).
Then for a network with homo/heterophily: and A special case of which are networks with neutral (nonexisting) homophily, where π aa is obtained to be equal to η a and accordingly π na = η a , therefore, and These results predict the behavior of the epidemic dynamics in the thermodynamic limit.Therefore they describe the dynamics very well when the network size is large enough.

B. Mean-field approximation for the branching process
An alternative to writing the consistency equations for the giant component size is to assume that a branching process governs the epidemic dynamics.Then, a straightforward way of finding the epidemic threshold in the SIR model is to find the critical point of a branching process, where the branching factor is given by the expected excess degree q.In the epidemic setting, the branching factor ke = pq gives the expected number of people one infected person infects during the epidemic process.Note that the branching factor has been used as the definition of the basic reproduction number R 0 [88], but is different from the basic reproduction number when it is defined in the networks as R 0 = β/γ k [80].In the SIR model with the app, we need to duplicate the populations so that we separately track the ones without the app (S n , I n and R n ) and with the app (S a , I a and R a ).
Given that the apps are uniformly distributed to π a fraction of the nodes and ke is the branching factor, we can write a mean-field approximation based on the branching process as follows: By defining a = π nn ke , b = π an ke , c = π na ke and d = π aa ke (1 − p app ), the difference equations can be written as: where X t = I (t) n

I (t) a
and A = a b c d .
The steady state X t+1 = X t is possible if all the eigenvalues λ of the transition matrix A (whether real or complex) have an absolute value which is less than 1; Without contact tracing, there is a chance of epidemic, given the initial reproductive number is ke > 1.In the case of app adoption, the critical value of app-users π c a that is needed for reducing the reproductive number can be derived by setting λ = 1 which leads to: When apps work perfectly, the epidemic threshold is given by: For each value of π a there is a non-trivial optimum value π opt aa that leads to the largest epidemic threshold in terms of the branching factor, which is: The critical app adoption can be also calculated as: In the absence of homo/heterophily, π aa = π a , Eq. 20, gives the same result as of Ref. [44], such that: Vazquez [97] also provides a clear way of combining different intervention strategies and shows how our specific results about application homophily are affected by other interventions.

C. Component size simulations
Next, we describe how to extract the giant component in simulated networks and how these simulation results can be used to find the critical points of the disease spreading process.The component sizes can also be used to find the epidemic size distributions as described in Section I B.

Component Extension
In each simulation run, we simulate one network structure G and distribute the apps to the nodes according to one of the models described in Section I C. From the original network G, we keep each link with probability p = 1 − e −βτ , which is the probability of infection going through a link without apps.We also remove all the links between two app-users with probability p app and call the resulting network G a .The components of graph G a are the normal components.
The extended components can be reached by going through every normal component and extending it.For every app-user α in the component C, we go through the neighbors n α = {α 1 , α 2 , , ..., α k } in the original network G.If α i is an app-user and not in the component α i / ∈ C, we add it to the component extension C with probability p.The total set of infected nodes, if starting from a node in C, will be C ∪ C .As these are disjoint sets, we can compute the size as S C = |C| + |C | and S c = |C|.

Susceptibility
In numerical simulations of finite size systems, we can use the peak of a susceptibility measure to find the critical transition point.Theoretically, susceptibility [84] is a measure of fluctuation in the component sizes, which is singular at the epidemic threshold (the critical point).In network percolation studies, it is defined as the expected growth in the size of the giant component when a random link is added to the network.Therefore, susceptibility in an ordinary percolation problem can be written as: where S c is the size of the component c, c max = argmax c S c is the largest component.
Here, we are dealing with two types of components, and as is shown in Fig 2D, the fraction of the sum of component sizes and network size S /N can be larger than one.Susceptibility should be a monotonically decreasing function in the supercritical regime.However, plugging the extended component sizes into Eq.25 results in a growth in the tail of susceptibility, turning it to a non-monotonic function in the supercritical regime.Therefore, this formulation of susceptibility is not suitable in the current case since the maximum of Eq. 25 could lead to estimates of critical points that are very far from the actual one.Instead, we can use the expected growth in the extended giant component, which can be computed as: where S c and S c are the size and the extended size of the component c and c max = argmax c S c is the largest component measured in the extended size.

D. Explicit compartment model simulations
Finally, we will perform explicit simulations of the spreading processes to confirm the theoretical results we arrived at via the approximations we presented above.The effect of tracking applications can be integrated into compartment model simulation by introducing separate susceptible and infected compartments for people with and without the app.The interactions between people with no app installed is similar to those of the normal SIR process, namely, susceptible individuals with no app (S n ) can become infected (I n ) by being in contact with infected people that either do not have the app installed (I n ) or have it installed (I a ).However, if a susceptible individual with the app (S a ) comes into contact with an infected individual with app (I a ), they will become infected but they will also receive infection notification from the app which means they will be quarantined (I q ).Quarantined individuals cannot infect anyone else.Eventually, all the infected individuals will move to the recovered compartment after a constant predetermined amount of time (1/γ) has passed from the beginning of their infection.The recovered compartment is divided into three compartments R n , R a , and R q to track which infected compartment the node is originating from.
The set of all reactions can be written as follows: Note that while edge reactions are governed by Poisson processes happening at a constant rate β, unlike most common SIR models, node reactions are governed by constant cutoff time 1/γ and happen exactly 1/γ units of time after the infection of the node.As interactions in the simulation are bound to take place over edges of a static network, with nodes belonging to each of the compartments, as shown in Sec.III, the results are similar to a component size simulation (which are described in Sec.II C) on a network with effective connectivity of ke = k (1 − e −β/γ ).As only the ratio between β and γ plays as a parameter in the model, we set the value of γ to 1.
In each simulation, starting from a single infected node and running the simulation in discrete time steps of 10 −4 units until no further reaction is possible, the final number of nodes that end up in R q , R a and R n determine total size of infection corresponding to the extended component size S of the component that the initial seed node belongs to.The final combined size of the R n and R a component, however, represents the size of the component S n that the seed node (index case) would belong to, had we removed app-app links.By adding I a and I q compartments, as compared to normal SIR processes, and linking them to the state of the source of infection and the internal state of each node, we include information about the history of the spreading agent more than one step back in the simulation of the spreading process.

III. NUMERICAL RESULTS
We will next illustrate using the theory and simulation introduced in Sec.II how the various parameters affect the epidemic sizes and epidemic probabilities.The simulation studies are done in networks of 10 4 nodes and averaged over 10 realizations.We use two network topologies: homogeneous networks (Erdős-Rényi networks) with expected degree k = 10 and random networks with expected degree sequence driven from power-law degree distribution p(k) ∝ k −3 , with a minimum degree cutoff adjusted such that the average degree is set to 10 [94].

A. Differences in normal and extended components
The difference between the epidemic probability (normal component size) and the epidemic size (extended component size), as given by Eqs. ( 4) and ( 5), is a phenomenon specific to epidemics in the presence of appadaptors.Breaking the equivalence of these two measures can have practical consequences, as illustrated in Fig. 2A.The difference between these two grows with the fraction of app-users π a .For example, when π a = 0.8 and the epidemic probability (the normal component size) is s max ≈ 0.5, the epidemic size (the extended component size) reaches s max ≈ 0.8.This is also reflected in the expected epidemic sizes (see Fig. 2B and Eq. ( 1).Despite the two component definitions differing from each other, they still display the transition at the same point and this point can be measured numerically using the susceptibilities defined in Eqs. ( 25)-( 26) (see Fig. 2C).
The extended component size is not a conserved quantity like the normal component size in the sense that the sum of component sizes S would always sum to the number of nodes N .Instead, the sum of component sizes can be significantly larger than the number of nodes (see Fig. 2D) and the maximum value it can reach grows with the number of application users π a .The deviation from S /N = 1 reaches its maximum with disease parameters higher than the threshold values, but when the disease reaches a large enough population, the fraction S /N starts to decay, reaching S /N = 1 when everybody belongs to the normal giant component.

B. Quarantine failures
The assumption in Section III A is that i) apps work perfectly and ii) an app-user always self-isolates before having a chance to spread the infection, meaning that there are no quarantine failures, p app = 1.It is of practical significance to investigate the effects of quarantine failures [45] on the epidemic threshold and epidemic size.Fig. 3 shows that in the absence of major quarantine failures, epidemic tracing and mitigation with apps can still be a valid strategy if the app adoption level in a society is high enough.The effect of app adoption rate π a is more important than the rate at which apps function, but both need to be relatively high in order for the apps to have a significant impact.
Even if we are above the epidemic threshold, the apps can be useful.Especially when the application adoption π a is high, the quarantines can be very unreliable and the outbreak size (Fig. 3B-C) and epidemic probability (Fig. 3D) both remain small.Again, overall, app adoption and quarantine reliability are essential, with the app adoption rate being more important.Note that ke = 1.8 is chosen as an illustrative example of a parameter region with interesting behavior in the various component sizes: it is large enough such that without any intervention, there is a wide epidemic spreading, but small enough such that the spread can be controlled without extreme measures.

C. Degree heterogeneity and high-degree app targeting
Real networks are degree-heterogeneous and this heterogeneity has a strong effect on the final outbreak size and the epidemic threshold.Fig. 4 shows the expected epidemic sizes with two different strategies in app adoption, random and high-degree targeting, for different fractions of app-users π a in the network.In homogeneous networks, Fig. 4A, contact tracing decreases the expected epidemic size and pushes the epidemic threshold forward.These effects can be further amplified by shifting to the high-degree targeting in app adoption.With 80% of appusers, the epidemic threshold can move from ke = 1 to ke = 4, which means at that point expected epidemic size is zero, while without contact tracing it would be almost 1.Note that in homogeneous networks, the effective average degree of the contact network ke , has good correspondence to the reproduction number of the infection.
In networks with degree-heterogeneity, the epidemic threshold vanishes in normal SIR processes.This effect holds in contact-traced epidemics if we distribute the apps uniformly randomly.However, from Fig. 4B it is clear that contact tracing can significantly reduce the expected epidemic size even when the apps are randomly distributed and the epidemic threshold remains unchanged.With the high-degree targeting strategy, it is possible to move the epidemic threshold.Comparing the expected epidemic size at different values of ke < 3 shows that in real-world situations, app adoption of superspreaders is of significant importance.Since hubs become the app-users, this strategy has drastic effects on the size and threshold of the epidemic, such that the threshold gets pushed from somewhere near zero to a value ke > 5 with the app adoption rate π a = 0.8.Therefore, the reproduction number can be much more controlled in the high-degree targeting strategy.

D. The effect of homophily and heterophily
In previous sections, there was an assumption that app-users are distributed with random mixing patterns; the fact that one of the connections of a node is an app-user has no effect on the probability of that node FIG. 5.The effect of homophily/heterophily in app adoption in homogeneous networks as described in Sec.III D. Homophily (heterophily) region is below (above) the diagonal πa = πaa.Expected epidemic size at ke = 1.8 for (A) random app adoption and for (C) high-degree targeting strategy.The epidemic threshold for (B) random app adoption and for (D) the high-degree targeting strategy.Thresholds are from theoretical results given by Eq.21 and expected epidemic sizes are from percolation simulations.The empty white region is the spectrum that having such a homo/heterophilic population is impossible.
being an app adopter.Next, we explore how homophily/heterophily affects epidemics based on app usage using the modular network model (MN).A Swiss experiment has reported that while a small fraction of π a = 0.2 of people have used the app, the inside connections between them was high enough such that π aa = 0.7 [40].
Fig. 5 illustrates that increasing heterophily leads to a lower epidemic threshold and larger epidemic size for a fixed ke .Increasing homophily from random mixing is initially preferable, but the optimum lies between random mixing and full homophily.For the expected epidemic size, strong heterophily is especially detrimental (see Fig. 5A for the homogeneous network and with random app adoption and in Fig. 5C for high-degree targeting strategy).The optimum value for heterophily/homophily is evident for the epidemic thresholds in Fig. 5B and Fig. 5D, respectively, for the random and high-degree targeting strategies.Fig. 6B gives a more clear picture of existence of an optimum value for the epidemic threshold in the case of homophily.According to Eq.21, for each fraction of app-users π a in the network, the epidemic threshold kc (π a , π aa ) can be maximised by controlling the homophily in app adoption π aa .The pattern in the Fig. 6B is very similar to the convex pattern in Fig. 5B, even though they are calculated using different approximations and approaches (see Sec. II A and Sec.II B).
Another view on the effect of homophily and heterophily is given by finding the critical fraction app-users π c a that is needed to go beyond the epidemic threshold as a function of (π aa and ke ).Fig. 6A depicts this relationship based on Eq. 23 and shows that π c a is not monotonic function of π aa but there is an optimal value of π aa giving the lowest fraction apps that are needed to stop the epidemic.Note that in a network without homophily or heterophily π c a increases monotonically as the function of the effective connectivity ke (see the inset of Fig. 6a).for each πa which is given by to Eq. 22.The pattern here is consistent with another approximation shown in Fig. 5B, while epidemic threshold values are slightly different due to different levels of approximations.Note that here we display the epidemic threshold for all values of πaa and πa such that 0 ≤ πna ≤ 1 so the networks with some of these parameters can be created in practice [95].

IV. DISCUSSIONS
In this article, we have developed two flexible analytic approximations to SIR epidemics in the presence of contact tracing apps.First, we use a branching process to derive explicit analytical solutions for the epidemic thresholds.Second, we expand the framework of using self-consistent equations to analyze digital contact tracing [44], which is an alternative to other approaches [71].Contrary to the conventional SIR spreading, a full picture of the late-state epidemics in the presence of digital contact tracing is not given by a single observable (the component size), but one also needs two variables (normal and extended component sizes).These correspond to the probability of the epidemic and the epidemic size, which are equivalent in the SIR process.Here we see that the two quantities can be significantly different if the number of application users is high.
Our numerical results illustrate that the effects of digital contact tracing can be very sensitive to the network structure, how applications are distributed among the population, and how well the tracing works.Realistic estimates of the effects of digital contact tracing can only be achieved if one can choose correct parameter ranges in a high-dimensional parameter space.In this study, we had 6 of such parameters: the shape of the degree distribution, average degree, amount of heterophily/homophily, application prevalence, quarantine probability and targeting strategy.While we were able to establish and confirm basic laws governing individual parameters and some combinations of parameters, exploring such a parameter space fully for possible compound effects is out of the reach in simulations.However, these effects can be largely revealed by inspecting the analytic equations we derived.
There are several open questions for which this study and other studies only hint at the results.There are types of network structures we ignore here.For example, the heterophily and homophily could be constructed in the network in slightly different ways.For example, a case study using a realistic agent-based model [69] has recently considered, among many other modeling choices aimed at precise calibration on the French population, the contributions of individuals of different ages.One could also develop a more realistic version of our stylized model to systematically analyze the effects of homophily caused by an age-based contact structure and different scenarios of app adoption within that structure.The age-based approach would also allow one to estimate the benefits of applications relative to the risk groups in this model.
Overall the problem of digital contact tracing offers not only a practical problem to solve but also an interesting theoretical puzzle because it introduces memory to the epidemic process.This memory is limited to one step within the tracing model we use here, but one could also use multi-step tracing, where also the second neighbors of infected nodes are quarantined in the case that the first neighbors have already passed on the infection.Further, here we ignore effects such as quarantines that do not directly stop the infection from one application user to another from spreading further.However, in the case of a strong group structure in the network, there could be situations where a non-application user A infects application user B, who alerts another application user C, who actually gets infected by A and stops the spreading because of the quarantine.Analyzing such more complicated phenomena can provide challenges for network scientists for years to come.FIG.11.The effect of homophily/heterophily in app adoption on the epidemic threshold and optimum pattern for homophily.Epidemic thresholds for homogeneous networks with (a) random app adoption (b) and high-degree targeting strategy.Also, for heterogeneous networks with a power-law degree distribution with (c) random app adoption (d) and highdegree targeting strategy.The empty white region is the spectrum that having such a homo/heterophilic population is impossible.

FIG. 1
FIG. 1. (A) Original contact network with app-users marked with the oval symbol.(B) The normal largest component, after the dotted links have been removed in the percolation process at random.When apps are working perfectly, links between a pair of app-users are removed with probability papp = 1 and other links are removed with probability p. (C) An example of a path of infection: the second app-user can be infected; therefore, it must be included in the outbreak size.(D) Extending the giant component to include the secondary app infections.The second infected app-adopter is added to the giant component with transmission probability p.

FIG. 2 .
FIG. 2. Disease spreading statistics in an Erdős-Rényi network as a function of the effective connectivity ke when there are πaN perfect applications (papp = 1) that are distributed uniformly randomly.Results are normalised to the network size N and shown for πa ∈ [0, 0.2, 0.4, 0.6, 0.8] with different markers.(A) The normal component size, i.e., the epidemic probability (dashed lines and markers following them) and the extended components, i.e., the epidemic size (solid lines and markers following them).Dashed and solid lines indicate the results from theory introduced in Sec.II A by Eq. 4 and the markers are results computed from component sizes of simulated networks as described in Sec.II C. (B) The expected epidemic size as given by Eq. 1 computed with theoretical results introduced in Sec.II A (solid lines), simulated component sizes introduced in Sec.II C (filled markers), and explicit SIR simulations introduced in Sec.II D (empty markers).(C) Susceptibility of the normal giant component χ (dots) and the extended component χ (solid lines) as defined in Eqs.(25)-(26).Since susceptibility is a divergent quantity at the epidemic threshold, as explained in Sec.II C 2, it is a good proxy for finding the critical point.Notice that peaks are at the same positions for both curves, normal and extended components.(D) The fraction of sum of component sizes and network size S /N .

FIG. 3 .
FIG. 3. The effect of quarantine failures as described in Sec.III B in homogeneous networks when app adoption is done uniformly randomly.Results are from percolation simulations.(A) The epidemic threshold as a function of quarantine probability papp and app adoption rate πa.All threshold values larger than 4 are shown with the same color.By setting the effective connectivity of the network to ke = 1.8 (B) the expected epidemic size, (C) the extended giant component size and (D) the normal giant component size are shown as a function of papp and πa.Note that ke = 1.8 is chosen as an illustrative example of a parameter region with interesting behavior in the various component sizes: it is large enough such that without any intervention, there is a wide epidemic spreading, but small enough such that the spread can be controlled without extreme measures.

FIG. 4 .
FIG. 4. Expected epidemic size E and epidemic threshold kc for two network topologies with different strategies; E as a function of effective connectivity ke for (A) homogeneous networks with Poisson degree distribution and for (C) heterogeneous networks with a power-law degree distribution P (k) ∝ k −3 .Results are shown for different values of πa using different markers: 0 (stars), 0.2 (triangles), 0.4 (discs), 0.6 (diamonds), and 0.8 (crosses).The solid lines with markers indicate the high-degree targeting strategy, while single markers indicate the random app adoption.Epidemic threshold kc as a function of app-adoption rate πa (such that the upper markers represent the high-degree targeting strategy) for (B) homogeneous networks and for (D) heterogeneous networks.Differences between the threshold values in the presence of homophily are explained in Fig. 5B,D.

FIG. 6 .
FIG. 6. Existence of optimum value for homophily based on branching process approximation as described in Sec.II B. (A) The critical value of app-users π c a that are needed for reducing the reproductive number as a function of effective connectivity and homophily probability πaa.The value of π c a remains the same within each black curve.The inset is the graph of π c a as a function of ke in the absence of homophily πaa = πa given by Eq. 24.(B) The epidemic threshold ke as a function of πaa and πa.The red symbols show the π opt aa

FIG. 7 .FIG. 8 .
FIG.7.The expected epidemic size computed with theoretical results introduced in Sec.II A for heterogeneous networks with degree distribution P (k) ∝ k −3 (solid lines) compared with ones with P (k) ∝ k −2.5 (dotted lines) as a function of the effective connectivity ke when apps are distributed uniformly randomly.Results are normalised to the network size N and shown for πa ∈ [0, 0.2, 0.4, 0.6, 0.8] with different colors.Note that by lowering the exponent, epidemic thresholds get closer to zero and the expected epidemic sizes decrease since there more low-degree nodes in the network.Therefore, by lowering the exponent, while we can add more degree heterogeneity in the network, the physics of the phenomena does not change.

FIG. 9 .
FIG.9.Expected epidemic size in the case of quarantine failures.Expected epidemic size at ke = 1.8 for homogeneous networks with (a) random app adoption (b) and high-degree targeting strategy.Also, for heterogeneous networks with a power-law degree distribution with (c) random app adoption (d) and high-degree targeting strategy.In (b) and (d) the pattern is different due to the effects of hubs.When doing a high-degree targeting strategy, quarantine failures are more significant since the infected ones are highly influential on the spreading dynamics.

5 FIG. 10 .
FIG.10.The effect of homophily/heterophily in app adoption on the expected epidemic size.Expected epidemic size at ke = 1.8 from percolation simulations for homogeneous networks with (a) random app adoption (b) and high-degree targeting strategy.Also, for heterogeneous networks with a power-law degree distribution with (c) random app adoption (d) and high-degree targeting strategy.The empty white region is the spectrum that having such a homo/heterophilic population is impossible.