Hidden Connectivity in Networks with Vulnerable Classes of Nodes

Sebastian M. Krause, Michael M. Danziger, and Vinko Zlatić Theoretical Physics Division, Rudjer Bošković Institute, Bijenicka c. 54, 10000 Zagreb, Croatia Faculty of Physics, University of Duisburg-Essen, Lotharstr. 1, 47048 Duisburg, Germany Department of Physics, Bar-Ilan University, Ramat Gan 5290002, Israel (Received 15 April 2016; revised manuscript received 20 July 2016; published 27 October 2016)


I. INTRODUCTION
In studies of security and robustness on complex networks, the nodes are typically assumed to be homogeneous with respect to their vulnerabilities to eavesdropping or failure.However, many real-world networks are heterogeneous [1], with large sets of nodes vulnerable to particular adversaries or failures.These vulnerabilities make networks far less secure and less robust than they seem.By developing tools based on statistical physics of networks, we show how this problem can be overcome, obtaining secure communication and optimal redundancy in networks with vulnerable classes of nodes.
A single vulnerability can affect many nodes in a number of different scenarios.For instance, in communication networks, servers under the same jurisdiction are vulnerable to eavesdropping by their controlling entity (company or government).A similar vulnerability structure emerges when communication software is vulnerable to versionspecific bugs, like the Heartbleed bug of 2014 [2,3].In each of these cases, secure communication can be obtained by splitting the message into pieces via secret sharing (so that it cannot be reconstructed without all of its fragments) [4][5][6] and transmitting them on different paths, each avoiding a particular vulnerable class of nodes (controlling entities or software versions).Depending on the topology of the network, and the distribution of the vulnerable node classes, this approach may or may not be possible.We illustrate this problem in Fig. 1, where we compare two schematic communication networks: one in which both adversaries can be avoided with multiple paths and one in which one of the adversaries cannot be avoided.
Clearly, a knowledge of network topology is needed in order to obtain secure communication in this way.Unfortunately, network topology has generally not been considered when assessing the security of communication networks.Those examples that do include topology [8][9][10][11][12] do not consider the possibility of multiple simultaneous adversaries, nor do they consider the new connectivity structure that can emerge: Though connected to the network, certain nodes are unable to communicate securely.Percolation is the field of statistical physics that is generally used to analyze connectivity in complex networks, but in its existing forms, it cannot treat the heterogeneity of multiple vulnerable classes.Here, we develop a new type of percolation theory, which provides a general framework for assessing secure connectivity in systems with multiple adversaries or multiple classes of vulnerable nodes.
Beyond the application to secure messaging in networks with multiple adversaries, our framework can discover optimal redundancy in cases where there are correlated failures or other vulnerabilities that affect specific classes of nodes.Though redundant connections via multiple paths are assumed to improve robustness [13][14][15], if a given vulnerability affects a large set of nodes, the redundant links may not improve the robustness.Indeed, in any case where node failure is more likely to occur in groups instead of individually, effective robustness requires optimally redundant paths, each avoiding one (or more) of the vulnerable classes.
Here, we present a new framework for analyzing networks with vulnerable classes of nodes and show the conditions for which-even if every node is vulnerablerobust and secure connectivity can be maintained.We represent the heterogeneity of the network by assigning every node in the network exactly one color, representing a particular vulnerability.The color may represent ownership, geographical location, reliance on a critical material, or some other vulnerability.We then develop a "coloravoiding percolation" theory, which allows us to determine the maximal set of nodes that are mutually connectable under the removal of any color-i.e., the nodes that are connectable via a collection of paths that avoids every color.The existence of this giant color-avoiding component indicates whether or not secure and robust connectivity can be obtained by avoiding the vulnerable node classes.

II. COLOR-AVOIDING PERCOLATION
On a noncolored network, if node or link failures occur with a given probability, percolation theory can be used to determine overall connectivity [16,17].Percolation on complex networks has a rich history [16][17][18][19][20].It has been used to study the resilience of the Internet [21,22] and its susceptibility to virus spreading [23,24], and even in probabilistic routing algorithms [25].It has also been used to understand word-of-mouth processes in social networks [26,27], and the robustness of many biological networks including neural networks [28], metabolic networks [29], and mitochondrial networks [30].Here, we develop a new framework based on percolation theory but not equivalent to any previous percolation problems.In this framework, connectivity corresponds to the ability to avoid vulnerable sets of nodes via multiple paths.
We begin with an undirected unweighted network G with N nodes and adjacency matrix A ij .Every vertex i is assigned a color c i ∈ f1; 2; …; Cg, where C denotes the total number of colors.In noncolored graphs, a single path provides connectivity; in k-core percolation, any k paths are sufficient [31,32]; and in k-connectivity, k-independent paths are required [33].Faced with the possible vulnerability or insecurity of all nodes of a single color, we seek a set of paths between two nodes such that no color is required for all paths.In general, splitting the message into more pieces and sending them along more paths would improve the possibility of secure communication between the source and receiver, provided a secret-sharing algorithm was used [4][5][6].However, if the source and target become disconnected when all the nodes of a certain color are removed, then that color cannot be avoided no matter how many paths are utilized.If we knew that only one color cÃ was problematic, we could determine the securely connectable nodes by simply removing all nodes of color cÃ and checking which components remain, essentially equivalent to standard percolation.Since we do not know which colors are problematic, we need to examine the connectivity upon the removal of every color.
We now define a pair of nodes as color-avoiding connected if, for every color c, there exists a path connecting this pair and avoiding all nodes of color c [in Fig. 2(a), nodes R and S are CAC].The color-avoiding paths are not necessarily unique: Often, one path can avoid multiple colors.We do not include the colors of the source and target nodes in the definition of CAC (see Fig. 2).This assumption captures a number of realistic scenarios: The source and receiver may have special knowledge of the security of their own nodes, the vulnerability may affect the nodes probabilistically so the nodes seek to minimize their exposure, or the vulnerability may affect the nodes as retransmitters but not as end points of the path.After analyzing this assumption in detail, we show how our theory can be extended to other trust models, in particular, where the sender and receiver trust one another's colors.We return again to this distinction in our discussion of the AS-level Internet.
Formally, we define a "color-avoiding connected component" as a maximal set of nodes, where every node pair in FIG. 1. Avoidable and nonavoidable adversaries.In this scenario, the sender must transmit their message through nodes that are controlled by either Lancaster (red rose) or York (white rose).In diagram (a), a message can be split in two and transmitted on two different paths: The upper path avoids York's nodes, while the lower path avoids Lancaster's nodes.In diagram (b), both of the paths pass through York's nodes, and York cannot be avoided.Lancaster and York were opponents in the Wars of the Roses in England in the 15th century.Artwork after Ref. [7].
the set is color-avoiding connected.Several examples of CAC components are shown in Fig. 2(b4), and in Fig. 6 in the Appendix.In this way, we transform the path-finding problem into a percolation problem, which allows us to study it in a fundamental way, analytically and numerically.Note that there are nodes that are not themselves part of the CAC component but are necessary for the color-avoiding connectivity of nodes that are in the component.This occurs, for example, when all of the neighbors that lead from a node to the CAC component are of the same color, as for node A in Figs.2(a) and 2(b).In such a case, the node itself is not CAC to the system as a whole because it must pass through nodes of a certain color before it can reach elsewhere.However, in general, this node will still be necessary to form paths that avoid other colors.The fact that non-CAC nodes may be needed to create overall system color-avoiding connectivity is one indication that a new kind of percolation theory is needed to uncover this hidden structure.
By studying the largest CAC component, we obtain a clear quantitative measure of the feasibility of using multiple paths to avoid vulnerable classes of nodes and information on where those paths should be routed.Furthermore, this gives us a way to measure the effect of changes in network topology, link density, and color distribution.
To find the largest set of color-avoiding connected nodes in any network with any color distribution, we propose the following algorithm.First, for every color c, we remove all nodes with color c and find the largest component in the remaining graph, L c.This component represents the largest set of nodes that are connected without requiring any nodes of color c.Then, with L þ c , we define the set of all nodes that have a direct neighbor in L c, trivially including L c and additionally nodes of color c that are directly connected to L c.These "dangling" c-colored nodes represent the nodes that can communicate via L c, without requiring any ccolored nodes aside from themselves.In Fig. 2(b1), we see L þ white , including L white and some white "dangling" nodes as node A, which also have access to paths avoiding further white nodes.Every pair of nodes in L þ c is connected by a path-avoiding color c.As the second step of the algorithm, FIG. 2. Illustration of color-avoiding connectivity.(a) In this network, the sender S and the receiver R are color-avoiding connected (CAC), as the green path avoids blue nodes, the purple path avoids white nodes, and the yellow path avoids black nodes.(b) Finding L color , the largest color-avoiding connected component.(b1) The largest component without white nodes (L white ) and its direct white neighbors (links to white neighbors are dashed) form L þ white .Each pair of nodes in this component (L þ white ) can communicate, avoiding white nodes on the path between sender and receiver.We repeat the process in (b2) for blue and in (b3) for black nodes.(b4) Nodes that belong to all three components (in red) constitute the largest CAC component, L color .Note that some nodes are not CAC but are necessary to form the largest color-avoiding component, for example, node A. This node is not in L þ black [see (b3)].However, it is still required to form a path avoiding blue nodes between nodes S and R [see (a)].(c) Total CAC pairs p pair compared to the size of the giant CAC component (S color ) for quenched graphs.Red squares show Poisson graphs with N ¼ 10 5 nodes, average degrees k ¼ 1.6, 1.7, 1.9, 4.0 (from left to right) and C ¼ 3 colors; the green circle shows the AS-level Internet [34,35], with colors representing countries as described in Fig. 4. The black line S 2 color ¼ p pair represents all CAC occurring within the giant CAC component.Note that p pair was approximated with samples of up to 5 × 10 5 pairs; error bars are smaller than the visible range.
we define L color ¼∩ c L þ c .This set consists of all the nodes that belong to L þ c for all colors c at the same time.This algorithm generates an implicit multilayer structure, which we explore in Sec.III B and compare with other multilayer network formalisms [36][37][38][39][40][41][42].In Fig. 2(b), we illustrate this method, and further technical details are discussed in Appendix A.
It is conceivable that L color does not represent the overall color-avoiding connectivity of the system.For example, the bulk of the CAC nodes might be in smaller components, without a single dominant set.However, if L color scales with system size and the size of the smaller color-avoiding connected components does not, then in the limit of large systems, the overall color-avoiding connectivity is determined by L color , just like the overall connectivity is determined by the size of the giant component in noncolored graphs.With S color defined as the fraction of nodes that are in L color among all N nodes, and p pair defined as the total fraction of color-avoiding connected pairs among all node pairs, we can test if L color accounts for the bulk of color-avoiding connectivity [p pair ¼ 2n pair =NðN − 1Þ, where n pair is the number of CAC pairs and N is the size of the network].In Fig. 2(c), we see, for random networks and a real-world network, that color-avoiding connectivity is indeed dominated by L color .When S color is small, nongiant clusters and the trivial color-avoiding connectivity that accompanies individual links leads to deviations between p pair and the connectivity predicted with S color , but these deviations rapidly disappear as the system size increases.This validates the treatment of L color as a proxy for color-avoiding connectivity.
By transforming the problem of connectivity via multiple color-avoiding paths into a percolation problem, we have provided a method to study the hidden connectivity that emerges in networks with vulnerable classes of nodes.This algorithm can be used to analyze any network with vulnerable classes of nodes, as we show below with the AS-level Internet.We proceed to develop analytical results based on percolation theory for random networks, where we will study the effects of the network topology and color distribution on the color-avoiding connectivity.

III. ANALYTIC THEORY FOR RANDOM NETWORKS
For the analytical treatment, we use the annealed approximation of networks of size N described through the configuration model [17], in which a degree distribution pðkÞ is a conserved quantity from which an ensemble of network realizations is drawn.For a more comprehensive treatment, see Appendix B.
Every node i is assigned a color c i ∈ f1; 2; …; Cg.The color sequence fc i g has probability Q i r c i .If we assume the colors are distributed uniformly at random, we get r c i ¼ r c ¼ 1=C for all c i .
We calculate S color in the limit of N → ∞ as the probability that a single node belongs to L color , Because L color is a subset of the regular giant component by construction, we begin by obtaining the solution for standard percolation on random graphs [17,43,44].The size of the giant component in a noncolored random graph is S ¼ 1 − g 0 ðuÞ, where g 0 ðzÞ ¼ P p k z k is the generating function of the probability distribution p k .Here, u is the probability that a node is not connected to the giant component over one particular link, and it is computed as the solution of u ¼ g 1 ðuÞ, where g 1 ðzÞ ¼ g 0 0 ðzÞ=g 0 0 ð1Þ is the generating function of excess degree [17].Second, we let κ c be the expected number of a randomly chosen node's neighbors of color c, which are connected to the giant component of standard percolation.Considering κ c for all colors, we obtain the vector κ ¼ ðκ 1 ; …; κ C Þ, with k 0 ¼ P c κ c being the total number of links to the normal giant component.Third, the conditional probability P κ that the links suffice to connect to L color , given that they belong to distribution κ and that they already belong to the normal giant component, is in which U c denotes the conditional probability that a link fails to connect to L c given that it does connect to the normal giant component via a node having a color c 0 ≠ c.We define The probability u c that a single link does not connect to a giant L c is calculated with u c ¼ r c þð1−r c Þg 1 ðu cÞ (site percolation with a surviving fraction of nodes of 1 − r c [17]).Notice that ð1 Þ is the probability that the randomly chosen node is in L þ c , given that it has exactly k 0 − κ c links to the normal giant component over nodes without color c.Combining these terms, we obtain a formula for S color : where the binomial factor B k;k 0 [Appendix B, Eq. (B7)] accounts for the probability that, out of k links, the k 0 links connect to the normal giant component.The multinomial factor M k 0 ;κ [Appendix B, Eq. (B8)] gives the multinomial probability of having the color distribution κ among the neighbors belonging to the normal giant component.
To obtain a closed-form solution for S color , we now assume that every color occurs with equal probability: An alternative derivation of this equation based on combinatorics and set theory is presented in Appendix D. This approach is equivalent to the theory presented here and can be more tractable to some readers, depending on their background.
We now discuss the limiting cases C ¼ 2 and C → ∞.The result for two colors can be simplified to [Appendix C, Eq. (C8)] which directly depends on u¯1 only.As the number of colors tends to infinity, standard percolation is not recovered.S color remains smaller than the relative size of the giant component S, and in fact, S color;∞ is identical to the giant component in k-core percolation with k ¼ 2 [31,32].The reason that S color;∞ is equivalent to two-core percolation is that-even if every node is a different color-if a node were connected via only one link, it would not be able to avoid the color of its sole neighbor.We demonstrate this directly by deriving an asymptotic form for S color as C → ∞ [Appendix C, Eq. (C14)]: which is the same result as in two-core percolation.In Fig. 3(a), we see that S color;C comes close to S color;∞ even for C ¼ 10, indicating that even moderate color diversity comes close to the infinite color case.We can thus use two-core percolation as an envelope for maximal coloravoiding connectivity: If a node is not in the two-core giant component, it will not be in the CAC giant component, regardless of the specific coloring.
We now discuss graphs with broad degree distributions with p k ∼ k −α (k > 0) and generating functions g 0 ðzÞ ¼ Li α ðzÞ=ζðαÞ and g 1 ðzÞ ¼ Li α−1 ðzÞ=½zζðα − 1Þ, with Li α ðzÞ the polylogarithm function.In Fig. 3(b), we see results for C ¼ 2 and C ¼ 10 depending on the average degree k ¼ ζðα − 1Þ=ζðαÞ [17].The limiting cases are diverging k We see that kcrit is not strongly affected by the number of colors but that the size of the giant CAC component is substantially smaller than in the case of Erdős-Rényi networks [see Figs.The critical connectivity can be calculated using Cohen's criterion for site percolation [21].With the fraction 1 − r c of nodes surviving random removal, we obtain We find that Erdős-Rényi (ER) networks are more coloravoiding connected than scale-free networks of equal average degree, the opposite of the results for resilience to random failures [17,22,45].This follows from the fact that if the average degree is conserved while the degree distribution is widened, there is a larger proportion of very low degree nodes, and we see the same effect in the twocore envelopes; compare Figs.3(a  We now turn to the critical behavior of S color in Erdős-Rényi networks with C uniformly distributed colors.Similar to standard percolation, we find that the size of the largest color-avoiding connected component S color undergoes a phase transition at a specific kcrit , which is now determined by the number of colors (see Fig. 3).For k < kcrit , color-avoiding connectivity is confined to clusters of finite size (with vanishing relative size in the limit of large N), and for k > kcrit , we have a largest color-avoiding connected component of relative size S color , which scales with system size.We find that the value of kcrit decreases as C increases and approaches the standard percolation threshold as C → ∞.Since color-avoiding connectivity requires that the giant component not be destroyed after the removal of any single color, we require that kER crit ¼ kcrit ½ðC − 1Þ=C, where kER crit ¼ 1 is the percolation threshold for ER graphs and ½ðC − 1Þ=C is the fraction of links remaining after the removal of 1=C nodes.Therefore, kcrit ¼ C=ðC − 1Þ.
To discuss the scaling and critical exponents, we return to the definition of P κ, Eq. ( 1).We consider the region just above kcrit by defining We analyze the behavior of P κ for small ε by expanding ð1 − ðU¯1Þ k 0 −κ c Þ ≈ ðk 0 − κ c Þε. Plugging this approximation into Eqs.( 1) and ( 3), we obtain We confirm the value of kcrit and the scaling of S color numerically in Fig. 3(c) for C ¼ 3 colors.As C → ∞, we need to resolve the seeming contradiction of a divergent critical exponent β ¼ C and convergence towards S color;∞ as it appears in Eq. ( 6).For ER networks, we show The reason that we do not observe β → ∞ as described in Eq. ( 8) is that the approximation used to obtain Eq. ( 7) is only valid in a critical region defined as ð k − kcrit Þ ≪ 1=C.As C → ∞, S color increases with the high exponent β ¼ C.However, the shrinking critical region overpowers the diverging critical exponent, and S color ∼ 0 takes on unobservable small values and crosses over to β ¼ 2 scaling outside the critical region.As mentioned above, in color-avoiding percolation, there are transmission nodes that are needed to maintain connectivity of the largest CAC component but are themselves excluded from it.The number of these nodes can be much larger than the number of nodes they are connecting near criticality.We test this performing numeric simulations for Poisson graphs with C ¼ 3 colors and r c ¼ 1=3 for all colors.We take networks of size N ¼ 5 × 10 6 and connectivity k ¼ kcrit þ 0.02 ¼ 1.52.All nodes in ∪ c L c might be needed for other nodes in order to connect.Averaging over 50 network realizations, we find, for the relative size of this component, ∪ c L c=N ¼ 5 × 10 −2 , while S color ¼ 7 × 10 −5 is more then 2 orders of magnitude smaller.We note that transmission nodes are not present in other percolation concepts as k-core percolation [31] or percolation in multiplex and interdependent networks [36][37][38][39][40][41][42]46], though a similar feature arises in k connectivity [33].

A. Generalization of the theory
While we presented closed-form results only for homogeneous color frequencies [see Eq. ( 4)], Eq. (3) holds for any set of color frequencies r c .For heterogeneous color frequencies r c , Eq. ( 3) can be evaluated directly.Because of the sum on k, this becomes cumbersome when k is large.For broad degree distributions p k , the full summation over κ can be replaced with a sampling method.The critical behavior can be determined for heterogeneous color distributions r c , as demonstrated above, using an expansion in Eq. ( 1).
An important generalization of color-avoiding percolation is to allow for trusted colors.In particular, in many scenarios, it is reasonable to trust the colors of the sender and receiver nodes, as they cannot be avoided anyhow.For this, we define A as the set of colors that have to be avoided.This includes all colors 1; …; C, except for trusted colors.We assume A not to be empty.For the algorithm, we have For analytical results, we replace Eq. (1) with and use it together with Eqs. ( 2) and (3).S color can be compared to the standard percolation result, if only one color has to be avoided, and all other colors are trusted, in the limit of A ¼ fcg.We have S color ¼ 1 − g 0 ðu cÞ, while the standard percolation result reads ð1 − r c Þ½1 − g 0 ðu cÞ.S color is larger than standard percolation in this case because nodes of the avoided color c are formally allowed as sender or receiver nodes.However, restricting to nodes of colors other than the avoided color c, standard percolation is recovered.Notice that with trusted colors, results for S color can exceed the two-core.This is because a single link connecting to a node with a trusted color can be enough to avoid all colors in A.

B. Comparison with other percolation types
Color-avoiding percolation includes two properties of heterogeneous complex networks: the topological connectivity, and metadata of node coloring, which we combine with a multipath concept.Our theory for random graph ensembles treats these two properties in two distinct steps, using conditional probabilities.Probabilities for colors to be avoidable are conditioned on the overall connectivity (via the giant component), which is treated separately.The comparison with numerical results in Fig. 3 demonstrates that this procedure works: Topology and coloring can indeed be treated separately, and conditional probabilities for avoiding colors are independent even when requiring multiple color-avoiding paths.Furthermore, the interaction of topology and color distribution implies a rich critical behavior, with critical values and critical exponents depending both on the topology and on the color distribution.To our knowledge, the only reported use of colorings in percolation is for polychromatic percolation [47,48], which was developed to model polymerization via simultaneous nonoverlapping largest components in colored lattices.Polychromatic percolation does not consider effects of network topology or multiple paths, and it is designed to answer different questions compared to our study, such as conductivity in heterogeneous lattices.
Multiplex networks are characterized by several layers of links between the same nodes.In this context, there are two main percolation approaches: one that assumes that connections can move from layer to layer via the shared nodes, and one that treats the connectivity of each layer separately and requires the connectedness of the node in all of the layers.The first approach is associated with interconnected networks and is well suited to model multimodal transportation networks, while the second approach is associated with interdependent networks and is better suited to model critical infrastructure.The first approach determines overall connectivity as the union of the giant components in each layer, while the second approach uses the intersection of giant components, with the additional requirement that the reduced set of nodes continue to define a giant component in each layer (this gives rise to the iterative solution and cascade process).
Our percolation process can be understood as a transformation from a single-layer node-colored network with C colors to a C-layered multiplex noncolored network.Each color defines a layer through its removal: After removing all the nodes in the network of color c, we determine the largest connected component and add to it nodes of color c that are directly linked to it.This set L þ c of nodes and links constitutes the layer corresponding to color c.Notice that layers for two different avoidable colors c and c 0 can have many links in common, which connect nodes of colors other than c or c 0 .Such overlapping links have strong implications for multiplex connectivity [46,49].
Even after this transformation, the percolation process is distinct from other multilayer percolation processes.After obtaining layers for all colors, we take the intersection of the giant component of all the layers.This collapsed giant component corresponds to the maximal set of nodes that are color-avoiding connected.This is not the same as the mutual giant connected component of interdependent networks because we do not require that the nodes in the intersection of the giant components remain connected in each layer.Allowing the final set of nodes to not be directly connected to one another in each layer leads to two qualitative differences: There are no cascades because the giant component is not recalculated repeatedly [50,51], and there are transmission nodes that are required for color-avoiding connectivity but are not color-avoiding connected themselves.This result is not merely a technical difference; it reflects a qualitative difference in the reality that we are modeling: Instead of looking at the set of nodes that remain in the giant components of each layer (as in interdependent networks) or that can be connected via paths that bridge the layers (in interconnected networks), color-avoiding connectivity looks for the pairs of nodes that have a path in every layer, even if the nodes that compose the path do not share this property.To our knowledge, this kind of analysis has never been performed in a single-layer or a multilayer context.The nearest analog is in studies of k components [33], where connectivity is limited to nodes that are connectable via k-independent paths.

IV. APPLICATIONS AND DISCUSSION
One immediate application of our framework is to secure communication in a network where every node is controlled by some entity and thereby subject to eavesdropping.Assuming C node owners, each of which eavesdrops on its routers' traffic, we can securely communicate if messages are split with a secret-sharing protocol [4,5,52] and transmitted along multiple color-avoiding paths [53,54].The nodes that can take advantage of this method are exactly the elements of the largest CAC component (L color ).
To study the hidden CAC structure of the Internet, we use a symmetrized version of the AS-level Internet prepared by Ref. [34], which was generated using data from the CAIDA project [35] up to December 2013.We then color every router according to the country to which the router is registered, reflecting the assumption that every country is eavesdropping on its traffic but that no countries share information [Fig.4(a)].If two countries do share information, we would simply recolor the collaborating node colors to have the same color.For simplicity, we confine our analysis to the case where no countries share information.Using the algorithm for finding the largest CAC component, we can determine which nodes are securely connectable and which are not [Fig.4(b)].
We find that 26 228 out of 49 743 (≈52.73%) of the routers are in the largest CAC component and that this accounts for the vast majority of CAC connected nodes [Fig.2(c)].However, we also find that these results vary greatly from country to country.For instance, only 25% of the routers registered to the USA are in the largest CAC component compared to 89% of routers registered to Russia [Fig.4(c)].This is partly because of the density of routers in the USA, which is much higher than Russia, and indicates that US eavesdroppers have far greater capacity to intercept communication than their Russian counterparts.
Since the analytic calculations we presented in Sec.III assumed that there was no correlation between the topology and the node coloring, it is of interest to understand how the deviations from that assumption affect color-avoiding connectivity in a real-world network.Details for the countries with the most nodes, together with a comparison with theoretical results, are presented in Fig. 5(a).For theoretical results, the degree distribution was used as measured from the network, and the nodes were given uniform color frequencies, where the frequencies r c are determined from the shares of AS of each color.We find three main reasons for the breakdown of color connectivity in the real-world network compared to the equivalent random system: (i) Their own country is not avoidable (USA, Brazil, Poland, Sweden), (ii) USA is not avoidable (Australia, Canada, Korea, India), or (iii) the nodes are under-represented in the two-core largest component (Netherlands).
We further consider the possibility that a pair of nodes trust their own colors in Figs.5(b) and 5(c).We find that for type (i) countries, trusting their own color substantially improves their connectivity, while for type (ii) countries, it only helps when communicating with the USA; for type (iii) countries, it does not make a substantial difference at all.
A similar analysis can be made of technological communication networks where the nodes are colored by version, or in human communication networks like spy networks with agents operating under different flags.
If classes of nodes are vulnerable to failure (as opposed to eavesdropping), avoiding them with multiple paths as determined by color-avoiding percolation leads to optimal redundancy: Each backup path is optimized for the failure affecting a different node class.When percolation is used to describe spreading phenomena or robustness in networks, it is commonly assumed that the nodes fail at random or preferentially according to a topological property like degree or betweenness [21,22].However, in many systems, certain sets of nodes are liable to fail all at once: pipe networks with above-ground water pipes freezing over all at once, smart grids where multiple power stations depend on one communication station [54,55], logistical networks where one contractor is responsible for several connections, metabolic networks with different metabolic pathways [56,57] depending on particular biochemicals, and any other case where a specific set of nodes is liable to fail at once [58,59].Color-avoiding percolation provides a general and extensible tool for analyzing effective robustness and optimal redundancy in any of these realistic scenarios.
In economics and operations research, it has been argued that global market competition today is essentially between supply chains instead of between individual companies FIG. 4. Color-avoiding connectivity of the AS-level Internet.(a) Here, we show the routers of the AS-level Internet in the Iberian peninsula as a disjointly vulnerable network, with the colors determined by the country to which the router is registered.(b) The green nodes are members of this set, while the red are not.This means that these routers can take advantage of multiple paths to maintain security, as described in the main text.Panel (c) shows the number of CAC routers (nodes in L color ) compared to the total number of routers for the top 20 countries worldwide, in terms of total number of AS routers registered to that country in L color .Data for the USA have been truncated for visibility; the total number of AS routers is 17 690.We use a symmetrized version of the network of Ref. [34], which was generated using data from the CAIDA project [35] up to December 2013.[60,61].Network studies about risks to supply chains have shown that supply chains are often highly vulnerable, in general [62,63].With the exception of case studies [64], risks associated with correlated failures due to geographical proximity [65][66][67] or shared ownership have not been studied.However, the effect of correlated failures can cause serious supply-chain disruptions, such as those that occurred with the hard-drive industry following the 2011 Thailand floods.Color-avoiding percolation can be directly used to compute the portion of the world trade network between entities, which is CAC secure with respect to correlated failures.Calculating the fraction of a supply chain that is in the giant CAC component provides an estimate of the resilience of different economic sectors to correlated failures.
Another application that fits neither the rubric of correlated failures nor that of security against eavesdroppers is in determining the maximal infectable set of individuals in a multistrain epidemic [68].Coloring nodes by strain immunity, color-avoiding percolation can be used to evaluate the population's susceptibility to a multistrain infection.For example, let us consider the case in which C different strain immunities are present in the population.Let us assume that we do not know a priori which of the strains will emerge next and that the probability of infection for an immunized individual is very small compared to the probability of infection of nonimmunized individuals.Using color-avoiding percolation, we can compute the largest set of infection-prone individuals without knowledge of the exact type of strain that will emerge.This component represents the lower bound of the component affected by the epidemic and could be used as a lower bound of the amount of resources needed to contain the spread of disease, whatever strain emerges.
Color-avoiding percolation can also be applied to ecology.In Ref. [69], Barter and Gross propose a model for an ecological system in which species are spread over a network of spatial patches (islands).If different species can not survive in certain patches, we can color the patches depending on which species they do not support.The model presented here could estimate a portion of islands where it is possible to find all the species, assuming that species can occasionally visit neighboring islands even if they are uninhabitable for that species.
Vulnerability colorings in real-world networks have not been measured systematically in the past.In light of our findings that this heterogeneity of vulnerabilities can be used for improving security and robustness of complex systems, we hope that the collection of vulnerabilitycolored network data will be pursued in the future.
Here, we have presented the first systematic study of complex networks with vulnerabilities affecting classes of nodes and a way to maintain network robustness by utilizing multiple paths.This led us to develop coloravoiding percolation, a new type of percolation with a number of unique properties.We have shown that even a small diversity of colors can enable color-avoiding connectivity to a large fraction of nodes in a random network but that, in real-world networks, uneven distribution of vulnerabilities can undermine this effect.The framework and metrics uncover a hidden structure that underlies any complex network with classes of vulnerable nodes and can be used to devise new network design principles and protocols for improving robustness and security in realworld networks.

ACKNOWLEDGMENTS
We acknowledge financial support from the European Commission FET-Proactive project MULTIPLEX (Grant No. 317532) and the Italy-Israel NECST project.M. D. is grateful to the Azrieli Foundation for support.M. D. thanks Alan Danziger for first suggesting router software versions as a percolation problem.V. Z. acknowledges support by the H2020 CSA Twinning Project No. 692194, RBI-T-WINNING, and Croatian centers of excellence Quantix and Center of Research Excellence for Data Science and Cooperative Systems.We also express gratitude to Shlomo Havlin, Damir Vukičević, Marko Popović, Hrvoje Štefančić, and Damir Korančić for helpful comments in the preparation of this manuscript.
All authors contributed to the idea, discussion of results, and writing of the paper.S. K. and M. D. performed simulations, and S. K. developed the analytical treatment.

APPENDIX A: COLOR-AVOIDING CONNECTED COMPONENTS AND THE MAXIMALITY OF L color
As shown in Fig. 6, CAC components can have a broad variety of forms, and they can overlap.Compared to standard component analysis, this complicates the identification of CAC components.However, it is possible to show that L color is always a CAC component (if it is not empty).For this, it has to be shown that L color is maximal.
To show that L color is a maximal component in the sense that there can be no other nodes that are CAC with all the nodes in L color and are not a member of L color , we provide the following argument.First, let us assume that L color is not maximal.That means that there exists a node v in the network which is (a) CAC to every node in L color and (b) excluded from L color .Condition (b) means that there exists a color c 0 for which the node v is not a member of L þ c0 .Consequently the node v cannot connect to the nodes in L color which is by construction a subset of L þ c0 , which contradicts (a).FIG. 6. Variety of color-avoiding components.Color-avoiding components may overlap, as shown in diagrams (b) and (c).Coloravoiding components can assume diverse forms.In a chain (a), paths between nodes of one color exist and can be reached by connections between nodes of different colors.In diagram (b), the black node serves as an alternative path provider for the blue nodes.Graph (d) does not need any connection among nodes of the same color, but there is a massive overhead of nodes and connections to achieve color-avoiding connectivity of the blue nodes.Graph (e) shows that a clique is a color-avoiding component.We can find analytical results for S color for random graph ensembles with randomly distributed colors in the limit of infinite graphs.These results can be used to estimate the situation in finite quenched networks.We are able to gain a general understanding including phase transitions.This knowledge can guide our understanding of real-world networks.

Networks
We use the generalized-configuration-model graph ensemble with N nodes, where each degree sequence fk i g occurs with probability Q i p k i , with the degree distribution p k .Additionally, we assign to every node i a color c i ∈ 1; 2; …; C. The color sequence fc i g has probability Q i r c i with the color distribution r c .For a graph G N out of the graph ensemble, L color has a certain size N color ðG N Þ.For the whole graph ensemble, we have to use the average value.By considering only giant contributions growing with network size, we have where is the probability of having the graph G N of size N, including ω, the probability of the connection scheme of G N as a matching of half edges.

Question and connection to percolation theory
For calculating S color in the random graph ensemble, we follow the ideas of Erdős and Rényi [43] and Newman [44].For calculating the size of the giant component, they used probabilities of connections for a single node in the graph ensemble.As we have to extend the method to a gradual procedure with conditional probabilities, it is useful to introduce the original method in detail with a shifted viewpoint.
Let us denote with L the set of all nodes belonging to the largest component.In the top panels of Fig. 7, a possible situation is illustrated.The largest component contains a large part of the network, and the remaining nodes belong to smaller components.We have to calculate the size S of the giant component, meaning the average relative size of L in the network ensemble in the limit of infinite network size.For this calculation, we can define the average probability u that a node fails to connect to L over one particular link.This is illustrated in the left part of the figure.Again, the thermodynamic limit N → ∞ is implied.With the definition of u at hand, we can calculate S in two steps: First, using a self-consistency equation, u is calculated.The probability u is identical to the probability that the neighbor does not connect to the giant component over any of the remaining links, In this equation, g 1 is the generating function of excess degree q k ¼ ðk þ 1Þp kþ1 = k.For important degree distributions, e.g., Poisson or scale-free, the equation for u can only be solved numerically.The second step is an averaging over nodes with various degrees k.The probability to connect to the giant component over any of k links is ð1 − u k Þ, meaning that not all links fail at the same time.This is illustrated in the bottom part of the figure.As a node that connects to the giant component belongs to it, In analogy to the procedure described above, we calculate S color as the probability that a randomly chosen node belongs to L color .This has to be evaluated in the graph ensemble of infinite size.As we perform an averaging over nodes with various degrees k, the following question has to be answered: What is the probability that a node with k links connects to a giant L c for all colors c at the same time.This is illustrated in Fig. 8. On the left, the situation for a graph with colors on the nodes is illustrated.Nodes of all colors might be in the largest component.After deleting all nodes of one color c, the remaining largest component L c might still contain a large part of all nodes in L. The condition for the node belonging to L color is illustrated in the right part of the figure.
We use the same two steps to attack this problem, as described for calculating the giant component above.First, we provide some single link probabilities that can be used as primitives for further calculations.Second, we combine the single link probabilities to calculate S color .

Single link probabilities
We have already shown Eq. (B2) for calculating the probability u.In the case of colors on the nodes, as illustrated in the left part of Fig. 9, the colors can simply be ignored.We further need the probability to connect to a node of color c, which is simply r c .This is illustrated in the second column of the figure.We further introduce u c, the probability that a single link does not connect to a giant L c. See the third column of the figure for an illustration.This can be calculated using percolation theory for random attack by solving Unfortunately, u c cannot be used directly for calculating S color .If we look at the same link, the probabilities u c are dependent for different colors.The most obvious argument is that we always have Π c ð1 − u cÞ ¼ 0, as a link must at least miss one of the L c. Instead, we use the conditional probability U c, as illustrated with the outer right column of the figure.The precondition is that a link connects to the giant component, and the node it connects to has no color c.Here, U c is the probability that such a link connects to L c.For calculating it, we use the primitives introduced so far, as illustrated in Fig. 10.Assuming independence of the probabilities (1 − u) for connecting to the giant component and ð1 − r c Þ for not connecting to a node of color c, the precondition of U c can be constructed.In this way, we can construct ð1 − u cÞ using the probability we are searching for: ð1 − u cÞ ¼ ð1 − uÞð1 − r c Þð1 − U cÞ.With this result, we find further restriction of the colors would decrease the numerator and the denominator by the same factor.

Averaging over link distributions
As in Eq. (B3) for the giant component, we want to get an analytical result for S color by averaging over possible link constellations of a randomly chosen node.Let us give the whole result and then explain it step by step afterwards: The formulas include the single link probabilities r c , u [Eq.(B2)] and U C [Eq. (B5) with (B4)].An illustration of the procedure can be seen in Fig. 11.Here, B k;k 0 is the binomial probability that, out of the k links, k 0 links connect to the giant component.Note that M k 0 ;κ gives the multinomial probability for a certain color distribution among the k 0 links connecting to the giant component.We assume that this second step is independent of the first step, which is confirmed with the final results.The numbers κ c count the links that connect to a node of color c in the giant component.Finally, P κ gives the joint probability that for the color distribution given by κ, all giant L c are connected at the same time.There is at least one link connecting to L c with probability 1 − ðU cÞ k 0 −κ c .If this is successful, the FIG. 9. Probabilities for a single link to connect to different parts of the network.We use these probabilities as primitives to calculate the probability for many links.While u, r c , and u c can be calculated with standard methods invented for the configuration model, the conditional probability U c can be calculated as a combination of the others.FIG. 10.Calculation of U ḡ using the equality ð1 − uÞð1 − r g Þð1 − U ḡÞ ¼ 1 − u ḡ.For this calculation, we have assumed independence of the qualities of the link under consideration, especially of the color it connects to and if it connects to the giant component.
randomly chosen node belongs to L þ c .The success probabilities for different colors have to be multiplied, as all L c have to be simultaneously reached.

Closed-form solutions
We now calculate closed-form solutions for S color for special cases.This calculation is done to demonstrate how the extensive summations over k 0 , k, and κ can be performed analytically.In cases where this is not possible, a sampling of values κ has to be performed.The results can be tested against the analytically tractable situations and by comparing with numerical results.The closed-form solutions presented here were used to calculate analytical results for the main article as well.
For evaluating Eq. (B6) with two colors, we first rewrite In the last step, the binomial formula was used backward.We can use this procedure once more, and with Eq. (B5) and r 1 þ r 2 ¼ 1, we find FIG. 11.For calculating the probability of a node with k links to belong to L color , we have to average over different link constellations that this node might show.First, B k;k 0 is the probability that, out of the k links, k 0 connect to the giant component.It is calculated using u (compare the left side of Fig. 9).Second, M k 0 ;κ gives the probability for a certain color distribution among the links.It is calculated using r g , etc. (compare Fig. 9, second from left).We assume that this second step is independent of the first step, which is confirmed with the final results.Third, P κ gives the joint probability that for this color distribution, L r, L¯b, and L ḡ are connected at the same time.This is calculated using U r , etc. (compare the right side of Fig. 9).
This result holds for any degree distribution and color distribution.Notice that r c ≤ u c ≤ 1.The result for two colors only depends on the probabilities u c, while conditional probabilities such as U c were eliminated.This was possible as L¯1 and L¯2 are not overlapping for two colors.
For Poisson graphs, we find, with the according generating function, g 0 ðzÞ ¼ g 1 ðzÞ ¼ e kðz−1Þ ; ðC9Þ For more than two colors, L c overlap.For homogeneous color distributions r c ¼ 1=C, a closed-form solution can be found in the same way as for two colors with the binomial formula.We find Let us finally discuss the behavior for C → ∞.This can be done by utilizing the term σ k 0 , the probability that a node connecting over k 0 links to the giant component belongs to S color .As can be seen with Eq. (B9), σ 0 ¼ σ 1 ¼ 0. On the other hand, with r c → 0, Eq. (B4) converges to Eq. (B2), and therefore U¯1 → 0. This means that σ k 0 >1 → 1.We finally find, with Eq. (B6), that

Critical behavior for Poisson graphs
With Eq. (B6), vanishing σ k 0 causes S color ¼ 0. According to this is the case if U c ¼ 1 for any color c.With Eq. (B5), we find that U c ¼ 1 whenever u c ¼ 1. Examining Eq. (B4) for u c, we can relate it to site percolation (random removal of nodes).For Poisson graphs, we have r crit ¼ ð k − 1Þ= k.
With homogeneous color distribution r c ¼ 1=C, we can resolve the critical connectivity given the number of colors The normal giant component size S shows a special critical behavior just above the transition point; it scales linearly with k − 1.Here, we are interested in the behavior of S color , which is a function of 1 − u¯1, which itself can be related to 1 − u ¼ S. By inserting into Eq.(B4), it can be shown that u cð kÞ ¼ r c þ ð1 − r c Þuðð1 − r c Þ kÞ.For small arguments ð k − kcrit Þ, Inserting this into Eq.(B5), we find, using 1 − uð k > 1Þ ≈ ½dð1 − uÞ=d kj¯k ¼1þ0 ð k − 1Þ, if additionally k − kcrit ≪ k − 1 holds (1 − u¯1 small compared to 1 − u).

FIG. 3 .
FIG. 3. Size of the giant color-avoiding component S color in random networks with uniformly distributed colors.We show the dependence of S color on average degree k (a) for Erdős-Rényi networks and (b) scale-free networks with different numbers of colors.Standard-deviation error bars are shown but barely visible for networks of size N ¼ 10 6 .The blue lines show the corresponding analytical results.For comparison, we include the giant component size of standard percolation S (black solid line) and the limiting case of a system with an infinite number of colors, S color;∞ (black dashed line).As mentioned in the text, S color;∞ is the same as the giant component in two-core percolation.(c) Critical exponent and finite-size scaling for Erdős-Rényi networks with C ¼ 3. Note that in the critical region, the theory and simulations show a slope of almost exactly 3 as predicted by Eq. (8).Finite-size scaling is shown, with results of > 150 realizations per size plotted individually and averaged.

FIG. 5 .
FIG.5.Details for the AS-level Internet.(a) For comparing with theoretical results S color ¼ jL color j=N (black line), we calculate shares jL color ∩ fc i ¼ cgj=jfc i ¼ cgj for 20 countries with the most AS.Here, fc i ¼ cg is the set of all AS assigned to country c, and jj is the number of nodes in a set or component.Results are shown with black bars and compare well with theoretical results for the group of eleven countries shown to the left of the vertical dashed line.To understand why color connectivity breaks down for the other countries, we also plot the shares of two-core AS in the respective countries (white bars).The fraction jL þ c ∩ fc i ¼ cgj=jfc i ¼ cgj (shown with red bars) is the fraction of AS in a country c which can communicate while avoiding further AS of that country c.The fraction jL þ c¼ ŪSA ∩ fc i ¼ cgj=jfc i ¼ cgj (shown with cyan bars) is the fraction of AS in a country c which can communicate while avoiding AS registered to the USA.Color-avoiding connectivity when sender and receiver colors are trusted: (b) The fraction of nodes of row color that are in the largest CAC component when trusting all nodes of row color and column color.The value indicates the probability that a node of color i can communicate with the largest CAC set of nodes of color j.In panel (c), we show the relative improvement compared to not trusting the sender-receiver colors.For USA nodes, this makes them 4 times more likely to be color-avoiding connected, while for most countries, it makes a less substantial difference, unless they are communicating with the USA.

¼ 1 −
g 0 ðuÞ − ð1 − uÞ dg 0 ðzÞ dz z¼u : Union of L c with all nodes of color c being direct neighbors of nodes in L c . 7. We base our theory on the method to calculate the size of normal giant components, as illustrated in this figure.Using a self-consistency equation, the probability u can be calculated.This is the probability that a node is not connected to the giant component over a single link (see top).On the bottom, the probability for a node with k links is illustrated to have at least one link connecting to the giant component.Note that u k is the probability that all links fail.FIG.8.We have to calculate the probability, if a node with k links is, for every color c, connected to the giant component L c with deleted color c.All connections over at least one link have to exist at the same time.We illustrate this question with red (c ¼ r), green (c ¼ g), and blue (c ¼ b).If a link connects to LL ḡ, it definitely does not connect to L c for one of the other colors.This kind of dependence forces us to use a stepwise calculation with conditional probabilities.
the precondition holds for an empty set of nodes.In this case, we define U c ¼ 1.Notice that the additional information of the explicit color, instead of only stating that the color is not c, does not alter the results, as a FIG