Robustness Assessment of Complex Networks using the Idle Network

Network robustness is an essential system property to sustain functionality in the face of failures or targeted attacks. Currently, only the connectivity of the nodes unaffected by an attack is utilized to assess robustness. We propose to incorporate the properties of the emerging connectivity of the nodes affected by the attack (Idle Network), which is demonstrated to contain pertinent information about network robustness, improving its assessment accuracy. The Idle network information offers the potential to generalize models, enabling them to estimate robustness for unseen attacks.

The representation of complex systems as networks, where system components are abstracted as nodes and their interactions as links, has allowed us to further our understanding of system structure and dynamics in fields as diverse as biology, engineering, economics, and geosciences [1][2][3][4][5][6][7][8][9]. Particularly, network theory has been instrumental in developing methodologies to assess the robustness of interconnected systems such as power grids, the internet, and airports, in the face of random failures or targeted attacks [10][11][12][13][14]. The robustness of a network can be defined as the ability of the network to maintain functionality whilst undergoing an attack (sequential node removal). In a world where critical infrastructures and their connectivity are potential targets of malicious attacks, it is paramount to identify the key network properties that determine robustness for a given attack. Since the pioneering work by Albert et al. [15], a vast literature has presented methodologies and metrics to quantify network robustness [11,12,[15][16][17][18][19][20]. However, current methodologies to assess network robustness focus mainly on the connectivity of the nodes unaffected (Active Network) by the attack, while the connectivity of the affected nodes (Idle Network) has received minimal attention [21]. In this study, we demonstrate the benefit of including information about the Idle Network in assessing network robustness. Let us formally define the Active and Idle Networks, which naturally emerge from an attack process acting on a network [21]. Attacking a network is synonymous to a process of sequential node removal. Consider an initial network N that consists of N nodes, denoted {n i } : i = 1, ...N , connected by a set of links {(n i , n j )}. The sequential node removal process starts at t = 0 with the original network N , and an attack strategy D, that is a function of the properties of N . For every discrete time step t > 0, the attack eliminates a chosen node n i and all its corresponding links (n i , ·), resulting in a new network, formed by the set of nodes and links that is unaffected by the attack; we denote this the Active Network N A (t). The attack process also gives rise to the Idle Network N I (t), which consists of the entire set of nodes removed from the Network N up to time t, and the links originally existing among them (see Fig. 1). We can mathematically express a given attack strategy D acting on a network N , as the decomposition of N into the Active N A (t) and Idle N I (t) networks.

Idle Network Active Network
It is clear that with respect to the nodes, the Active and Idle networks are complementary, implying that the union of the nodes in N A (t) and N I (t) is the set of nodes in N . However, this is not the case for the connectivity of the nodes, as it is neither complementary nor symmetric. When a node is removed, all its links are removed from the Active Network. Yet, from the set of links removed by the attack, only the subset that connects affected (removed) nodes is included in the Idle network. We argue that the information about the connectivity of the affected nodes by an attack, which is just available in the Idle Network, provides important information on the effectiveness of the attack and, therefore, on the robustness of the attacked network. Thus, our research hypothesis can be stated as follows: there exists non-redundant information on the robustness of a network undergoing an attack in the Idle Network structure.
To test this hypothesis, we will extract indicators from the Active and Idle Networks to benchmark our capacity to assess network robustness using only Active indicators (traditional approach) versus incorporating Idle indicators as well. Particularly, we choose two simple indicators to model robustness: (i) The largest cluster size C, defined as the ratio of the number of nodes in the largest cluster (set of connected nodes) over the number of nodes N in the initial network, quantifies the effect of the attack in breaking down (building up) the Active (Idle) network in terms of its size. Note that this metric does not encompass the effectiveness of the connectivity of these networks. (ii) The link fraction L is the number of links in the Active (Idle) network, normalized by the total number of links in the initial network N . This indicator describes how the attack removes (adds) links and thus provides information about how well-connected the nodes are in the Active (Idle) Network. Both of these indicators are normalized to be between [0, 1], and are monotonically decreasing (increasing). These indicators were chosen such that, in complement, they have information on the overall functionality of the network and therefore, on its robustness.
Following previous studies [22][23][24][25], we utilize the efficiency, E, as a proxy for robustness. Recall that E of a network N with N nodes is defined as the standardized sum of the reciprocal of the shortest paths d i,j between all pair of nodes i and j; Note that, if two nodes i, j are disconnected, then 1 di,j = 0, as the distance between the two nodes is infinite. E is normalized to always start at 1, by dividing all values of E for a single evolution by the value of the efficiency for the intact network. We underline that E is a property of the Active Network, as it is solely a function of the adjacency matrix of the Active Network.
Given the two indicators and the proxy for robustness, we transform our hypothesis into a regression problem. Thus, we evaluate the difference in estimation accuracy achieved via a neural network when only Active indicators are included in the training set, and when Idle indicators are also included. More specifically, we use a forward-feeding and back-propagating artificial neural network with 3 hidden layers of 10 neurons per layer, each with ReLu activation functions; set to optimize validation squared residual loss. Each neural network was implemented with a dataset of 200 attack sequences, with a 3 4 train, 1 8 test, and 1 8 validation split. The output of the neural network is the estimation of the efficiency as the proxy for robustness. In order to verify our hypothesis, the estimation accuracy must increase when the neural network is granted the Active and Idle indicators, compared to the estimation produced using the Active indicators alone.
Our study investigates different stochastically generated synthetic network topologies and attack strategies to test our hypothesis systematically. Namely, we test the robustness estimation for random (Erdos Renyi [26]), scale-free (using a configuration model [27]), and small world (Strogatz-Watts [1]) topologies, undergoing three different attack strategies: targeted (degree), random failure, and random spreading [21]. Furthermore, the different topologies were explored for varying initial link densities, as characterized byk (average degree of the initial network N ). The tested link densities for all of the synthetic topologies correspond tok ∈ {3, 6, 12, 24}. Thus, we have explored 36 combinations of topologies, attacks, and link densities. For each of these combinations, 200 different stochastic topologies were generated and exposed to a full attack evolution, where the indicators and efficiency were calculated at the different stages of the attack (see Supplemental Material (SM)).    Fig. 2d. Most importantly, as hypothesized, by combining the indicators of the Active and Idle networks, we obtained a more accurate estimation of network robustness (SSR = 0.01±0.006) (Fig. 2e). Additionally, the increased accuracy in the estimation is consistent throughout all stages of the attack (Fig. 2f). Our results for the whole data set of network topologies and attacks demonstrate systematically that Active indicators, when combined with Idle indicators, increase the accuracy in the estimation of robustness from 20% to 900% depending on the topology and attack, verifying our hypothesis (see SM).

Largest Cluster size / Link Fraction
Guided by the following observation made from the data set of topologies and attacks analyzed in this work (see SM): "the more complex (i.e., more variable at different scales) the efficiency curves are, the higher the improvement in the accuracy of robustness assessment by acknowledging Idle indicators", we investigate the potential role of Idle information in distilling variability in the data set to improve network robustness estimation. To this goal, we systematically explore the effect of variability in the training set in estimating robustness. More specifically, we trained neural networks with training sets with increasing variability by combining different topologies, attacks, and link densities (including a data set consisting of all combinations), and we compared the estimation accuracy when only Active indicators are considered, and when Active and Idle indicators are both included. Fig. 3 shows the model outputs for the most generalized case: data for all the three topologies, four densities, and three attacks are included in the training set. The results are apparent, the inclusion of Idle indicators (see 3 c and g) produce exceedingly good predictions when compared with those achieved via only Active indicators (see 3 a and b). When the difference between the model output and the true value of robustness (SSR) is computed as a function of the attack stage (see 3 d and h), a consistent pattern is observed: Active and Idle indicators combined outperformed the Active indicators alone during the most significant part of the attack sequence.
As expected, a general trend is also observed (see SM): the more heterogeneous the training set is, the less accurate is the estimation of network robustness done by all three neural networks (trained with: Active indicators only, Idle indicators only, and Active and Idle indicators). However, the rate of performance deterioration is not similar, in fact, it is not comparable. As soon as variability is introduced in the training set, the neural network using the Active indicators exclusively is not able to estimate even the general trend, let alone the variability. Whereas the neural network trained using both the Active and Idle indicators is able to estimate the general trend very well and a majority of the variability. This trend is consistent for all of the topologies tested (see Fig. 3 and SM).
FIG Two further remarks are noteworthy from this part of the work: (i) In a surprising number of times, the robustness estimations obtained via the neural networks trained exclusively with Idle indicators are significantly more accurate than those produced by the neural networks trained only with Active indicators, highlighting the non-redundant relevant information content in the Idle network. (ii) In select cases, the Neural Network trained with all topologies, densities, and attacks outperforms in terms of accuracy the estimation of robustness made by a Neural Network trained purely for a specific topology, attack, and link density, highlighting the value of Idle indicators in interpreting the overall variability in the dataset to improve estimations for specific cases. Therefore, we claim that the intrinsic information in the Idle Network, jointly with the Active indicators, allow the neural networks to navigate the variability in the training set to maintain an enhanced accuracy in assessing network robustness.
Our previous results have clearly demonstrated that the Idle network contains relevant information, useful for improving the assessment of network robustness. However, the degree of improvement in that assessment varies depending on the attack and network topology. Acknowledging that the used synthetic networks lack some properties often exhibited by real-world networks (e.g., modularity), here, we further test the relevance of Idle network information in assessing robustness of real networks. To do that, we simulate stochastic targeted degree attacks on real-world topologies, where the probability of removing a given node is proportional to its original degree. We also evaluate the role of Idle information in generalizing the estimation robustness for an unseen attack (e.g., based on betweenness centrality). Particularly, we first train a neural network using only Active indicators resulting from 200 node removal sequences obtained by following a stochastic targeted degree attack strategy. Our results show a fairly good estimation of our proxy of robustness (See Fig. 4a -Little Rock Lake Food Web [28]). However, suppose that trained neural network is used to estimate the network robustness of the same network topology under a stochastic targeted betweenness attack (unseen attack). In that case, the estimation fails to reproduce the evolution of the true value during the vast majority of the attack sequence (see Fig. 4c). On the other hand, if a neural network is trained with the Active and Idle indicators of the same 200 node removal sequences (stochastic targeted degree attacks), not only we obtain better accuracy in estimating network robustness under stochastic targeted degree attacks (see Fig.  4b -Little Rock Lake Food Web), but also that neural network provides an exceptionally well-maintained accuracy in the estimation of network robustness for a previously unseen attack (stochastic targeted betweenness attack) for the vast majority (and relevant) part of the attack sequence (See Fig. 4d -Little Rock Lake Food Web). These results have been tested for several real-world networks (Little Rock Lake Food Web [28], Budapest Connectome [29], and US airports [30] -See SM), corroborating our two previous findings, namely, (i) Idle network information systematically improves our capacity to estimate network robustness, and (ii) Idle information allows us to retain accuracy in network FIG. 4. Illustration of the four real network topologies tested (top panels), along with the results for the Little Rock Lake. An artificial neural network for the Little Rock Lake was trained by attacking the topology with 200 full stochastic degree evolution's, and attempts to predict a previously unseen attack scheme (stochastic betweenness attack). In these attack schemes, the probability of removing a specific node is proportional to the degree or betweenness centrality. The sum of squared residuals is displayed for a single evolution.
robustness estimation under scenarios of enhanced variability, both in the training set and out-of-sample (e.g., altered attack strategies).
Our results indicate that the key role of Idle indicators is to partially harness the existing information in the internal variability of the training set to gain estimation power (i) in the face of variability in the training set (either from its intrinsic stochastic variability or due to the inclusion of different topologies and attacks in the training set), and (ii) for unforeseen attacks and topological features that generate variability compatible with that observed in the training set. Thus, the Idle network information is instrumental for our model (neural network) to interpret variability and improve the robustness assessment. However, if the variability in the data set is minimal (e.g., targeted attack in a sparse scale-free network), the gain achieved by including Idle indicators would be incremental. Furthermore, the indicators chosen in this study (size of the largest cluster and link fraction) could be particularly clumsy in encoding complementary information on network robustness to that encoded by the Active indicators for certain network topologies (e.g., spatial networks such as the power grid [1]), and therefore, these Idle indicators might be ineffective in enhancing network robustness assessment in those cases.
We want to finally remark that this study uses a neural network as a tool to turn our hypothesis into a regression problem. The chosen neural network architecture and typology to estimate our proxy of robustness is not intended to be optimal, but to demonstrate the information content and role of the Idle network in the assessment of network robustness. Thus, for example, we anticipate that using convolutional neural networks may improve the accuracy of robustness estimation. Such further improvements in the accuracy of estimating efficiency can lead to important implications of our work, since neural networks trained for generalized data sets would offer a light way to estimate network efficiency, which otherwise is a computationally very demanding quantity to be calculated.
Assessing network robustness accurately is essential to ensure the correct and sustained functionality of many natural and engineered systems. Our study shows that there is non-redundant and pertinent information on the robustness of a network in the so-called Idle network. The inclusion of Idle information in models to assess network robustness allows us to improve the accuracy of our estimations for a specific network topology and attack and equips models with the capability to interpret in-sample and out-sample variability to preserve estimation power amid noise and unseen variability. Thus, evaluating network robustness in the light of the Idle Network constitutes a conceptual paradigm shift that could improve the quality and accuracy of its assessment and might lead to new strategies to guide enhanced network resilience. ACKNOWLEDGMENTS A.T. acknowledges financial support support from the NSF Earth Sciences Directorate Grant EAR-1811909. Y.M. acknowledges support by the Government of Aragón and "ERDF A way of making Europe" funds through grant E36-20R, by Ministerio de Ciencia e Innovación, Agencia Española de Investigación (MCIN/AEI/10.13039/501100011033) through grant PID2020-115800GB-I00, and by Soremartec S.A. and Soremartec Italia, Ferrero Group. The funders had no role in study design, data collection, and analysis, decision to publish, or preparation of the manuscript.
This supplemental material provides the extended results of all the experiments performed during this work, demonstrating the validity of our central hypothesis tested, namely, the Idle network contains pertinent and nonredundant information about network robustness, improving its assessment accuracy. Moreover, the results systematically support that including both Active and Idle indicators in a model to estimate network robustness allows the model to better assimilate the variability in the training set, reducing the decline in network robustness assessment as that variability increases. Finally, our experiments using real-world network topologies shed light on the potential of models informed both by active and idle indicators estimating network robustness even for foreign attacks to those included in the training set.

This document is structures as follows:
-In Section A, we show the estimation accuracy obtained by Artificial Neural Networks with three different types of inputs: (i) only Active indicators, (ii) only Idle indicators, and (iii) both Active and Idle indicators. This triad of models is trained for each of the topologies (Scale-free, Small-World, and Random), attacks (targeted degree, random spreading, and random), and link densities (initial average degree: ̅ = 3, 6, 12, and 24). We denote these neural networks as specifically trained neural networks since each model is trained using a set consisting of a single specific topology, attack, and link density.

-
In section B, we present results showing the accuracy of the triad of Neural Networks (only using Active indicators, only using Idle indicators, and using both Active and Idle indicators) when the training data set includes a larger variability introduced by generalizing the attack strategies (targeted degree, random spreading and random) and the network topology (Scale-free, Small-World, and Random) for a specific link density. -Section C shows the extended results of our study by fully generalizing the training set, which includes all the topologies, attacks, and densities considered in this study. The estimation accuracy of network robustness by neural network using the generalized training set is evaluated systematically for the different cases and depending on whether only active, idle or both types of indicators were used as input to the Neural Networks.

-
In section D, we compare the performance of the different neural networks trained with data sets of increased variability (specifically trained, trained under generalized topology and attacks, and trained with the fully generalized dataset). This comparative is used to interrogate the effect of variability of training set in the deterioration of estimation accuracy of robustness.

-
In section E, we finally present the complementary results of those shown in Fig. 4 in the main manuscript, providing the network robustness estimation accuracy for other three real-world network topologies (Budapest Connectome, Top 500 US airports, and a power grid). These estimations were obtained with Neural Networks using training and validation sets consisting of stochastic degree attacks for two different testing sets: stochastic degree attacks and stochastic betweenness attacks, to show the ability of the different Neural Networks to provide estimations of robustness for previously seen (degree) and unseen (betweenness) attacks.
Note that all the neural networks used are forward-feeding and back-propagating artificial neural networks with three hidden layers of ten neurons per layer, each with ReLu activation functions; set to optimize validation squared residual loss. Each neural network used in the different sections was implemented with a dataset of 200 attack sequences, with a 3/4 train, 1/8 test, and 1/8 validation split. The neural network's output is the estimation of the efficiency as the proxy for robustness.

Section A: Specifically Trained Neural Networks
This section presents the network robustness accuracy for specifically trained neural networks, which are trained with a dataset of stochastically generated specific topologies, undergoing a single attack strategy for a particular link density. Table S1 displays the sum of squared residuals (SSR) for all specific neural networks applied to all 36 combinations of topology, attack, and link density.
Table S1 results demonstrate that Neural Networks informed by both Active and Idle indicators outperform the predictions of Neural Networks fed by only Active or Idle indicators in every single case. This systematically proves our hypothesis. Moreover, we notice that the gain from including the Idle indicators ranges drastically. The lowest gain of 20% comes from the scale-free topology undergoing degree attack with a ̅ = 24, and the most significant gain of around 900% comes from the scale-free topology undergoing random attack with a ̅ = 3. The difference in information gained from including the Idle indicators increases as the variability in the dataset grows. The scale-free topology undergoing random attack has the most internal variability in the evolution of the efficiency, as the importance of each node scales as a power law. Therefore, when a series of random attacks are performed, the evolution of the efficiency is significantly different depending on when central hubs are removed during the attack sequence.

Section B: Neural Network trained with generalized attack strategies and topologies for a fixed link density.
For completeness, we include the performance of neural networks trained with augmented datasets consisting of a mix of all the attack strategies (degree, spreading, and random) applied to all the topologies (Scale-Free, Small-World, and Random) with a common link density. The performance of those models is interrogated depending on the type of indicators used (only Active, only Idle, or both Active and Idle) when applied to estimate network robustness for each of the 36 combinations of attack, topology, and link density. The results shown in table S2 not only further support our hypothesis but also show, when these values are compared with those shown in table S1, that the deterioration of the accuracy in the estimation of efficiency as function of the variability introduced in the training set is reduced by including both Active and Idle indicators (for more details see section D).  Table S3 shows the performance of the neural networks trained with the augmented dataset consisting of all the attack strategies (degree, spreading, and random) applied to all the topologies (Scale-Free, Small-World, and Random) with all the different link densities ( ̅ = 3, 6,12,24). The performance of those models is interrogated depending on the type of indicators used (only Active, only Idle, or both Active and Idle) when applied to estimate network robustness for each of the 36 combinations of attack, topology, and link density.
The values of the sum of the square residuals (SSR) obtained from the efficiency (proxy of robustness) estimation done by the models trained with different sets of indicators shown in table S3 highlight the remarkable capacity of the model trained with both Active and Idle indicators in estimating robustness, particularly when compared with the performance of the model granted only with Active information. The outcome of this experiment confirms once more our hypothesis, demonstrating that together the Active and Idle indicators are able to leverage the variability existing in the training set to disentangle the mixing of different topologies, link densities, and attacks existing in the training set to produce fairly accurate estimations for individual combinations. Two further important remarks can be pointed out from table S3: (i) the estimation accuracy of network efficiency (a property of the Active network) done with the model granted only Idle information is comparable to or better than the accuracy offered by the model trained with only Active indicators in more than 60% of the combinations, showing the particular role of Idle information in informing the estimation in the face of enhanced variability; (ii) In particular cases this effect leads to a performance of the Neural Network trained by Active and Idle indicators of the fully generalized compatible or better than the performance offered by the model specifically trained (e.g., see Table S1 for Scale-Free ̅ = 3 ) The results in table S3, when compared with those shown in tables S1 and S2, show that the deterioration of the accuracy in the estimation of efficiency as a function of the variability introduced in the training set is reduced by including both Active and Idle indicators (for more details see section D).  S1 -True values (computed) of network efficiency, E, and efficiency estimations (NN estimation) obtained from neural networks for different topologies, attacks, and link density combinations. The neural networks were trained with the specific attack (Degree -D, Spreading -S, or Random -R), topology (Scale-free -SF, Small-World -SW, or Random -RD), and link density corresponding to those of the training set. In the rightmost column, the panels display the value of the cumulative SSR as a function of the attack stage for each of the cases.
. Fig. S2 -True values (computed) of network efficiency, E, and efficiency estimations (NN estimation) obtained from neural networks for different topologies, attacks, and link density combinations. The neural networks were trained with the augmented dataset consisting of all attacks (Degree -D, Spreading -S, and Random -R) and all topologies (Scale-free -SF, Small-World -SW, and Random -RD) for a given link density. In the rightmost column, the panels display the value of the cumulative SSR as a function of the attack stage for each of the cases.

Section E: Neural Network for Real Complex Networks.
This section includes the extended results of the analysis for three other real-world network topologies, namely, the Budapest Connectome [29], Network of flights among the 500 busiest commercial airports in the United States in 2002 [30], and a power grid [1], to test the validity of our hypothesis in more realistic topologies. The results presented in this section were also utilized to evaluate the relevance of Idle information for the model to assess the robustness of the three topologies when they undergo an attack that has not been included in the training set used to fit our model. Fig. S4 displays the true values and neural network estimations of network efficiencies at the different stages of the attacks. Particularly, for each network topology, the top panels offer the information about the performance of the Neural Networks trained with the different set of indicators (Active, Idle, and Active and Idle) when tested with data corresponding to the same attack type as that used to create the training set (stochastic degree attack). The results clearly support the central hypothesis of this work demonstrating that by also providing Idle information to the model, the accuracy of the estimation of efficiency increases in all the cases and for the most significant part of the attack sequence. The bottom panels for each of the network topologies display the results of the Neural Networks trained using stochastic degree attacks but estimating the decline in efficiency as a different attack type (stochastic betweenness attack) proceeds with the node removal. For both the US airport network and the Budapest Connectome, the inclusion of Idle information in the model increases its estimation power even when evaluated for a previously unseen attack. The performance obtained by the model trained with both Active and Idle indicators for the US airport is particularly remarkable when compared with the resulting performance using either of the two types of indicators alone. However, for the power grid topology, the robustness assessment for a stochastic betweenness attack (previously unseen attack) is worse when estimated with both Active and Idle indicators than that with the Active indicators only. Interestingly, Idle indicators alone yield the most accurate robustness assessment, especially if only the most relevant part of the attack is considered. To understand these apparent discrepancies of the results obtained for the power grid network with respect to all the other network topologies used, we need to better contextualize the particularities of the power grid, which is a network significantly different from the other real and synthetic networks used in this work. Particularly, the power grid network is a low density and spatial network, with no clearly distinguishable hubs. As a stochastic degree attack proceeds in this topology, the Active largest cluster size and link fraction decline very quickly. However, the Idle indicators display an almost negligible trend as the nodes removed at early stages of the attack are dispersed throughout the network (i.e., disconnected nodes in the Idle network). This mismatch in the intrinsic variability of the Active and Idle indicators in the most relevant part of the attack process hinders the neural network in extracting the important information of the Idle indicators, as the range of the Active indicators is overwhelming. In other words, the chosen Idle indictors for low density spatial networks are not suitably encoding complementary information useful for the neural network to increase the accuracy in the estimation. Nevertheless, we want to highlight that the information content in the Idle network is not disputed by these resultsrecall that the accuracy of the neural network that only uses the Idle indicators is the highest. In this case, the neural network is forced to put weights and bias on the Idle indicators, and this results in an exceedingly good prediction for both the seen attack, and the un-seen attack. Fig. S4 -True values (computed) of network efficiency, E, and estimations (NN estimation) of efficiency obtained from neural networks for three real network topologies (US airports, Budapest connectome, power grid). The neural networks were only trained using stochastic targeted attacks for each topology. Still, they were tested for both estimating efficiency as a function of the attack stage when this attack was stochastic targeted attacks and for an unseen attack type, namely, stochastic betweenness attack. In the rightmost column, the panels display the value of the cumulative SSR as a function of the attack stage for each of the cases.