Finding multiple core-periphery pairs in networks

With a core-periphery structure of networks, core nodes are densely interconnected, peripheral nodes are connected to core nodes to different extents, and peripheral nodes are sparsely interconnected. Core-periphery structure composed of a single core and periphery has been identified for various networks. However, analogous to the observation that many empirical networks are composed of densely interconnected groups of nodes, i.e., communities, a network may be better regarded as a collection of multiple cores and peripheries. We propose a scalable algorithm to detect multiple non-overlapping groups of core-periphery structure in a network. We illustrate our algorithm using synthesised and empirical networks. For example, we find distinct core-periphery pairs with different political leanings in a network of political blogs and separation between international and domestic subnetworks of airports in some single countries in a world-wide airport network.


Introduction
Many complex systems can be expressed as networks in which a node represents an object (e.g., person, web page, protein) and an edge represents the relationship between two objects (e.g., friendship, hyperlink, physical interaction).A network can be characterised by microscale, mesoscale and macroscale structural patterns such as the degree (i.e., the number of edges that a node has), clustering coefficient, and diameter [1,2].Among various structural properties of networks, community structure is a representative mesoscale structure of networks [3].A community is a group of densely interconnected nodes while different communities are sparsely interconnected.A community often corresponds to a group of nodes sharing a role, and identifying communities aids classification of nodes and visualisation of networks [3].
Core-periphery structure is another mesoscale structure of networks, with which we view a network as being composed of two groups of nodes called the core and periphery.Although the definition varies, a core is often defined as a group of densely interconnected nodes, and a periphery as a group of nodes that are densely connected (i.e., adjacent) to the core nodes but not to other peripheral nodes [4][5][6][7][8][9][10][11][12][13][14][15][16].Although a core and community are both a group of densely interconnected nodes, they are different; a core connects densely to its periphery, whereas a community is not densely connected to other nodes outside it.Core-periphery structure has been found in various networks including brain networks [17], metabolic networks [18], protein interaction networks [19], social networks [4,9,10,15], international trade networks [6,11,20], financial networks [8,21,22] and transportation networks [5,8,9].For example, in a coauthorship network among researchers, notable researchers often publish papers with other notable researchers, forming a core, while other researchers tend to publish papers with particular notable researchers such as those in the same research group, forming a periphery [9].
Borgatti and Everett introduced the first quantitative formulation of coreperiphery structure [4].In the discrete version of core-periphery structure [4,9], which we will focus on in the present study, Borgatti and Everett introduced an idealised core-periphery structure in which core nodes are adjacent to all other nodes and peripheral nodes are adjacent to all core nodes but not to any peripheral node.Although it is also realistic to assume that the coreperiphery connectivity is sparser than the core-core connectivity [4], we use the idealised core-periphery structure in the present study.Then, they sought for the assignment of all nodes in a given network to a core or periphery such that the network is as close as possible to an idealised core-periphery structure.Since then, many core-periphery detection algorithms have been developed [4-6, 8-11, 13-15, 18, 20].These algorithms aim to identify a single core-periphery pair embedded in a network.However, a network may be better regarded as a collection of multiple core-periphery pairs [4,7,9,[12][13][14], which is the focus of the present study.Previous studies in this direction have not provided a tailored scalable algorithm to this end.A study focussed on a related but different type of multiple core-periphery structure [23].Other algorithms aim to detect multiple cores but do not assume that peripheral nodes are sparsely connected to each other [10,24,25].A network can also have multiple disjoint cores in the form of k-cores [26], k-trusses [27] or dense subgraphs [28,29].
However, the corresponding algorithms do not tell how densely peripheral nodes are connected to each other or to which core a peripheral node belongs.Another algorithm to find various mesoscale structure of networks including multiple core-periphery pairs [12] is computationally costly and only feasible for small networks (Supplementary Note 1).
We present a scalable algorithm to detect multiple non-overlapping coreperiphery pairs in networks, each of which is as close as possible to an idealised core-periphery structure.Our algorithm automatically determines the number and the size of the core-periphery pairs.Various algorithms to detect coreperiphery structure in networks are classified to density-based and transportbased algorithms [8,14,18].Densely-based algorithms posit that a core is a densely connected group of nodes, whereas transport-based algorithms posit that a core is a group of nodes that can be reached to other nodes along short paths.
In the present study, we focus on density-based algorithms.

Algorithm
We extend the idealised core-periphery structure introduced by Borgatti and Everett [4] to the case of multiple pairs of a core and a periphery.In the Borgatti-Everett (BE) algorithm, one considers a graph (i.e., network) composed of N nodes and M edges.Let A = (A ij ) be the adjacency matrix, i.e., A ij = 1 if node i and j are adjacent by an edge, and A ij = 0 otherwise.We assume an undirected and unweighted network without self-loops, i.e., A ij = A ji and A ii = 0 for all i and j.Let x = (x 1 , x 2 , . . ., x N ) be a vector of length N , where x i = 0 if node i is a peripheral node, and x i = 1 if node i is a core node.We define the idealised core-periphery structure as the network where each core node is adjacent to all core and all peripheral nodes, and each periphery node is adjacent to all core nodes but no peripheral node.The corresponding adjacency matrix, B(x) = (B ij (x)), is given by The discrete version of the Borgatti-Everett (BE) algorithm, which we consider in the present study, seeks x that maximises similarity between A and B(x).
We will explain the similarity measure in Section 2.3.
We extend the idealised core-periphery structure to the case of multiple core-periphery pairs.Let C be the number of core-periphery pairs and c = (c 1 , c 2 , . . ., c N ) be a vector of length N , where c i ∈ {1, 2, . . ., C} is the index of the core-periphery pair to which node i belongs.We exclude overlaps between core-periphery pairs, and between the core and periphery in each core-periphery pair.The corresponding adjacency matrix, B(c, x), is given by where δ is Kronecker delta.
We seek (c, x) that makes B(c, x) the closest to A by maximising where p = M/[N (N − 1)/2] is the density of edges in the network.The term ) represents the number of edges that are present in both the given network and the idealised core-periphery structure.The null-model term is the expected number of edges that are present in both the idealised core-periphery structure and an Erdős-Rényi random graph [30], in which each pair of nodes is adjacent with probability p.A large Q value indicates that the given network and the idealised core-periphery structure share more edges than by chance.

Maximisation of Q
We use a label switching heuristic [31,32] to maximise Q.First, we assign each node to a different core by setting (c i , x i ) = (i, 1) (1 ≤ i ≤ N ).Then, we scan all nodes in a random order.For each scanned node i, we calculate the increment in Q when we tentatively update (c i , x i ) to the core of the core-periphery pair that a neighbour of node i, denoted by j, belongs to, i.e., (c j , 1).We also calculate the increment in Q when we tentatively update (c i , x i ) to (c j , 0).Note that we experiment on these two cases regardless of whether x j = 0 or x j = 1.We carry out this procedure for all neighbours of i to measure the increment in Q in each case.Finally, we update (c i , x i ) to the label that has yielded the largest tentative increment in Q (i.e., (c j , 0) or (c j , 1) for a neighbour j).If any relabelling does not increase Q, we do not update (c i , x i ).When we have scanned all nodes, we stop the entire procedure if no node has changed its label in the present round.Otherwise, we draw a new random order of nodes and scan all nodes again according to the new random order.
The increment in Q by changing node i's label from (c, x) to (c , x ) is given by di,(c ,1) where di,(c,x) is the number of neighbours of node i that have label (c, x), and Ñ(c,x) is the number of nodes with label (c, x).For each scanned node i, we calculate equation ( 4) at most 2d i times, where d i is the degree of node i.Therefore, the time complexity for scanning all nodes in one round is , and that of the entire algorithm is O(rM ), where r is the number of rounds.We run this algorithm 20 times starting from the same initial condition stated above and adopt the node labelling that produces the largest value of Q.

Significance of the core-periphery structure
A detected core-periphery structure may be statistically insignificant [4,33].
Therefore, we adapt a statistical test in the case of a single core-periphery pair [33] to the case of multiple core-periphery pairs.We first describe the statistical test for a single core-periphery pair [33].We measure the significance of a core-periphery pair by a quality function based on the Pearson correlation coefficient [4], which is defined by where A core-periphery pair detected for the given network is deemed to be significant if Q BE is statistically larger than Q BE values calculated for a null model.One uses the Erdős-Rényi random graph as the null model, in which the number of edges is the same as that of the original network.One generates many networks using the Erdős-Rényi random graph and maximises Q BE for each network.The Kernighan-Lin (KL) algorithm [34] is used for maximising Q BE .The core-periphery pair detected for the original network is significant at a significance level of α ∈ (0, 1] if the Q BE value for the original network is larger than a fraction 1 − α of the Q BE values for the randomised networks.
In case of multiple core-periphery pairs, we apply essentially the same statistical test to each core-periphery pair detected in the original network.For each detected core-periphery pair, we first calculate Q BE .Second, we generate 3,000 networks using the Erdős-Rényi random graph, which have the same number of nodes and edges as those of the core-periphery pair.In counting the number of edges, we only consider the edges connecting nodes within the core-periphery pair.Third, we detect a single core-periphery pair in each randomised network by maximising Q BE using the KL algorithm.Fourth, we compare the obtained Q BE values between the original and randomised networks.If a core-periphery pair is judged to be insignificant, we call the corresponding nodes the residual nodes, i.e., those not belonging to any significant core-periphery pair.
If we test C core-periphery pairs at a significance level of α, the probability of making at least one false positive (i.e., an insignificant core-periphery pair is judged to be significant) is 1−(1−α) C , which increases as C increases.To remedy this multiple comparison problem, we adopt the Šidák correction, with which we test each core-periphery pair at a significance level of [35].We set α = 0.01.
We have decided to use Q BE maximised by the KL algorithm as the test statistic to compare the original and randomised networks.However, we can also use different algorithms to maximise Q BE .We can also use a different test statistic including Q restricted to the case of the one core-periphery pair (i.e.,

Variation of Information
For the synthetic networks with planted core-periphery structure, we measure the difference between the true core-periphery structure (c, x) and the inferred core-periphery structure (ĉ, x) by the variation of information (VI) [36].The VI is given by where P (c, x; ĉ, x) is the fraction of nodes that have the true label (c, x) and inferred label (ĉ, x).The VI value is equal to zero if and only if the inferred coreperiphery structure is the same as the true one.We measure the performance of an algorithm by averaging VI values over the 100 generated networks.

Results
We compare the proposed algorithm with a previous algorithm, which detects a single core-periphery pair by maximising Q BE using the KL algorithm [33,34].
We refer to the latter algorithm as the BE-KL algorithm.We also compare our algorithm with another ad-hoc two-step in which we divide a network into a core and a periphery, and also divide the same network into non-overlapping communities by maximising modularity using the Louvain algorithm [32].Then, we regard the core and peripheral nodes in each detected community as a core-periphery pair.To divide the network into a core and a periphery, we use the BE-KL algorithm.We apply the statistical test (Section 2.3) to the core-periphery pairs detected by the three algorithms.We do not compare these algorithms with the algorithm introduced by Tunç and Verma [12] because of a low speed and insufficient performance of their algorithm on model networks with planted core-periphery structure (Supplementary Note 1).

Synthetic networks
We compare the performance of the three algorithms on four different types of synthetic networks with a planted core-periphery structure schematically shown in Fig. 1.We generate the synthetic networks using stochastic block models [9,13,14,37,38].We draw label (c i , x i ) for the ith node (1 Then, we place edges between each pair of nodes with label (c, x) and (c , x ) with probability Θ (c,x),(c ,x ) .For each type of the stochastic block model, we generate 100 networks and detect core-periphery pairs by the three algorithms.
As a first example, we consider a network composed of a single core-periphery pair (Fig. 1a).We set and Θ (1,0),(1,0) = 0.01, where θ ∈ {0.01, 0.05, 0.1, 0.15, . . ., 1}.We note that the core-periphery structure is evident for a large θ value.For this network model, the VI value, quantifying the discrepancy between the true and inferred core-periphery structure, is compared for the three algorithms in Fig. 2a.The VI values for the BE-KL algorithm are approximately equal to zero for most of θ values (θ ≥ 0.1).The VI values for the two-step algorithm are large even for a large θ value because the two-step algorithm divides the single core-periphery pair into communities.In contrast, the VI values for the proposed algorithm are close to zero when θ ≥ 0.2.Therefore, the performance of the proposed algorithm on this network model is comparable to that of the BE-KL algorithm when θ is not too small.As a second example, we examine networks composed of two core-periphery pairs (Fig. 1b).We set π (c,1) = 1/8, π (c,0) = 3/8, Θ (c,1),(c,1) = Θ (c,1),(c,0) = θ, and Θ (c,0),(c,0) = Θ (1,x),(2,x ) = 0.01 for c ∈ {1, 2} and x, x ∈ {0, 1}.The VI values for this network are shown in Fig. 2b.The VI for the BE-KL algorithm is large for all θ values because the BE-KL algorithm is not designed for multiple core-periphery pairs.The VI values for the two-step algorithm and those for the proposed algorithm are similar and close to zero for θ ≥ 0.4.
In empirical networks, there may be nodes that are better regarded not to belong to any core or periphery.Therefore, as a third example, we consider a network composed of a single core-periphery pair and residual nodes (Fig. 1c).
We regard the block of the residual nodes as a single group of nodes, like a core or periphery, when calculating the VI value.Let R be the index for the block of the residual nodes.We set The VI values for this model are shown in Fig. 2c.The VI for the BE-KL algorithm is large because the residual nodes are classified as peripheral nodes.The VI for the two-step algorithm is large for the entire range of θ.In contrast, the VI for our algorithm is close to zero for θ ≥ 0.3.
As a fourth example, we consider networks composed of two core-periphery pairs and residual nodes (Fig. 1d).We set

Empirical networks
We apply the three algorithms to three empirical networks.For directed and weighted networks, we disregard the direction and the weight of edges.

Karate club network
Consider the karate club network [39], which has N = 34 nodes and M = 78 edges (edge density p = 0.139).A node represents a member of a karate club at a university.Two members are adjacent if they have socially interacted outside club activities during the observation period.During the study, a conflict occurred between the instructor (node 1) and the president (node 34), which fissured the club.Based on self-reports, each member was labelled the instructor side (15 members), the president side (16 members) or neutral (3 members) [39].
The core-periphery structure detected by the three algorithms is shown in Fig. 3.The BE-KL algorithm detects a single core-periphery pair such that both the instructor and president are core nodes (Fig. 3a), neglecting the fissure of the club.The two-step algorithm detects two core-periphery pairs, each of which consists mostly of the members with the same leanings (Fig. 3b).In particular, the instructor and the president belong to the core of the different core-periphery pairs.Two neutral members, nodes 10 and 19, are assigned to the president's core-periphery pair, which does not agree with the self-reports by the members.
The residual nodes consist of the members on the instructor side, those on the president side and a neutral member.Our algorithm detects almost the same two core-periphery pairs as that detected by the two-step algorithm (Fig. 3c).

Political blog network
The second example is a political blog network [40], which has N = 1,222 nodes and M = 16,714 edges (edge density p = 0.0224).A node is a blog on the United States president election in 2004, and two blogs are adjacent if one blog cites the other blog on its front page.Each blog was labelled with one of the political leanings, liberal (586 blogs) or conservative (636 blogs), determined by automated categorisations by several weblog directories [40].If a blog was uncategorised or classified to conflicting categories, the authors of Ref. [40] manually judged the political leaning.
The core-periphery structure detected by the three algorithms is shown in Fig. 4. The unique core detected by the BE-KL algorithm is a mixture of liberal and conservative blogs (Fig. 4a).The peripheral blogs are mostly adjacent to blogs with the same political leaning.However, the structure detected by the BE-KL algorithm alone does not tell this unless we refer to the political learning of the individual blogs.A different algorithm for a single core-periphery structure yielded similar results for the same network [13].The two-step algorithm detects three core-periphery pairs, each of which mostly comprises the blogs with the same political leanings (Fig. 4b).Two core-periphery pairs are much larger than the third one and have the opposite political leanings.The third small coreperiphery pair is mainly composed of liberal blogs.In each core-periphery pair, a majority of the peripheral nodes is densely interconnected, which is against the idealised core-periphery structure.This is due to the community detection step that partitions a network into communities with dense intra-community edges.Our algorithm detects two core-periphery pairs, each of which mostly comprises the blogs with the same political leaning (Fig. 4c).The detected two core-periphery pairs are smaller than those detected by the two-step algorithm.
More nodes are classified as residual nodes than by the two-step algorithm.With our algorithm, the edges between peripheral nodes within each core-periphery pair are sparser than those detected by the two-step algorithm, respecting the notion of the idealised core-periphery structure.

Airport network
Our third example is a network of airports, which has N = 2,939 nodes and M = 15,677 edges (edge density p = 0.0036) [41,42].A node represents an airport.Two airports are adjacent if there is a direct commercial flight between them.
Figure 5 shows the core-periphery structure detected by the three algorithms.
The BE-KL algorithm detects a dense core composed of 89 airports scattered in different geographical regions (Fig. 5a).The peripheral airports are rarely adjacent to the core airports in other regions.Furthermore, the peripheral airports tend to be adjacent to other peripheral airports in the same region, which is inconsistent with the notion of the periphery.The two-step algorithm detects 16 geographically concentrated core-periphery pairs (Fig. 5b).Our algorithm detects ten geographically concentrated core-periphery pairs (Fig. 5c).
The partition of the world-wide airport network into geographically distinct groups of airports found here is consistent with the previous results derived with community detection algorithms [43,44].Compared to the two-step algorithm, the peripheral airports detected by our algorithm are not densely interconnected.
We further analyse the core-periphery structure obtained by our algorithm.peripheral airports, of which 405 (57%) airports are located in Europe (Fig. 7a).
However, this core-periphery pair excludes most airports in the Nordic countries (84 airports; 68%).There are 89 airports within 20 miles from a metropolis in Europe, among which there are 51 core airports (57%), 28 peripheral airports (31%) and ten residual airports (11%).As a comparison, if we select the same number of the European airports with the largest degrees as that of the European core airports, 64 airports (72%) are contained in the set of 89 airports within 20 miles from a metropolis, which is more than the number of the core airports (51 airports; see above) contained in the same set of airports.This result indicates that hub metropolitan airports, which are common, are not necessarily core airports.
The second core-periphery pair contains 161 core airports and 240 peripheral airports, among which 217 (54%) airports are located in East Asia (Fig. 7b).In this core-periphery pair, 31 airports are located within 20 miles from a metropolis in East Asia, among which there are 23 core airports (74%), eight peripheral airports and no residual airport (Fig. 7b).
The third core-periphery pair contains 150 core airports and 312 peripheral airports, among which 210 (45%) airports are located in the United States (Fig. 7c).In this core-periphery pair, 71 airports are located within 20 miles from a metropolis in the United States, among which there are 29 core airports (41%), 30 peripheral airports (42%) and 12 residual airports (17%) (Fig. 7c).We have not found the partitioning of airports into core-periphery pairs corresponding to different major airline groups (e.g., American Airlines, Delta Airlines, Southwest Airlines and United Airlines in the United States).
Table 1 lists the size of core-periphery pairs and the fractions of different types of edges.The airports in a large core are not densely interconnected compared to those in small core-periphery pairs, probably due to the limited capacity of the airports (e.g., a small number of runways).Core-periphery pairs 1, 2 and 3 contain hub airports in each region.The other small core-periphery pairs consist of a small number of core airports, i.e., at most 20% of the airports in each core-periphery pair.In these core-periphery pairs, most of the peripheral airports are adjacent to the core airports but not to other peripheral airports in the same core-periphery pair.This observation suggests that a small number of core airports relays most of the flights into these regions as gateway airports.For example, the representative core airport (i.e., the core airport that has the largest number of neighbours in the core-periphery pair) in pair 4, MNL (Philippines), and that in pair 8, LOS (Nigeria), serve most of the domestic airports in the respective countries.Such a structure is evident in small core-periphery pairs such as core-periphery pairs 4-10.
The subnetwork within the Philippines is shown in Fig. 8a; see Supplementary Table 1 for properties of all airports in the Philippines.Most of the airports (34 airports; 92%) in core-periphery pair 4 (shown in orange in Figs. 6, 7b and 8a) only serve domestic flights.Core airport 1 (labelled in Fig. 8a) has most of the edges (41 edges; 84%) between core-periphery pair 4 and the rest of the network.Therefore, core airport 1 functions as a gateway airport in the Philippines.Core airport 2 also functions as a gateway airport, but to a lesser extent than core airport 1 does.Core-periphery pairs located in Alaska (core-periphery pair 6 in Table 1), Russia (pair 7) and Ecuador (pair 9) also contain a few core airports serving as gateway airports in the respective regions (Supplementary Note 2).Core airports 8 and 21 in the Philippines (Fig. 8a) have a small degree, which is counterintuitive.Core nodes having degree one or two are also found in core-periphery pair 6 (Supplementary Fig. 3c).The airports 8 and 21 in the Philippines are adjacent to one peripheral airport 7 and 4, respectively.If we assign airport 8 to the periphery, two peripheral airports 7 and 8 would be adjacent.Similarly, if we assign airport 21 to the periphery, two peripheral airports 4 and 21 would be adjacent.To avoid edges between peripheral nodes, our algorithm has identified airports 8 and 21 as core nodes.
However, airports 8 and 21 may be better regarded as peripheral airports given that they are not densely interconnected to other core airports.Previous studies provided remedies for this issue [4,9] (see Section 5 for further discussion).
The subnetwork within Thailand is shown in Fig. 8b; see Supplementary Table 2 for properties of all airports in Thailand.Two major airports 1 and 14 are located in the capital city, Bangkok, and belong to different core-periphery pairs.All international airports in Thailand belong to core-periphery pair 2 (shown in blue in Figs. 6, 7b and 8b), including core airport 14.Most of the domestic airports (13 airports; 59%) belong to core-periphery pair 10 (shown in magenta), including core airport 1.The subnetwork composed of core-periphery pair 10 is largely separated from the other airports in Thailand, which belong to core-periphery pair 2, and the rest of the world.The separation of the domestic and international airports and their respective subnetworks is also observed in the Philippines (Fig. 8a), Iran and Nigeria (Supplementary Note 2).

Computation time
We compare the speed of the three algorithms on the synthetic and empirical networks.We implement all the algorithms in MATLAB and run simulations on a computer with Intel 2.6GHz Sandy Bridge processors and 4GB of memory.
We measure the speed of an algorithm by averaging the CPU time over 100 runs.
We do not run the statistical test because it is common to all the algorithms.
The average CPU time of the three algorithms is compared in Table 2.The BE-KL algorithm is the fastest on all the synthetic and karate club networks.However, it is slower than our algorithm on the political blog and airport networks.
The two-step algorithm is the slowest on all the networks.Our algorithm is approximately two times slower than the BE-KL algorithm on the synthetic and karate club networks.However, on the political blog and airport networks, it runs much faster than the BE-KL algorithm.Our algorithm runs in O(rM ) time, where r is the number of rounds (Section 2.2).Therefore, our algorithm is expected to be fast on large sparse networks.

Discussion
We proposed a scalable algorithm to detect multiple core-periphery pairs in networks by maximising a novel quality function Q.The quality function Q compares the number of edges of different types in the given network with the expected number for an Erdős-Rényi random graph.
In the airport network, we have found several core nodes having degree one or two (e.g., airport 8 and 21 in Fig. 8a), which contradicts the intuition that core nodes would have a large degree.Our algorithm assigned these nodes to a core to suppress the edges between peripheral nodes.However, these nodes may be better regarded as peripheral nodes because they are not adjacent to all but one core node.One remedy is to weaken the suppression of the edges between peripheral nodes [4,9].Adapting this idea to the case of multiple core-periphery structure warrants future research.We used the Erdős-Rényi random graph as the null model to define Q.
The Erdős-Rényi random graph is also used as a null model for detecting communities in networks.In the inference of a stochastic block model without degree corrections (i.e., without assuming a heterogeneous degree distribution), one compares the distribution of edges within and across different blocks, between the given network and an Erdős-Rényi random graph [37].In a community detection algorithm based on the Potts model, one compares the number of edges within proposed communities in the given network and that for an Erdős-Rényi random graph [45,46].Our Q is equivalent to the quality function used in the latter algorithm if we enforce that all nodes are core nodes, i.e., The configuration model, with which we randomly rewire edges conserving the degree of each node, is a more commonly used null model for community detection [3].If we adopt the configuration model as the null model in our algorithm, the quality function will be given by We have used the Erdős-Rényi random graph because the maximisation of Q config does not yield the true core-periphery structure for synthetic networks having a single core-periphery pair.This is because Q config involves d i d j /2M , which makes Q config small when node i and j have a large degree.Because core nodes tend to have a large degree, maximisation of Q config tends to assign core nodes that should belong to the same core-periphery pair to different core-periphery pairs (i.e., δ ci,cj = 0) or a periphery (i.e., x i + x j − x i x j = 0).To support this argument, we maximise Q config using a label switching heuristic (Section 2.2) for the synthetic networks with a single core-periphery pair (Fig. 1a).The obtained VI values are larger than 0.5 in the entire range of θ (Fig. 9).This result contrasts to that obtained from our algorithm (with the Erdős-Rényi random graph null model), which is shown by the rectangles in Fig. 2.
Previous studies provided algorithms to detect multiple core-periphery pairs based on community detection algorithms.Yang and Leskovec used an algorithm for detecting overlapping communities in networks [10,47,48].They regarded the nodes belonging to many communities as core nodes and nodes belonging to few communities as peripheral nodes.These algorithms may detect densely interconnected peripheral nodes because the detected peripheral nodes in a single core-periphery pair belong to the same community.In addition, a periphery may belong to multiple cores in these algorithms.These properties are shared by the algorithms proposed in Refs.[24,25].In contrast, our algorithm detects disjoint core-periphery pairs such that peripheral nodes are interconnected sparsely within each core-periphery pair and across different core-periphery pairs.Yan and Luo focussed on a different type of structure consisting of multiple cores and a single periphery [23].In contrast, a core detected by our algorithm owns its exclusive periphery, including the case of an empty periphery.1 and 2. We only show the IDs of all core airports, some peripheral airports and all residual airports.Fig. 2c).In addition, only core airport 1 has an edge to the rest of the network (Supplementary Table 5) and therefore is the unique gateway airport for this core-periphery pair.In Ecuador, most of the airports (ten airports 83%) are adjacent to airport 1, which is the unique core airport in core-periphery pair 9 (Supplementary Fig. 2d).This core airport has most of the edges (ten edges; 77%) between core-periphery pair 9 and the rest of the network (Supplementary Table 6).Therefore, core airport 1 serves as a gateway airport in Ecuador.
Airport 11 also functions as a gateway airport in Ecuador.The Russian airports belong to either core-periphery pair 1, 2 or 7 (Supplementary Fig. 2e).Most of the airports in core-periphery 7 are located in Russian Far East.In coreperiphery 7, all the peripheral airports are adjacent to core airport 1.The core airport 1 has most of the edges (eight edges; 67%) between core-periphery pair 7 and the rest of the network (Supplementary Table 7).Therefore, core airport 1 serves as a gateway airport for this cor-periphery pair.There is no clear separation between the domestic and international airports into different core-periphery pairs in Russia.3-7.We only show the IDs of the core airports, some peripheral airports and some residual airports.Supplementary Table 1: Properties of the airports in the Phillipines.The airports are sorted in the descending order of the total number of edges.The internal edge is defined as that between two airports within the same coreperiphery pair.The external edge is defined as that between an airport in the focal core-periphery pair and an airport in a different core-periphery pair or a residual airport.

ID
1),(c,1) = Θ (c,1),(c,0) = θ for c ∈ {1, 2} and all other Θ values to 0.01.The VI values for this network model are shown in Fig. 2d.The VI for the BE-KL algorithm is large for all θ.The VI for the two-step and proposed algorithm decreases as θ increases.The proposed algorithm attains smaller values of VI than the two-step algorithm in almost the entire range of θ.

Figure 6
Figure 6 maps the locations of the core and peripheral airports.The three largest core-periphery pairs labelled 1, 2 and 3 are mainly based in Europe, East Asia and the United States, respectively.The core-periphery pairs 1, 2 and 3 consist of the airports in 125, 35 and 47 countries, respectively.Each of the other core-periphery pairs labelled 4-10 consists of the airports in one country.The location of the airports and metropolises in Europe, East Asia, the United States and their surroundings are shown in Fig. 7.Here the metropolis is defined as the capital city of all countries, the provincial capitals of China and the state capitals of the United States because China and the United States have many airports.Core-periphery pair 1 contains 333 core airports and 378

RFigure 1 :Figure 2 :Figure 3 :Figure 4 :Figure 5 :
Figure1: Schematic of the adjacency matrices of the networks generated by stochastic block models.The filled blocks correspond to the entries that are equal to one with probability θ and zero otherwise.The empty blocks correspond to the entries that are equal to one with probability 0.01 and zero otherwise.The diagonal entries are always set to zero and shown as empty entries in the figure for the sake of simplicity.The dashed lines indicate the borders separating different blocks.The labels (c, x) are also indicated at the top and left of the adjacency matrices.Label R represents a block of residual nodes.The networks are composed of (a) a single core-periphery pair, (b) two core-periphery pairs, (c) a single core-periphery pair and residual nodes, and (d) two core-periphery pairs and residual nodes.In all cases, we set N = 400.

Figure 6 :Figure 7 :
Figure 6: Location of the airports.The large and small filled circles represent the core and peripheral airports, respectively.Each colour represents a core-periphery pair.The open squares represent residual airports.

Figure 8 :
Figure 8: Airport network within (a) the Philippines and (b) Thailand.The line colour indicates the core-periphery pair to which the two airports belong.The edges connecting two airports in different core-periphery pairs are shown in grey.The numbers attached to some airports indicate the IDs of the airports listed in Supplementary Tables1 and 2. We only show the IDs of all core airports, some peripheral airports and all residual airports.

Figure 9 :
Figure 9: The VI values between the true and inferred core-periphery structure obtained by maximising Q config .The error bars indicate the ±1 standard deviation.

Supplementary Figure 2 :
Airport network within (a) Iran, (b) Nigeria, (c) coreperiphery pair 6 based in Alaska, (d) Ecuador and (e) Russia.The line colour indicates the core-periphery pair to which the two airports belong.The edges connecting two airports in different core-periphery pairs are shown in grey.The numbers attached to some airports indicate the IDs of the airports listed in Supplementary Tables

Table 1 :
Properties of the core-periphery pairs in the airport network.The core-periphery pairs are ordered according to the number of the core nodes.The core nodes that have the largest number of neighbours within the same core-periphery pair are shown as representative core nodes.

Table 2 :
The average CPU time of the three algorithms on different networks.We generate synthetic networks 1-4 using the stochastic block models schematically shown in Figs.1a, 1b, 1c and 1d, respectively.For each of them, we set θ = 0.9 and measure the CPU time for one generated network.

Table 2 :
Properties of the airports in Thailand.

Table 4 :
Properties of the airports in Nigeria.

Table 7 :
Properties of the airports in Russia.Only the airports in core-periphery pair 7 and those in other core-periphery pairs that are adjacent to core-periphrey pair 7 are shown.