Mapping Flows in Bipartite Networks

Mapping network flows provides insight into the organization of networks, but even though many real-networks are bipartite, no method for mapping flows takes advantage of the bipartite structure. What do we miss by discarding this information and how can we use it to understand the structure of bipartite networks better? The map equation models network flows with a random walk and exploits the information-theoretic duality between compression and finding regularities to detect communities in networks. However, it does not use the fact that random walks in bipartite networks alternate between node types, information worth 1 bit. To make some or all of this information available to the map equation, we developed a coding scheme that remembers node types at different rates. We explored the community landscape of bipartite real-world networks from no node-type information to full node-type information and found that using node types at a higher rate generally leads to deeper community hierarchies and a higher resolution. The corresponding compression of network flows exceeds the amount of extra information provided. Consequently, taking advantage of the bipartite structure increases the resolution and reveals more network regularities.


I. INTRODUCTION
Many networks are bipartite [1-3]. Their main application is to model interactions between entities of different types: users watching movies, documents containing words, animals eating plants. Studying those networks with the naked eye is often infeasible because of their size and complexity. Therefore, to carry out further analysis, we must simplify them. We need to find coarse-grained descriptions that highlight their community structure [4].
Most community-detection methods are developed for unipartite networks, but can be used for bipartite networks as they are, either by running them on unipartite projections or by applying them directly to bipartite networks [5,6]. However, both these approaches have limitations. First, unipartite projections of bipartite networks cannot preserve all the information that is encoded in the bipartite network such that significant structure is lost [2]. Second, applying unipartite methods directly to bipartite networks ignores the regularities of bipartite networks and does not take into account the fact that links only connect nodes of different types [7]. What do we miss by discarding this node-type information? And how can we use it to understand the structure of bipartite networks better?
To explore the value of using bipartite information in community detection, we study the flow-based community-detection method Infomap [8], which uses an information-theoretic objective function, known as the map equation [9], to exploit the duality between compression and finding regularities in data. The map equation models network flows with random walks and relates the quality of a network partition to how well it compresses a modular description of the random walks. Modules * christopher.blocker@umu.se † martin.rosvall@umu.se with long flow persistence, such as cliques or clique-like groups, achieve the best compression. To derive a coding scheme, the map equation uses a hierarchical code that reflects the structure of the network partition. However, this coding scheme is designed for unipartite networks and assumes that any pair of nodes can be connected and visited one after the other; it does not take advantage of the structural constraints in bipartite networks where links only connect nodes of different types and random walks must alternate between them. Consequently, the map equation disregards bipartite information and provides suboptimal compression.
To address these issues, we developed a coding scheme that uses node-type information at different and adjustable rates. For a node-type remembering rate of zero, we recover the standard map equation; a remembering rate of one leads to a fully bipartite map equation and higher compression. Through intermediate rates, we can analyze how the community landscape changes with available node-type information. We implemented the coding scheme in Infomap and explored the community landscape of real-world networks from different domains.
In networks with community structure, we can compress flows beyond the extra information we make available through the coding scheme. When we describe a network with all its nodes in one module, our coding scheme improves the compression by an amount equal to the entropy of the rate at which node types are used. In hierarchical partitions, the compression improves proportionally to the available node-type information. Generally, exploiting node types at higher rates increases the resolution and leads to deeper community structures with more and smaller modules, thus revealing more network regularities.

II. THE MAP EQUATION FRAMEWORK
To illustrate the duality between compression and finding regularities in network data, consider a communication game where the sender uses code words to update the receiver about the position of a random walker in a network. We assume that the sender and receiver remember the current module but not the current node of the random walker. The question is: how can we devise a modular coding scheme to minimize the average per-step description length, which we refer to as the code length?
We start with all the nodes in one module and assign unique code words to the nodes based on their ergodic visit rates. The sender needs to communicate exactly one code word per random-walker step to the receiver with this one-level approach. According to Shannon's sourcecoding theorem [10], the lower bound for the code length is the entropy of the node visit rates.
If the network has a community structure, we can achieve a lower code length with a two-level coding scheme: we partition the nodes into modules and define a separate codebook for each module. This coding scheme uses unique code words within modules, allowing nodes in different modules to reuse short code words. To describe transitions between modules for a uniquely decodable code, we introduce an index level codebook that assigns code words to modules, and add exit code words to each module codebook. We can generalize this approach and reduce the code length further with a recursive code structure in multiple levels.
With a two-level approach, the sender communicates either one or three code words per random walker step. For steps within a module, the sender uses one code word from the current module codebook. For transitions between modules, the sender communicates three code words from three different codebooks: (i) the exit code word of the current module codebook, (ii) the entry code word of the new module from the index level codebook, and (iii) a node visit code word from the new module codebook.
For a small example network (Fig. 1a), we illustrate the codebook structure for a two-level partition according to the map equation (Fig. 1b). The map equation calculates the code length L for a given partition M as the average of the module and index level code lengths, weighted by the fraction of time a random walker uses each of the corresponding codebooks in the limit, (1) Here, p m = q m + n∈m p n is the fraction of time the random walker uses the codebook for module m, where n ∈ m are the nodes in m, p n is the ergodic visit rate of node n, and q m is the entry and exit rate of m; q = m∈M q m is the rate at which the index level codebook is used; Q = {q m | m ∈ M} is the set of module entry rates; P m = {q m } ∪ {p n | n ∈ m} is the set of node visit rates in module m, including module exit; and H is the Shannon entropy. We assume undirected networks and, therefore, entry and exit rates are the same.
To minimize the map equation, we need to make a tradeoff. On the one hand, we want to keep modules small for short code words within modules. On the other hand, we want to limit the number of modules for short code words at the index level. Further, modules should have long flow persistence and cannot be too small; otherwise a random walker changes modules at a high rate and the sender is required to use the index level codebook frequently. Under these restrictions, partitions with many links within modules and few links between modules give the best compression.

III. THE BIPARTITE MAP EQUATION
Since the map equation was developed for unipartite networks, its coding scheme can describe transitions between any pair of nodes. However, directly applying the map equation to bipartite networks leads to higher-thannecessary code lengths, because transitions only happen between nodes of different types in bipartite networks. For a more efficient coding scheme in bipartite networks, we consider the communication game again. As before, the sender updates the receiver about the position of a random walker, but now both are aware of the bipartite network structure.
In a food web, for example, where herbivores are connected to plant species, random walks alternate between animal and plant nodes. If the current node is an animal node, the random walker must step to a plant node next, and vice versa. Therefore, we can use a bipartite coding scheme with two types of codebooks per module: one for animal-to-plant and one for plant-to-animal transitions. Since both these codebooks only address half of the nodes on average, code words can be shorter.
To derive the code length of a bipartite coding scheme, we apply Bayes' rule to the standard map equation and obtain the bipartite map equation. Let M 1 be a partition with all nodes in one module and P 1 be the set of ergodic node visit rates over two steps, that is, the visit rates we would obtain assuming a unipartite network. The standard map equation calculates the entropy of the random process X : current node from P 1 . However, random walks on bipartite networks also provide information about a second process, namely, Y : current node type. In the bipartite map equation, we combine these two processes into one, X|Y : current node, given current node type, and determine its entropy with Bayes' rule, H (X|Y ) = H (X) − H (Y ) + H (Y |X). We know that H (Y ) = 1 bit because the random walk alternates between nodes of different types and H (Y |X) = 0 bits since the node fully determines the node type. Let P L and P R be the sets of visit rates for left and right nodes, respectively, given that the current node type is known. Then, we can express L (M 1 ) in terms of P L and P R , to show that providing the node type reduces the description of one-level partitions by 1 bit. To generalize to two-level partitions, we plug this equation into Eq. 1 and obtain the code length where Q L = q L m | m ∈ M and Q R = q R m | m ∈ M are the sets of left and right module entry rates; are the sets of left and right node visit rates in module m, including module exits; m L and m R are the subsets of left and right nodes in m; and p u ∈ P L and p v ∈ P R are the visit rates for left nodes u and right nodes v, respectively.
By separating the left and right visit rates in Eq. 3, we define the bipartite map equation: where q L = m∈M q L m and q R = m∈M q R m are the usage rates for left-to-right and right-to-left codebooks at index level; p L m = q L m + u∈m L p u and p R m = q R m + v∈m R p v are the usage rates for left-to-right and right-to-left codebooks at the module level. Thus, the bipartite map equation calculates the code length for a given partition that describes a joint clustering of left and right nodes in a bipartite network (detailed derivations in Appx. A).
The bipartite map equation changes the communication game. As before, the sender uses one code word to encode transitions within modules and three code words for transitions between modules. But now, both sender and receiver keep track of the current node type to choose the correct codebook -left-to-right or right-to-left -for their communication.

IV. THE BIPARTITE MAP EQUATION WITH VARYING NODE-TYPE MEMORY
The map equation is about compression with constraints: compression is not the only goal. The more we use the regularities in a network, the more we can compress its description. But higher compression does not necessarily mean that we find network structures that allow us to understand the network better.
For example, consider a version of the coding game where sender and receiver remember the location of the random walker. In this case, we would use a coding scheme with separate codebooks for each node with code words only for neighbouring nodes. This would allow us to encode the walker's path at the entropy rate of the corresponding Markov process [10] and provide a better compression than the map equation. But then nodes would not have unique code words anymore and, even though the code is efficient, it would not capture the modular structure of the network.
The key is that the map equation forgets at which exact node a random walker is and only remembers the current module. With the bipartite map equation, we relax this constraint by remembering node types. However, in sparse bipartite networks, this comes close to remembering nodes and moves us towards encoding at the entropy rate of the Markov process without identifying modular structure. Therefore, it is useful to look at using node-type information at intermediate rates.
In the bipartite map equation with varying node-type memory, node types are fuzzy. While each node has a true type, either left or right, and the random walker alternates between types, we assume that we cannot determine types reliably. We model this uncertainty by introducing a node-type flipping rate α. When we inspect a node, we observe its true type with probability 1−α, and the opposite type with probability α. Then, on average, nodes appear both left and right to a degree determined by α. Node-visit rates change accordingly and become mixed; we describe them as pairs of left and right flow: left nodes u with visit rate p u have a mixed visit rate p α u = ((1 − α) p u , αp u ), and right nodes v with visit rate p v have a mixed visit rate p α v = (αp v , (1 − α) p v ). Using Bayes' rule again, we calculate the level of compression we can achieve when node types are fuzzy. Let M 1 be a partition with all the nodes in one module, P 1 be the set of ergodic node visit rates, and P α 1 = {p α n | n ∈ M 1 } be the set of mixed node visit rates. The entropy of Y : current node type is, as before, 1 bit because we observe left and right nodes with probability 1 2 each. However, the entropy of Y |X : node type, given node is now the entropy of the node-type flipping rate, Overall, compared with the standard map equation, we can improve the compression by 1 bit, but node-type fuzziness increases the code length by H α , the entropy of the flipping rate, where H (P α 1 ) is shorthand for the average componentwise entropies of the mixed node visit rates.
Plugging Eq. 5 into the standard map equation gives us the generalisation to two-level partitions, We define the bipartite map equation with varying nodetype memory: which measures the code length for a partition M and node-type flipping rate α. Figure 1c illustrates how the codebook structure changes compared to the standard map equation (Fig. 1b) for a fixed value α in the same example network as before (Fig. 1a). When node types are flipped at a rate of α = 1 2 , nodes become left and right in equal parts. With H α = 1 bit, this means that there is maximum uncertainty about node types. Ignoring node types in this way is equivalent to using the standard map equation. The bipartite map equation is recovered for α = 0 and α = 1 because both values lead to H α = 0. However, they have different interpretations. For α = 0, node types never flip and we can determine the true type of the nodes. Under a flipping rate of α = 1, node types always flip and we determine the opposite of the true node type. This has no effect on the code length because it simply swaps the left and right entropy terms of the bipartite map equation.
Using the bipartite map equation with varying nodetype memory, we are ready to answer the initial question: what more can we learn about a network by using node types in whole or in part? Because it is more intuitive to think about how much we know about node types than the probability of flipping them, we use entropy to connect these two quantities. Flipping node types at rate α leads to an uncertainty of H α about them. Consequently, I (α) = 1 − H α is the available amount of information about node types, given that they are flipped at rate α. This formulation suggests an alternative interpretation of Eq. 5: we can reduce the code length of one-level partitions exactly by the amount of information that we have about node types. To investigate by how much we can reduce the code length of two-level and hierarchical partitions, we have applied the bipartite map equation to real-world networks.

V. APPLYING THE BIPARTITE MAP EQUATION TO REAL-WORLD NETWORKS
We have implemented the bipartite map equation for two-level and hierarchical partitions in Infomap [12] and used it to analyze the community landscape of 21 bipartite networks from different domains. Our results show that the bipartite map equation uses node-type information effectively and improves the compression beyond the provided information. This improved compression increases the resolution and lets us discover more regularities.  Figure 2: Community structure at different scales in the weighted Fonseca-Ganade plant-ant web [11]. By providing more node-type information, we increase the resolution and detect finer modules on lower and coarser modules on higher levels in the community hierarchy.   (d) Arroyo Goye pollinator-plant web.

A. Networks
We selected 21 bipartite networks from different domains from the KONECT [13] and ICON [14] databases and other sources [15,16]. We preprocessed the networks with the python package NetworkX [17] and only kept their largest connected components. The resulting networks ranged from a few dozen to millions of nodes and edges in size; their domain, number of left nodes n L , number of right nodes n R , and number of edges m are listed in Table I. In all networks, left nodes represent subjects, such as users, documents, and animals, while right nodes represent objects that are acted upon, such as movies, words, and plants.

B. Setup
We explored the community landscape of our test networks from no information at I = 0 bits to full information at I = 1 bit with a step size of 0.05 bits. For node-type information I, we calculated the corresponding node-type flipping rate α numerically. Because of its stochasticity, we ran Infomap 100 times for each network and value of α, both with the flag --two-level, to search for two-level partitions, and without to search for hierarchical partitions. Finally, for each α, we selected the partitions with the best code length for further analysis.

C. Structure and compression
We measured the extra compression provided by a partition M by using the corresponding one-level partition M 1 as a baseline. The one-level code length decreases by the amount of node-type information that is available (Eq. 5), specifically L α (M 1 ) = L 0.5 (M 1 ) − I (α), where I (α) = 1 − H α is the node-type information when node-types are flipped at rate α. We define the extra compression of M as L α (M 1 ) − L α (M) ≥ 0; it is always at least 0 because Infomap returns the one-level partition when it does not find any partition with lower code length. In partitions with more than one level, the extra compression depends on the codebook use rate, the total coding rate q + m∈M p m , and the amount of node-type information (Eq. 6).
To measure the resolution of the community detection, we use the effective module size as a proxy. By only considering leaf modules -those modules that contain nodes but have no sub-modules -we can use the same measure for two-level and hierarchical solutions. Let S be the set of leaf module sizes in partition M where size refers to the number of nodes in a module. Then the perplexity of the module sizes, 2 H(S) , tells us the effective number of leaf modules. Combining the effective number of modules together with the number of nodes N in the network, we calculate the effective module size as N 2 H(S) . The effective community size and extra compression capture two significant patterns in the analyzed networks.
First, the resolution increases and we detect more communities on different scales when we use node types. At lower levels in the community hierarchy, modules become more fine-grained, while on higher levels, they become  (Fig. 2). With more node-type information, some nodes are assigned into singleton modules that form bridges between other modules. The flow-persistence time is not long enough to include them in either of the other modules and, therefore, it is better to assign them to their own modules (Fig. 2). When we approach full node-type information at I = 1 bit, it can lead to so many small modules that no useful structure is detected anymore. For example, leaf modules in the Las Vegas Hikers network (LVHK) contain only 1.5 nodes on average (Fig. 3a). In the IMDb actor-movie network, the effective module size decreases approximately linearly from 23 at I = 0 bits to 3.5 at I = 1 bit (Fig. 3b). In the Last.fm user-song network, the effective module size is around 2,000 between I = 0 bits and I = 0.85 bits but then drops sharply and is 91 for I = 1 bit (Fig. 3c). We see a similar behavior in all the networks we analyzed (Table I), both for hierarchical and two-level partitions, with the difference being that sharp drops in module size are less common in two-level partitions (Fig. 3, Appx. B). However, as leaf modules become smaller, the community hierarchy becomes deeper such that higher levels still contain significant structures.
Second, the compression improves by more than the amount of node-type information we provide. With the duality between compression and finding regularities in data, the bipartite map equation detects more structure in the bipartite networks. As we come closer to I = 1 bit, the improvement in compression becomes steeper. For example, in the IMDb actor-movie network, the extra compression improves from 5.7 bits at I = 0 bits to 6.3 bits at I = 1 bit such that the rate of improvement increases closer to full node-type information (Fig. 3b). In the Arroyo Goye pollinator-plant web and the LVHK network, the compression improves slowly at first, but faster once more regularities can be detected above I = 0.5 bits (Fig. 3a, Fig. 3d). With more node-type information, the regularising effect of the standard map equation decreases. The entropy function's non-linearity explains why the compression increases faster the more node-type information we use.

VI. CONCLUSION
We have extended the map equation framework for finding modules in network flows to use node-type information encoded in bipartite networks. Applied to 21 real-world networks, the bipartite map equation implemented in the search algorithm Infomap detects more, smaller communities at lower levels of the community hi-erarchy and fewer, larger modules at higher levels. The community-detection resolution increases because the bipartite map equation's coding scheme exploits the alternating trajectories of random walks and compresses the description of network flows beyond the provided nodetype information. In between ignoring and making full use of the node-type information, the bipartite map equation can use the node-type information at intermediate rates, offering a principled way to explore communities at higher resolution in bipartite networks.

ACKNOWLEDGMENTS
This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation. We would like to thank Leto Peel, Vincenzo Nicosia, and Jelena Smiljanić for discussions that helped to improve this paper. Martin Rosvall was supported by the Swedish Research Council, Grant No. 2016-00796. Based on Eq. A4, we define the bipartite map equation, Here, q L = m∈M q L m and q R = m∈M q R m are the usage rates for left-to-right and right-to-left codebooks at index level; p L m = q L m + u∈m L p u and p R m = q R m + v∈m R p v are the usage rates for left-to-right and right-to-left codebooks at module level, respectively. As the total weight of edges incident to left nodes is equal to the total weight of edges incident to right nodes, we have q L = q R = q 2 and p L m = p R m = pm 2 for all m. Consider again P, the set of ergodic node visit rates over two steps and let α ∈ [0, 1] ⊂ R. For better readability and because specific nodes are not important, we refer to the visit rates over two steps simply as p in the following. Further, we use H α = H (1 − α, α) as shorthand for the entropy of α. We can then rewrite H (P), With Eq. A3 and Eq. A6, we rewrite the code length of the one-level partition, where we define mixed node visit rates, For values of α = 0, α = 1, and α = 1 2 , we retrieve the original definitions of P L , P R , and P, respectively, from Eq. A8. Again, to generalize, we plug Eq. A7 into Eq. A2, where is the set of mixed module entry rates and R α m is the set of mixed node visit rates in module m, as defined in Eq. A8. Based on Eq. A9, we define a first version of the bipartite map equation with varying node-type memory, Finally, we assume that node types are fuzzy and are flipped at rate α. A node that is in fact a left node appears to be a right node an α-fraction of the time. Similarly, a right node appears to be a left node an α-fraction of the time. This means that, on average, node types are mixed and have both left and right components. We model this with pairs: left nodes u with visit rate p u ∈ P L have a mixed visit rate p α u = ((1 − α) p u , αp u ), and right nodes v with visit rate p v ∈ P R have a mixed visit rate p α v = (αp v , (1 − α) p v ). Using mixed node visit rates, we refine our earlier definition from Eq. A8 and combine R α and R 1−α , Further, all codebook usage rates become pairs, q α = m∈M q α m and p α m = q α m + u∈m L p α u + v∈m R p α v where q α m is the mixed module entry and exit rate of m and addition works component-wise. Since the network is bipartite and random walks alternate between left and right nodes, we have q α = q 2 , q 2 and p α m = pm 2 , pm 2 . Combining Eqs. A8-A11, we define the bipartite map equation with varying node-type memory, where Q α = {q α m | m ∈ M} is the set of mixed module entry rates and P α m = {q α m } ∪ p α u | u ∈ m L ∪ p α v | v ∈ m R is the set of mixed node visit rates in module m, including module exits.