Two types of densification scaling in the evolution of temporal networks.

Many real-world social networks constantly change their global properties over time, such as the number of edges, size, and density. While temporal and local properties of social networks have been extensively studied, the origin of their dynamical nature is not yet well understood. Networks may grow or shrink if (a) the total population of nodes changes and/or (b) the chance of two nodes being connected varies over time. Here, we develop a method that allows us to classify the source of time-varying nature of temporal networks. In doing so, we first show empirical evidence that real-world dynamical systems could be categorized into two classes, the difference of which is characterized by the way the number of edges grows with the number of active nodes, i.e., densification scaling. We develop a dynamic hidden-variable model to formally characterize the two dynamical classes. The model is fitted to the empirical data to identify whether the origin of scaling comes from a changing population in the system or shifts in the connecting probabilities.


I. INTRODUCTION
Along with the increasing availability of high-resolution data sets, dynamics of human social communication have been extensively studied over the past decades [1][2][3][4][5][6]. Many of these studies are based on data sets of online interactions, such as emails [7], text messages [8,9], and mobile phones [2,3,10,11], but the recent development of sensor devices has also enabled us to collect time-stamped data from face-to-face interactions in physical space [1, [12][13][14][15]. Those data therefore cover a wide range of social contexts in which dynamic interactions among individuals form temporal social networks [6,16].
These real-world social networks exhibit very often nonstationarity: their structure constantly changes over time not only in shape but also in size. Generally, these dynamics are present because the studied system is not closed: it is in fact a common property of real-world social and economic networks that agents are free to enter and exit. In online social networks like Twitter and Facebook, anyone can basically join or quit the existing communication at any time. In financial markets, a bank becomes a part of an interbank network if it borrows from or lends to other banks and exits the network when the loan is repaid [17,18]. Another nonconservative aspect of real-world networks is the fact that, even if the population is constant, the networking activity might vary due to external factors such as schedule and diurnal rhythm; coffee breaks in a conference [13], pauses between classes in a school [5,19], lunch breaks in a company [15,20], etc.
In the present work, we focus on the evolution of two fundamental quantities that condition the global property of networks: the number of active nodes N and the number of edges M. The scaling relationship M ∝ N γ , known as the "densification power law," has been found in many real-world systems [21,22], where the scaling exponent γ is constant and 1 < γ < 2. In this work, we present further empirical evidence that there exist two types of scaling in the evolution of networks. In addition to the well-known densification power law, we also show that some systems exhibit an accelerating growth of M in which the scaling exponent itself is increasing.
We consider two key factors that would lead networks to be time varying: N and M will vary over time if (a) the size of population (i.e., potential number of active nodes) changes and/or (b) the chance of two nodes being connected changes. Clearly, the size of the population in a system constrains the number of active nodes that form a network: at constant probability for links to appear, more nodes implies more links. Similarly, bilateral matching probability determines the number of edges in the network and thereby its density: at constant population size, the higher the probability for a link to exist, the higher the number of links. The question is then: given an empirical temporal network exhibiting time-varying global quantities, is it possible to identify the source of the dynamics?
Here, we develop a method to perform such a task by exploiting a scaling relationship between the numbers of active nodes and edges. To model the behavior of nodes, we use a dynamical version of a hidden-variable model in which the temporal probability of two nodes being connected is given by a product of "fitness" parameters [23,24]. The fitness parameters are considered to be intrinsic and constant features of the nodes. In the present model, the time-evolving aspect arises from two distinct channels. First, we introduce a parameter that modulates the average activity level of nodes. This modulation parameter allows the size of generated networks to vary through a change in the connecting probabilities while keeping the population size, including resting nodes, constant. Second, we allow the population size to vary with time. In the original fitness model [23,24], there is no distinction between population and the number of active nodes, because the population size is assumed to be large enough so that virtually all nodes in the system are active [18]. However, if the population in a system is not sufficiently large, a certain fraction of existing nodes may not be active [18], and thereby a change in the population size affects the rate at which the number of edges grows with the number of active nodes.
In the following, we first expose empirical evidence for the existence of two different types of scaling relationships between N and M. We then present a dynamical hiddenvariable model with which we investigate the emergence of the two types of scaling patterns. Specifically, we define two classes of theoretical equations that connect N and M under different specifications on the average activity of nodes and the population size. By identifying a class of equations that better fits the observed data, the proposed method allows us to estimate the actual average activity and the (unobservable) population size. From this we are then able to identify for each empirical data which key factor drives the dynamics of the temporal network. We also briefly mention a variation of the model for cases where the population is fixed and known, allowing us to fit the empirical distribution of node fitnesses to a beta distribution. We conclude by a discussion of our results and the limitations of the model.

A. Data
We consider six data sets of social and economic interest, taken from contexts of very different nature (see Appendix A for a full description of the data sets): (1) Interbank (bilateral transactions in the online interbank market in Italy); (2) Enron (email communication network from the Enron Corporation [7,25]); (3) CollegeMsg (online social network at the University of California, Irvine [8,9]); (4) RealityMining (phone call data from the Reality Commons project [2]); (5) LondonBike (bike trips from the London Bicycle Sharing Scheme [26]); (6) Highschool (face-to-face contacts network in a French high school [19]).
All data sets are converted to temporal networks with undirected and unweighted edges. Bidirectional edges (i.e., edges in both directions) are regarded as undirected edges with weight 1. From each dataset, we construct a sequence of network snapshots by defining particular time intervals in each of which all the interactions between nodes are regarded as the edges of the corresponding network. We define N to be the number of nodes that have at least one edge in a given snapshot, and M denotes the corresponding number of edges. This suggests that N/2 M N (N − 1)/2, where N/2 corresponds to the minimum number of edges that can exist between N active nodes (when all the nodes are connected to exactly one edge), while N (N − 1)/2 corresponds to the maximum number of edges that can exist between N active nodes (i.e., a complete graph).

B. Evidence from empirical data
We investigate the dynamical relationship between N and M. Figure 1 shows scatter plots of M against N for each social context. Two important features appear. First and foremost, there is a strong positive correlation between N and M in all the data sets we examine. In particular, we observe superlinear scaling, i.e., the rate at which M rises with N is larger than that expected by a linear growth, as is occasionally reported for many real-world systems [21,22]. This phenomenon is also known as the "densification power law" or "densification scaling" [21,22,27].
Second, there are two different patterns as to how M grows with N. One is the densification scaling we mentioned, in which the scaling exponent is constant (>1), showing as a straight line on a log-log scale plot. In Fig. 1, Interbank, Enron, CollegeMsg, and RealityMining appear to belong to this category. Contrarily, for LondonBike and Highschool the growth of M for large values of N is accelerating: the slope itself increases in log-log space as N grows [28].
Both behaviors are striking, as they suggest the existence of simple mechanisms for the dynamics of global activity in temporal networks. However, the empirical dynamical relationship we observe in Fig. 1 cannot be reproduced by a class of common growing network models in which a new node joins the network with a given number of edges [29][30][31]. While these models are intended to explain the emergence of scaling in empirical degree distributions [29], the number of edges has a linear correlation with the number of active nodes asymptotically, i.e., M ∝ N, which is not consistent with our finding. In the following, we present a model which explains how these different types of global behaviors can emerge from temporal social interactions.

A. Model
To explain the two types of scaling in a unified model, we consider a dynamical version of the hidden variable model in which the probability of two nodes being connected at time interval t is given by (1) where a i is the "fitness" of node i that represents the activity level of the node [23,24,32,33]. In the baseline model we assume that a i is uniformly distributed on [0, 1] because in general we do not have any prior information about the distribution of activity levels. We will also consider a beta distribution as an alternative case in Sec. IV D. There are two time-varying parameters in the model. One is N p,t , which represents the potential number of active nodes in the system at time t, i.e., the total of active and inactive nodes that are in the system at time t. The number of active nodes having at least one edge at time t is denoted by N t . We note that the number of active nodes N t is always observable, but the potential number of nodes N p,t is not. In social networks, for instance, we do not usually know how many people are ready to interact with other people and what fraction of them actually created at least one edge. In many cases, what we can observe from data is the number of active nodes that appear in the record of interaction history, while there is no record of nodes without interactions. Since the observed active nodes may account for only a fraction of the potential nodes, it is generally written as N t = (1 − q 0,t )N p,t , where q 0,t denotes the fraction of inactive nodes that have no edge at time t, or equivalently, the probability of a randomly chosen node being isolated. To take an example of social networks, changes in N p may represent a situation in which the number of students in the classroom changes over time according to the class schedule, leading to a variation in the maximum possible size of face-to-face contact networks. The potential number of nodes that are ready to interact with others is the first key parameter of the model, as it physically constrains the size of networks to be observed.
The second time-varying parameter of the model is κ t > 0, which modulates the global activity level of nodes. In the financial system, for instance, the chance that two banks trade during the lunch time would be intrinsically lower than that in the morning [18], in which case the banks' activity levels may have a certain diurnal pattern. In social networks where individuals communicate with each other, κ would vary according to the time schedule of the school, workplace, academic conferences, or the circadian rhythm of humans [4,[34][35][36].
With this specification, the observed network size N and the number of edges M coevolve as either N p or κ or both change over time. One can see a change in N due to a shift in N p represents an extensive margin effect, while a shift in κ leads to an intensive margin effect. Parameters κ and N p can thus explain two different origins of the time-varying nature of networks.

B. Analytical expression for N and M
In this section we show an analytical solution for the dynamic hidden-variable model when the network size is not necessarily large enough [18]. Suppose that node i (1 i N p ) is assigned activity a i ∈ [0, 1] which is drawn from density ρ(a) [23]. The numbers of active nodes N and edges M can be expressed as functions of parameters κ and N p (we drop time subscript t for notational convenience): where k(κ, N p ) denotes the average degree over all the existing nodes including isolated ones. To obtain the functional forms of N and M, we need to find the functional forms of q 0 (κ, N p ) and k(κ, N p ). Let u(a, a ) be the probability that there is an edge between two nodes having activity levels a and a , respectively. As is shown in Appendix B, the average degree k(κ, N p ) is given by which simply states that the average degree is equal to the number of nodes (excluding the focal node itself) times the expected connecting probability. It should be noted that Eq. (3) is equivalent to Eq. (21) of Ref. [24] if N p − 1 is replaced with N, which is asymptotically true as will be shown below. From (B11) in Appendix B, the probability of a randomly chosen node being isolated is given by Substituting ρ(a) = 1 (i.e., uniform distribution on [0, 1]) and u(a, a ) = κaa into Eq. (3) gives Similarly, q 0 is given by By defining a variable x ≡ 1 − κa 2 , we have Combining these results with Eq. (2), we have This leads to interesting limit behaviors: if |1 − κ/2| < 1 and N p is sufficiently large, then q 0 (κ, N p ) 0 and thereby N N p and M ∝ N 2 , as is shown in the study of the static fitness model [23,24,32]. In contrast, if N p is not large enough, then q 0 (κ, N p ) > 0 and N < N p , in which case M is not of order N 2 and the scaling exponent will take a value between 1 and 2 as is observed in empirical data ( Fig. 1) [18]. Note that κ is not per se a probability, and that its value does not have any a priori upper bound (as it depends on the activity distribution). Clearly, the larger the population N p and the overall activity κ, the lower the share of resting nodes q 0 in the population [ Fig. 2(a)].

C. Role of κ and N p in the emergence of scaling
Using Eqs. (8) and (9), we are now able to analyze numerically how M scales with N. First, we observe that if the value of κ is kept constant while N p varies, the dynamical relationship between N and M is close to a straight line in a log-log plot [gray solid in Fig. 2(b)], as seen in some empirical data.
If κ is small enough, the scaling is close to linear, approaching the lower bound of M indicated by the dashed line in Fig. 2(b). However, as κ increases, the scaling becomes more and more superlinear, which can be seen in Fig. 2(b) by following the same symbols in different colors.
By contrast, if we vary κ for a given value of N p , the slope will bend upward. This can be seen in Fig. 2(b) by following different symbols in the same color (colored line). This reproduces the accelerating growth behavior observed in the empirical data, namely LondonBike and Highschool. We note that although the scaling relationships appear to be quite regular, it proves to be very difficult (if not impossible) to extract from Eqs. (8) and (9) an analytical expression for them, because of the complicated dependencies of N on N p .

A. Estimation of model parameters from empirical data
We now propose a method to identify the dynamical class of a system and at the same time estimate κ and N p from the empirical data. In fact, the two model parameters may be estimated in two ways. One is to directly solve the two nonlinear equations Eqs. (8) and (9) with respect to (κ, N p ) for a given observation of (N, M ). This direct calculation gives us a one-to-one mapping of (N, M ) to (κ * , N * p ), where an asterisk denotes the solution of the system of two equations. However, such a method proves to be unable to accurately estimate the parameters when the network is small, where there is a large degree of overlap of (N, M ) generated under multiple combinations of (κ, N p ) [Figs. 2(b) and 4(a)].

Two classes of models for the two types of scaling
The other method is to use the dynamical relationship between N and M. This method is based on the idea that the estimation bias due to the overlap of (N, M ) could be avoided if we exploit the dynamical relationship between N and M rather than a particular observation of (N, M ) in a given snapshot. In this method, we fit the empirical N-M relationship to theoretical equations, which will give us nonlinear least squares estimators of κ and N p .
Since the observable variables N and M appear separately in Eqs. (8) and (9), respectively, we formulate a regression equation by relating N with M through the substitution of κ or N p . By doing so, we essentially categorize the empirical dynamic networks into two classes. In the regression equation for the first class, we express N as a function of M and parameter κ to endogenize the time variation of N p . Hereafter we call this type of formulation "Model I." This corresponds to a situation in which the connecting probability p i j , ∀i = j is constant while the potential size of networks N p is time varying.
In "Model II," on the other hand, we specify N as a function of M and parameter N p to endogenize the time variation of κ. This type of model would be appropriate when the set of nodes is fixed while the connecting probabilities are affected by diurnal or circadian rhythms.
The regression equations in the two models are respectively given as follows: Model I: N p is time varying and κ is constant.
where N p is expressed as a function of M and κ: N p (M, κ ) ≡ Model II: N p is constant and κ is time-varying.
where κ is expressed as a function of M and N p : κ (M, N p ) ≡ After estimating the parameters in both specifications, we select one that attains the lower sum of squared errors: Model I is selected if ε I ε I < ε II ε II , Model II is selected otherwise.
A schematic of the model selection is illustrated in Fig. 3. Note that the criterion of model selection is effectively the same as that of the Akaike information criterion (AIC) and Bayesian information criterion (BIC) because we have only one parameter in both models.

Validation
We check the accuracy of the proposed estimation method by using synthetic networks. (resp. Model II), we generate 500 synthetic networks under various N p ranging from 20 to 300 (resp. κ ranging from 0.001 to 0.99) for a given κ (resp. N p ). While solving the system of two nonlinear equations is straightforward in principle, the question is whether the obtained solution matches the true values of κ and N p . Obviously, the network generating mechanism is in reality not deterministic but stochastic, which means the same parameter combination (κ, N p ) may yield different observations of (N, M ). Using a particular pair of (N, M ) is therefore not sufficient to infer the true model parameters. Indeed, the solution of Eqs. (8) and (9) leads to a biased estimate of N p especially when the true values of κ and N p are small [ Fig. 4(a)]. This is expected from Fig. 2(b) in which there is a large amount of data overlap in the lower left area of the corn. In fact, κ tends to take small values (e.g., <0.1) in real-world networks, in which case the biased estimation can become a serious problem. Figures 4(b) and 4(c) shows the error bars of the estimated parameters for the second method over 1000 runs. The estimated values of N p and κ nicely match the true values even when the network size is fairly small and thereby multiple combinations of (κ, N p ) can yield the same (N, M ). This is an advantage of this method with which we do not rely on a particular realization of (N, M ), but rather we exploit the whole dynamical relationship. Furthermore, in the case where N p is fixed and κ varies in time, Model II also gives a better estimate than the direct calculation.
It should be noted that the observed N can be much lower than its potential value N p , which suggests that the potential number of active nodes cannot necessarily be inferred directly from the observed number of nodes. This is particularly true when κ is so small that the network is fairly sparse (see Fig. S1 in the Supplemental Material [37]). Figure 5 shows the empirical results for the CollegeMsg and LondonBike data sets (see Figs. S2 and S3 in the Supplemental Material [37] for the other data sets). Our results illustrate the fact that scaling relations in social and economic temporal networks may be driven by the two previously described factors. For Interbank, Enron, CollegeMsg, and RealityMining, Model I is selected, which means the timevarying nature of the global network properties comes from shifts in the potential number of nodes, i.e., the population in the system changes over time. On the other hand, for London-Bike and Highschool, Model II is selected, which means the population remains almost unchanged, and the changes in the numbers of edges and active nodes are due to time-varying connecting probabilities.

Empirical results
Since all we need for the model classification is a variety of combinations of (N, M ), one can implement the method for any timescale. For instance, if we see intraday activity in the Interbank data set, the scaling behaviors on the vast majority of days are still better explained by Model I (Fig. S4). For the LondonBike dataset, the scaling relationship over different days is identified as being driven by a time-varying κ for each time interval (Fig. S5), again indicating that the population (i.e., the number of bike stations) is essentially fixed throughout the data period.
For the data sets for which Model I is selected, we note that the estimated values of κ are fairly small, ranging from 0.011 (RealityMining) to 0.078 (Interbank). κ is time varying for LondonBike and Highschool, but their values are still small with the maximum value being no larger than 0.02. This suggests that the direct calculation discussed above would not work well for empirical networks [ Fig. 4(a)].

C. Network density
Another global quantity that might be of interest is network density. From the estimates of κ and N p we can write the theoretical network density as As discussed above, the parameter q 0 approaches 0 as N p → ∞. This suggests that in Model I, in which κ is constant, the network density converges to κ/4 as N p (and N) grows. We compare the theoretical and the empirical network density in Fig. 5 (right panels) for CollegeMsg and LondonBike (see Fig. S6 for the other data sets). For the data sets for which Model I is selected (Interbank, Enron, CollegeMsg, and RealityMining), the density monotonically decreases as N increases, approaching the asymptotic value κ/4 (dashed line).
For the other data sets (LondonBike and Highschool), on the other hand, the relationship is nonmonotonic; density increases with N when N is sufficiently large. In Model II, where N p is constant, the density can be regarded as a function of κ, and a shift in κ has two effects on the density. First, an increase in κ leads the network to be denser because it has a positive impact on the probability of two nodes being connected. Second, an increase in κ would cause the number of active nodes N to rise, which has a negative impact on the density. Since there is a finite fraction of inactive nodes when the network is not large enough (i.e., q 0 > 0), the number of active nodes can increase in accordance with a rise in κ. This increases the denominator of the density by definition, which would lead to a reduction in the theoretical density. Indeed, we find that there exists a threshold of N above which the former effect dominates the latter [ Fig. 5(b), right].

D. A more general activity distribution
The estimation methods we proposed above assume that activity parameters {a i } are distributed uniformly, because in many real-world systems we have no prior knowledge about the activity level of (unobservable) resting nodes. Nevertheless, if we could have further information about the system (in addition to N and M), we could also obtain an estimate of the empirical activity distribution that covers the entire set of nodes.
In this section, we propose a method to estimate activity distribution when the total number of potentially active nodes in the system (i.e., N p ) is known. We focus on the systems in which N p is considered to be constant (i.e., systems for which Model II is selected), namely LondonBike and Highschool, and assume that the true N p is given by the total number of active nodes of a day. The implicit assumption here is that nodes that are ready to be active would have at least one temporal edge during a day. We choose a beta distribution, for a ∈ [0, 1], as a general form for the activity distribution. Parameters α and β are estimated such that the estimated N p matches the empirical counterpart.
A generalized version of the nonlinear regression equation [Eq. (12)] is given by [see Eq. (B11) in Appendix B) Note that endogenous variable κ is now expressed as a function of M, taking parameters N p , α, and β as given: N p (N p −1) . The estimation procedure under a generalized activity distribution is then given by the following four steps: (1) For a given combination of (α, β ), obtain the estimate of N p , denoted by N p (α, β ), by implementing the nonlinear least squares on Eq. (15).
The estimation results suggest that the activity distribution is skewed to the left in both data sets (Fig. 6, insets), and the generalized regression equation still well fits the empirical N-M curve (Fig. 6). We note that while the goodness of fit generally improves due to the introduction of additional parameters (i.e., α and β), the fitted curve is little affected by the specification of activity distribution [see Figs. 5(b) and S2].
This suggests that the dynamic hidden variable model well explains the macroscopic fluctuations of empirical networks for alternative specifications of activity distribution.

V. DISCUSSION
We proposed a method to identify the source of scaling in temporal networks, namely the dynamical relationship between the numbers of active nodes and edges. Building on a model including both population and activity dynamics, we showed that these two mechanisms are sufficient to explain the two types of scaling observed in real-world systems. The estimating method we developed enables us to compute the parameters for the activity rhythm κ and the population size N p (and thereby the number of resting nodes N p − N). While an observation of (N, M ) in a particular snapshot is not sufficient to identify the source of dynamics, a sequence of N and M allows for such an estimation. We apply the method to six empirical data sets, and identify for each the main driving factor responsible for the emergence of scaling.
It should be noted that our proposed framework does not depend on whether the network under study is growing or shrinking. As we already pointed out, the only information needed for the method is a time variation of N and M. Indeed, in many real networks such as the six networks we examined, the size of networks does not exhibit a monotonic behavior but rather nonmonotonic shifts, depending on external factors that affect the activity rhythm and/or the population. Thus, the method can reveal the key factor that may lead a network to grow or shrink.
While our framework is useful for understanding the evolution of temporal networks in any contexts, there remain some issues that need to be addressed in future research. First, our method assumes that there are two types of systems, which are described as Model I (i.e., activity rhythm κ is constant and population size N p is time varying) and Model II (i.e., population size N p is constant and activity rhythm κ is time varying). In real-world systems, there may exist an intermediate state in which both the activity rhythm and the population size are evolving with similar time scales. To study those systems, one would need to include additional information other than N and M to inform the model, in order to be able to separate the effects of both mechanisms. Second, one key parameter of the model is the distribution of node fitnesses. Currently, we specified this distribution to be either a uniform or a beta distribution, which gives satisfactory estimates of the dynamical parameters. The method would of course yield more accurate estimates if we could incorporate an empirical distribution of fitnesses. However, measuring those is a complicated task: to do so, one needs to observe the activity levels of totally inactive nodes (i.e., nodes without edges), which is paradoxical. The fitness of a node in the model is indeed a rather abstract property, which integrates many realistic characteristics that depend on the context. Such characteristics can also be time dependent. Third, in the hidden variable model, the structure of the generated network is basically the same as that in a configuration model. Therefore, the model is not sufficient to replicate the empirical structural properties while the aggregate properties are well explained by the model. Explaining the structural and local properties, however, is beyond the scope our paper and should be left for future research. ACKNOWLEDGMENTS T.K. acknowledges financial support from JSPS KAK-ENHI Grants No. 15H05729 and No. 19H01506. M.G. acknowledges that this work was partially supported by the ANR project DATAREDUX (ANR-19-CE46-0008).
T.K. conceived and directed the study. T.K. and M.G. defined the model. T.K. performed the analytical calculations and the data analyses. T.K. and M.G. drafted the final manuscript.
The authors declare no competing financial interests.

APPENDIX A: DATA SETS
The Interbank data set is constructed from bilateral transactions in the online interbank market in Italy between September 4, 2000 and December 31, 2015 (i.e., 3922 business days). The data are commercially available from e-MID SIM S.p.A. based in Milan, Italy [38] From the data we build a temporal network where nodes are banks, with one snapshot per day. For each day, two banks are connected by an edge if a loan is made from a bank to another between 11:00 and 12:00.
The Enron data set is an email-based communication network from the Enron Corporation [7,25] collected from May 11, 1999 to June 21, 2002. From the data we build a temporal network where nodes are employees, with one snapshot per day. For each day, two employees are connected by an edge if at least one e-mail has been sent from one employee to the other between 14:00 and 16:00. The data are taken from [39].
The CollegeMsg data set is an online social network at the University of California, Irvine collected from Mar 23, 2004 to October 26, 2004 [8,9]. From the data we build a temporal network where nodes are users, with one snapshot per day. For each day, two users are connected if one has sent a private message to the other between 14:00 and 16:00. The data are taken from [40].
The RealityMining data set is built from the call data from the Reality Commons project [2] collected from September 24, 2004to January 7, 2005. From the data we build a temporal network where nodes are individuals, with one snapshot per day. For each day, two individuals are connected if there has been a phone call between them or a voicemail has been left, during the 8:00-12:00 time window. The data are taken from [39].
The LondonBike data set describes the trips taken by customers of London Bicycle Sharing Scheme [26] collected on January 12, 2016. From the data we build a temporal network where nodes are bike sharing stations, with snapshots every 20 seconds aggregating the data from a 10-minute sliding time window. For each 10-minute time interval, two stations are connected if there has been at least one trip between them.
The Highschool data set is a face-to-face contacts network recorded in a high school in France on December 6, 2013, using wearable sensors by the SocioPatterns collaboration [1,19]. As in LondonBike, from the data we build a temporal network where nodes are individuals, with snapshots constructed every 20 seconds with a 10 minutes sliding time window. For each 10-minute time interval, two individuals are connected if they have been at least once in contact.