Maximal dispersion of adaptive random walks

Maximum entropy random walks (MERWs) are maximally dispersing and play a key role in optimizing information spreading in various contexts. However, building MERWs comes at the cost of knowing beforehand the global structure of the network, a requirement that makes them totally inadequate in real case scenarios. Here, we propose an adaptive random walk (ARW), which instead maximizes dispersion by updating its transition rule on the local information collected while exploring the network. We show how to derive ARW via a large-deviation representation of MERW and study its dynamics on synthetic and real world networks.

In the last decade we have assisted to a wave of new works studying extreme (rare) events associated to dynamical processes evolving on complex networks [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17]. The necessity to understand unlikely events comes from the fact that, although rare, their appearance determines the future of the system under study, which may be potentially catastrophic, e.g., earthquakes [1]. In this context, researchers have focused on random walks and their load and flow properties [2,3,15,17] for models of traffic in transportation [11,16] and communication networks [4], or on epidemic models and extinction events [6], or again on general orderdisorder [8] and percolation transitions [9,10] to corroborate the robustness of networks. In these settings, rare events are often driven by internal noise, and their understanding could provide us with control mechanisms to keep away from harmful scenarios [5,6,8,10].
Among all dynamical processes evolving on networks, discrete-time biased and unbiased random walks (URWs) stand out as simple and insightful models of diffusion processes on discrete topologies [18]. A fundamental property of random walks is their ability to homogeneously spread over the whole network. Mathematically, the spreading capability of a random walk can be characterized by measuring the entropy production rate. In many scenarios, it is indeed of uttermost importance to design random walks that maximise such entropy production rate in order to spread the most homogeneously. To picture this, imagine to have equal-size groups of random walks with different colors running on a network; at each time, the most homogeneous spreading is obtained with an equal proportion of colors on every node. Such a well mixing, or maximal dispersion, turns out to be particularly useful when information about a node state (e.g., its healthy or infected condition, its availability, etc.) needs to be homogeneously spread to all other nodes in the network [19][20][21], a sought-after property for transportation and ad-hoc networks [22].
It is known that a random walk achieves maximal dispersion when it travels all trajectories of the same length with uniform probability. Such a property char- * francesco.coghi@su.se acterises the so-called maximum entropy random walk (MERW) [20,23] and is used in a myriad of practical cases: testing network robustness [24,25] and navigability [26][27][28][29], predicting links and communities [30][31][32][33] as well as disease associations [34], or assessing neutral quasispecies evolution in biology [35], to name a few. However, in order to build MERW, it is required to know the topology of the whole network before even exploring it [20,21,23]. Apart from the expensive computational cost of defining the stepping rule of MERW (based on calculating dominant eigenvalue and eigenvector of a N ×N matrix, with N the number of nodes), having global knowledge of the network beforehand is an heavy drawback that makes MERW totally inadequate on networks whose structure cannot be entirely determined a priori or changes over time (e.g., growing networks [36] or temporal networks [37,38]). It is therefore fundamental to optimally design dispersive random walks that just make use of local information while exploring the network. Several attempts have been made in this direction, although so far finding only approximate solutions (see for example [21]).
In this Letter, we solve this longstanding problem by proposing an adaptive random walk (ARW) that locally updates its stepping rule based on the structure of the explored network. Without requiring any prior knowledge of the whole topology, ARW outperforms MERW, as it is maximally dispersive on every portion of the network visited and not only on the whole graph. Via a bridge between large deviation theory and network science, MERW can be seen as a rare event of URW. We exploit this to construct ARW as a single-trajectory rare-event sampling algorithm [39,40] that only makes use of local information available to adapt itself-changing its transition probabilities step by step-in order to best spread on the network. Shaping the random walk by only gathering local information is an outstanding property that makes ARW the only sensible choice for optimising the spreading on networks with time-varying topology and on heterogeneous networks. Indeed, imagine to have a network composed by two main modules, connected to each other and topologically very different. MERW initialised on one of the modules is not able to maximise dispersion while exploring it as its stepping rules arXiv:2202.13923v2 [cond-mat.stat-mech] 21 Jan 2023 are based on the 'averaged' structure of the whole network. On the contrary, ARW is optimally dispersive in every exploration phase. In the following, this is made evident by showing that ARW has an entropy production rate closer to the maximal one on the visited portion of the graph if compared with URW and MERW. Moreover, when the network is fully explored, ARW and MERW have similar mixing properties and, in the thermodynamic limit, they become the same.
We start by considering an URW X = (X 1 , X 2 , · · · , X n ) on a finite connected and undirected graph G = (V, E) characterized by a set of nodes V and a set of links E. The topology of the graph is encoded in the adjacency matrix A = {a ij }, where a ij = 1 if the nodes i and j are connected, and a ij = 0 otherwise. We also define the degree of node i as k i = j∈V a ij . URW dynamics is determined by the stochastic transition matrix Π U , with components describing the probability of URW to move from X = i at time to X +1 = j at time + 1. We focus on a particular dynamical observable that characterises the entropic content of a random walk trajectory, Indeed, apart from boundary terms that do not influence our discussion, C n is the logarithm of the probability of URW trajectory divided by the number of time steps n. Noticeably, by taking the long-time limit of the average over all paths of C n , we get the so-called Kolmogorov-Sinai entropy production rate h U [20,21,41], i.e., h U = lim n→∞ C n , interpreted as the mean information generated per time step. For a generic ergodic random walk, it can be written as where ρ = {ρ i } and π = {π ij } are the stationary distribution and the transition probability matrix of the random walk. Eventually, the observable C n is a random variable of the random walk process that represents the fluctuating version of h U , viz. the fluctuating trajectory entropy [42,43]. The finite-time fluctuating nature of C n around its typical value h U = i∈V ρ i ln k i is of interest here. A complete understanding of the fluctuations is given by the probability density P n (c) := P (C n = c), which is known to have the large deviation form with the non-negative large deviation rate function I(c) characterizing the leading behavior of P n (c) and o(n) denoting corrections smaller than linear in n. The focus thus moves onto studying I in Eq. (4), which has a unique zero at c * = h U . The rate function can be calculated by means of the so-called Gärtner-Ellis theorem [43][44][45], which states that I is given by the Legendre-Fenchel transform of the scaled cumulant generating function (SCGF) as this last is differentiable for a finite graph [45]. In particular, as URW is an ergodic Markov process, the SCGF can be obtained as where ζ s is the dominant eigenvalue of the so-called tilted matrixΠ s = (π s ) ij , with components Hence, the likelihood of fluctuations can be studied using the SCGF Ψ rather than the rate function I. However, calculating the probability of fluctuations is only a first step towards the prediction and control of rare events. It is indeed important to also understand how these extreme events are created in time. In this context, we construct the driven process [46][47][48] associated with a given fluctuation C n = c. This process is a locally-biased version of URW [12,47] and its transition probability matrix is given by where r s is the right eigenvector associated with ζ s . The driven process is still Markovian and ergodic and can be interpreted as the effective dynamics of the subset of paths of URW leading to a fluctuation C n = c [12,13,47]; to match such a fluctuation [49], the Laplace parameter s must satisfy c = Ψ (s) .
Eventually, the entropy rate of the driven process can be obtained taking Eq. (8) and plugging it into Eq. (3) and can be expressed in terms of the SCGF [12] as In Sect. II of the Supplemental Material (SM) we show that h(s) in Eq. (10) has a global maximum for s = 1, i.e., where ζ 1 is the dominant eigenvalue of the adjacency matrix A. Replacing s = 1 in the driven process, Eq.
allowing us to interpret MERW on a network as a biased random walk creating a rare event fluctuationgiven by replacing s = 1 in Eq. (9)-of URW [12]. This result shows, on the one hand, that we can sample a particular rare event of URW by simulating MERW and, on the other hand, that MERW can in principle be obtained from URW by opportunely conditioning on a certain rare event of the observable C n in Eq. (18). The latter observation is key to introduce our adaptive random walk (ARW). As we will show in the following, such ARW, through successive local adaptations of URW, reaches maximum entropy while exploring the network, i.e., much before the entire graph has been visited, and eventually converges to MERW on the whole network. To construct ARW, we develop here an algorithm based on a rare-event sampling scheme [39,40,[50][51][52][53]. According to this, the random walk updates its transition probability matrix at each time step, in order to typically visit a specific rare event of URW, obtained fixing s = 1 in Eq. (9). We refer the reader to Sect. I of the SM for a general formulation of the sampling algorithm valid for all additive observables and s ∈ R.
Formally, ARW is a discrete-time process Y = (Y 1 , Y 2 , · · · , Y n ), where Y n ∈ V is the position of ARW on the graph at time n. The core of ARW is based on an adaptive power method to solve the following dominant eigenvalue equatioñ which is known to be cardinal to construct MERW in Eq. (12). More in detail, our adaptive power method simulates single Markov chain transitions with importance sampling [54]. In particular, supposing that ARW is located on the node i at time n, i.e., Y n = i, the next step is proposed according to an estimate of MERW in Eq. (12) that reads where Z is the normalisation factor, i 0 is an a-priori fixed node, and r (n) 1 is the n-th time estimate of the eigenvector centrality [39,40]. This last is given by the stochastic-approximation [55] formula based on an asynchronous update via the indicator function 1 Yn=i and on the learning rate λ(n). The indicator function selects the i-th component of the eigenvector centrality to be updated only when the process Y , at the n-th time step, has visited node i. Additionally, the learning rate λ(n)-commonly used in stochastic approximation protocols [50,51,55,56]expresses how much of the information that has been learnt up to time n is used to update r 1 in the next time step. Note that, in the following, all ARW simulations are obtained by using a learning rate λ(n) = 1/((n + 1) β ), with β = 0.1, in Eq. (15). Although there is no theory to a-priori determine the learning rate, there are mathematical conditions that λ needs to satisfy, and one can carry out numerical simulations on benchmark networks to finely tune the value of β. We show how to do so in Sections III and VI of the SM (see Fig.  S2-S6, Fig. S9, and Table S1), further noticing that our approach, in order to set off an 'optimal' value of β, does not require to compare the performance of ARW with MERW, or with any other spreading process that requires global knowledge of the network.
ARW is randomly initialized on a node of the network with a normalized random right eigenvector r (0) 1 and evolves according to the fully local rules in Eq. (14) and Eq. (15). At each time step it tends to optimize the spreading-aiming at maximum entropy production-on the portion of the network visited. In the long-time limit, it will eventually converge to MERW of Eq. (12) as r (n) 1 → r 1 , r (n) 1 (i 0 ) → ζ 1 , and Z → 1 . We insist on the fact that differently from MERW, ARW does not need to know the full topology of the network since the beginning, as it learns it on the run. Thanks to this, its entropy production rate stays always close to the maximum rate on the visited portion of the graph. This is drastically different from MERW which does not maximize the entropy while it explores the graph, but reaches optimal spreading only when the whole network has been visited.
We show this in Fig. 1(a)-(c) where we compare the spreading properties of single trajectories of ARW, MERW, and URW on two network models, namely Erdös-Rényi, and Barabasi-Albert, and on a realworld man-made network. The last network describes an air transportation system: each node is a city and two nodes are connected if at least one airplane flew between the two cities in the time window [11 Jan 2000 -10 Jan 2001] [57]. In this last context, maximizing entropy production rate allows manufacturers, for example, to homogeneously spread goods around their factories. Each of the three processes is initialized on a randomly selected node of the network, and evolves according to its transition probability matrix. As the walker moves hopping through previously-unvisited links, we calculate its entropy production rate h (solid line) and compare it with the optimalh (dashed line) given by the logarithm of the dominant eigenvalue of the adjacency matrix associated with the portion of graph made by all (and only) previously visited links. The entropy production rate h of each process is calculated via a modified version of Eq. (3), that is This takes into account the number of visited links , while 0 otherwise, and the stationary distribution ρ(M ) is calculated as the left eigenvector of Π(M ) = {π ij (M )}. As the solid green line is indistinguishable from the corresponding dashed one, ARW has always-while exploring the networkoptimal spreading performances. On the contrary, by comparing the corresponding solid and dashed lines, the performances of MERW and URW are always suboptimal. In particular, MERW reaches maximum entropy production-comparable to that of ARW-only when the whole graph has been visited.
Moreover, in Fig. 1(d)-(f) we plot the median (solid line) of the relative difference between the entropy production rate h and the optimalh, together with first and third quartiles (shaded area). The median and the quartiles are calculated over an ensemble of 1000 trajectories and are used in place of the mean and standard deviation because of the unknown distribution of h aroundh. This gives further evidence of the fact that ARW performs better than MERW (and URW) at maximizing the spreading while discovering the structure of the graph. Fig. S7 of the SM shows the entropy production rates of ARW, MERW, and URW on the graph induced by the visited nodes, i.e, including links that have not yet been visited. In this case, ARW is still outperforming MERW and URW on the Erdös-Rényi graph, but is only marginally better on the other networks. The optimal dispersion of the fully-local ARW on the visited links of the network comes at a price: ARW takes longer than URW and MERW to cover the whole graph. This is consequence of (i) an initial so-called warm-up phase in which ARW is localised in the region where it was initialised, and (ii) a typical exploration time of the network. During the warm-up phase, our process finely tunes the eigenvector centrality and the transition probability matrix to set off an efficient exploration of the network. As shown in Fig. S8 of the SM, where we plot the average number of time steps needed to discover new links, ARW remains indeed localised in the first few visited links. This is also the main reason why the network coverage time-the number of time steps to visit all links-is, on average, 10 2 − 10 4 time steps longer for ARW than for MERW or URW (see Fig. 2). We find that the coverage time T is related to the total number of links L as where α is the scaling exponent. Remarkably, after the initial warm up-which is evident by the overall upward shift of ARW power-law fit-the scaling exponents of ARW and MERW are similar (α A ≈ 2.9, α M ≈ 2.7), but larger than the one of URW which has no global constraint to satisfy (α U ≈ 1.3). We remark that optimising entropy production rate and coverage time are two different tasks with the former much harder than the latter. Although having both properties is certainly appealing, our main goal is to optimise dispersion. However, in an attempt to also optimize coverage time, we point out that in our studies, the initial right eigenvector r 1 plays a key role in determining the initial warm-up time and the overall accuracy of ARW in optimizing the spreading while exploring the network (see Fig. S10 of the SM). We hope that our work will stimulate further investigations to explore the trade-off between optimal spreading accuracy and exploration times.
In this Letter, we have proposed an adaptive random walk that has optimal spreading properties, outperforming the well-known MERW. Via a large-deviation tilting on the fluctuating trajectory entropy observable, ARW typically observes a maximum entropy production rate while exploring the network, exploiting only local information. Besides the theoretical novelty driven by a large deviation study of random-walk rare events, we believe that our work can be a fundamental step towards the study of network information spreading [58,59] in all such cases where no prior knowledge on the network is available, or when the network is changing in time [37,38,60]. ARW could also be used to study dispersion properties of other dynamical processes on real networks, e.g., aiming at optimal exploration in congested networks [61,62]. Furthermore, the algorithm at the core of ARW could also be used to sample other rare-event fluctuations associated with any additive observables of random walks [39,40].

CODE
All the code used in the manuscript is available at https://github.com/gabriele-di-bona/ARW.

ACKNOWLEDGMENTS
FC is deeply grateful to Pierpaolo Vivo and Hugo Touchette for valuable comments and suggestions in the writing stage of the manuscript.

ADAPTIVE RANDOM WALK ALGORITHM
In this Section we give details on the adaptive random walk process Y = (Y ) n =1 which is based on an efficient algorithm to sample large-deviation rare events [39,40]. Differently from the main text, we will not restrict to the particular case of a MERW, but we will describe the numerical scheme in its full generality. However, all the results in the main text (MT) can be obtained fixing f (X ) = k X in Eq. (18) and replacing s = 1 in Eq. (19) and following.
We consider a Markov chain X = (X ) n =1 evolving in a finite discrete state space Γ of N states (this is the graph G = (V, E) in the MT) according to the (irreducible and aperiodic) transition matrix Π with elements π ij that characterize the probability of going from a state X = i at time to a state X +1 = j at time + 1. Associated with the Markov chain X, we consider a purely time-additive observable of the general form where f is any function of the state. As mentioned in the MT, one can estimate the likelihood of fluctuations of C n by calculating the scaled cumulant generating function (SCGF) in Eq. (5)MT (and Eq. (6)MT) and by Legendre-Fenchel transforming it to obtain the large deviation rate function, i.e., the leading exponential behaviour of the probability density in Eq. (4)MT. However, this procedure is known to be difficult since it involves the calculation of the dominant (Perron-Frobenius) eigenvalue ζ s of the non-negative tilted matrix see [39,43,45,47] for further details. This problem can be mathematically formalized as finding the solution of the following spectral equationΠ where along with ζ s we have the right eigenvector r s . Both these are fundamental to define the driven process, which is characterized by a transition matrix whose components are as in the first equality of Eq. (8)MT.
In the following, we show a numerical method that approximates the dominant right eigenvector r s and, consequently, the dominant eigenvalue ζ s , and so the SCGF Ψ too (Eq. (6)MT), by making use of an adaptive stochastic power-method scheme. The core of the algorithm is rooted on works of reinforcement learning for risk-sensitive control of Markov chains [50,51,63,64] and was also recently adapted to estimate large deviation functions of continuous-time diffusion processes [52,53]. Here, we repropose a discussion of the method that appeared in [39].
A bare power method scheme recursively multiplies an arbitrary vector v 0 := r (0) s (the superscript (0) refers to the zero-th step in the numerical scheme) by the transition matrixΠ s . In this case, we can write where r (n+1) s is the approximation at the (n + 1)-th step of the dominant right eigenvector normalized by the (n + 1)-th estimate of the dominant eigenvalue ζ . This can be calculated as where, a priori, the component i 0 is arbitrarily chosen, but here we fix it to be the maximum component of r (n) s in magnitude such that the right eigenvector is normalized according to its infinity norm 1 . As n → ∞ we then have Numerically, it is known that the power method scheme in Eq. (22) is particularly inefficient for large (and very connected) state spaces, because at each step one matrix multiplication needs to be calculated. To overcome this problem, as we will see in the following, one may consider to implement a numerical scheme that simulates single Markov chain transitions rather than keeping track of the full structure of the state space in the matrix multiplication. These transitions could be naively proposed according to the tilted matrix in Eq. (19). Notwithstanding this, due to the presence of the exponential factor, such an approach could lead to divergences. For this reason, one considers importance sampling [54] and simulates the (n + 1)-th transition step according to the n-th estimate of the driven process in Eq. (21), that is With a simulated transition from i to j the estimate of the right eigenvector in Eq. (22) therefore reads where we normalize the right eigenvector with r in magnitude) and, as usual in importance sampling, we unbias the transition multiplying by the likelihood ratio 2 π ij /(π (n) s ) ij . Notice that it is only in the limit n → ∞ that the estimate in Eq. (25) tends to the driven process of Eq. (21). Indeed, the normalisation appearing in Eq. (25) has a pivotal role in correctly estimating the driven process transition probabilities, and it is only in the limit n → ∞ that such normalisation converges to 1.
Eventually, convergence of Eq. (26), as in Eq. (24), is realized making use of a stochastic approximation scheme. Stochastic approximation methods [55,56] are discrete-time (stochastic) recursive schemes similar, in form, to gradient descent methods. They make use of an annealing schedule, also called learning rate, to quantify, at each time step, 'how much' of the new information coming from an algorithm is used to update the value of an observable of interest in the next step. In our case, this observable of interest is the dominant right-eigenvector, and the stochastic approximation scheme applied to the algorithm in Eq. (26) reads where 1 Yn=i makes the update asynchronous, viz. it selects only the state i at time step n, and λ is the learning rate, or annealing schedule, which should satisfy certain mathematical features [55] and is here assumed to have the general form A proof of the convergence of this numerical method is given in [55], where it is shown that the numerical scheme tracks the solution of a particular nonautonomous differential equation. As mentioned in the MT, the learning rate is a fundamental ingredient of the stochastic approximation scheme we have introduced as it really determines how much of what has been learnt up to time n is used to update the right-eigenvector r s in the next time step. Generally speaking, apart from the mathematical conditions given in [55] which ensure the mathematical convergence of Eq. (27), to our knowledge there is no general recipe to determine the 'optimal' form of the learning rate λ one should choose to have good-fast-numerical convergence of Eq. (27). Indeed, λ may depend on many different features related to the particular model investigated: the initial condition r (0) s , the local structure of the state space encoded in π ij , etc. Hence, an exact functional form of λ may be very hard to find and is beyond the scope of our work. Generally speaking, the key rule is that λ should not decay neither too quickly nor too slowly in time as the random walk Y needs some time to learn the local features of the network, but it should not be allowed too much time to avoid slow or inaccurate convergence.
The remarkable advantage of this adaptive stochastic numerical scheme, based on the combination of spectral methods with important sampling, is that it allows us to estimate the SCGF Ψ by simulating a single trajectory of a Markov chain that learns on the run the particular rare event of interest c of Eq. Generally speaking, a direct implementation of the numerical scheme presented above, although extremely efficient from a computational point of view, could still lead to noisy results in the calculation of the dominant eigenvalue and right eigenvector in absence of good initial guesses for r s and ζ s . In the simulations used for Fig. (1)MT and Fig. (2)MT we have initialized the vector r s with random components sampled from the [0, 1] uniform distribution and then normalized the vector according to the L 1 norm. We will detail this point in Section 19. We also address the reader to Chapter 4 of [39] for a more general discussion.

Pseudocode
For full transparency of the work done, we propose in the following a pseudocode for the ARW, obtained by implementing the algorithm presented in the previous Section for the particular case of a transition matrix Π associated with an URW and tilting parameter s = 1. We address the reader to the GitHub folder 3 where we have collected all the codes used to draft the manuscript.

DETAILS ON THE ENTROPY PRODUCTION RATE
As we have stated in the previous Section, in a large deviation context the SCGF is fundamental to obtain the entropy production rate h(s). The SCGF can be calculated starting from the tilted matrixΠ s = (π s ) ij , which in the case shown in the MT has components (π s ) ij = π ij e s ln ki = π ij k s i = a ij k s−1 where we have used f (i) = k i as function of the state. Hence, the SCGF is given by the logarithm of the dominant eigenvalue ζ s ofΠ s , that is Ψ(s) = log(ζ s ) .
Eventually, the entropy rate of the driven process can be obtained taking Eq. (8)MT and plugging it into Eq. (3)MT, and can be expressed in terms of the SCGF as in Eq. (10)MT [12]. The entropy rate of the driven process h(s) (given by Eq. (10)MT) has a unique global maximum in s = 1. This can be seen by taking the derivative of h(s) with respect to s and putting it equal to 0. This reads We notice that if the SCGF Ψ exists, then it is analytic and convex [43,45]. Hence, we have that Ψ (s) ≥ 0. As a consequence, Eq. (31) is satisfied either for s = 1 or when Ψ (s) = 0. Since for finite graphs Ψ (s) = 0 can only be true for s → ±∞ (otherwise the function would show some non-analyticities for finite values of s), we are left with only the critical point s = 1, which can be directly checked to be a maximum of the function h. In Fig. 3, we plot the entropy rate h(s) for an Erdös-Rényi graph with 100 nodes and average degree 3, computed using Eq. (30) and Eq. (10)MT.

SETTING OFF THE LEARNING RATE FOR THE ADAPTIVE RANDOM WALK
In this Section we give details on how to set off the learning rate λ in Eq. (28) appearing in the ARW algorithm (see Eq. (27)). As we mentioned in Section , an exact functional form of the learning rate λ as a function of the network topology and the initial conditions does not exist in the literature and is, arguably, hard to find. Even harder-perhaps impossible-to find is a good (optimal), form of the learning rate without having any information on the actual exploration space. These are very much open questions that people in the field of stochastic approximation and reinforcement learning are pursuing. Nonetheless important, here, we do not seek to find an answer for them, and we limit ourselves to show a possible numerical way that one can implement to set off a good learning rate.
First of all, we make use of the general learning rate form of Eq. (28) that was proposed and used in many other works related to stochastic approximation methods, see for instance [50? , 51]. Given this form, the problem boils down to finding a good exponent β. Although this must satisfy the mathematical conditions stated in [55] to guarantee numerical converge, this requirement is not enough to select one single β or a small set of them. Indeed, it is often the case that although the mathematical convergence is guaranteed, numerically one faces a full spectrum of different behaviour: too slow convergence, too fast convergence, numerical instability-all leading to bad learning. To avoid such a situation, a numerical tuning of β appears necessary.
To set off a good value of β that we have used in all our simulations with the most different topologies, we use Erdös-Rényi random graphs as toy models. As shown in the following figures, the fact that the same β, 'optimally' set for Erdös-Rényi random graphs, appears to work as well also for scale-free random graphs is symptom of numerical stability. In fact, this numerical stability appears to be very pronounced as the same β shows to work well even on Erdös-Rényi graphs of very different mean degree.
In Fig. 4, and 5, we respectively analyse and plot how the entropy production rate calculated when M ∈ {100, 500, 1000} or M ∈ {0.01E, 0.1E, 0.3E} links of a graph (E is the size of the set of links in the graph, introduced to show that we do not need to know the exact number of links explored) have been explored by ARWs with different exponents β ∈ {0.01, 0.05, 0.1, 0.25, 0.5, 1.0} varies with the mean degree of Erdös-Rényi and Barabasi-Albert random graphs. It appears evident that all values of β 0.25 work very well, reaching a  very high entropy production rate. We stress here that although we have not compared the entropy production rate of ARWs with the real optimal one given by the logarithm of the adjacency matrix of the M links explored, by the compression of the lines with decreasing β one may, correctly (see comparison with Fig. 6, and 7), infer that those βs are the best ones, i.e., leading to a numerical convergence of the entropy production rate given by Eq. (32) very close to the optimal one.
In order to further demonstrate that the previous two figures already allow to infer the best βs, we plot in Fig.  6, and 7, the relative error of the entropy production rate measured by (32) with respect to the optimal one given by the logarithm of the adjacency matrix of the M links explored.
From Fig. 6, and 7 two further interesting remarks are in order. Firstly, by the fact that lines are, on average, mostly flat with varying k we conclude that the relative error of the entropy production rate does not really depend on the mean degree. Therefore, we can test the best β on a single Erdös-Rényi random graph with a certain number of nodes and average degree and use that result for any other graph. Secondly, it indeed seems the case that any value of β 0.25 can be used to get an optimal entropy production rate. However, ARW algorithms with these βs are not all numerically stable the same way, as we show in Tab. I. Indeed, it turns out that for values of β that are too small, the algorithms do not finish the simulation. Eventually, we conclude that the value β = 0.1 is a good trade-off between numerical stability and convergence properties to the optimal entropy production rate.
To conclude, we show another extended numerical study analysing the behaviour of the entropy production rate, and its relative error with respect to the optimal value, as a function of β out of single trajectories, starting at random positions, that cover only a very small portion of M = 100 or M = 1000 links of big networks of 10000 nodes. We plot these results in Fig. 8.
By investigating Figs. 8 (c) and (d) it is evident, once again, that the most interesting values one should consider in a simulation are characterised by a plateau for β 0.4. Larger values lead to a smaller entropy production rate. Notice also that since the entropy is an extensive observable, the larger the number of links visited, the bigger the value of the entropy rate. Figs. 8 (a) and (b) are not affected by the size of the set of links visited and give further insights: the smaller the value of β, the smaller the relative error of the entropy production rate, but also the bigger the fluctuations that may be related to the numerical instability we previously discussed.  4,5,7,9,12,15,18,21,24,27,30 of an Erdös-Rényi random graph with 1000 nodes. (b,d,f) Similarly to (a,c,e) but for a Barabasi-Albert random graph with 1000 nodes and mean degree k = 4, 6,8,10,12,14,16,18,20,22,24,26,28,30. The shaded area around each solid line is delimited by the first and third quartiles.  4,5,7,9,12,15,18,21,24,27,30 of an Erdös-Rényi random graph with 1000 nodes. (b,d,f) Similarly to (a,c,e) but for a Barabasi-Albert random graph with 1000 nodes and mean degree k = 4, 6,8,10,12,14,16,18,20,22,24,26,28,30. The shaded area around each solid line is delimited by the first and third quartiles.

MEASURING THE ENTROPY PRODUCTION RATE
In Fig. (1)MT we compare the spreading performances of ARW, MERW, and URW by calculating the entropy production rates of each process while they explore the network. Clearly, there is some freedom in the choice of what to consider as explored up to a certain moment in the evolution of each process. The most conservative choice, which is implemented in the calculations appearing in Fig. (1)MT, is to consider the graph visited composed only by the links (and connecting nodes) that have actually been crossed by the process in consideration. This is the case we face in the reality when, for instance, a person takes choices leading them to unknown situations that, in turn, are connected with other choices that, however, they do not know until the moment they face that particular situation. In such a scenario, the stochastic matrix entering in Eq. (17)MT at a certain time n for each process will have to have non-zero entries only for the edges effectively crossed. This means that a redesign of the transition matrix-cancelling non-zero entries for unvisited links-for all the process while they run on the network needs to be done.
Another possibility is to consider the graph discovered composed by all the links that connect visited nodes. In the real world, this case happens every time one does not need to cross a link between two nodes to visit them if these have already been visited via other paths. Furthermore, a more theoretical reason pushing for this attempt comes from the implementation of the ARW as in the algorithm 1: the right eigenvector is built on the nodes, not on the links, and is updated every time a node is visited, independently on the link used. Similarly, the transition matrix of the estimated driven process is updated for all the nodes connected to the one the random walk has just visited. Similarly to the previous case, the transition matrices of ARW, MERW and URW need to be redesigned-cancelling non-zero entries for links that connect at least one unvisited node-every time a new node in the network is discovered. Therefore, in this scenario, the entropy production rate is calculated as where V (n) is the set of visited nodes up to time n, and with a little abuse of notation π ij (M ) = π ij /( i,j ∈V (n) π ij ) if (i, j) ∈ V (n), 0 otherwise, and ρ(M ) is calculated as the dominant left eigenvector of Π(M ) as in the case treated in the MT. In Fig. 9 we re-propose Fig. (1)MT for the entropy production rate calculated with Eq. (32). Noticeably, the ARW, obtained with a learning rate λ(n) = 1/((n + 1) 0.1 ) in Eq. (14)MT or Eq. (27) for s = 1, is only marginally better than MERW and URW on the Barabasi-Albert and on the air transportation network (the same analyzed in the MT), although still much better over the Erdös-Rényi random graph. This is evidence of the fact that the ARW is effectively optimized only over the links it actually crosses. For this reason, we argue that the entropy production rate needs to be calculated over the graph of visited links as done in Fig. (1)MT.

LINK-EXPLORATION TIME
As mentioned in the MT, the ARW is characterized by a long warm-up time that prolongs the time to explore other links in the network. During the warm up, the ARW is localized in very few links (1 to 3) and by continuously crossing them it finely tunes the eigenvector centrality and the transition matrix estimate of the driven process in the visited portion of the graph, so as to set off an optimal exploration of the network. Indeed, once the warm up is over, the coverage time scaling exponent of the ARW is very similar to that of the MERW (see Fig. 2MT). In Fig. 10 we compare the link exploration time of MERW, URW, and ARW over the same networks considered in Fig. 1MT and in Fig. 9. For each simulation, we save every time step in which the random walk explores a new link for the first time, and we plot them for a random instance of each process in Fig. 10(a-c), while in Fig. 10(d-f) we plot the medians, first and third quartiles, taken over 1000 trajectories. Differently from Fig. 2MT, here we can analyze how the discovery rate of the process decreases or increases while the network is discovered highlighting the temporal extension of the warm-up time as well as the number of links involved. As mentioned, this warm-up time is responsible for the longer-of about 10 2 to 10 4 steps-link coverage time of the ARW with respect to MERW and URW.

IDEAS ON HOW TO OPTIMIZE THE ADAPTIVE RANDOM WALK
Such a long warm-up time seems to be necessary for the ARW to set off optimal initial conditions (in terms of right eigenvector and driven process transition matrix) for the exploration of the entire network. However, it also seems that the warm-up time can be shortened at the expense of the optimal spreading of the process. Indeed, the initial condition for the right eigenvector and the entropy production rate play as two conjugated quantities: the choice of the initial right eigenvector r (0) 1 influences both the spreading properties and the overall coverage time of the process. We would like to stress here that, conversely, the learning rate λ(n) appearing in the algorithm in Eq. (27) plays only a little role in determining the full coverage time of the network. Rather, it is pivotal in characterising the convergence of the algorithm in Eq. (27), determining how high-maximal/optimal in the best case scenario-is the entropy production rate of the process obtained. Indeed, as we can see from Fig.  11, by setting off different processes with the most various values of β in the learning rate in Eq. (28), we show that all processes have, roughly, the same full coverage time of the network, although very different spreading (entropy production rate) properties. We have run 1000 simulations for each process, with random initial node, on the giant connected component of an Erdös-Rényi random graph of 1000 nodes and average degree 3. On the one hand, notice that in Fig. 11(a) the link exploration times of the ARW processes are almost all identical, higher than those of MERW and URW, while, in Fig. 11(b), the difference between the entropy production rate of the ARW process and the optimal one is much lower for lower values of β. Furthermore, notice that for β ≤ 0.25 the ARW is much better than the MERW, while for higher values of β the optimal convergence is not reached. In particular, in our simulations in the MT we keep β = 0.1 as a good trade-off between numerical stability-see Tab. I where we compare for various choices of β the number of times a simulation goes in overflow (# Errors)-and convergence to the maximal entropy production rate. As previously mentioned below Eq. (28),   Fig. 11 we compare the number of times the simulation has crashed (# Errors) on a total of 1000 simulations for each process considered, possibly because of an overflow. Evidently, the numerical stability of the algorithm behind ARW decreases the smaller the β.
we believe that this result can be further improved by analyzing more numerical simulations and various network topologies. All this, however, is beyond the scope of our paper.
In the MT we considered r (0) 1 to be an L 1 -normalized random vector. In Fig. 12 instead, we study the new ARW, which is essentially equivalent to the ARW so far considered, apart from r (0) 1 that, in this case, is a non-normalized random vector with components sampled from the uniform distribution over [0, 1].
Interestingly, ARW is much quicker than ARW in exploring the network-it has a smaller scaling exponent α A -see Fig. 12 (a) and (b) and it has about the same overall link coverage time of MERW. Furthermore, similarly to the MERW, it drastically decelerates in the last phase ( Fig. 12(a)). Notwithstanding this, its spreading performance are, on average, no better than MERW (Fig. 12(c)(d)) and certainly not as good as ARW.
This is only a preliminary study that shows an interesting interplay between initial conditions and overall spreading performances of the adaptive random walk. Further studies in this direction may help to find the best conditions for the ARW process to show short link exploring/coverage time and maximal entropy production rate while exploring the network.