The arrow of time across five centuries of classical music

The concept of time series irreversibility -- the degree by which the statistics of signals are not invariant under time reversal -- naturally appears in non-equilibrium physics in stationary systems which operate away from equilibrium and produce entropy. This concept has not been explored to date in the realm of musical scores as these are typically short sequences whose time reversibility estimation could suffer from strong finite size effects which preclude interpretability. Here we show that the so-called Horizontal Visibility Graph method -- which recently was shown to quantify such statistical property even in non-stationary signals -- is a method that can estimate time reversibility of short symbolic sequences, thus unlocking the possibility of exploring such properties in the context of musical compositions. Accordingly, we analyse over 8000 musical pieces ranging from the Renaissance to the early Modern period and certify that, indeed, most of them display clear signatures of time irreversibility. Since by construction stochastic processes with a linear correlation structure (such as 1/f noise) are time reversible, we conclude that musical compositions have a considerably richer structure, that goes beyond the traditional properties retrieved by the power spectrum or similar approaches. We also show that musical compositions display strong signs of nonlinear correlations, that nonlinearity is correlated to irreversibility, and that these are also related to asymmetries in the abundance of musical intervals, which we associate to the narrative underpinning a musical composition. These findings provide tools for the study of musical periods and composers, as well as criteria related to music appreciation and cognition.


I. INTRODUCTION
The quantitative description of the structure in musical compositions has a long history of interdisciplinary research, with contributions from musical theory, information theory, mathematics to physics. Traditional quantitative analysis of the temporal structure underlying musical pieces have mainly addressed linear correlations, such as the ones captured by spectral (Fourier) analysis, starting from the pioneering work of Voss and Clarke [1,2] and subsequently followed by a wealth of more in-depth analysis [3][4][5][6][7][8][9][10][11][12][13]. It is nowadays widely accepted that music presents a so-called 1/f power spectrum and that this is a fingerprint of "appealing sound" [14][15][16][17][18][19]. However, recent evidence has challenged this vision, as it has been suggested that pleasantness could be also related to nonlinearities present in music compositions, a property which by definition is not captured in the power spectra [20]. These findings motivate further exploration into quantitative ways of measuring structure in music compositions that goes beyond linear theories. Amongst others, musicians and musicologists have addressed the breakdown of continuity and temporality [44], the effect of asymmetry in melody [49] and the importance of musical irreversibility [47]. In all cases, a relationship with pleasantness has been explored.
Inspired by both statistical physics and nonlinear dynamical concepts, here we introduce statistical measures for irreversibility, nonlinearity and asymmetry and present a description of music scores by gauging their interrelations. We primarily explore to which extent classical music manifests statistical time irreversibility and further on introduce nonlinearity and asymmetry. Simply put, a stationary signal is (statistically) time reversible if the statistical properties of the signal are invariant under time reversal [24], whereas the signal is statistically irreversible in the opposite case. For instance, white noise is a stochastic process known to be statistically reversible: if one listens to temporal white noise and subsequently to the same signal after time reversal, it is not possible to distinguish not only which is which, but also whether they sound different. The notion of statistical time irreversibility, which has been associated with an "arrow of time", has deep relations in non-equilibrium physics with concepts such as dissipation and entropy production. For instance, the amount of entropy produced by a thermodynamic system out of equilibrium has been linked, in non-equilibrium steady states (NESS), to the extent in this system displays time irreversibility [21][22][23]. Interestingly, it is well known that a large family of stochastic processes -which include pink or 1/f noise as a special case-are statistically reversible [25]. The presence or absence of statistical reversibility is a priori a well-suited concept to explore to which extent the regularities and patterns present in musical compositions go beyond linear correlation structures. Standard methods that estimate irreversibility in (discrete) stationary signals usually require long time series sizes for an accurate estimation when the alphabet (number of different states) is large, simply because the amount of possible m-grams scales exponentially with the alphabet size. This is a problem expected to emerge in musical compositions, as these are seldom large, typically consisting of sequences of some hundreds of notes, and the alphabet size (e.g. the number of different notes involved in the piece) is rarely exponentially small. Here we leverage on a recently introduced approach, horizontal visibility graph irreversibility (HVG-I) [32,33], which we argue actually bridges this gap and is able to extract meaningful measures of statistical irreversibility in short sequences and applies to both stationary and non-stationary signals. We extend the method to deal with short sequences by defining a measure of HVG-irreversibility -which we can link to entropy production-and a confidence index which states when a certain irreversibility value is genuine or, on the contrary, is just a finite size effect and therefore spurious. Equipped with these tools, we can then explore time series irreversibility in music. Interestingly, we find that a large amount of compositions indeed display time irreversibility and therefore can be understood as signals generated by systems which operate out of equilibrium and producing entropy. We explore in detail the relation between irreversibility, entropy production and the presence of nonlinear temporal correlations in music, establishing that irreversibility is a key feature of musical compositions which is not related to linear information displayed by the power spectrum, and we are finally able to interpret such fingerprints in terms of musical composition.
The rest of the paper is as follows: in section II we present the database, which consists of over 8000 musical compositions from 77 different composers spanning several centuries and different musical periods, from the Renaissance to the beginning of the Modern Period. In section III we present the theory and methods used to estimate time irreversibility and entropy production in music. In section IV we present the results from these methods and complement them with additional characterisation provided for nonlinearity and interval asymmetry. The former attempts to quantify the temporal correlation structure which persists in musical compositions once linear temporal correlations are removed, whereas the latter is a strongly musical-based notion. We provide a global picture by comparing the performance and relations between irreversibility, nonlinearity and interval asymmetry, and in section V we conclude.

II. DATASET
We collected 8856 midi files of 77 different composers extracted from the Kunst der fugue midi dataset [36]. Since each piece usually incorporates different voices (different time series, each one corresponding to a different voice), a first task was to decide which of these voices should be considered as the main voice or pitch sequence of the piece (note that the full, multivariate analysis could be done as well, and we leave that approach for a future work). To choose the pitch sequence in multi-voice pieces, we used the following two-step criterion: 1) the pitch sequence must be longer than 30 notes and 2) the sequence should have the largest number of different notes from all sequences in the same piece. With this criterion we are assuming that the selected pitch sequence would have most of the relevant features in the musical piece (such as the melody or the theme with its variations). We checked that applying the criterion above was unambigious, and only one sequence could be extracted from each piece. We processed each midi file by parsing it into a comma separated value (CSV) [37] format and with the aid of a Julia script [48] we extracted the pitch sequence (see left panel of Fig.1 for an illustration). Some examples of pitch sequences extracted from individual pieces are depicted for illustration in the right panel of Figure 1.
To have a rough idea of the characteristics of these sequences, in the right panel of Figure 2 we display the sequence size histogram, whereas in the left panel of the same figure we depict the distribution of the number of different notes (i.e. the alphabet) per piece. As expected, typically the alphabet is too large (in principle there are 128 symbols, and in practice the average alphabet size is 30) and the sequence size too short (median 427) for standard time irreversibility methods -such as comparing the frequency of n-grams in the forward and backward sequence-to be  applicable, hence motivating the use of graph-theoretic approaches such as the one described in the next section.

A. Reversibility and entropy production in stationary systems
A stationary process is said to be time reversible if the joint probability distribution of the forward and backward process are statistically equivalent. More concretely, let S = (x 1 , x 2 , . . . , x N ) be a time series of N data, and denote S * = (x N , x N −1 , . . . , x 1 ) the backward time series. The forward and backwards joint distributions are denoted respectively P F (N ) := P (x 1 , x 2 , . . . , x N ), P B (N ) := P (x N , x N −1 , . . . , x 1 ). We say that the time series S is statistically time reversible if and only if P F (m) d =P B (m), ∀m = 1, . . . , N , where d = should be interpreted here as having two distributions which cannot be distinguished. Accordingly, statistical time reversibility is usually known as the property of a time series whose statistics remain the same when the series is flipped. Note that in practice P F (m 1) are hard to estimate, and in the event that only a single realisation S is available, P F (N ) cannot be estimated at all. In those cases it is customary to estimate P F (m) and P B (m) for 2 ≤ m N , since a sufficient condition for rejecting time reversibility is to reject indistinguishability for small m. Gaussian linear processes such as white noise or colored noise, or conservative chaotic processes such as Hamiltonian chaos are statistically time reversible, and related to processes in thermodynamic equilibrium in statistical physics. Nonlinear stochastic processes, or dissipative chaotic processes on the other hand are generally found to be irreversible [25], and are associated to processes that operate away from equilibrium in a thermodynamic sense.
There are various possible approaches to quantify the degree of irreversibility, starting from the obvious choice of comparing the m-gram statistics in S and S * , to more exotic approaches [25,26,32]. A notable result dictates that when the signal x(t) = x t is generated by an underlying thermodynamic system, then the amount of time irreversibility of an (infinitely long, i.e. N → ∞) trajectory S is related to the amount of entropy that the underlying thermodynamic system is producing [22]. In particular, in the event that all active (i.e. out of equilibrium) degrees of freedom are characterised in the phase-space variable x, then the steady-state rate of entropy production σ tot is related to the time irreversibility of S via where k B is the Boltzmann constant and KLD(·||·) is the Kullback-Leibler divergence. Note that for any distributions Q and R, KLD(Q||R) defined by: quantifies their distinguishability since KLD(Q||R) = 0 if and only if Q and R are identical, and is positive otherwise. Note also that when the observable x does not fully incorporate all active degrees of freedom, the right hand side in Eq.1 is only a lower bound of the true entropy production rate. Moreover, in practice one usually cannot estimate the full hierarchy of m-grams, so any partial result (m < ∞) again provides a lower bound to σ tot . Also, when x is defined on a continuous support, it is customary to symbolize it. In the case when x is intrinsically discrete . . , v |V | } then a proper estimate of the rhs in Eq.1 can only be attained when the sequence size N |V | (exponentially larger): this is needed for collecting sufficient statistics on each m-gram. As we will show below, this latter observation is crucial in our context, as the musical compositions we consider here are relatively short symbolic sequences (N ranges from a few hundreds to a few thousand points), while the number of symbols (notes) present in each musical piece is typically of the same order of magnitude as N , thus making the direct empirical estimation of eq.1 ineffective. In what follows we introduce an alternative, graph-theoretic method [32] which we show circumvents these issues.

B. HVG-irreversibility
A time series of N points can be transformed into a so-called horizontal visibility graph (HVG) of N nodes via the so-called horizontal visibility algorithm [27,28]. This is a non-parametric method that enables the characterisation of time series and their underlying dynamics using combinatorics and graph theory.
Definition III.1 Let S = {x 1 , . . . , x N }, x i ∈ R be a real-valued scalar sequence of N data. Its horizontal visibility graph HVG(S ) is defined as an undirected graph of N vertices, where each vertex i ∈ {1, 2, . . . , N } is labelled in correspondence with the ordered datum x i . Hence x 1 is related to vertex i = 1, x 2 to vertex i = 2, and so on. Then, two vertices i, j (assume i < j without loss of generality) share an edge if and only if x k < inf(x i , x j ), ∀k : i < k < j.
HVG implements an ordering criterion which can be visualized in Figure 3 (see [27] for a convexity criterion that generates 'natural' visibility graphs instead). Visibility and Horizontal Visibility graphs were introduced in the context of time series analysis with the aims of using the tools of Graph Theory and Network Science [29] to describe the structure of time series and their underlying dynamics from a combinatorial perspective (for other proposals for graph-theoretical time series analysis, see [30,31]).
Among others, the concept of time series irreversibility has been recently explored within the context of visibility graphs [32][33][34]. In a nutshell, if the HVGs of S and S * have the same properties, then S is said to be HVGreversible, and the concept has been shown to be applicable both in stationary and non-stationary processes [33]. How is HVG-reversibility checked in practice? Since each node i = 1, . . . , N in the HVG is associated to a datum x i , i = 1, . . . , N in the graph, there is a natural node ordering associated to the arrow of time. Such ordering is therefore inherited by the degree sequence, which has a natural representation k = (k 1 , . . . , k N ), where k i is the degree of node i (i.e., the number of links adjacent to node i). Now, while the HVG is initially an undirected graph, it can be converted into a directed one by assigning a direction to each link in the HVG such that if i < j, then the link is i → j. Assigning a direction to each of the links splits the degree sequence k = k in + k out , where k in = (k in 1 , k in 2 , . . . , k in N ) is the in-degree sequence and k in i counts the number of links which are incident to node i, and respectively k out = (k out 1 , k out 2 , . . . , k out N ) where k out i counts the number of links that emanate from node i. Importantly, by construction one then has that the in-degree sequence of the HVG extracted from a given sequence Figure 3: Sample time series of N = 6 data and its associated Horizontal Visibility Graph (HVG) (see [32] for details). By assigning a temporal arrow to each of the links, the degree sequence splits into an in degree and an out degree sequence, such that flipping the time series (time reversal operation) is equivalent to interchanging the in and out degree sequences. Assessing time reversibility in the time series reduces in this context to spot differences in the statistics of the in and out degree sequences, such as comparing the in and out degree distributions.
is equal to the out-degree sequence of the HVG extracted from the time-reversed sequence, and therefore in order to assess time reversibility in S , one can simply explore the statistical differences between the in-degree sequence and the out-degree sequence in the HVG (see [32,33] for details and Fig. 3 for an illustration).
In [32] some of us proposed that the right hand side Eq.1 could actually be approximated by comparing the in and out degree distributions of HVG(S ), as these are the (m = 1)-point marginal distribution of the in and out degree sequences. Note that only needing to look at m = 1 statistics is a substantial difference with respect to the benchmark method based on comparing m-gram statistics of the time series, as in this latter case by construction the statistics of 1-grams are invariant under time reversal, and irreversibility can only be checked for m = 2 or higher. This resulting reduction will enhance our ability to effectively use the method in short sequences, as we will show below. Whereas originally the HVG method only looked at strings of size m = 1 (1-grams) in the degree sequence, one could of course further generalise this measure if it were needed to account for blocks of arbitrary size in the degree sequence, following the spirit of Eq.1. For a block size m, let us consider strings of size m within the in and out degree sequences such that with a little abuse of notation, k m in = (k in 1 , k in 2 , . . . , k in m ) and similarly for k m out . Additionally, let P in m (k) the marginal distribution of the in-degree sequence blocks k m in (respectively for P out m (k)). Then, we now define the HVG-irreversibility of order m as the Kullback-Leibler divergence between the in and out degree sequence blocks marginals This quantity is null if and only if the size-m blocks are equally distributed in the (infinitely long) degree sequence, and is positive otherwise.
To summarise, the procedure to evaluate HVG-reversibility in a time series S is as follows: • From S we construct the directed HVG, and subsequently extract the in and out degree sequences k in and k out . Note that for S * , k in and k out are interchanged. This procedure is applicable when S comes from both stationary and non-stationary processes alike.
• To quantify for HVG-reversibility (and HVG-entropy production), we make use of Eq. 2. Note that m = 1 is the smallest nontrivial case here, as 1-grams of the degree sequence already incorporate temporal directionality (when using m-grams of S vs S * , m = 2 is the simplest nontrivial case instead).

C. On the number of symbols
Once we have outlined the procedure to estimate the HVG-irreversibility, we now consider the problem of working with short experimental sequences. First, it is important to note that the in and out degrees typically take values Figure 4: (Left panel) Log-log plot of the minimum number of effective symbols needed to determine time irreversibility of a given musical composition, as a function of the size N of the composition, using our HVG-based method (red plus) and using a standard 2-gram comparison in the note sequence (black crosses). We can see that the number of symbols in the method based on the note sequence is systematically in the same order of magnitude of total size of the sequence, hence finite size effects are too strong and make the method unapplicable. The method based on the HVG requires a substantially smaller number of symbols, orders of magnitude smaller than the sequence size, hence making this method useful in this dataset. (Right panel) Histogram of the reduction factor ρ (see the text) which compares the effective number of symbols needed in the HVG irreversibility method with respect to a standard method based on n-gram statistics. When we average over all the musical pieces, the average number of symbols is reduced 17 times, and this reduction can reach up to 75 times in some cases. This strongly reduces the finite size effects and hence makes the HVG method useful to explore time irreversibility in short sequences.
from a small alphabet -systematically smaller than the original musical note alphabet as we will show below-, whose size increase at most logarithmically with N . This is because the probability that an arbitrary node in a HVG has a certain in and out degree k typically decays exponentially fast with k [35]. Moreoever, it is important to recall that temporal irreversibility can already be assessed for m = 1, that is, by only looking at the marginal distribution of the in and out degree sequences. Conversely, if we were to estimate time irreversibility directly on the note sequences, then we would at least need to consider strings of size m = 2 consecutive notes. The effective number of symbols needed is therefore much larger than in our HVG setting. In the left panel of figure 4 we plot the effective number of symbols required to assess time irreversibility using HVG (red plus sign) and using a simple 2-gram comparison of the forward and backward note sequence (black crosses). We see that in most of the cases, the number of note 2-grams is of the same order of magnitude than the size of the note sequence, hence making the standard approach useless. On the other hand, the approach based on HVG keeps the number of symbols needed to a bare minimum and is therefore useful in this context. To give an additional quantitative idea of the symbol reduction given by the HVG method, let us define a reduction factor ρ associated to a given musical piece as ρ = total number of strings of size m = 2 empirically found in the MIDI note sequence total number of different (in our out) degrees empirically found in the in and out degree sequence .
In figure 4 we plot the histogram P (ρ) estimated over our whole dataset. The average reduction is about 17, which means that our irreversibility method yields on average a 1700% reduction over the standard method based on counting the statistics directly on the note sequence. This implies that the finite size effects (subsampling) -which will appear due to the fact that the musical composition are not exponentially larger than the number of symbolswill be contained in the case of the HVG method, thus enabling its use in applications where time series are short, such as in musical compositions.
Having said that, while small, finite size effects are still expected to emerge in finite time series, and because of that, the Kullback-Leibler divergence will always be positive (vanishing for reversible processes only asymptotically), which poses some interpretability problems. In what follows we introduce a confidence index whose aim is to solve this interpretability issue.

D. Irreversibility ratio IRm: a confidence index
Let us define the m-order Irreversibility Ratio IR m by standardizing the net irreversibility measure KLD m (in||out) with respect to a null model where the original sequences are shuffled: where σ is the standard deviation. The process of standardizing is applied in order to be able to compare results across samples (musical pieces) with different sizes and marginal distributions, something which is recurrent in the musical compositions we analyse. More concretely, in the ideal situation of estimating irreversibility for infinitely long sequences, KLD m (in||out) vanishes if and only if P in m (k) = P out m (k), and therefore values different from zero would indicate that the sequence is statistically HVG-irreversible. However, in practice this is not so clear cut: the quantity KLD m (in||out) for a reversible process is only asymptotically null (vanishing as the sample size goes to infinity), and will be finite (yet small) for finite samples due to statistical deviations in the estimation of joint probability functions P (k in ) and P (k out ). By standardizing this quantity with respect to a suitable null model, one can quantify the effective deviation of a given finite sample from the expected value if that finite sample was generated by a truly reversible process. The null model is built by taking 200 randomizations of the sample sequence, and computing KLD m (in||out) on each randomized sample. Then Eq.3 measures the effective distance of a sample to its null model, in standard deviation (aka 'sigma') units. For instance, if IR m ≤ 1 this means that the irreversibility value of a sample finite sequence is not statistically distinguishable from a finite sample of the same size and marginal distribution extracted from a truly reversible process. That does not necessarily mean that the process is HVG-reversible, it only means that there is no statistical significance to assert otherwise. Similarly, if e.g. IR m = 2, this means that the HVG-irreversibility value of the sample is two standard deviations larger than the one expected for its (reversible) null model. Figure 5: Irreversibility ratios IR1 for three theoretically HVG-reversible processes (uniform white noise, unbiased random walk, pink noise) and one theoretically HVG-irreversible process (fully chaotic logistic map xn+1 = 4xn(1 − xn)). In every case we generate a time series of N = 10 4 data points and symbolize with a vocabulary of |V | = 100 symbols. Red crosses correspond to the empirical irreversibility value of order 1 of the time series KLD1(in||out), black dots correspond to the ensemble average irreversibility value KLD1(in||out) null model of 10 3 null models constructed by randomizing the empirical time series (see the text), and the brackets correspond to ± one standard deviation (note that the Y axis is in logarithmic scales). The irreversibility ratio of order 1 (see the text) decides whether the empirical time series is reversible (IR1 ≤ 1) or irreversible (IR1 > 1), and correctly predicts the true nature of the process. Interestingly, the empirical irreversibility value KLD1(in||out) of pink noise is notably larger than for white noise or random walk, which could mislead to the suggestion that pink noise is irreversible. The irreversibility ratio asserts that this is not the case, suggesting that the difference in the raw values is due to differences in the specific marginal distribution of the underlying processes, which only yield spurious effects on the determination of temporal irreversibility.
For illustration and validation, in Figure 5 we plot the raw HVG-irreversibility value KLD 1 (in||out) (red crosses) and IR 1 for time series extracted from four synthetic dynamical processes: (i) white noise, (ii) random walk, (iii) pink noise and (iv) a fully chaotic process. Process (i) is an (uncorrelated) uniform white noise. This is a stationary stochastic process with delta-like autocorrelation function, it is statistically reversible and HVG-reversible. The method correctly identifies this character as IR 1 < 1 in this case. Process (ii) is an unbiased discrete random walk x(t + 1) = x(t) + η, where η ∼ uniform(−1/2, 1/2). Interestingly, while this is a non-stationary process and thus could be seen as irreversible, it can be shown to be so-called HVG-stationary [33] and explored adequately within the HVG framework, as HVG-reversible with IR 1 < 1. This is indeed convenient as Brownian particles do not produce entropy on average, so in the context of HVG-reversibility, we recover the relation between reversibility and entropy production in this non-stationary process. Process (iii) is a linearly correlated noise with a 1/f spectrum. This is again, by definition, a time reversible process, and is a paradigmatic (stationary) stochastic process to describe music from the pioneering works of Voss and Clarke [1,2]. Also in this case we correctly detect the reversible character as IR 1 < 1. Finally, process (iv) is a deterministic chaotic process generated by a fully chaotic logistic map x t+1 = 4x t (1 − x t ). This process is dissipative and statistically time irreversible (and HVG-irreversible), as certified by IR 1 > 1.
Another important aspect is to understand how IR m is affected by the time series length N . Intuitively, if the underlying process is HVG-reversible, then KLD 1 (in||out) should be similar to its null model (within its uncertainty) and both measures should be decaying with the same trend as N increases, and therefore one should expect IR m < 1, independently of N . If, on the other hand, the underlying process is HVG-irreversible, then KLD 1 (in||out) should be systematically larger than zero and remain positive as we increase N . Since the null model of this process is HVG-reversible, its irreversibility value should decrease as N is increased, we thus expect IR m effectively to increase with N without bounds. This only means that for an HVG-irreversible process, whereas for short time series the distinguishability is small (IR m close to 1), as we increase the series size it is systematically easier to ascertain that the series (and process) is HVG-irreversible. In some sense, IR m is therefore a measure of confidence. We illustrate this dependence on series size in Fig 6, where in panels (a,b) we consider a truly reversible process (white noise x t = η, η ∼ Uniform(0, 1)) and in panels (c,d) we consider a truly irreversible process (a fully chaotic logistic map x t+1 = 4x t (1 − x t )). Results indeed confirm our discussion above.
Once we have illustrated how the irreversibility ratio works, let us introduce a classification of net HVG-irreversibility for a given musical piece as it follows: Definition III.2 A musical piece is defined as HVG-reversible (or simply reversible) if IR 1 ≤ 1. If 1 < IR 1 ≤ 4 we say that the process is HVG-irreversible with weak confidence. If 4 < IR 1 ≤ 10, we say that it is HVG-irreversible with strong confidence, and if IR 1 > 10 we say that the musical piece is HVG-irreversible with extreme confidence.
A similar classification can be defined for higher orders m > 1, however this is not needed for this work since IR 1 shows strong correlation with IR m>1 for most of the musical pieces considered, hence it will be enough to concentrate our analysis on m = 1. Incidentally, note that since blocks are extracted from the HVG's degree sequence (not directly from the sequence of notes), in principle the HVG-irreversibility measure at m = 1 already gathers temporal information of different time scales.
Summing up, we have shown that the concept of HVG-reversibility is better suited than the standard concept of statistical time reversibility to investigate the arrow of time in stationary and nonstationary processes [33], and HVG is a useful approach when time series are short. We can safely conclude that IR m is a measure that quantifies our certainty that the time series under study was generated by an HVG-irreversible process. To compute such confidence, the size of the series (and the resulting finite-size effects) and the specific shape of the marginal distribution of the signal must be taken into account in order to provide a quantifier which is not affected by these variables. On the other hand, and once the analysis based on IR m leads us to conclude that the process is indeed HVG-irreversible, we will then use KLD m (in||out) as a bound for the true (thermodynamic) entropy production rate of the process. In the next sections we will compute HVG-reversibility metrics, but in order to lower down verbosity we will refer to it indistinctively as either HVG-reversibility or just reversibility.

A. Irreversibility
To start, we have explored the confidence of time series irreversibility -as quantified by IR m -for all the pieces considered in this work. In the left panel of figure 7 we depict in semi-log scales the normalised histograms P (IR m ) Figure 6: (a) Raw order-1 irreversibility value KLD1(in||out) of a time series of size N generated by a reversible white noise process xt = η, η ∼ Uniform(0,1) (red crosses). For comparison, the results of a null model (mean ± one standard deviation) where we randomize the original time series 200 times and compute the irreversibility value is shown in black solid circles. The raw irreversibility value decreases without bounds in a fashion similar to the null model, certifying that positive irreversibility values are here only due to finite size effects which vanish as N increases. (b) Order-1 irreversibility ratio IR1 for the process depicted in panel (a), certifying that the time series is not distinguishable from a reversible process for any time series size. (c) Similar to (a), for a time series of size N generated by an irreversible, fully chaotic logistic map xt+1 = 4xt(1 − xt) (red crosses). The raw irreversibility value is always positive and stabilizes for N > 10 3 . The null model is by definition reversible, and its irreversibility value is only positive due to finite size effects, hence decreases as N increases. Distinguishability therefore increases with N . (d) Similar to (b), for the chaotic case, certifying that the process is irreversible and that the level of confidence increases without bounds. for m = 1, 2, 3. The orange area denotes the region IR m > 1, showing that a large percentage of pieces are irreversible (HVG-irreversible) at all m orders, with varying degree of confidence. A second observation is that the histograms for m = 2 and m = 3 are indeed very similar. In order to further understand to which extent the three measures IR 1 , IR 2 and IR 3 are correlated, we have computed the Pearson correlation coefficient between IR m and IR m−1 (see appendix figure 14), showing that indeed we find a strong correlation between all of them. The interpretation of this finding is two-fold: we can first argue that, in the context of classical music and the database analysed in this work, higher-order irreversibility is irrelevant, and all the structure can be efficiently captured by order-1 HVG-reversibility. Second, this also means that from now on we can safely focus our analysis on m = 1, which is notably faster to estimate. The abundance of each irreversibility confidence class is depicted in the middle panel of figure 7, showing that all four classes have a notable representation, where only about 30% of the whole dataset complies with a reversible structure. In the right panel of the same figure we have plotted in semi-log the normalized histogram of KLD 1 (in||out), for all pieces (black curve) and only for those pieces which have previously been certified to be irreversible (IR 1 > 1). In both cases, values mostly concentrate in the interval [0, 1], and most of the pieces with high irreversibility value (e.g. KLD 1 (in||out) > 0.1) correspond to those pieces previously checked to be indeed irreversible according to the irreversibility ratio criterion. Distributions decay rapidly in both cases, highlighting that this measure is highly concentrated towards the left end of the spectrum.  We can now concentrate on the subset of pieces which are certified to be irreversible (IR 1 > 1), and we can rank both composers and pieces according to their net irreversibility value. For composers, this ranking is shown in table I. Since irreversibility is linked with entropy production, we could provocatively say that this is a ranking of the composers which, on average, have a 'more out of equilibrium' compositional process, i.e. the composers whose compositional style dissipates more energy and on average produce more entropy accordingly. A similar ranking for the most irreversible pieces is depicted in  In order to explore the evolution of irreversibility over different periods, in Fig. 8 we plot the values of IR 1 (top) and KLD 1 (in||out) (bottom) for all pieces as a function of the date of birth of the composer of the piece. Interestingly, the irreversibility ratios seem to fluctuate in a non-random way. To highlight such modulation, in panel (b) of figure

B. A nonlinearity index ξ
To complement the irreversibility analysis of musical compositions, in a second step we consider the temporal arrangement of note sequences within each piece. While it has been extensively certified that music evidences longrange temporal correlations with (typically) a heavy tailed power spectrum [1,3,7,8], less is known about nonlinear correlations. However, recent studies have reported evidence of nonlinear correlations in musical pieces and discussed on their possible relevance in their structure [20].
In order to assess the amount of nonlinear temporal correlations, we define a nonlinearity index ξ inspired in the index previously introduced in [39] by computing the significance (and the amount of nonlinearity) in the Magnitude Detrended Fluctuation Analysis (MDFA) of each musical piece and its Fourier-fixed surrogates (null model with linear correlations). Basically for the calculation of ξ we compare both MDFA computations (original and surrogates) in terms of the local slopes of a fitting polynomial (ŷ) for the function F (s)/s (see Appendix B for details): where N ws is the total number of windows of size s,ŷ (x i ) is the first derivative of the polynomial evaluated at the ith window of size (x i = log(s i )) and ŷ s (x i ) sur and σ(ŷ s (x i )) sur represent the mean and variance of the slopes in the ensemble of surrogates at the ith window size respectively (see Appendix B for more details). Surrogates were generated with the Iterative Amplitude Adjusted Fourier Transform (IAAFT) algorithm [38,42], preserving the marginal distribution and the power spectrum of the original piece. By construction, ξ ≤ 1 would indicate that the signal only evidences (at most) linear correlations, whereas if ξ > 1 then the signal has correlations of nonlinear nature (not reflected in the power spectrum), and the larger ξ the stronger they are [42]. In Figure 9 we measure the nonlinearity index ξ for all the pieces considered in the database. The left panel displays its frequency histogram, certifying that indeed a large majority of pieces display a high nonlinearity index. In the right panel of the same figure we plot ξ as a function of the piece composer's date of birth. Notably, we find that a substantial amount of all musical compositions considered display different degrees of nonlinearity, and the similarity of this panel with panels (a) and (d) in Figure 8 is suggestive. Note that though evidence for different profiles of nonlinear correlations in music scores has been reported previously [20], a specific index such as ξ, able to quantify the amount of nonlinearity in a signal, was lacking.

C. Irreversibility vs nonlinearity
In order to understand and link the concept of time irreversibility to nonlinearity, we have investigated to which extent the irreversible character holds when the pieces keep their linear correlations structure but are randomized otherwise. For each piece we have therefore constructed a surrogate piece where the linear correlation structure (power spectrum) is maintained by applying the same technique to the one described previously (IAAFT surrogates), and we then compared IR 1 in both cases. In the left panel of figure 10 we compare the histograms of IR 1 for all pieces and for all surrogate pieces. Interestingly, in the case of the surrogates a large percentage of the pieces now display HVG-reversibility (see the middle and right panel of the same figure for a graphical demonstration of the irreversibility loss induced by surrogating the signals). This result can be explained as follows: Gaussian linear processes with a prescribed power spectrum are indeed reversible. We can understand surrogate pieces as stochastic processes with a prescribed power spectrum (the same as the original piece) but no additional temporal correlation kernels beyond the linear one, and therefore by construction surrogates should be time reversible. All in all, these results point to the fact that time series irreversibility has a connection to nonlinearity. Such connection is reinforced by the similarity between the right panel of Figure 9 (nonlinearity index ξ) and panels (a) and (d) of Figure 8 where we display the values of IR 1 and KLD 1 (in||out) for all pieces as a function of the date of birth of the composer of the piece. Since the method we use to estimate the amount and significance of nonlinearity has been proved to depend on the length of the series [43] (see appendix B), for the estimation of the statistical dependence between nonlinearity and any other property we use the mean value for the local slopes in the MDFA function, which is also related with the amount of nonlinearity and is less dependent of the length of the series [39] (see Appendix B): whereŷ(x) is the polynomial fitted for log(F (s)/s) used for the calculation of the index ξ (quantities are determined before the significance test in equation 4). In order to quantify the apparent correlation between nonlinearity and irreversibility, we have computed the mutual information between the amount of nonlinearity proxy ŷ (x) and KLD 1 (in||out) for all pieces and also for the subset of irreversible pieces, using the discrete mutual information between two random variables X, Y we define a mutual information confidence index, in the same fashion as nonlinearity and irreversibility indexes, by substracting the average MI null and dividing by the standard deviation σ(MI) null of a null model: where the null model is generated random shuffling the elements of one of the variables. The MI indexes are computed by sampling 1000 realizations of the null model. We chose this statistical dependence measure because it captures all kind of correlations (linear and nonlinear) that would be relevant for this case, in contrast with other measures that assume linear dependencies (e.g. Pearson or Spearman coefficients).
Results of MI index are plotted in figure 11. We conclude that HVG-irreversibility is indeed intimately related in musical compositions with the presence of nonlinear correlations in the signal. Noting that HVG-reversibility is a proxy for entropy production, this relation manifests a link between a physical concept (dissipation and entropy production) and a statistical one (nonlinear temporal correlations), hence giving a physical interpretation to the latter.

D. Interval asymmetry
With the aim of linking the patterns observed in terms of irreversibility and nonlinearity with a quantity of musical significance, we finally consider the statistics of intervals. An interval is defined as the distance between two consecutive notes in a musical piece. For a given sequence of N notes (n t ) N t=1 , one can define its respective sequence of intervals as (i t ) N −1 t=1 , with i t = n t+1 − n t , whose properties have been previously analysed [50][51][52][53][54]. One particular known result is that small intervals are predominantly descending while large ones are typically ascending [49]. This property introduces an asymmetry in the distribution of intervals which, intuitively, would contribute to the heterogeneity of the joint distributions of consecutive notes in melodic sequences. In order to investigate the relation between interval asymmetry, irreversibility and nonlinearity, we first look at the interval distribution of the set of pieces we studied. We only consider intervals shorter or equal to an octave (12 semitones) since larger intervals are less frequent. Figure 12 displays the interval distributions observed in the complete set of original pieces (left panel) and for the ensemble of surrogates generated previously in the nonlinearity test (right panel). The most evident difference between both distributions is the frequency of the zero interval (when the note keeps the same value), which is lower for the original pieces. The claim of the difference between ascending and descending intervals holds in our data, the inverals −4, −3, −2, −1 are more frequent than 4, 3, 2, 1 respectively, whereas the interval 5 is more frequent than −5. However, it is not clear that for larger intervals the claim holds. Since the interval distributions are computed for the whole corpus (over 8000 pieces) and not for individual pieces we cannot relate directly this distribution asymmetry with irreversibility. To explore the interval statistics on individual pieces and the possible relation of the asymmetry with nonlinearity and irreversibility we measure the difference between positive (ascending) and negative (descending) intervals (D ↑↓ ) for a given piece: where I ↓ is the number of negative intervals (when i t < 0) and I ↑ the number of positive ones (i t > 0). If the interval distribution of a piece is symmetric (same number of ascending and descending intervals) then the difference D ↑↓ = 0 and if there is only one direction in the melody (ascending or descending) the difference would be D ↑↓ = 1. Results for the interval difference (D ↑↓ ) are shown in fig 13, where D ↑↓ is determined for all the original pieces and for their ensemble of surrogates. First panel (left) shows the distributions for the values of interval difference (D ↑↓ ), the second panel (right) is a plot of the average value of D ↑↓ for each composer. We systematically find a strong interval assymmetry in a large part of the musical pieces. According to the Mutual Information index (figure 11) D ↑↓ indeed strongly correlates with the HVG-irreversibility metric KLD 1 (in||out), and to a lesser extent with the nonlinearity metric ŷ (x) , thus concluding that the irreversibility and nonlinearity traits observed in musical compositions can indeed be narrowed down to musical concepts.

V. DISCUSSION
In this work we have made use of tools from statistical physics, nonlinear dynamics and graph theory to characterise music scores beyond the linear correlation paradigm provided by the power spectrum, in terms of time irreversibility, nonlinear temporal correlations and asymmetric distribution of note intervals. All of these properties intervene in what could be called a "musical narrative", the flow of a composition. We have established different levels of correlations amongst these quantifiers, some of which invite to reflection. For instance, the finding that in musical compositions nonlinear traits show a considerable correlation with irreversibility, which is a natural hallmark of time directionality, unravels the unforeseen notion that nonlinearity is related to a preferential time arrow. By means of our interval asymmetry exploration we have further found evidence which suggests that -at least some of-the amount of irreversibility present in musical compositions can be explained in the light of such interval asymmetry. Furthermore, its ensuing connection to nonlinearity gives new insight. It is enticing to associate the idea of irreversibility to structure in the scores at multiple time scales, intercalated with bursts of elements of surprise, it may well be that the degree of irreversibility is linked to the balance between these two factors.
Interestingly, the concepts of irreversibility, directionality and their musical implications have been qualitatively evoked in music theory under multiple forms. For instance, temporal directionality has been established in terms of irreversible relations of before and after [44]. According to Hastey [44], for some theorists "directionality arises in the compositions only through our ability to predict the future course of events. In tonal music a leading tone or passing dissonance implies an expected resolution. Such expectations whether realized or not, constitute in our imagination goals toward which the music is directed". On the other hand, irreversibility was one of the main concerns of musicians such as Anton Webern, one of the exponents of atonality and serialism. Similarly, alegoric relations to thermodynamics and information theory have been previously explored by musicians e.g. Iannis Xenakis, father of stochastic music [45]. Our approach provides a novel quantitative description of these concepts. In other works, the complexity of music has often been related to linear temporal correlations, using 1/f noise as a paradigm offering a balance between predictability and surprise and justifying its enjoyment. However, in our study we show that preserving the exact same power spectrum is not enough to preserve properties such as irreversibility (Fig.10), interval asymmetry (Fig.13) or nonlinearity (by construction). The fact that these properties are pervasive across five centuries of classical music, together with the observation that such emergence is nontrivial (for instance Gaussian linear stochastic processes are indeed statistically time reversible), challenge the traditional link between pleasantness and linear correlations in music. Furthermore, these three properties show to be statistically related, a result that points to a deep relation between directionality, dissipation, and nonlinearity and their possible relations with the pleasantness in music, a dimension that might be instrumental for the study of perception, music appreciation and cognition.
Summing up, while the statistical properties underlying musical compositions have mainly focused on the presence of linear correlations in the signal (again, the 1/f noise paradigm), here we show that classical music compositions over five centuries, encompassing pieces from 77 composers from the Renaissance up to the early modern period, display strong nonlinear correlations. We have certified that such nonlinearities are indeed strongly related to an adequate definition of statistical time irreversibility (HVG-irreversibility), able to quantify the statistical arrow of time in stationary and nonstationary signals alike and well-defined to handle short sequences. Since HVG-reversibility -as quantified by the Kullback-Leibler divergence between the in and out order-m degree distributions of the signal's horizontal visibility graphs-is a proxy for the amount of thermodynamic entropy produced by a physical signal, exploring time irreversibility in musical compositions allows us to quantify the process of composition in out-of-thermodynamic-equilibrium' terms. We indeed find that over two thirds of the compositions display this signature. Our study of the value of KLD 1 (in||out), which is independent of the piece size, shows only small variations over musical periods and thus leads us to conclude that this is a common trait of tonal music.
This work should be taken as a first step of a more in depth, inclusive research program. The three main elements of music are rhythm, melody and harmony [46]. While here we have only addressed melody, we expect the integration of rhythm and harmony in our line of research to be enlightening. The update of our study in order to include the advent of atonal-dodecaphonic, serial, stochastic, concrete and spectral music, amongst others, is another fascinating open challenge. x(i), i = 1, ..., N , the standard DFA method consists in the following steps: 1) the original signal is integrated , where x denotes its average value, 2) the integrated time series is then divided into nonoverlapping windows of size s. 3) Each data segment of length s-size is then fitted using a polynomial y m (j) of degree m. 4) Next, the root-mean-square fluctuation from the polynomial, F (s), is calculated: The procedure is repeated by varying s such that the fluctuation function is obtained in terms of the segment length, which represents the time scale where correlations might be present. When auto-correlations scale like a power law, the rms fluctuation function F (s) behaves as F (s) ∼ s α , where α is the Hurst exponent. A value of α > 0.5 indicates the presence of persistent correlations, e.g. α = 1 is the case for 1/f noise. On the other hand, a value of 0 < α < 0.5 corresponds to anti-correlations and α = 0.5 to white noise The magnitude Detrended Fluctuation Analysis (MDFA) introduced by Ashkenazy et al [39] is a method capable to detect the presence of nonlinear correlations in a time series. This method can be summarized by the following recipe: 1) for a given time series x(i) the increment series is defined as ∆x(i) ≡ x(i + 1) − x(i), 2) the increment series is decomposed into a magnitude series and sign series: ∆x(i) = sgn(∆x(i)) | ∆x(i) |, their respective means are subtracted to avoid artificial trends, 3) because of the limitations of the DFA method for estimating α < 0.5 (anti-correlated series), the magnitude and sign series are integrated first to make sure they are positively correlated. 4) The DFA method is implemented on the integrated magnitude and sign series. 5) In order to obtain the respective scaling exponents, the function F (s)/s is estimated, the 1/s factor is to compensate the integration made before. If the data obey a scaling law, the fluctuation function should behave as F (s)/s ∼ s α−1 . It has been shown that the magnitude series (| ∆x(i) |) is the one that carries information regarding nonlinear correlations in the original time series [40].
To evaluate the evidence and amount of nonlinear correlations it is necessary to compare the MDFA results with the appropriate surrogate data results, these surrogates preserve the linear correlations of the original time series but lack of any possible nonlinear correlations. We generate 20 surrogates for each piece with the Iterative Amplitude Adjusted Fourier Transform (IAAFT) algorithm [38,42]. In panels a and b from figure 15 the results for both original piece (red crosses) and surrogates (shaded area) are shown, the diference of the original data from the surrogates in panel b is evidence for the presence of nonlinear correlations. However, the functions F (s) and F (s)/s do not necessary follow a power law behavior. To be able to quantify the amount of nonlinearity in the time series we define a nonlinearity index ξ, given by comparing the scaling behavior of the original data and its surrogates. The proper comparison should be given by the slope of the original data and the slopes of the null model [39]. In our case, instead of having a single scaling we have different regions with different scalings. In order to compare both scaling behaviors we first fit a polynomialŷ to the original MDFA data and to each of its surrogates (ŷ s ), evaluate the first derivative ofŷ (slopes) at each point (log(s)) in the MDFA and compute the index ξ as follows: where N ws is the total number of window sizes,ŷ (x i ) is the slope the polynomial evaluated at the ith window size (x i = log(s i )) and ŷ s (x i ) sur and σ(ŷ s (x i )) sur represent the mean and variance of the slopes in the ensemble of surrogates at the ith window size respectively. By construction, ξ ≤ 1 would indicate that the signal only evidences Figure 16: Effect of piece length on the nonlinearity index ξ, mean local slope ŷ (x) and asymmetry D ↑↓ .
(at most) linear correlations, whereas if ξ > 1 the signal has correlations of nonlinear nature (not reflected in the power spectrum), and the larger ξ the stronger. Figure 17: Effect of piece length on IR1 and KLD1(in||out) As shown by R 2 , there are subtle correlations of the irreversibility ratio and irreversibility value with piece length (panels left and middle), and this correlation is removed (right panel) if we only consider the irreversibility value KLD1(in||out) the pieces which have been previously certified to be irreversible (IR1 > 1).
Appendix C: Dependencies of IR1 and KLD1(in||out) with series size According to the theoretical analysis conducted in Fig.6, let us assume that a time series of size N is generated by a certain dynamical process. Then we have that • if the process is reversible, then IR 1 is systematically below 1 and is insensitive to series size N and on the other hand KLD 1 (in||out) decreases with series size.
• if the process is irreversible, then KLD 1 (in||out) is reasonably stable and insensitive to series size N , whereas IR 1 is expected to grow with series size.
Let us consider now the musical pieces. Each of them is composed by different composers (so a priori by a possibly different 'dynamical process'), and each of them has different length, spanning from dozens to thousands of notes. Comparison across composers and pieces is therefore difficult. In Fig.17 we depict the values of IR 1 and KLD 1 (in||out) as a function of the piece length N for all pieces in the dataset (left and middle panels) and only for those pieces that have been certified as irreversible, i.e. those for which IR 1 > 1. Panels are in log-log to have a better visualisation of all the points. We can see in the left panel a small increasing trend: this is indeed related to all those pieces which are irreversible, whose irreversibility ratio IR 1 tends to be larger for larger time series. Similarly, in the middle panel we can appreciate a subtle decreasing trend: this is indeed related to all those pieces which are reversible, whose reversibility value KLD 1 (in||out) tends to be smaller for larger time series. In the right where panel there is no trend