Information content of note transitions in the music of J. S. Bach

Music has a complex structure that expresses emotion and conveys information. Humans process that information through imperfect cognitive instruments that produce a gestalt, smeared version of reality. How can we quantify the information contained in a piece of music? Further, what is the information inferred by a human, and how does that relate to (and differ from) the true structure of a piece? To tackle these questions quantitatively, we present a framework to study the information conveyed in a musical piece by constructing and analyzing networks formed by notes (nodes) and their transitions (edges). Using this framework, we analyze music composed by J. S. Bach through the lens of network science and information theory. Regarded as one of the greatest composers in the Western music tradition, Bach's work is highly mathematically structured and spans a wide range of compositional forms, such as fugues and choral pieces. Conceptualizing each composition as a network of note transitions, we quantify the information contained in each piece and find that different kinds of compositions can be grouped together according to their information content and network structure. Moreover, we find that the music networks communicate large amounts of information while maintaining small deviations of the inferred network from the true network, suggesting that they are structured for efficient communication of information. We probe the network structures that enable this rapid and efficient communication of information--namely, high heterogeneity and strong clustering. Taken together, our findings shed new light on the information and network properties of Bach's compositions. More generally, our framework serves as a stepping stone for exploring musical complexities, creativity and the structure of information in a range of complex systems.

Music has a complex structure that expresses emotion and conveys information.Humans process that information through imperfect cognitive instruments that produce a gestalt, smeared version of reality.How can we quantify the information contained in a piece of music?Further, what is the information inferred by a human, and how does that relate to (and differ from) the true structure of a piece?To tackle these questions quantitatively, we present a framework to study the information conveyed in a musical piece by constructing and analyzing networks formed by notes (nodes) and their transitions (edges).Using this framework, we analyze music composed by J. S. Bach through the lens of network science, information theory and statistical physics.Regarded as one of the greatest composers in the Western music tradition, Bach's work is highly mathematically structured and spans a wide range of compositional forms, such as fugues and choral pieces.Conceptualizing each composition as a network of note transitions, we quantify the information contained in each piece and find that different kinds of compositions can be grouped together according to their information content and network structure.Moreover, we find that the music networks communicate large amounts of information while maintaining small deviations of the inferred network from the true network, suggesting that they are structured for efficient communication of information.We probe the network structures that enable this rapid and efficient communication of informationnamely, high heterogeneity and strong clustering.Taken together, our findings shed new light on the information and network properties of Bach's compositions.More generally, our simple framework serves as a stepping stone for exploring further musical complexities, creativity and questions therein.We expect this framework to have broad applicability in understanding how information is structured in a range of complex systems.

I. INTRODUCTION
From Tibetan throat singing to Scottish piobaireachd to modern hip hop, music is a universal aspect of human culture, enjoyed by people of all ages from all around the world.It has even been proposed that music is a fundamental part of being human [1].Though styles, sounds, and instruments vary drastically from one culture and time period to another, it is indisputable that music has had a substantial impact on the development of humans and society [2,3].Through music we can tell stories [4], convey messages [5], and imbue the strongest of emotions [6][7][8].It is a common human experience to feel pensive or despondent after hearing a slow song in a minor key or to feel carefree or energized after hearing an upbeat song in a major key.But how does something as abstract as music communicate so much?Past literature has discussed music in terms of expectation and surprise [9][10][11].In order to be evolutionarily successful, our brains are adept at forming expectations based on prior events.When these expectations are contradicted by an experience, we feel surprised.With surprise can come a host of other emotions: we may feel relief when the dissonant sound we expected was actually consonant, or we may feel distress when the musical resolution we expected did not occur [12].But how do we quantify these expectations and surprises?How do we mathematically formalize and measure the information conveyed by a piece of music?Fundamentally, music is comprised of fleeting and elusive sounds, and hence may appear hard to measure.
Here, we seek to extract order from music's complexity by examining music through the lens of network science.A network consists of nodes and edges-representing en-tities and the connections between them, respectively.Conceptualizing each note as a node and each transition between two notes as an edge, we can build a network for any piece of music [13][14][15][16][17].This representation enables us to use physics-based approaches to quantitatively analyze aspects of a musical piece.Using music networks, we build a framework to study the information conveyed by a piece and apply this framework to provide a comprehensive analysis of Bach's compositions.Bach is a natural case study given his prolific career, the wide appreciation his compositions have garnered, and the influence he had over contemporaneous and subsequent composers.His diverse compositions (from chorales to fugues) for a wide range of musicians (from singers to orchestra members) often share a fundamental underlying structure of repeated-and almost mathematicalmusical themes and motifs.These features of Bach's compositions make them particularly interesting to study using a mathematical framework.
As we listen to music, we form expectations.Upon hearing a particular note, we anticipate which notes might come next based on past transitions.The less likely the outcome, the more surprised we are upon hearing it.This "suprisal" can be quantified by the Shannon information entropy [18].Ideas from information theory have led to illuminating insights in a wide range of settings, including language [19,20], social networks [21,22], transportation patterns [23] and music [24,25].We draw upon these ideas to quantify the information present in the music networks.While prior research has attempted to quantitatively identify patterns and features present across different kinds of music [15,[26][27][28], understanding how humans perceive these patterns is more nuanced and complex than simply evaluating the structure of compositions because humans are not perfect learners.Rather, studies have consistently found that humans assimilate patterns of information presented to them through imperfect perceptual systems, resulting in slightly inaccurate representations of transition structures [29][30][31][32].This observation raises interesting questions about the information that is perceived by a human; in particular, how does the inferred structure relate to, and differ from, the true structure of a musical piece?Further, are there any patterns in music that particularly shine through the messy process of human perception and if so, how do these patterns vary across different kinds of music?While these questions are nuanced and can depend on factors like training, recent advances in the study of how humans learn networks of information offer a valuable framework to address these questions [31,[33][34][35].
Here, we draw upon ideas from network science, information theory and cognitive science to build a framework to investigate the information conveyed by music.We then use this framework to provide a systematic analysis of music composed by J. S. Bach.We begin in Sec.II with a discussion of how music can be represented as a network along with details of the compositions analyzed in our work.Next, in Sec.III, we study the informa-tion present in the networks.We find that Bach's music networks contain more information than expected from typical (or random) transition structures.Strikingly, we also find that certain composition forms are clustered together based on their information content.We investigate how the network structure influences information content, and show that the higher information in these music networks and the differences observed across musical pieces within each compositional form can be explained by the heterogeneity in node degrees (or the number of distinct pitches that follow a given note).Next, in Sec.V, we use a maximum-entropy model for how humans perceive networks of information to examine how closely the inferred transition structure of a piece aligns with the true network structure.We hypothesize that the music networks maintain a low deviation between the inferred and true network, and this property is driven by tight clustering in the network.Additionally, we find that certain compositional forms can be distinguished based on the discrepancies between the original and the inferred network.Together, our framework introduces a fresh perspective on music, and sheds new light on properties of Bach's music.By performing a systematic study of how information in a complex system, like music, is structured and perceived by humans, our work provides insights on human creativity and how humans experience the world around them.Our study also opens up numerous interesting directions for further inquiry, which we discuss in Sec.VII.

II. MUSIC AS A NETWORK OF NOTE TRANSITIONS
We note that there have been previous efforts in constructing and analyzing different network representations of music [13][14][15][16][17].In our study, we focus on investigating the information conveyed by note transitions in music and begin with a basic representation of the note transitions.We study a wide range of Bach's compositions including: preludes, fugues, inventions, cantatas, English suites, French suites, chorales, Brandenburg concertos, toccatas, and concertos.The audio files for these pieces were collected and read in MIDI format, from which the sequence of notes was extracted (see Methods section A 1 for further details on each compositional type and the sources for each piece).Each note present in a piece is represented as a node in the network, with notes from different octaves represented as distinct nodes.The transitions between notes are calculated separately for different instruments.If there is a transition from note i to note j, then we draw a directed edge from node i to node j (see Fig. 1).For chords, where multiple notes occur at the same time, edges are drawn between all notes in the first chord to all notes in the second chord.To simplify our analysis, we remove any self loops in the network, thereby restricting ourselves to understanding the structure of transitions to the next different note in the FIG. 1.An example of a network constructed from a musical piece using the method described in our paper.At the top, we show a toy musical piece.Below, we show the network in which notes are nodes and transitions between notes, whether isolated or played simultaneously as part of a chord, are directed edges.The direction of the edge matches the temporal direction of the transition.
piece.We begin by examining unweighted networks of note transitions to focus on how the network structure alone impacts the information content and perception of a musical piece.After understanding the skeleton of the transitions, we then add weights to the edges based on how frequently various transitions occur.This procedure allows us to disentangle the effects of the network structure (comprising the set of possible note transitions) and edge weights (comprising the note transition probabilities).Although our emphasis has been on building a basic representation of the note transitions present in a musical piece, it is important to highlight the potential to extend this representation to capture other essential aspects of music.We expand on how future efforts could incorporate more musical realism and complexity in Sec.VII.

III. QUANTIFYING THE INFORMATION IN NETWORKS
We seek to measure the amount of information produced by a sequence of notes.Although note sequences can have long-range temporal dependencies [36,37] and higher order structure [38,39], as a first analytical step, we focus on the Markov transition structure.That is, we study the information contained in individual note transitions.This information is quantified by the Shannon entropy of a random walk on the network [18,40] (Fig. 2; see also the Methods section A 2 for further details).Given a network of transitions, the contribution of the i th node to the entropy can be written in terms of the entries of the transition probability matrix P as: (1) In the case of directed unweighted networks, P ij = 1/k out i , where k out i is the out-degree of the node.Hence, for unweighted networks, the node-level entropy is S i = log (k out i ), which is solely determined by the out-degree.To calculate the entropy of the entire network, the contributions of the nodes are weighted by their stationary distribution-the probability that a walker ends up at node i after infinite time-which we denote by π i [40].The entropy of the network is then: ( For undirected and unweighted networks, the stationary distribution has a simple analytical form 2. The model of information production using random walks.(a) An example of a random walk on the network of note transitions is shown using the blue dotted line.At each node, the walker chooses an outgoing edge to traverse, each weighted with equal probability.This walk generates a sequence of notes as shown below.(b) The amount of information, or the entropy, generated when a walker traverses an edge from a node depends on the degree of the node.When traversing nodes with a high versus low degree, the walker has more choices for which edge to pick and hence, such a transition generates more information.Thus, nodes with a higher degree (right) are said to have higher entropy than nodes with a low degree (left).(c) To calculate the entropy of the entire network, one needs to weigh the contribution of each node by the probability that a walker will occupy it.For networks with the same average degree, those with a wider range of degrees (right) have a higher entropy than those with a narrower range of degrees (left).
where k i is the degree of node i, and E is the total number of edges.The network entropy is then: By contrast, for directed networks the stationary distribution depends on the detailed structure of the network and cannot be written in closed form.Hence, for our directed music networks, we calculate the stationary distribution numerically and use Eq. 2 to compute the entropy of each piece.
To understand the amount of information produced by the music networks, we compare them to randomized (or "null") networks of the same size; that is, networks with the same number of nodes and edges (see the Methods section A 5 for details on generating null networks).This helps develop an intuition for the amount of information that networks of the same size typically contain.If the note transitions in the music networks do have distinct properties that allow them to communicate a large amount of information, then we would expect Bach's networks to contain more information than the null transition structures.By averaging over 100 random networks for each piece, we find that the real networks generally have consistently higher entropy-thereby containing more information-than their random counterparts (Fig. 3A).Moreover, by comparing across pieces, we observe that the different kinds of compositions cluster together based on their entropy.The chorales, typically meant to be sung by groups in ecclesiastical settings, are shorter and simpler diatonic pieces that display a markedly lower entropy than the rest of the compositions studied.By contrast, the toccatas, characterized by more complex chromatic sections that span a wider melodic range, have a much higher entropy.It is possible that the chorales' functions of meditation, adoration, and supplication are best supported by predictability and hence low entropy, whereas the entertainment functions of the toccatas and preludes are best supported by unpredictability and hence high entropy.
We know that the node-level entropy is defined only by the out-degrees of the nodes.Accordingly, it is useful to assess differences between the true networks and others wherein the node-level entropies have been fixed by preserving the true degree distribution.To perform this assessment, we compare the entropy of the real networks with another set of null models: randomized networks which preserve both the in-and out-degree of each node (see the Methods section A 5 for details on generating these networks).We observe that the entropies of the networks are more or less preserved (see Fig. 3B).Although this preservation is expected for undirected networks (where the entropy is determined only by the degree distribution), it need not exist for directed networks (where the different stationary distributions contribute to the entropy).We therefore find that the entropy of music networks is primarily determined by their degree distributions rather than their stationary distributions.
To gain intuition for how the entropy of note transitions depends on network structure, consider the case of unweighted and undirected networks.The network entropy takes a particularly simple form, as shown in Eq. 3. Following a Taylor expansion around the average degree of the network (see the Methods section A 2), one obtains: where ⟨k⟩ is the average degree of the network and Var(k) is the variance of the degrees.To first order, we see that the entropy increases logarithmically with the average degree of the network.To second order, the entropy increases with the variance or the heterogeneity of the degrees, such that more information will be produced by networks with heterogeneous (or broader) degree distributions.We define the degree heterogeneity as: Many networks that we encounter in our daily lives are characterized by heterogeneous degree distributions, typically with few high degree "hub" nodes and many low degree nodes [41][42][43].By contrast, regular graphs-which have homogeneous degrees-produce random walks with the least entropy (see Fig. 2(c)).
Where does Bach's music fall along this spectrum?We found in Fig. 3A that the music networks analyzed have consistently higher entropy than null networks with the same number of nodes and edges (in other words, randomized networks with the same average degree).In the Supplementary Information Sec.D 4, we show that this higher information content of Bach's music networks is due to higher heterogeneity in their in-and out-degree distribution; that is, the music networks are more heterogeneous in their degrees than expected from transition structures of their size, enabling them to pack more information into their structure.Since we have focused our analysis on the first-order sequential relationships among notes, which are likely common across different kinds of music, we expect this result to generalize for other kinds of music as well.
In Fig. 3A, we also observed that various pieces belonging to certain compositional forms were clustered together in their entropy.Consistent with this observation, we find that the pieces which are clustered together in their entropy have very similar degrees (see Supplementary Information Sec.D 3).Examples include English suites, French suites, and chorales.In contrast, fugues did not cluster together in their entropy as much as other composition types and displayed diverse average degrees.For the compositions that are grouped together in their entropy, we find that the differences observed among the pieces in the group can be explained by their degree heterogeneity (see Supplementary Information Sec.D 4).We can, for example, see this relation in the chorales where the pieces which have a higher in-and out-degree heterogeneity tend to have a higher entropy, despite having similar degrees (Fig. 3C).We note that this relationship between the entropy and degree heterogeneity holds even in our data set of directed networks, likely because the in-and out-degrees tend to be correlated.

IV. HOW HUMANS PERCEIVE NETWORKS OF INFORMATION
A key aspect of human communication involves receiving and assimilating information in the form of interconnected stimuli-ranging from sequences of words in language and literature to melodic notes of a musical piece, and even abstract concepts.Humans assimilate this information and build representations of the underlying structure of inter-item relationships, as depicted in Fig. 4a.As noted earlier, humans build these internal network models using imperfect cognitive instruments that result in slightly distorted versions of true network structures.The information that is perceived by a human is the sum of the information present in the system and the inaccuracies that stem from the imperfect cognitive processes involved in perception [17].In the previous section, we focused on quantifying the actual information present in the system (see Fig. 2).We will now account for the second piece: the inaccuracies that arise due to the imperfect cognitive process of perceiving information (see Fig. 4).
To understand how humans learn and represent transition structures, researchers have conducted a number of experiments and introduced a range of models describing how humans internally construct transition networks [31][32][33][44][45][46].A common thread across a number of these studies and models is that humans integrate transition probabilities over time, relating items that are adjacent to each other as well as those separated by transitions of length two, three, and so on [17,32,45,47].This allows for lower computational costs and better generalizations about new information at the cost of accuracy.Here, we focus on one such model based on a free-energy principle which captures this temporal integration and inaccuracies in perception [17,31].The model postulates that when constructing internal network representations of information, humans aim to maximize the accuracy of their internal representation while simultaneously minimizing the computational cost required for its construction [17,31,35,48].One the one hand, a human could learn the structure with no errors, forming a perfectly accurate network of the transitions (Fig. 4b (i)) but that formation process would be computationally expensive.On the other hand, one could disregard accuracy and have the least expensive representation (Fig. 4b (iii)).Most humans do something in between by recalling the sequence of transitions sometimes accurately and sometimes inaccurately, thereby forming a fuzzy perception of the true network (Fig. 4b (ii)).Formally, the competition between computational complexity and accuracy can be captured by a free energy model of people's internal (a) A key aspect of human communication involves receiving and assimilating information in the form of interconnected stimuli.Humans assimilate patterns of information presented to them through imperfect perceptual systems, which results in slightly inaccurate internal models of the underlying transition structure.(b) When forming internal network models of the world, humans strike a balance between accuracy and complexity.The parameter η quantifies this trade-off between accuracy and cost.In panel (i), we see the example network built when solely maximizing the accuracy (η → 0), which forms a perfect representation of reality.However, building this network requires perfect memory and is computationally expensive.In panel (iii), we see the network built when solely minimizing the computational cost (η → 1), in which all nodes are connected to all other nodes, unlike the original network.Constructing this network does not require significant cost, but it provides no accuracy in representing the original information.Humans tend to display intermediate values of η = 0.80 [17], thereby constructing networks that preserve some but not all of the true transition structure, as shown in panel (ii).Figure adapted with permission from Ref. [35].
representation [31].The learned transition probabilities under this model ( P ) can be written in terms of the true transition probabilities (P ) as follows: where η ∈ [0, 1] captures the errors in representation.
A detailed derivation of this expression is provided in the Methods section A 3. We emphasize the similarity of this form across multiple different theories of cognition [30,33,34].By relating the inferred transition structure to the true network structure, this framework enables one to explore questions about the information that a human perceives from a given network.Given our interest in such questions in the context of music, we use this model to compute the inferred network for each musical piece.We note that studies of musical expectancy have highlighted the role of statistical learning as a mechanism, alongside other factors, in musical expectancy and knowledge acquisition [49][50][51][52].
For the rest of our discussion, we use the term "inferred network" on its own to refer to the network calculated using the model of perception discussed above.Prior work indicates that, on average, humans display an η = 0.80 in large-scale online laboratory experiments [17].Given a network of note transitions with transition probabilities (P ), we use this empirically measured value to calculate the inferred network ( P ) using Eq. 6.In the context of music, it is important to recognize that the inferred structure would naturally exhibit variations, potentially influenced by factors like an individual's level of training.Nonetheless, this framework provides interesting insights regarding the types of structures that could be considered more effective in accurately communicating information, while taking into account the limitations of human perceptual systems.We provide a discussion of how future research could expand upon our research and improve the study of information perception in music in Sec.VII.

V. QUANTIFYING DISCREPANCIES IN THE PERCEPTION OF MUSIC NETWORKS
We are now prepared to investigate the extent to which the inferred music networks deviate from their true structure.Networks that display a low deviation between the inferred and true structure can be regarded as more effective in accurately communicating information.Hence, this framework provides insight into the communicative success of a network, from the point of view of how the network interacts with our imperfect perceptual systems.Mathematically, one can quantify the deviations between the inferred network ( P ) and the original network (P ) using the Kullback-Leiber (KL) divergence: where π i is the stationary distribution of the original network.The lower the KL-divergence, the closer the network is to the true network, and hence the network can be considered more effective in communicating information accurately.Do Bach's musical compositions possess distinct features that result in smaller discrepancies in their perceived structure?How do pieces differ in these discrepancies?What are the structural differences between the musical pieces that lead to such differences?To answer these questions, for each musical piece, we compute the KL-divergence between the true transition probabilities P and the inferred transition probabilities P .Then, to understand whether these music networks do indeed maintain low discrepancies in their inferred structure, we compare them against random networks with the same number of nodes and edges.The data confirms our intuition (Fig. 5A): Bach's music networks have a lower KL-divergence than random networks of the same size.Even if we compare against null networks with the same in-and out-degree distributions, we still see that the music networks have a lower KL-divergence (Fig. 5B).This finding suggests that the lower KL-divergence of these networks cannot be explained by their degree distributions alone.Additionally, we observe interesting variations in the KL-divergence among the different compositional forms (Fig. 5).The chorales, at one extreme, seem to have the highest KL-divergence, while the preludes and toccatas have the lowest KL-divergence.In what follows, we attempt to identify and interpret the network properties that underlie the observed variations in the discrepancies of the inferred information across compositional forms and pieces.

A. Transitive clustering coefficient
As seen in the previous section, the discrepancies in the inferred transition structure for the music networks could not be explained by the distribution of degrees alone.For undirected networks, prior research has demonstrated that the KL-divergence between the inferred and true transition structures decreases with an increase in the density of triangles within the network [17].This relationship can be demonstrated by substituting the expression for the inferred version of a network (Eq.6) into the equation for the KL-divergence (Eq.7).We now extend this analysis to our directed networks, with the aim of generalizing this finding.By performing this substitution, we derive the subsequent expression for the KLdivergence in terms of the original network's adjacency matrix (A): Here we see that the KL-divergence depends on a product of the form A ij A il A lj , which quantifies the transitive relationships present in the network.More explicitly, it depends on the number of directed triangles of the form i → j → k and i → k.
To quantify the extent to which a network has clusters of this form, we introduce a measure termed the transitive clustering coefficient of the network, defined along similar lines to the clustering coefficient of a network [53,54].For each node, this quantity is measured by dividing the number of transitive triangles that node i is a part of (∆ T i ) by the number of possible directed triangles: Here k tot i is the total degree (in + out) of the node.We average this quantity over all nodes in the network to report a single value for each piece.As indicated by Eq. 8, we expect the KL-divergence of the networks to primarily be driven by the transitive clustering coefficient.This relationship is indeed evident in Fig. 5C, where we observe that musical networks with a higher transitive clustering coefficient tend to exhibit lower KL-divergence values.In this context, we also observe that the preludes and toccatas (which demonstrated relatively lower KLdivergence values) are characterized by a larger density of transitive triangles compared to other pieces like the chorales.
A natural question that arises at this point is: What is the significance of these transitive relationships within the networks, and why do they contribute to reduced disparities between the inferred and true structure?From a cognitive science perspective, this relationship between the KL-divergence and clustering arises from the tendency of humans to count transitions of length two, as discussed previously.In a scenario where a given node i is connected to node j and node j links to node k, a human learner may erroneously draw an edge between node i and node k in their mind.However, if the network originally had a direct link from node i to node k, such an error would reinforce an existing edge, thereby aligning the inferred network more closely with the true network.Hence, we expect networks with high clustering to be more robust to errors made during inference.From a music perspective, interpreting these triangles is not straightforward since the networks are unweighted.Nevertheless, the presence of a large density of such triangles suggests that if there is a transition between notes i and j, and notes i and k, there is likely also a transition between notes j and k.This could potentially reflect the tendency of music to form tonally stable sequences of note transitions.Substantiating these claims would require further efforts, which we elaborate on in Sec.VII.
Analyzing the transitive clustering further, we find that the musical networks have a higher transitive clustering coefficient than degree-preserving random networks (Fig. 5D), suggesting that this feature is not due to mere coincidence.From Fig 5D , we make an interesting observation: the preludes appear to have a lower transitive clustering coefficient than the corresponding null networks that preserve their size and degree distribution, while the chorale pieces generally have a higher transitive clustering coefficient than expected from null networks.We probe this further in the Supplementary Information and identify meso-scale structures that could lead to the observed differences between the compositional forms.

VI. ACCOUNTING FOR NOTE TRANSITION FREQUENCIES
So far, we have focused our attention on the information content and perception of unweighted (or binary) note transition networks created from Bach's mu-sic.These networks only captured whether or not a transition exists between two notes and were not sensitive to how frequently each transition occurs.The binary networks enabled us to probe how the structure of the transitions supports effective communication.However, in many real networks, not all transitions occur with the same frequency.To reflect the different frequencies with which transitions may occur, we construct networks in which transitions are weighted according to this.For example, if note i follows note j 90% of the time and note k follows note j 10% of the time, the edge from node j to node i will be more heavily weighted than the edge from node j to node k (see the Methods section A 1 for further details on network construction).Adding this piece of information to the networks leads us to new questions about the role that transition weights play in communicating information to listeners.For example, how is the information generated by a random walk on the network altered by differences in the frequencies of transitions?Do these differences in frequencies reduce the discrepancies in the inferred network?

Weights reduce the surprisal of transitions
For unweighted networks, the node-level entropy of a random walk is determined solely by the out-degree (k out i ), since each outgoing edge is traversed with probability P ij = 1/k out i .If the edges are weighted by their transition frequencies, the P ij 's will no longer be uniformly distributed, and each outgoing edge will not have an equal probability of being traversed.Hence, incorporating the edge weights reduces the node-level entropy.This observation is intuitive since non-uniformities in any distribution lead to decreases in entropy.However, extending this intuition to the entropy produced by the entire network is not as straightforward, since one must weigh the contribution of each node by the stationary distribution of the random walkers, which cannot be expressed in closed form for directed networks.Generally, we find that the entropy of weighted networks is still lower than the corresponding unweighted networks (Fig. 6A).This finding suggests that the different weights do indeed reduce the overall surprisal generated by the networks.

Weights reduce discrepancies between the inferred network and the original network
Incorporating the transition frequencies also helps us to understand the role that the weights play in the human inference of note transitions.We observe that the weighted networks of note transitions have lower KL-divergence than the binary networks (Fig. 6B).This observation suggests that the weights aid in forming more accurate internal representations of the transition structures, thereby reducing the discrepancies between the inferred and true structure.
In light of these data, we next verify the role that the network structure plays in the communicative success of weighted networks by comparing the entropy and KLdivergence of the weighted music networks with edgerewired null networks.In the analysis on unweighted networks, we observed that the entropy was primarily driven by the degree distribution of the network and not sensitive to the precise connectivity pattern.To make this observation, we had compared the entropy of the real music networks to randomized networks that preserved the exact degree distribution of each node and hence, held the node-level entropies fixed.Along similar lines, here we make use of null models that keep the node-level entropies fixed by preserving the in-and outdegree of each node and the out-weights at each node (see the Methods section for details on the null models).By comparing the entropy of the weighted music networks to the degree-preserving weighted null models, we see that the entropies of real networks are still more or less unchanged, although the real networks have marginally higher entropies than the null networks (Fig. 6C, top).These results support our conclusion that the entropy in the real networks is still primarily driven by their degree distribution.When we compare the KL-divergence of the real weighted networks with the degree-preserving weighted null models, we find that the real networks have a lower KL-divergence than the corresponding null networks (Fig. 6C, bottom).Together, these results suggest that incorporating the weights into our network analysis does not alter our results on the effects of network structure qualitatively.
Accounting for the note transition frequencies in our network model leads to several interesting lines of inquiry.For instance, is it the specific distribution of weights that improves the accuracy of the inferred music networks?Future work could evaluate this possibility by comparing the KL-divergence of the weighted networks with a class of null models that preserve the skeleton of the network, but permute the edge weights.It would also be interesting to test whether higher edge weights are concentrated in triangular clusters of the network, offering a potential explanation for the lower KL-divergence of the weighted networks compared to the binary networks.

VII. CONCLUSIONS AND FUTURE DIRECTIONS
Across language, literature, music and even abstract concepts, humans demonstrate the remarkable ability to identify patterns and relationships from sequences of items-an essential aspect of information sharing and communication [31,35,[55][56][57].Here, we draw upon ideas from network science, information theory and statistical physics to build a framework that serves as a stepping stone for studying the information conveyed by a musical piece.We use this framework to analyze networks of note transitions in a wide range of music composed by J. S. Bach.For each musical piece, we construct a network of note transitions by drawing directed edges between notes that are played consecutively.We then quantify the amount of information generated by the network structure and find that different compositional forms can be grouped together based on their information entropy.We relate the information content of each piece to its network structure, enabling us to gain insight into the structural properties of various pieces.Next, inspired by recent progress in the field of statistical learning which demonstrates how humans infer transition structures across visual and auditory domains [31,49,50,56], we use a computational model [17,31] for how humans learn networks of information to compute the average "inferred" network structure for each piece.We then quantify the discrepancies between the inferred and true transition structures under this model.Here too, we observe interesting differences among the pieces, which we attribute to differences in the clustering of the networks.Finally, we study how the frequencies of transitions influence the information content and perception of the musical pieces, by weighing the transitions by the number of times they occur.We find that the weights reduce the overall entropy or surprisal of the transitions, and also reduce the deviations between the inferred and actual network, suggesting that the weights aid in accurate inference of these transition structures.
Furthermore, we find that the music networks contain more information and maintain lower discrepancies in the inferred structure than expected from typical transition structures of the same size.This provides us insight into features that make networks of information effective at communicating information.In general, networks which are denser (have a higher average degree) produce more information (have a higher information entropy).For networks of comparable average degree, more heterogeneous (higher variance in degree distribution) structures produce more information than those that are more regu-lar or homogeneous in their degree (Fig. 7(i)).Moreover, networks which have a high degree of clustering maintain a lower divergence from human expectations (Fig. 7(ii)).Together, these findings suggest that for networks of a given size, rapid and accurate communication of information is supported by structures that are simultaneously heterogeneous and clustered (Fig. 7).Notably, such structures are widely prevalent across complex systems [41-43, 58, 59].
We hope that our framework inspires further exchange between physics, cognitive science, and musicology.On a broader scale, our study also adds to investigations on how information in complex systems is structured.To conclude, we highlight a number of exciting directions for future inquiry and outline ways in which our framework can be expanded upon and improved.

Future directions
A natural follow-up to this analysis would be to examine works of other composers-particularly works outside the Western tradition.This also prompts questions aimed at assessing how various styles or genres of music differ [60][61][62].In particular, what are the key features by which a listener distinguishes between music from two eras, say the Classical and the Romantic eras?How do the differences in structure then impact how the piece is perceived by a listener?Consequentially, a quantitative assessment of musical compositions like ours raises the FIG. 7. Network structures that support effective communication of information.Networks with a larger variance or heterogeneity in their node degrees, as shown in the top panel, pack more information into their structure and have a higher entropy.Clustering in the network, as shown in the bottom panel, makes the structure more resilient to errors made by humans when building an internal representation of the information, allowing the network to be inferred more accurately.Together, these structures convey a large amount of information that can be learned by humans more accurately, and are hence more efficient for communication.
intriguing possibility of identifying works of a composer or genres that may not be a priori obvious to musicologists.
Systematically analyzing the information that we extract from complex systems also provides us with new tools to understand human creativity and experiences.A question that often arises in the context of how humans experience music is: What makes a musical composition appealing to the human ear?While individual preferences in music can vary widely and is highly subjectively, there is still a general agreement on certain composers being considered "influential" or "great".This fact raises the possibility that there may be some inherent qualities that are common to musical pieces which are widely considered appealing.Identifying such features might give us insight into the creative process of composing music and also complement existing work using AI to generate music [63,64].Several attempts have been made to identify such patterns.For example, Ref. [15] analyzed note transition networks in certain compositions by Bach, Chopin, and Mozart as well as Chinese pop music, and suggested that "good" music is characterized by the small-world property [53] and heavy-tailed degree distributions.On the other hand, Ref. [26] studied selected compositions from Bach's Well-Tempered Clavier and found non-heavy-tailed degree distributions, suggesting that such distributions are not necessary for music to be appealing.It would be interesting to devise future experiments to determine whether our findings relate to the aesthetic or emotional appeal of a piece.In our study, we found that Bach's music networks had a higher number of transitive triangular clusters, enabling them to be learned more efficiently than arbitrary transition structures.Are pieces with a larger number of these triangles also more appealing to a listener?Future work assess this possibility by conducting experiments that ask people to rate Bach's compositions and analyzing whether these ratings correlate with the presence of triangular clusters.More generally, our work focuses not solely on the information inherent in the transition structure of music, but also on how the information in this transition structure is perceived by a human listener.This framework might be useful in studying cognitive aspects of music and in bridging patterns observed in data with cognitive theories of music.
In future work, it also would be interesting to extend our analysis to examine how music networks evolve with time.There are three potentially interesting lines of inquiry here: First, how do the entropy and KL-divergence of a musical piece change as the piece progresses?Does this temporal change differ among the various compositional forms?Second, how has the music of a specific composer (whether Bach or otherwise) changed over the course of their lifetime?Has it become more intricate and complex, holding more information?Perhaps as the composer gains experience, their compositions convey information more efficiently and accurately, as reflected in a reduced KL-divergence?If the exact dates of when each piece was composed were known, then the framework used in our paper might provide answers to these questions.Third, how has music of a given genre, say classical music, changed over the years across composers?Ref. [28], for example, studied the fluctuation in pitch between adjacent notes in compositions by Bach, Mozart, Beethoven, Mendelsohn, and Chopin, and found that the largest pitch fluctuations of a composer gradually increased over time from Bach to Chopin.As mentioned earlier, it would be interesting to expand our analysis to different composers, and see how the information and expectations vary across composers and time.
Lastly, we also identify limitations within our analysis that highlight directions for further effort.First, our work relies on a simplistic representation of music that could be expanded to incorporate more musical realism and complexity.For instance, one could account for differences in timbre, the intervals between notes, or even fused notes or chords, which are known to play a key role in music perception [65,66].Second, while we have focused on the information present in first-order sequential relationships among the notes, future work could capture higher-order correlations, hierarchies, and more intricate structures inherent in music [36][37][38][39].Recent advances in studying higher-order dependencies and structures present in networks offer a promising approach to capturing this complexity [67][68][69].Incorporating such subtleties would not only improve our understanding of how the networks are structured, but also how they are perceived.Expanding on this understanding, it would be beneficial to conduct targeted experiments that specifically address and build models of the perception of distinct musical attributes.Further, exploring the variability of music perception among individuals, considering factors such as musical training or cultural influences would also be interesting.
The aforementioned and ensuing directions would expand our capacity to address more specific questions regarding the composer idiosyncrasies, era characteristics, and genre discussed earlier.As such, our work offers a flexible framework that can be utilized by a wide range of scholars both in and outside of physics.Beyond music, our study can also be extended to a range of complex systems present around us-such as language and social networks.For example, one could analyze works of literature and ask: Does the entropy of noun transitions in various works of Shakespeare differ based on their genre?More specifically, does the information content and learnability of noun transitions or relationships between characters differ between tragedies and comedies?By providing an example of a systematic and comprehensive analysis of the actual and perceived information in music, our study complements and adds to the rich study of language, music, and art as complex systems [26,70,71].different movements and our data set has separate MIDI files for each movement.We analyze each movement separately and average our measurements over them to yield a single measured quantity for each piece, as indexed by a unique BWV number.In the case of the chorales, we analyzed the 186 four-part chorales in BGA Vol.39 with BWV number 253-438.
The MIDI files were read in MATLAB using the readmidi function in MATLAB [76] to obtain information about the notes being played.Different instruments in a piece are stored in separate channels within each data file.The transitions between notes are calculated separately for each instrument or track.We assign each note present in a piece a node in the network, and notes from different octaves are assigned distinct nodes.We then draw an edge from note i to note j if there is a transition between them.If there are multiple notes being played at a single time t (as is the case with chords), edges are drawn from the previously played note to all notes at time t, and from all the notes being played at time t to the subsequent note(s).This procedure gives us a directed binary network of note transitions.The code and data used to construct the networks is available at [77].We also construct weighted versions of these networks, where each edge is weighted by the number of times the corresponding transition occurs.

Entropy of random walks on networks
We use random walks to model how a sequence of information is generated from an underlying network of information.Under this model, a walker traverses the network by picking an outgoing edge to traverse at each node.Given a network with adjacency matrix A and matrix element A ij , the probability that a walker transitions from node i to node j in a standard Markov random walk is where k out i = j G ij is the out-degree of a node.We are interested in quantifying how much information is contained in the resulting sequence, which is captured by the entropy of the random walk: where π is the stationary distribution of the walkers, which satisfies the condition P π = π.For the simplest possible case of an undirected and unweighted network, P ij = 1/k i and π i = k i /2E, where k i is the degree of the i th node and E = i,j A ij /2 is the total number of edges.The entropy in this case simplifies to: We can apply a Taylor expansion to this expression around the average degree of the network, and thereby obtain: Hence we find that the entropy of random walks increase logarithmically with the average degree of the network.Additionally, it grows as the variance of the degrees increases.This formalization enables us to relate the information content of various music networks to their network structure.The code used to measure the entropy of random walks on the networks analyzed is available at [77].

Model for how humans learn networks
As discussed in the main text, humans do not infer the transition probabilities of sequences of information with perfect accuracy due to imperfections in their cognitive processes.Studies have consistently found that in forming internal representations of transition structures, humans integrate transition probabilities over time [17,32,45,47].This process results in humans connecting items in the sequence that are not directly adjacent to each other.Mathematically, we can express the inferred transition structure P in terms of the true transition structure P under this model of fuzzy temporal integration as: where f (∆t) is the weight given to the higher powers of P and is a decreasing function of ∆t such that longerdistance associations contribute less to a person's network representation.The functional form of f (∆t) is obtained using a free energy model described in Ref. [31].This model suggests that when forming internal representations of information, each human arbitrates a trade-off between accuracy and cost.The optimal distribution for f (∆t) under this model is then a Boltzmann distribution with a parameter β that quantifies the trade-off between cost and accuracy in forming an internal representation of the information: where Z = e −β∆t = (1 − e −β ) −1 is a normalization constant.Substituting this expression to simplify Eq.A3, we obtain an equation that relates the inferred transition probabilities P to the true transition probabilities P : where η = e −β .Prior work has estimated the value of η to be 0.8 from large-scale online experiments in humans [17].Using this measured value of η, we use Eq.A5 to calculate the inferred network for any given music network (code available at [77]).

KL-divergence
To quantify how much the distorted learned transition structure P differs from the original transition structure P , we calculate the Kullback-Leiber (KL) divergence between the two transition structures.The Kullback-Leiber divergence is a measure of how different a probability distribution is from a reference distribution, and is given by: where ⃗ π is the stationary probability distribution of the transition matrix P , obtained by solving P π = π.The KL-divergence between two quantities is always nonnegative and attains the value zero if and only if P = P .The larger the KL-divergence, the more the inferred network P differs from the original network.Hence, this quantity acts as a measure of the extent to which a network gets scrambled by the inaccuracies of human of learning-or in other words, how accurately the network structure is inferred.

Null Models
We aim to identify distinct features in the music networks that enable them to convey information effectively.To assess whether our observations are merely due to random chance or are instead a unique feature of our dataset, we compare our measurements on the real music networks with the following null network models [78,79].
1. Null networks with the same number of nodes and edges.These are obtained by generating random networks with the same number of nodes and edges, and enable us to assess whether the quantity we have measured is to be expected merely based on network size.
2. Degree-preserving null networks.These are randomized networks of the same size, with the additional constraint that the in-and out-degrees of each node in the network are preserved.Such networks are constructed by swapping edges between pairs of nodes in the network iteratively, such that the in-and out-degrees of each node are preserved but the connectivity (or topology) of the network is randomized.This class of null models enable us to evaluate the role that connectivity or topology plays in the quantity we are measuring.
We can generalize the degree-preserving null networks to weighted networks.We are interested in degreepreserving randomized networks since these keep the node-level entropies fixed and allow us to study the impact of topology on the quantities we are measuring.In the case of weighted networks, the node-level entropies are determined by the out-weights and out-degrees of the nodes.Hence, our procedure of swapping edges between pairs of nodes in the network still works since it preserved the out-weights of each node in addition to the inand out-degrees.With these null models, we can benchmark the presence of the quantities we are interested in, and identify the role that the connectivity pattern or size plays.The code used to generate the null networks is available at [77].

Transitive Clustering Coefficient
Along the lines of the clustering coefficient of a node [53,54], we define the transitive clustering coefficient as a measure of the degree to which nodes in a directed network tend to form transitive relationships.The transitive clustering coefficient of a node i (for an unweighted graph with no self loops) is given by: where ∆ T i denotes the number of transitive triangles that node i is a part of and k tot i is the total degree (in + out) of the node.The denominator simply counts the number of triangles that could exist within the neighborhood of node i.The possible directed triangles involving node i can be divided into two categories-those representing cyclic relationships and those representing transitive relationships (Fig. 8).The number of transitive triangles involving node i that actually exist can be expressed in terms of the adjacency matrix of the graph A, This expression counts a subset of the total number of triangles, and is a special case of the expression derived in Ref. [80].We will use this expression to measure the transitive clustering coefficient of each music networks (code available at [77]).

Information content
To better visualize the variation in information content among the musical compositions, we assign each piece an index number and plot the information entropy for each piece as a function of its index number (Fig. 9A).We observe here more clearly how different compositional forms tend to have pieces clustered together in their entropies.As reported in the main text, we find that the chorales have a markedly lower entropy than the rest of the compositions studied.In contrast, the toccatas and the second set of preludes have a much higher entropy.To relate the information entropy of the music networks to their structure, we compare their entropy to corresponding null networks (Fig. 3A and B in the main text), where we conclude that the information entropy is primarily determined by the degree distributions.In the case of undirected and unweighted networks, the network entropy depends upon the logarithm of the average degree of the network and the heterogeneity in the degree distribution (Eq.4) to first and second order, respectively [17,40].We now provide supplementary results that relate the information entropy of the music networks to their structure.

Understanding the information entropy to first order: average degree
On plotting the information entropy of the music networks as a function of their average degree (Fig. 9B), we see that the differences in the information entropy of the compositional forms to first order arise due to differences in their average degrees.Although we observed in Fig. 9A that the compositional forms are clustered together in their entropy, it is clear that some pieces-such as the chorales, French suites, English suites, and cantatasare more tightly clustered than the fugues and first set of preludes.These differences can be explained by the how much the average degrees vary across pieces.In Fig. 10, we plot the entropy of the music networks as a function of the average network degree, separately for each composition type.Additionally, we also report the standard deviation in the average degree of the pieces for each composition type.Studying these plots, we observe that the English suites, French suites, and chorales (which clustered more tightly in their entropies) have tighter degree distributions, while the fugues (which are more spread out in their entropy) display more diverse average degrees.
In Fig. 3A of the main text, we observed that the entropy of the real music networks is larger than corresponding randomized null networks with the same num-ber of nodes and edges.Since the average degree is the same for the two networks, we hypothesize that the differences arise due to higher in-and out-degree heterogeneity as per Eq. 4. To test our hypothesis, we compare the in-and out-degree heterogeneity of the music networks (calculated using Eq. 5) with their corresponding null networks in Fig. 11.In general, we observe that Bach's music networks are indeed more heterogeneous than expected from the random networks of the same size.This organization allows them to pack more information into their structure.
The heterogeneity in degrees can also explain the differences in entropies observed between pieces that are tightly clustered together in their entropy.As observed earlier, compositions such as the chorales, French suites, English suites, and cantatas have pieces that are clustered together in their average degree and consequentially, in their entropy.We expect that the differences observed among the pieces in each group can be explained by differences in their degree heterogeneity.In Fig. 12 and Fig. 3C, we plot the entropies of the pieces that clustered together as a function of their in-and out-degree heterogeneity, and in general observe that the pieces with higher heterogeneity have a higher information entropy.However, we note that our sample size for most compositional forms is small and hence, we only report the chorales in the main text.

Further analysis of the transitive clustering coefficient
In our analysis of the discrepancies between the actual and perceived information content of note transitions in Bach's musical compositions, we found that these discrepancies were primarily driven by the presence of transitive triangular clusters.These transitive triangular clusters tend to bring the inferred network closer to the actual network, making the network more learnable.As shown in Fig. 13A, the real (unweighted) music networks tend to have a higher transitive clustering coefficient than random networks that preserve the degree of each node, indicating that this is a distinct feature of the music networks that is not merely due to coincidence.The data in Fig. 13A has a striking shape, which we elaborate on and analyze in this section.First we observe that the chorale pieces tend to have a higher transitive clustering coefficient than expected from networks of their same size and degree distribution.Second, although the preludes have a higher transitive clustering coefficient than other compositional forms, the value was still lower than expected from networks of their same size and degree distribution.Indeed, by examining only the x-axis, we notice that the null networks corresponding to the preludes have a higher transitive clustering coefficient than the null networks corresponding to chorales.However, by examining the y-axis, we see that the deviation between the real chorales and the prelude networks are not that pronounced.We hypothesize that these differences might be due to the presence of mesoscale features in the networks, such as core-periphery structure.
a. Core-periphery structure Core-periphery structure in a network refers to the presence of two components: a tightly connected "core" and a sparsely connected "periphery.The core consists of nodes which are well-connected to each other and to the periphery, while the nodes in the periphery are sparsely connected to one another and to the nodes in the core [93,94].We hypothesize that the presence of a relatively larger core might explain why the chorales have a higher clustering coefficient than expected given their size and degree.Similarly, a smaller than expected core for the preludes might explain why their clustering coefficient was lower than expected from networks of the same size and degree distribution.Since the core consists of nodes that are well-connected to themselves and the periphery, if there are a larger number of edges occurring within the core and between the core and periphery than between the periphery nodes, it is likely that these edges will form the clusters that we are interested in.We denote the edges between two nodes that belong to the core by core-core (CC), those between nodes that belong to the periphery by periphery-periphery (PP), and those between the nodes in the core and the nodes in the periphery by core-periphery (CP).
To test our hypothesis, we compute the core-periphery structure for each music network using the method described by Borgatti and Everett [94].We then compute the ratio of the sum of the number of core-core (CC) edges and core-periphery (CP) edges to the number of periphery-periphery (PP) edges for each network.To understand this ratio, we compare it to corresponding degree-preserving null networks (Fig. 13B).Strikingly, we observe that the chorales have a higher fraction of edges that are within or emanating from the core than expected from their corresponding null networks.The preludes are at the other end, and have a lower fraction of edges that are within or emanating from the core than expected from their corresponding null networks.This pattern of findings suggests that the chorales have a more pronounced core-periphery structure than expected by chance, while the preludes have a less pronounced core-periphery structure than expected.Although the preludes still have a slightly higher transitive clustering coefficient than the other pieces, the differences are not as pronounced as one would expect because of these differences in their core-periphery structure.
By performing this additional analysis, we provide an example of how the music networks display interesting meso-scale structures that differ from one compositional form to another, resulting in differences in how their network structure is perceived.

A B
FIG. 13.Core-periphery analysis of the music networks.(A) The transitive clustering coefficient of the real music networks compared to null networks that preserve the in-and out-degree of each node.For the degree-preserving null networks, we report the average over 100 independent realizations, with error bars denoting the standard error of the sample.(B) The ratio of the number of core-core (CC) edges and core-periphery (CP) edges to the number of periphery-periphery (PP) edges in the real music networks compared to degree-preserving null networks.For the degree-preserving null networks, we report the average value computed over 100 independent random graphs.In both panels, the dotted line indicates the line y = x.Colors and markers indicate the type of piece, as shown in the legend.

FIG. 3 .
FIG.3.Quantifying the information of Bach's music using the entropy of random walks on networks of note transitions.(A) Entropy of Bach's music networks (S real ) compared with random networks of the same size (S rand ).We report the entropy of the corresponding random networks after averaging over 100 independent realizations.The error bars for S rand indicate the standard error of the sample.(B) The entropy of Bach's music networks (S real ) compared with random networks that preserve the in-and out-degree of each node (S deg ).We report the entropy of the corresponding degree-preserving random networks after averaging over 100 independent realizations.The error bars for S rand indicate the standard error of the sample.(C) The entropy of the chorales as a function of the average in-degree heterogeneity H in = Var(k in )/⟨k in ⟩ (top) and out-degree heterogeneity H out = Var(k out )/⟨k out ⟩ (bottom) of the networks.In panels (A) and (B), each data point represents a single piece.Color and marker indicate the type of piece, as shown in the legend.The dashed line represents the line y = x.In panel (C), the dotted line indicates the best linear fit, and the reported rs value is the Spearman correlation coefficient.

FIG. 4 .
FIG. 4. How humans process networks of information.(a)A key aspect of human communication involves receiving and assimilating information in the form of interconnected stimuli.Humans assimilate patterns of information presented to them through imperfect perceptual systems, which results in slightly inaccurate internal models of the underlying transition structure.(b) When forming internal network models of the world, humans strike a balance between accuracy and complexity.The parameter η quantifies this trade-off between accuracy and cost.In panel (i), we see the example network built when solely maximizing the accuracy (η → 0), which forms a perfect representation of reality.However, building this network requires perfect memory and is computationally expensive.In panel (iii), we see the network built when solely minimizing the computational cost (η → 1), in which all nodes are connected to all other nodes, unlike the original network.Constructing this network does not require significant cost, but it provides no accuracy in representing the original information.Humans tend to display intermediate values of η = 0.80[17], thereby constructing networks that preserve some but not all of the true transition structure, as shown in panel (ii).Figure adapted with permission from Ref.[35].

FIG. 5 .
FIG. 5. Quantifying the difference between the actual information and the perceived information in Bach's music networks by calculating the KL-divergence between the actual and perceived network.(A) KL-divergence of the real music networks (D real KL ) compared with random networks of the same size (D rand KL ).We report the KL-divergence of the corresponding random networks after averaging over 100 independent realizations.The error bars for D rand KL indicate the standard error of the sample.(B) KL-divergence of the real music networks (D real KL ) compared with random networks that preserve the in-and out-degree of each node (D deg KL ).We report the KL-divergence of the corresponding degree-preserving random networks after averaging over 100 independent realizations.The error bars for D deg KL indicate the standard error of the sample.(C) KL-divergence of the real music networks as a function of the transitive clustering coefficient of the network C = ⟨∆ T i /k tot i (k tot i − 1)⟩.(D) The transitive clustering coefficient of the real music networks compared with random networks that preserve the in-and out-degree of each node.The dotted line indicates the line y = x.For the degree-preserving random networks, we report the transitive clustering coefficient after averaging over 100 independent realizations, with error bars denoting the standard error of the sample.In all the panels, each data point represents a single piece.Color and marker indicate the type of piece, as shown in the legend.The dotted line in panels (A), (B), and (D) represents the line y = x.

FIG. 6 .
FIG. 6. Accounting for the frequencies of the note transitions in our analysis.(A) Entropy of the weighted versions of Bach's music networks (S weighted ) compared with the corresponding unweighted versions (S unweighted ).(B) The KL-divergence of the weighted versions of Bach's music networks (D real,w KL ) compared with the corresponding unweighted versions (D real KL ).(C) Top: Entropy of the weighted note transition networks (S real,w ) compared with degree-preserving edge-rewired null networks (S deg, w ).Bottom: The KL-divergence of the weighted note transition networks (D real,w KL ) compared with degree-preserving edge-rewired null networks (D deg, w KL ).In all panels, each data point represents a single piece.Color and marker indicate the type of piece, as shown in the legend.The dashed line represents the line y = x.In the top figure of panel (C), we report the average deviation of the data points from the line y = x.

FIG. 8 .
FIG.8.The 8 different possible triangles with node i as a vertex in a directed graph.The triangles which represent transitive relationships are marked using the letter 'T'.

FIG. 9 .
FIG. 9.The entropy of Bach's music networks and its relation to the average degree of the network.(A) The entropy of Bach's music networks (S real ) indexed by the pieces.(B) The entropy of Bach's music networks (S real ) as a function of the average degree of the network ⟨k⟩.Each data point in panels (A) and (B) represents a single piece.Colors and markers indicate the type of pieces, as shown in the legend.

FIG. 10 .FIG. 11 .FIG. 12 .
FIG.10.The relation between the information entropy and the average degree of the music networks plotted separately for each compositional form.The entropy of Bach's music networks (S real ) plotted against the average degree of the network ⟨k⟩.Each data point represents a single piece.Colors and markers indicate the type of pieces, as shown in the legend.