Low Algorithmic Complexity Entropy-deceiving Graphs

In estimating the complexity of objects, in particular of graphs, it is common practice to rely on graph- and information-theoretic measures. Here, using integer sequences with properties such as Borel normality, we explain how these measures are not independent of the way in which an object, such as a graph, can be described or observed. From observations that can reconstruct the same graph and are therefore essentially translations of the same description, we will see that when applying a computable measure such as Shannon Entropy, not only is it necessary to pre-select a feature of interest where there is one, and to make an arbitrary selection where there is not, but also that more general properties, such as the causal likelihood of a graph as a measure (opposed to randomness), can be largely misrepresented by computable measures such as Entropy and Entropy rate. We introduce recursive and non-recursive (uncomputable) graphs and graph constructions based on these integer sequences, whose different lossless descriptions have disparate Entropy values, thereby enabling the study and exploration of a measure's range of applications and demonstrating the weaknesses of computable measures of complexity.


I. THE USE OF SHANNON ENTROPY IN NETWORK PROFILING
One of the main objectives behind the application of Shannon Entropy is the characterization of the randomness or 'information content' of an object such as a graph. Here we introduce graphs with interesting deceptive properties, particularly disparate Entropy (rate) values for the same object when looked at from different perspectives, revealing the inadequacy of classical information-theoretic approaches to graph complexity.
Central to information theory is the concept of Shannon's information Entropy, which quantifies the average number of bits needed to store or communicate the statistical description of an object.
For an ensemble X(R, p(x i )), where R is the set of possible outcomes (the random variable), n = |R| and p(x i ) is the probability of an outcome in R. The Shannon Entropy of X is then given by Definition II. 6.
Which implies that to calculate H(X) one has to know or assume the mass distribution probability of ensemble X. One caveat regarding Shannon's Entropy is that one is forced to make an arbitrary choice regarding granularity. Take for example the bit string 01010101010101. The Shannon Entropy of the string at the level of single bits is maximal, as there are the same number of 1s and 0s, but the string is clearly regular when 2-bit (non-overlapping) blocks are taken as basic units, in which instance the string has minimal complexity because it contains only 1 symbol (01) from among 4 possible ones (00,01, 10,11).
A generalization consists in taking into consideration all possible "granularities" or the Entropy rate: Definition II.7. Let P r(s i , s i+1 , . . . , s i+L ) = P r(s) with |s| = L denote the joint probability over blocks of L consecutive symbols. Let the Shannon Entropy rate [28] (also known as granular Entropy, n-gram Entropy) of a block of L consecutive symbols-denoted by H(L)- Thus to determine the Entropy rate of the sequence, we estimate the limit when L → ∞.
It is not hard to see, however, that H L (s) will diverge as L tends to infinity if the number of symbols increases, but if applied to a binary string H L (s), it will reach a minimum for the granularity in which a statistical regularity is revealed.
The Shannon Entropy [28] of an object s is simply H L (s) for fixed block size L = i, so we can drop the subscript.
We can define the Shannon Entropy of a graph G, with respect to i, by: Definition II. 8.
where P is a probability distribution of G i , i is a feature of interest of G, e.g. edge density, degree sequence, number of over-represented subgraphs/graphlets (graph motifs), and so on.
When P is the uniform distribution (every graph of the same size is equally likely), it is usually omitted as a parameter of H.
The most common applications of Entropy to graphs are to degree sequence distribution and edge density (adjacency matrix), which are labelled graph invariants. In molecular biology, for example, a common application of Entropy is to count the number of 'branchings' [26] per node by, e.g., randomly traversing a graph starting from a random point. The more extensive the branching, the greater the uncertainty of a graph's path being traversed in a unique fashion, and the higher the Entropy. Thorough surveys of graph Entropy are available in [26,30,44], so we will avoid providing yet another one. In most, if not all of these applications of Entropy, very little attention is paid to the fact that Entropy can lead to completely disparate results depending on the ways in which the same objects of study are described, that is, to the fact that Entropy is not a graph invariant-either for labelled or unlabelled graphs-vis-á-vis object description, a major drawback for a complexity measure [38,42] of typicality, randomness, and causality. In the survey [26], it is suggested that there is no 'right' definition of Entropy. Here we formally confirm this to be the case in a fundamental sense.
Indeed, Entropy requires a pre-selection of a graph invariant, but it is itself not a graphinvariant. This is because ignorance of the probability distribution makes Entropy necessarily dependent on graph invariant description, there being no such thing as an Invariance theorem [6,18,33] in Shannon Entropy to provide a convergence of values independent of description language as there is in algorithmic information theory for algorithmic (Kolmogorov-Chaitin) complexity.
Definition II.9. The algorithmic complexity of an object G is the length of its shortest computational description (computer program) in a reference language (of which it is independent), such that the shortest generating computer program fully reconstructs G [6,18,20,33]. For example, the mathematical constant π is believed to be an absolute Borel normal number (Borel normal in every base), and so one can take the digits of π in any base and take n × n digits as the entries for a graph adjacency matrix of size n × n by taking n consecutive segments of n digits π. The resulting graph will have n nodes and an edge density 0.5 because the occurrence of 1 or 0 in π in binary has the probability 0.5 (the same as π in decimals after transformation of digits to 0 if digit i < 5 and 1 otherwise, or i < b/2 and 1 otherwise in general for any base b), thus complying with the definition of an Erdös-Rényi (E-R) graph (albeit of high density).
As theoretically predicted and numerically demonstrated in Fig. 1(A and B), the degree distribution will approximate a normal distribution around n. This means that the graph adjacency matrix will have maximal Entropy (if π is Borel normal) but low degree-sequence Entropy because all values are around n and they do not span all the possible node degrees (in particular, low degrees). This means that algorithmically constructing a graph can give rise to an object with a different Entropy when the feature of interest of the said graph is changed.
A graph does not have to be of low algorithmic complexity to yield incompatible observerdependent Entropy values. One can take the digits of an Ω Chaitin number (the halting digits of π in base 2 (A) and in base 10 (B) undirected and with no self-loops. C: A graph based on the 64 calculated bits of a partially computable Ω Chaitin number [5]. It appears to have some structure but any regularity will eventually vanish as it is a Martin-Löf algorithmic random number [23].
probabilities of optimal Turing machines with prefix-free domains), some of the digits of which are uncomputable. But in Fig. 1(C) we show a graph based on the first 64 digits of an Ω Chaitin number [5], thus a highest-algorithmic-complexity graph in the long run (it is ultimately uncomputable). Since randomness implies normality [23], the adjacency matrix has maximal Entropy, but for the same reasons as obtain in the case of the π graphs, it will have low degree-sequence Entropy. For algorithmic complexity, in contrast, as we will see in Theorem III.6, all graphs have the same algorithmic complexity regardless of their (lossless) descriptions (e.g. adjacency matrix or degree sequence), as long as the same and only the same graph (up to an isomorphism) can be reconstructed from their descriptions. One can also start from completely different graphs. For example, Fig. 2 shows how Shannon Entropy is applied directly to the adjacency matrix as a function of edge density, with the same Entropy values retrieved despite their very different (dis)organization.
The Entropy rate will be low for the regular antelope graph, and higher, but still far removed from randomness for the E-R, because by definition the degree-sequence variation of an E-R graph is small. However, in scale-free graphs degree distribution is artificially scaled, spanning a large number of different degrees as a function of number of connected edges per added node, and resulting in an over-estimation of their degree-sequence Entropy, as can be numerically verified in Fig. 3. Degree-sequence Entropy points in the opposite direction to the entropic estimation of the same graphs arrived at by looking at their adjacency matrices, when in reality, scale-free networks produced by, e.g., Barabasi-Albert's preferential attachment algorithm [1], are recursive (algorithmic and deterministic, even if probabilities are involved), as opposed to the E-R construction built (pseudo-)randomly. The Entropy of the degree-sequence of scale-free graphs would suggest that they are almost as, or even more random than E-R graphs for exactly the same edge densities. To circumvent this, ad-hoc measures of modularity have been introduced [31], to precisely capture how removed a graph is from 'scale-freeness' by comparing any graph to a scale-free randomized version of itself, and thereby compelling consideration of a pre-selected feature of interest ('scale-freeness').
Furthermore, an E-R graph can be recursively (algorithmically) generated or not, and so its Shannon Entropy has no connection to the causal, algorithmic information content of the graph, and can only provide clues for low Entropy graphs that can be characterized by other graph-theoretic properties, without need of an entropic characterization. the generative quality of each group of graphs, suggesting that B-A are as or more random than E-R graphs, despite their recursive (causal/algorithmic and deterministic) nature, whereas in fact this should make B-A networks more random than E-R graphs. Here the E-R graphs have exactly the same edge density as the B-A graphs for 4 and 5 preferential attached edges per node. This plot illustrates how, for all purposes, Entropy can be easily fooled and cannot tell apart higher causal content from apparent randomness. One can always update the ensemble distribution to accommodate special cases but only after gaining knowledge by other methods.

B. A low complexity and high Entropy graph
We introduce a method to build a family of recursive graphs with maximal Entropy but low algorithmic complexity, hence graphs that appear statistically random but are, however, of low algorithmic randomness and thus causally (recursively) generated. Moreover, these graphs may have maximal Entropy for some lossless descriptions but minimal Entropy for other lossless descriptions of exactly the same objects, with both descriptions characterizing the same object and only that object, thereby demonstrating how Entropy fails at unequivocally and unambiguously characterizing a graph independent of a particular feature of interest. We denote by 'ZK' the graph (unequivocally) constructed as follows: 1. Let 1 → 2 be a starting graph G connecting a node with label 1 to a node with label 2. If a node with label n has degree n, we call it a core node; otherwise, we call it a supportive node.
2. Iteratively add a node n+1 to G such that the number of core nodes in G is maximized.
The resulting graph is typified by the one in Fig. 4.

C. Properties of the ZK graph
The degree sequence d of the labelled nodes d = 1, 2, . . . , n is the Champernowne constant [7] C in base 10, a transcendental real whose decimal expansion is Borel normal [4], constructed by concatenating representations of successive integers C 10 = 0.1234567891011121314151617181920 . . . whose digits are the labelled node degrees of G for n = 20 iterations (sequence A033307 in the OEIS).
The sequence of edges is a recurrence relation built upon previous iteration values between core and supportive nodes, defined by: where r = (1 + sqrt (5) ZK n m has been used where we want to emphasize the number of generation-or time-steps in the process of constructing ZK n . The symbol ∆(ZK) denotes the maximum degree of the graph. Nodes in the ZK graph belong to 2 types: core and supportive nodes. Definition III. 3. Node x is a core node iff ∃ m ∈ {1 · · · n − 3} such that x ∈ ∆(ZK n m ). Otherwise it is a supportive node.
Theorem III.1. To convert ZK r−1 to ZK r , we need to add 2 supportive nodes to ZK r−1 if r is odd or one supportive node if r is even.
Proof. By induction: The basis: ZK 3 has 3 core nodes denoted by c 3 and 2 supportive nodes denoted by s 3 . As described in the construction procedure, to convert ZK 3 to ZK 4 , we choose a supportive node with maximum degree. Here, since we have only s 3 nodes, their degree is one. So we need to connect to 3 other supportive nodes. As we have only one left, we need to add 2 supportive nodes. Now, ZK 4 has 3 supportive nodes, 2 of them new, s 4 , and one old, s 3 .
The old one is of degree 2, and we need to convert it to 5; we have 2 other supportive nodes left, so we need a new supportive node s 4 . Therefore, the assumption is true for ZK 3 and ZK 4 (the basis).
Inductive step: Now, if we assume that it is true for ZK n−1 , then it is true for ZK n .
We consider 2 cases: 1. n − 1 is odd 2. n − 1 is even Case one: If n − 1 is odd then n − 2 is even, which means we have added one supportive node with degree one, and to convert ZK n−1 to ZK n we need to have a core node with depicted here) that are based on Entropy rate (word repetition). While useful for quantifying specific features of the graph that may appear interesting, no graph-theoretic or entropic measure can account for the low (algorithmic) randomness and therefore (high) causal content of the network. degree n. The maximum degree of a supportive node is n − 3, and we have only one supportive node which is not connected to the core candidate node, which implies that the core technique, other than compression, that uses the concept of algorithmic probability to approximate algorithmic complexity [39][40][41]. This means that randomness characterizations by algorithmic complexity are robust, as they are independent of object description, and are therefore, in an essential way, parameter-free, meaning that there is no need for pre-selection or arbitrary selection of features of interest for proper graph profiling. candidate node will be n−2, and we would need to add 2 extra supportive nodes to our graph.
Case two: If n − 1 is even then n − 2 is odd, and therefore ZK n−1 has 2 supportive nodes with degree one (they have only been connected to the last core nodes). So we would need to add only one node to convert the supportive node with maximum degree to a core node with degree n.

if n is odd then |V
Theorem III.2. ∀r ∈ 1 · · · n there is a maximum of 3 nodes with degree r in ZK n .

Proof. By induction:
The basis: The assumption is true for ZK 3 .
Inductive step: If we assume ZK n−1 have ∀r ∈ 1 · · · n − 1 there is a maximum of 3 nodes with degree r ∈ ZK n−1 then ∀r ∈ 1 · · · n, then there is a maximum of 3 nodes with degree r ∈ ZK n .
The proof is direct using theorem III.1. To generate ZK n , we add a maximum of 2 supportive nodes. These nodes have degree one and there is no node with degree one except the first core node (core node with degree 1). Thus we have a maximum of 3 nodes with degree one. The degree of all other supportive nodes will be increased by one, which, based on the hypothesis of induction, has not been repeated more than 3 times.
Theorem III.3. ZK is of maximal degree-sequence Entropy.
Proof. The degree sequence of the ZK graph can be divided into 2 parts: 1. A dominating degree subsequence associated with the core nodes (always longer than subsequence 2 of supporting nodes) generated by the infinite series: 1. A second degree sequence associated with the supportive nodes, whose digits do not repeat more than 3 times, and therefore, by Theorem III.2, has a maximal n-order Entropy rate for n > 2 and a high Entropy rate for n < 2.
Therefore, the degree sequence of ZK is asymptotically of maximal Entropy rate.
Proof. By demonstration: The computer generated program of the ZK graph written in the Wolfram Language, is: starting from the graph defined by 1 → 2 as initial condition.
The length of NestList with AddEdges and the initial condition in bytes is the algorithmic complexity of ZK, which grows by only log 1 0i and is therefore of low algorithmic randomness.
We now show that we can fully reconstruct ZK from the degree sequence. As we know that we can also reconstruct ZK from its adjacency matrix (denoted by Adj(ZK)), we therefore have it that both are lossless descriptions from which ZK can be fully reconstructed and for which Entropy provides contradictory values depending on the feature of interest.
Theorem III.5. ∀n ∈ N, all instances of ZK n are isomorphic.
Proof. The only degree of freedom in the graph reconstruction is the selection of a supportive node to convert to a core node when there are several supportive nodes of maximal degree.
As has been proven in Theorem III.1, the number of nodes which are added to a graph is independent of the supportive nodes selected for conversion to a core node. In any instance of a graph the number of nodes and edges are equal, and it is clear that by mapping the selected node in each step in any instance of a graph to the selected node in the corresponding step in another instance ZK n ) we get f : V (ZK n ) ⇒ V (ZK n ), such that f is a bijection (both one-one and superimposed one on the other).
Finally, we prove that all isomorphic graphs have about the same (e.g. low) algorithmic complexity: Theorem III.6. Let G be an isomorphic graph of G.
Proof. The idea is that if there is a significantly shorter program p for generating G compared to a program p generating Aut(G), we can use p to generate Aut(G) via G and a relatively short program c that tries, e.g., all permutations, and checks for isomorphism.
Let's assume that there exists a program p such that ||p | − |p|| > c, i.e. the difference is not bounded by any constant, and that K(G) = |p |. We can replace p by p + c to generate Aut(G) such that K(Aut(G)) = p + c, where c is a constant independent of G that represents the size of the shortest program that generates Auth(G), given any G. Then we have it that |K(Aut(G)) − K(G)| < c, which is contrary to the assumption.
The number of Borel-normal numbers that can be used as the degree sequence of a graph is determined by the necessary and sufficient conditions in [16,17] and is numerable infinite.

D. Degree-sequence targeted Entropy-deceiving graph construction
Taking advantage of the correlation between 2 variables X 1 , X 2 (starting independently) with the same probability distribution, let M be a 2 × 2 matrix with rows normalized to 1.

Consider the random variables
The correlation between Y 1 and Y 2 is just the inner product between the two rows of M . This can be used to generate a degree distribution of a graph with any particular Entropy, provided the resulting degree sequence complies with or is completed according to the necessary and sufficient conditions for building a graph [16,17].

IV. GRAPH ENTROPY VERSUS GRAPH ALGORITHMIC COMPLEXITY
The ensemble of the graphs compatible with the ZK graph for the Entropy of its degree distribution consists thus of the set of networks that have near-maximal degree sequence, as the sequence distribution is uninformative (nearly every degree appears only once) and thus does not reduce statistical uncertainty, despite the algorithmic nature of the ZK graph (and assuming one does not know that the graph is deterministically generated, a reasonable assumption of ignorance characteristic of the general observer in a typical, realistic case).
The size of the ensemble is thereby close to |d|!, the number of permutations of the elements of the degree distribution d of the ZK graph, constrained by the number of sequences that A sound characterization of a complexity measure can thus be established as a function that captures strictly more information about (any) S than any (computable) function. All computable functions are thus not good candidates for universal measures of complexity as they can be replaced by a measure as a function of the property (or combination) of properties of interest and nothing else.

C. Dependence on assumed distributions
An argument against the claim that Entropy yields contradictory values when used to profile randomness (even statistical randomness) is that one can change the domain of the Entropy measure in such a way as to make Entropy consistent with any possible description of a graph. For example, because we have proven that the ZK algorithm is deterministic and can only produce a single ZK graph, it follows that there is no uncertainty in the production of the object, there being only one graph for the formula. In this way, building a distribution of all formulae generating the ZK graph will always lead to Shannon Entropy H(ZK) = 0 for the 'right' description using the 'right' ensemble containing only the ZK formula(e).
According to the same argument the digits of the mathematical constant π (to mention only the most trivial example) would have Shannon Entropy H(π) = 0, because the digits are produced deterministically and the 'right' ensemble for π should be that containing only formulae deterministically generating the digits π.
Directly changing the ensemble on which Entropy operates for a specific object only facilitates conformity to some arbitrary Entropy value dictated by an arbitrary expectation, e.g. that H(π n ) = 0 for any initial segment of π of length n (entailing an Entropy rate of 0 as well) because π is deterministic and therefore no digit is surprising at all, or alternatively, lim n→∞ H(π n ) = ∞ if Shannon Entropy is supposed to measure statistical randomness.
Moreover, this misbehaviour has to do not with a lack of knowledge but with the lack of an invariance theorem, because π is deterministically generated and hence its digits do not fundamentally reduce uncertainty. But if one assumes that the digits of π are not stochastic in order to assign it a Shannon Entropy equal to zero, then one is forced to concede that even perfect statistical randomness, produced by a supposedly Borel-normal number, has, in objective terms, a Shannon Entropy (and Entropy rate) equal to zero, but the highest Shannon Entropy (and Entropy rate) from an observer perspective (as it will never be certain that the streaming digits are truly π). In other words, the asymptotic behaviour after taking into consideration the digits of π approximates maximum Shannon Entropy, but π itself has a Shannon Entropy of zero.

D. An algorithmic Maximum Entropy Model
Following the statistical mechanics approach [2], a typical recursively generated graph such as the ZK graph would, based on its degree sequence, be characterized as being typically random from the observer perspective-because Shannon Entropy will find the graph to be statistically random and thus just as random as any member of the set of all graphs with (near) maximal degree sequence Entropy-thus giving no indication of the actual recursive nature of the ZK graph and misleading the observer.
In contrast, the type of approach introduced in [41], based upon trying to find clues to the recursive nature of an object such as a graph, would asymptotically find the causal nature of a recursively-generating object such as the ZK graph, independent of probability distributions, even if it is more difficult to estimate.
Rectifying the approaches based on models of maximum entropy involves updating and replacing the assumption of the maximum entropy ensemble. An example illustrating how to achieve this in the context of, e.g., a Bayesian approach, has been provided in [45] and consists in replacing the uninformative prior by the uninformative algorithmic probability distribution, the so-called Universal Distribution, as introduced by Levin [20]. The general approach has already delivered some important results [46] by, e.g., quantifying the degree of human cognitive randomness that previous statistical approaches and measures such as Entropy made it impossible to quantify. Animated videos have been made available explaining applications to graph complexity (https://youtu.be/E238zKsPCgk) and to cognition in the context of random generation tasks (https://youtu.be/E-YjBE5qm7c). A tool has also been placed online (http://complexitycalculator.com/) for sequences and arrays, and thus the reader can experiment with an actual numerical tool and explore the differences between the statistical and the algorithmic approaches.

V. CONCLUSIONS
The methods introduced here allow the construction of 'Borel-normal pseudo-random graphs', uncomputable number-based graphs and algorithmically produced graphs, while illustrating the shortcomings of computable graph-theoretic and Entropy approaches to graph complexity beyond random feature selection, and their failure when it comes to profiling randomness and hence causal-content (as opposed to randomness).
We have shown that Entropy is highly observer-dependent even in the face of full accuracy and access to lossless object descriptions and thus has to be complemented by measures of algorithmic content. We have produced specific complexity-deceiving graphs for which Entropy retrieves disparate values when an object is described differently (thus with different underlying distributions), even when the descriptions reconstruct exactly the same, and only the same, object. This drawback of Shannon Entropy, ultimately related to its dependence on distribution, is all the more serious because it is easily overlooked in the case of objects other than strings, for instance, graphs. For an object such as a graph, we have shown that changing the descriptions may not only change the values but actually produce divergent, contradictory values.
We constructed a graph ZK about which the following is true when it is described by its adjacency matrix Adj(ZK): lim n→∞ H(Adj(ZK n )) = 0 for growing graph size n. Contradictorily, considering the same ZK graph degree sequence, we found that lim n→∞ H(Seq(ZK n )) = ∞ for the same growth rate n, even though both Adj(G n ) and Seq(G n ) are lossless descriptions of the same graph that construct exactly the same ZK graph, and only a ZK graph.
This means that not only does one need to choose a description of interest in order to apply a definition of Entropy, such as the adjacency matrix of a network (or its incidence or Laplacian) or its degree sequence, but that as soon as the choice is made, Entropy becomes a trivial counting function of the specific feature of interest, and of that feature alone. In the case of, for example, the adjacency matrix of a network (or any related matrix associated with the graph, such as the incidence or Laplacian matrices), Entropy becomes a function of edge density, while for degree sequence, Entropy becomes a function of sequence normality.
Entropy can thus trivially be replaced by such functions without any loss, but it cannot be used to profile the object (randomness, or information content) in any way independent of an arbitrary feature of interest.
These results and observations have far-reaching consequences. For example, recent literature appears contradictory, by turns suggesting that cancer cells display an increase in Entropy [34], and also reporting that cancer cells display a decrease in Entropy [37], in both cases applied to a function of degree distribution over networks of molecular interactions.
Cells are also believed to be in a state of criticality between evolvability and robustness [9,35] that may make them look random though they are not. This means that Entropy may be overestimating randomness in the best case or misleading in the worst case, as we have found in the instance of disparate values for the same objects, thus suggesting that additional safeguards are needed to achieve consistency and soundness.
New developments [39,41] promise more robust complementary measures of (graph) complexity less dependent on object description, measures based upon the mathematical theory of randomness and algorithmic probability which are better equipped to profile causality and algorithmic information content and cover statistical randomness and thus can be considered an observer-improved generalization of Shannon Entropy.