Leaf-to-leaf distances and their moments in finite and infinite m-ary tree graphs

We study the leaf-to-leaf distances on full and complete m-ary graphs using a recursive approach. In our formulation, leaves are ordered along a line. We find explicit analytical formulae for the sum of all paths for arbitrary leaf-to-leaf distance r as well as the average path lengths and the moments thereof. We show that the resulting explicit expressions can be recast in terms of Hurwitz-Lerch transcendants. Results for periodic trees are also given. For incomplete random binary trees, we provide first results by numerical techniques; we find a rapid drop of leaf-to-leaf distances for large r.


Introduction
The study of graphs and trees, i.e. objects (or vertices) with pairwise relations (or edges) between them, has a long and distinguished history throughout nearly all the sciences. In computer science, graphs, trees and their study are closely connected, e.g. with sorting and search algorithms [1]; in chemistry the Wiener number is a topological index intimately correlated with, e.g., chemical and physical properties of alkane molecules [2]. In physics, graphs are equally ubiquitous, not least because of their immediate usefulness for systematic perturbation calculations in quantum field theories [3]. In mathematics, graph theory is in itself an accepted branch of mainstream research and graphs are a central part of the field of discrete mathematics [4]. An important concept that appears in all these fields is the distance or path length in a graph, i.e. the distance between certain vertices, given in terms of the number of edges connecting them [5,6,7]. For trees, i.e. undirected graphs in which any two vertices are connected by one edge only, various results exist [8,9,10], for example, that compute the path lengths from the top of the tree to its final leaves. In a binary tree such as shown in Fig. 1 this path length might correspond, e.g. to the number of yes/no decisions one performs when searching for information.
Tree-like structures have recently also become more prominent in quantum physics with the advent of so-called tensor network methods [11]. These provide elegant and powerful tools for the simulation of quantum many-body systems. In a recent publication [12] we show that certain correlation functions and measures of quantum entanglement can be constructed by a holographic distance and connectivity dependence along a tree network connecting certain leaves [13]. In these quantum systems, the leaves are ordered according to their physical distance, for example the separation of magnetic ions in a quantum wire. This ordering imposes a new restriction on the tree itself and the paths lengths which become important are leaf-to-leaf distances across the tree. In the present work, we shall concentrate on full and complete trees. We derive the average path lengths for varying leaf-to-leaf distances with leaves ordered in a one-dimensional  line as shown e.g. in Fig. 1 for a binary tree ‡. The method is then generalised to m-ary trees and the moments of the path lengths. Explicit analytical results are derived for finite and infinite trees. We also consider the case of periodic trees. Last, we numerically study the case of incomplete random graphs, which is closest related to the tree tensor networks considered in Ref. [12].
2. Average leaf-to-leaf path length in complete binary trees

Recursive formulation
Let us start by considering the complete binary tree shown in Figure 1. It is a connected graph where each vertex is 3-valent and there are no loops. The root node is the vertex with just two degrees at the top of Figure 1. The rest of the vertices each have two daughter nodes and one parent. A leaf node has no daughters. The depth of the tree denotes the number of vertices from the root node with the root node at depth zero. With these definitions, a binary graph is complete or perfect if all of the leaf nodes are at the same depth and all the levels are completely filled. We now denote by the level, n, a complete set of vertices that have the same depth. These are enumerated with the root level as 0. We will refer to a level n tree as a complete tree where the leaves are at level n. The path length, ℓ, is the number of edges that are passed to go from one external node to another (cp. Figure 1). We would like to bring attention to the fact that in some fields the path length refers to the sum of the levels of each of the vertices in the tree [1], whilst what we are studying is known as the distance [6].
Let us now impose an order on the tree of Figure 1 such that the external nodes are enumerated from left to right to indicate position values, x i , for leaf i. Then we can define a leaf-to-leaf distance r = |x i − x j | for any pair of leaves i and j. This is equivalent to the notion of distance on a one-dimensional physical lattice. Let the length L be the length of the lattice, i.e. number of external nodes. Then for such a complete binary tree, we have L = 2 n .
Clearly, there are many pairs of leaves are separated by r from each other (cp. Figure 1). Let {ℓ n (r)} denote the set of all corresponding path lengths. We now want to calculate the average path length L n (r) from the set {ℓ n (r)}. We first note that for a level n tree the number of possible paths with separation r is given as 2 n − r. In Figure 2, we see that any complete level n tree can be decomposed into two level n − 1 sub-trees each of which contains 2 n−1 leaves. Let S n (r) denote the sum of all possible path lengths encoded in the set {ℓ n (r)}. The structure of the decomposition in Figure 2 suggests that we need to distinguish two classes of path lengths r. First, for r < 2 n−1 , paths are either completely contained within each of the two level n − 1 trees or they bridge from the left level n − 1 tree to the right level n − 1 tree. Those which are completely contained sum to 2S n−1 (r). For those path of length r that bridge across the two level (n − 1) trees, there are r of such paths and each path has lengths ℓ n−1 = 2n. Next, for r ≥ 2 n−1 , paths no longer fit into a level n − 1 tree and always bridge from left to right. Again, each such path is 2n long and there are L − r = 2 n − r such paths. Putting it all together, we find that for n > 1 and with S 1 (r) = 1. Dividing by the total number of possible paths of length r then gives the desired average path length (2)
S n (r) = 2S n−1 (r) + 2nr (3a) After ν such expansions, we arrive at The expansion can continue while r < 2 n−ν−1 . It terminates when n − ν becomes so small such that the leaf-to-leaf distance r is no longer contained within the level-(n − ν) tree. Hence the smallest permissible value of n − ν is given by where ⌊·⌋ denotes the floor function. For clarity, we will suppress the r dependence, i.e. we write n c ≡ n c (r) in the following. Continuing with the expansion of S n (r) up to the n c term, we find Details for the summations occurring in Equation (6b) are given in Appendix A. From (a) 10 0 Hence the average path lengths are given by In the limit of n → ∞ for fixed r, we have lim n→∞ L n (r) ≡ L ∞ (r) = 2 n c + 2 1−nc r .
We emphasise that L ∞ (r) < ∞ ∀r < ∞. In Figure 3 we show finite and infinite path lengths L n (r). We see that whenever r = 2 i , i ∈ N, we have a cusp in the L n (r) curves. Between these points, the ⌊·⌋ function enhances deviations from the leading log 2 r behavior. This behaviour is from the self-similar structure of the tree. Consider a sub-tree with ν levels, the largest separation that can occur in that sub-tree is r = 2 ν , which has average length 2ν. When r becomes larger than the sub-tree size the path length can no longer be 2ν − 1 but always larger, so there is a cusp where this path length is removed from the possibilities. The constant average length when r ≥ L 2 is because there is only one possible path length that connects the two primary sub-trees, which is clear from (1).

Average leaf-to-leaf path length in complete ternary trees
Ternary trees are those where each node has three daughters. Let us denote by S  n (r)} for given r in analogy to the binary case discussed before. Furthermore, L = 3 n . Following the arguments which led to Equation (1), we have This recursive expression can again be understood readily when looking at the structure of a ternary tree. Clearly, S n (r) will now consist of the sum of path lengths for three level n trees, plus the sum of all paths that connect the nodes across the three trees of level n. The lengths of these paths is solely determined by n irrespective of the number of daughters and hence remains 2n. As before, we need to distinguish between the case when r fits within a level n − 1 tree, i.e. r < 3 n−1 , and when it connects different level n − 1 trees, r ≥ 3 n−1 . For r < 3 n−1 , there are now 2r such paths, i.e., r between the left and centre level n − 1 trees and r the centre and right level n − 1 trees. For r ≥ 3 n−1 there are L − r = 3 n − r paths. We again expand the recursion (10) and find, with n (3) c = ⌊log 3 r⌋ + 1 in analogy to (5), that and

Average leaf-to-leaf path length in complete m-ary trees
The methodology and discussion of the binary and ternary trees can be generalised to trees of m > 1 daughters, known as m-ary trees. The maximal path length for any tree is independent of m and determined entirely by the geometry of the tree. Each external node is at depth n, a maximal path has the root node as the lowest common ancestor, therefore the maximal path is 2n. A recursive function can be obtained using similar logic to before. For a given n, there are m subgraphs with the structure of a tree with n − 1 levels. When r is less than the size of each subgraph (r < m n−1 ), the sum of the paths is therefore the sum of m copies of the subgraph along with the paths that connect neighbouring pairs. When r larger than the size of the subgraph (r ≥ m n−1 ), the paths are all maximal. When all this is taken into account the recursive function is This can be solved in the same way as the binary case to obtain an expression for the sum of the paths for a given m, n and r S (m) n (r) = 2m n n (m) The average path length is then and .
We note that in analogy with Equation (5), we have used n (m) c = ⌊log m r⌋ + 1 (18) in deriving these expressions. Figure 3 shows the resulting path lengths in the n → ∞ limit for various values of m.

Variance of path lengths in complete m-ary trees
In addition to the average path length L Here · denotes the average over all paths for given r in an m-ary tree as before. In order to obtain the variance, we obviously need to obtain an expression for the sum of the squares of path lengths. This can again be done recursively, i.e. with Q (m) n (r) denoting this sum of squared path length for an m-ary graph of leaf-to-leaf distance r, we have similarly to Equation (14) Q Here, the difference to Equation (14) is that we have squared the length terms 2n. As before, expanding down to n c (here and in the following, we suppress the (m) superscript of n (m) c for clarity) gives a term containing Q Using Equation (20c), (16) and (15) Figure 4 for selected m.

General moments of path lengths in complete m-ary trees
The derivation in section 4.1 suggests that any q-th raw moment of path lengths can be calculated similarly as in Equation (19). Indeed, let us define M  q,n−1 (r) + 2 q n q (m − 1)r r < m n−1 , 2 q n q (m n − r) r ≥ m n−1 .
By expanding, this gives As before, n c corresponds to the first n value where, for given r, we have to use the second part of the expansion as in Equation (24). Hence we can substitute the second part of (25) for M (m) q,nc−1 (r) giving In order to derive and explicit expression for this similar to section 2.2, we need again to study the final sum of Equation (26). We write where in the last step we have introduced the Hurwitz-Lerch Zeta function Φ [14,15] (also referred to as the Lerch transcendent [16] or the Hurwitz-Lerch Transcendent [17]). It is defined as the sum The properties of Φ(z, s, u) are [16] Φ(z, s, u Hence we can write Averages of M  Figure 5. A periodic, complete, binary tree with n = 8 levels. Circles and lines as in Figure 1. The properties (29a) -(29c) can be used to show that, for a given m and q, Φ (m, −q, −n) can be expressed as a polynomial of order (−n) q . Therefore in the n → ∞ limit, we find

Complete m-ary trees with periodicity
Up to now we have always dealt with trees in which the maximum distance r was set by the number of leaves, i.e. r ≤ m n . This is know as a hard wall or open boundary in terms of physical systems defined along r. A periodic boundary can be realised by having the leaves of the tree form a circle as depicted in Figure 5 for a binary tree. For such a binary tree, only distances r ≤ L/2 are relevant since all cases with r > L/2 can be reduced to smaller r = mod(r, L/2) values by going around the periodic tree in the opposite direction. Therefore we can write where r < L/2 and the subscript • denotes the periodic case. Note that the case where r = L/2 the clockwise and anti-clockwise paths are the same so only need to be counted once. In the simple binary tree case we can expand this via (7) with n c as in Equation (5) Figure 6. We construct these graphs numerically and measure L (2,R) n (r) as shown in Figure 7. § For small n, we have computed all (L − 1)! graphs (cp. Figure 7(a)) while for large n, we have averaged over a finite number N ≪ (L−1)! of randomly chosen binary trees among the (L − 1)! possible trees (cp. Figure 7(b)). We see in Figure 7(a) that, similar to the complete binary trees considered in the section 2, the path lengths increase with r until they reach a maximal value. Then they start to decrease rapidly unlike the complete graph in Figure 3. We also see that for such small trees, we are still far from the infinite complete tree result L (2) ∞ (r) of Equation (9). Finally, we also see that when we choose 10, 000 random binary trees from the 10! = 3, 628, 800 possible such trees at L = 11 that the average path lengths for each r is still distinguishably different from an exact summation of all path lengths. This suggests that rare tree structures are quite important. In Figure 7(b) we nevertheless show estimates of L (2,R) n (r) for various n. As before, the shape of the curves for large n is similar to those for small n. Clearly, however, the cusps in L

Conclusions
We have calculated an analytic form for the average length of the path that separates two leaves with a given separation -ordered according to the physical distance on a line -in a complete binary tree graph. This result is then generalised to a complete tree where each vertex has any number of children. In addition to the mean path length, it is found that the raw moments of the distribution of path lengths have an analytic form that can be expressed in a concise way in terms of the Hurwitz-Lerch Zeta function. These findings are calculated for open trees, where the leaves form an open line, periodic trees, where the leaves form a circle, and infinite trees, which is the limit where the number of levels, n, goes to infinity. Each of these results has a neat form and characteristic features due to the self-similarity of the trees. and we also use l k=0 Appendix A.2. Series used in section 4.1 The explicit expressions for the series terms occurring in Equation (20b) are given here. The first part is a simple geometric series given by equation A.3. The second part is an arithmetico-geometric series similar to A.2, The final part is another also an arithmetico-geometric series and has the following form [18]: (A.5)