Complex distributions emerging in filtering and compression

In filtering, each output is produced by a certain number of different inputs. We explore the statistics of this degeneracy in an explicitly treatable filtering problem in which filtering performs the maximal compression of relevant information contained in inputs (arrays of zeroes and ones). This problem serves as a reference model for the statistics of filtering and related sampling problems. The filter patterns in this problem conveniently allow a microscopic, combinatorial consideration. This allows us to find the statistics of outputs, namely the exact distribution of output degeneracies, for arbitrary input sizes. We observe that the resulting degeneracy distribution of outputs decays as $e^{-c\log^\alpha \!d}$ with degeneracy $d$, where $c$ is a constant and exponent $\alpha>1$, i.e. faster than a power law. Importantly, its form essentially depends on the size of the input data set, appearing to be closer to a power-law dependence for small data set sizes than for large ones. We demonstrate that for sufficiently small input data set sizes typical for empirical studies, this distribution could be easily perceived as a power law. We extend our results to filter patterns of various sizes and demonstrate that the shortest filter pattern provides the maximum informative representations of the inputs.


I. INTRODUCTION
Compression, filtering, and cryptography are related areas in signal and information processing [1][2][3][4]. By definition, a large number of possible inputs are mapped to a smaller number of possible outputs, so that a given output may correspond to multiple inputs, which number is the output's degeneracy. A similar problem emerges in cooperative systems with a large number of local minima in the energy landscape, in particular, in spin glasses and deep learning neural networks [5][6][7]. The configuration space of a system of this sort can be divided into a set of domains (basins) of attraction of these minima. One can ask: what is the statistics of these domains of attraction, what is the distribution of their sizes? This is analogous to the degeneracy statistics problem. These statistics are important because the degeneracy distribution gives the most relevant entropy when studying information problems [8], that is, it is the distribution containing the most information relevant to the system under study.
These issues were explored in a recent series of works [9][10][11][12][13] which exploited the principle of maximum entropy, see also Refs. [14][15][16][17][18]. The finding of Refs. [9][10][11][12][13] is that maximally informative samples drawn from data exhibit statistics with broad distributions. More specifically, their entropy optimization based theory predicts powerlaw like distributions of degeneracy of maximally informative outputs (minimal sufficient representations). The title of the paper "Minimum description codes are critical" [10] published in the journal Entropy highlights the message of that theory in the most succinct way. On the other hand, distributions markedly distinct from power laws can be observed in empirical distributions for this kind of problem, for example, a collection of empirical curves in Fig. 1 of Ref. [10].
One obstacle to understanding such phenomena is that the need to obtain sufficient samples to resolve the degeneracy distribution limits the system size which can be examined. In the present work we contribute to these efforts by proposing a model for which sampling distributions can be obtained exactly, for the entire state space and up to large system sizes. We introduce and explore a reference filtering problem straightforwardly treatable through purely combinatorial techniques. This filter extracts all positions of a given local pattern in the input, see Fig. 1. We study a family of such filter patterns and demonstrate that the smallest pattern, extracting single ones from the inputs, generates outputs with the highest entropy of the degeneracy distribution, called relevance, Fig. 2 and Table I. According to Refs. [10,12], this type of information entropy is maximized by the representa-tions of the samples of a complex system which provide the most information, the maximally informative representations, which occurs in a critical region where the sample distribution becomes complex. Outside of this region outputs are either nearly all the same, or nearly all different, giving no useful information about the underlying system. Note that the entropy of the samples themselves, called the resolution, is not maximum in this critical region, but in the limit that all samples are different. We establish a link between filtering and sampling problems based on the idea that filter patterns of different lengths with their positions can serve as sampling variables. We demonstrate that the behavior of our filtering problem is analogous to that found in the more complex systems of Refs. [9][10][11][12][13]. The entropy of the degeneracy distribution of filter outputs increases as the filter length decreases, but vanishes in the extreme limit, for which resolution is maximised, with outputs being identical to inputs. Thus within a family of filters, the most informative filter is the shortest one with length greater than one.
We directly obtain the complex degeneracy distributions for outputs generated from various input sets. We develop efficient recursive methods that allow us to find this distribution for large input strings, as can be seen in Fig.3, which would not be accessible through empirical sampling methods. Due to the tractability of this problem, we are able to identify precisely how these distributions deviate from a power-law dependence. We find scaling forms that accurately describe the tail of the degeneracy distributions (see Fig. 4). We discover that these distributions are essentially shaped by the size of the input data. That is, input data sets of relatively small size, typical for empirical studies, produce degeneracy distributions of outputs that are closer to a power law than the distributions of outputs from very large input data sets. Finally, we develop a mean field theory and obtain the asymptotics of the degeneracy distribution and the spectrum of degeneracies. Our findings indicate that the phenomena we observe should apply to more general filtering and compression problems.
Our paper is organized in the following way. In Sec. II we introduce our filter and the synthetic input data sets for it. In Sec. III we describe how the filtering problem can be viewed as sampling of a complex system. In Secs. IV, V, and VI we obtain the basic relations for the degeneracy of outputs, develop our algorithm, obtain the complex degeneracy distributions of outputs for the complete input data sets, and describe their features. We develop the mean-field theory of these distributions in filtering in Sec. VII. In Sec. VIII we obtain the degeneracy distributions of outputs for uniformly random input data sets of various sizes. In Sec. IX we discuss our results and indicate possible generalizations. In the Appendix we give the combinatorial derivations of recursive relations used in Sec. IV and explicit asymptotics. The exact results obtained by our algorithm and recursions are provided in the Supplementary Material [19].

II. A REFERENCE FILTERING PROBLEM
We study the distribution of outputs in a solvable filtering problem by implementing a purely combinatorial, microscopic approach not involving entropy considerations. Let the input data be a set of N strings of zeroes and ones (x i ), x i = 0, 1, of length n, assuming the periodic condition x 1 = x n+1 . We consider two types of data set. The first set is the complete set of all possible unique inputs. Its size N is determined by the size n of inputs, N = 2 n . Second, we consider data sets of arbitrary size N consisting of strings of uniformly randomly generated zeroes and ones constrained by the same periodic condition as above. In the latter situation, some of the elements of a data set may coincide. Clearly, in the limit N → ∞, we arrive at a situation equivalent to the complete data set. (We stress that the random data set of size N = 2 n differs from the the complete data set.) The filter works as follows: every instance of a specific pattern in the input is marked by a one in the corresponding position in the output. All other positions are marked with zeroes. This produces a minimal coding of the positions of the pattern occurrences in the input. We illustrate the results from a few example filter patterns in Fig. 2. We observe complex degeneracy distributions reminiscent of those observed in, for example, Ref. [20].
For the sake of simplicity, in the remainder of this paper we focus on the simplest of these filters. Each sequence of ones of length 1 in the input (i.e., every 1 whose neighbors are both zeroes) gives 1 at the same position in the output. All other sequences of ones or zeroes in the input produce zeroes in the corresponding places in the output, as shown in Fig. 1. In other words, the input vector (x i ), i = 1, 2, ..., n, x i = 0, 1, is transformed to the output vector with the following components: As we will see, this filter is the maximally informative one, in a family of similar filters. Other filter patterns may be analysed using the same methods we describe below.

III. FILTERING AS A SAMPLING PROBLEM
Let us discuss the link between filtering and sampling in more detail. The input x i to our filtering problem is a string of n ones and zeroes. It contains alternating contiguous chains of ones and zeroes of different lengths, so that one may completely define the sequence by listing the starting positions of all the strings of ones and their respective lengths. With this in mind, one may reformulate the problem of filtering for patterns in sequences as sampling a system of correlated discrete variables in the following way. Let us introduce the set of discrete variables {s l i }, for i = 1, ..., n and l = 1, ..., n−1. The ('spin') variable s l i is 1 if the input has a string of l ones beginning at position i, and is 0 otherwise. These variables correspond to the outputs of filters for chains of consecutive ones. We include one more variable which distinguishes between the states of all zeroes and all ones. The system, then, has 2 m , m = n(n − 1) + 1, possible states. Strings of ones in the input do not overlap with each other, which restricts the configurations of the variables {s l i } that can occur. We include these constraints in our system by imposing a cost for each violation of the mutual exclusion rules, the cost function being where functions as a cyclic Heaviside function. We now have a complex system consisting of m state variables, whose interactions are governed by the cost function U . Each of the 2 n possible input sequences corresponds to a different minima of this cost function. Now let us consider the sampling of this system. Suppose we may measure only n of these variables s l i . Which n variables will be most informative about the state of the system? intuitively, shorter filter patterns, i.e. smaller values of l, should provide more detailed information about the system, and indeed the variables s l=1 i contain the most information about the state of the system. (To see this, note that the variable s l i gives information about the state of l + 2 input digits. In particular, when s l i = 1, it fully determines l + 2 digits of the input, which happens for a fraction 2 −(l+2) of the inputs. Thus, the total amount of information conveyed by s l i in terms of the number of digits of the input it reveals is (l + 2)2 −(l+2) , which is indeed maximum for l = 1.) The set {s 1 i } is also the most informative representation of the input from the perspective of Ref. [20], in the sense that samples obtained from observing these n variables have a larger entropy of their degeneracy distributions, Eq. (30), than samples using any other subset of n variables. This is apparent from the fact that any subset of n variables containing larger filters will have less available configurations, see also Table I. We detail this analogous behavior further in Sec. VI.

IV. OUTPUTS AND THEIR DEGENERACIES FOR COMPLETE INPUT DATASET
Let us begin by considering the complete input data set, all 2 n different configurations of zeroes and ones. Each output consists of isolated ones separated by strings of zeroes of various lengths. The total number of possible outputs, M (n), is then the number of ways of arranging isolated ones in a chain of length n. This number coincides with the number of different configurations of dimers in a closed chain (ring) of length n: where [n/2] is the integer part of n/2, see Appendix A 1. Comparing this number for successive values of n, we see that M (n) = M (n − 1) + M (n − 2), with initial conditions M (3) = 4, M (4) = 7. The elements of the sequence may be written in terms of the roots of the characteristic equation [21][22][23] Thus where the last expression gives the large n limit. Here the largest root (1 + √ 5)/2 = 1.61803... ≡ z g is the famous golden ratio.
The key point for our study is that the degeneracy of an output is the product of the degeneracies of the strings of zeroes between ones in the output. As is clear from Eq. (1), the presence of a 1 in an output at position i fixes the digits of its inputs at positions i − 1, i, and i + 1. These digits must be 0, 1, and 0, respectively. This is true for each 1 in the output. On the other hand, each of the remaining digits of these inputs compatible with a given output must be either a 0 or be a 1 with one or two neighboring ones. All the degrees of freedom in the input corresponding to a given output correspond to these digits.
Let an output with m ≥ 1 ones contain m strings of zeroes with lengths 1 , 2 , ..., m . Then the degeneracy of this output equals Hered( ) is the number of input strings of length , having the first and last digits 0, that generate an output string of zeroes. This number plays an important role in our problem, similar to prime numbers in number theory, so we call thed( ) prime degeneracies. Suppose that the output contains µ strings of zeroes of length , = 1, 2, ..., where Then Eq. (7) may be rewritten for m ≥ 1. Let us find the prime degeneracies explicitly. The number of these configurations, i.e., the degeneracyd( ) of the output string of zeroes, can be obtained recursively by taking into account three points: (i) Relevant input configurations of length are obtained by inserting 0 or 1 into each relevant configuration of length − 1 between the first and second positions of the sequence. (Recall that the first and last positions of the input sequence are fixed to 0.) (ii) Input strings of length beginning and/or ending with 010 are irrelevant, and so they should be removed from the set generated at the previous step. These configurations can be obtained by inserting two digits 10 into each relevant input string of length − 2 between its first and second positions. (a,c,e) Degeneracy distribution for the complete input data set: number of outputs of a given degeneracy vs. degeneracy. (b,d,f) Cumulative degeneracy distribution for the complete input data set: number of outputs of degeneracy greater than or equal to a given degeneracy vs. degeneracy. The input length is n = 21, 50, 120. Open square symbols represent a mean-field approximation to the cumulative distribution, given by Eq. (38).
(iii) Finally, there exist input strings, compatible with the output string of zeroes, that cannot be obtained by inserting a single digit into relevant input strings of length − 1 between their first and second positions. These are the input strings of length beginning with 0110. These inputs can be obtained by inserting 110 into each relevant input string of length − 3 between their first and second positions.
Following these rules, the degeneracy of a string of zeroes at the output, prime degeneracyd( ), can be writ-ten recursively as a linear difference equation: with the initial conditiond(1) =d(2) =d(3) = 1. The solution of Eq. (10) may be explicitly expressed in terms of the complex roots z 1 , z 2 , and z 3 of the characteristic equation givingd where One of the roots of Eq. (11), z 1 , say, is real, It determines the large asymptotics of the prime degeneraciesd( ):d where we used the identity . The two other roots are complex conjugate numbers, The special case of the periodic output of length n with all digits 0 has to be considered separately. First let us take an arbitrary digit of the input. The number of input configurations where this digit is 0 and the resulting output has only zeroes is given byd(n + 1), because the fixed 0 of the periodic input plays the role of first and last digit of the configurations of a string of n + 1 digits. If the chosen digit is 1, then the number of the relevant input configurations equals 1 + n−1 i=2 id(n − i), where the sum over i accounts for the configurations where the digit is in a group of i consecutive ones, plus one configuration with all input digits equal to 1. Consequently, we obtain the following expression for the degeneracy of the output with all zeroes: which is the largest possible degeneracy of an output of a given length. Applying the recursion relation for prime degeneraciesd, Eq. (10) to the terms on the right-hand side of Eq. (17) we find that the largest degeneracy d D (n) satisfies the same difference equation as Eq. (10) though with different initial condition, see, e.g., Ref. [24] and the On-line Encyclopedia of Integer Sequencies [25]. We present this equation here for future reference, with the initial condition d D (3) = 5, d D (4) = 10, and d D (5) = 17. With these initial condition, we get the explicit solution of this equation, where z 1 ≡ z d , z 2 , z 3 are given by Eqs. (14) and (16), and its large n asymptotics The black curves represent least-squares fittings of ln Ncum(d, n) as ln N * cum (n) + Bn ln αn d for each n. (b) Cumulative degeneracy distribution ln{− ln[Ncum(d, n)/N * cum (n)]} vs. ln ln d for n = 20, 40, 60, 80, 100, 120. Inset: exponent α vs. 1/n.

V. CALCULATING THE EXACT DEGENERACY SPECTRUM
In principle, for each of the 2 n possible inputs we can obtain, one by one, an output numerically by applying Eq. (1). In practice, we use a more efficient algorithm described below. This algorithm focuses on outputs with a fixed number of ones and exploits the factorization of the output degeneracies, see Eq. (7).
The full list of degeneracies can be found from integer partitions in an explicit form, as follows. Let us introduce the operator P, which generates all integer partitions of positive integer k into r integers, that is, P(k, r) is the matrix whose rows i = 1, 2, ..., w are all different integer partitions {P i1 (k, r), P i2 (k, r), ..., P ij (k, r), ..., P ir (k, r)} of k into r integers, r j=1 P ij (k, r) = k [26,27]. For an output of length n, we consider all partitions of n − m into m integers, for all possible m = 1, 2, ..., [n/2]. For each such partition, [i.e., a row of P(n − m, m)], we find the degeneracy d i = jd (P ij (n − m, m)) [see Eq. (7)]. Some of them coincide, so we find the union of them. Finally, we add the largest degeneracy d D (n) (corresponded to m = 0) to the resulting set. In summary, for the full set of degeneracies D full (n), we have withd( ) and d D (n) provided by Eqs. (10) and (15), respectively.
For each integer partition, i.e., for each row i of the matrix P(n − m, m) we introduce a function giving the number of pieces of length present in this partition, so that One can then write The number of outputs that contain m chains of zeros with lengths specified by the integer partition P i (n−m, m) is then obtained by considering the number of distinct permutations of the m strings of zeros, multiplying by n, and finally dividing by m, giving where the product in the denominator is over the lengths of the parts of this partition. The total number of outputs with degeneracy d is then finally: where δ(a, b) is the Kronecker symbol. This is the expression we use for computing N (d, n) in an efficient way.
For the sake of brevity, hereafter we refer to N (d, n) as a degeneracy distribution, without normalization. The distribution N (d, n) can also be built up recursively starting from small values of d and n, as we show in Appendix A 2.

VI. OUTPUT DEGENERACY DISTRIBUTION FOR COMPLETE INPUT DATASETS
Using this algorithm we obtained the number of outputs N (d) for the full spectrum of degeneracies d for n up to 120, see Supplementary Material [19]. These results demonstrate that the degeneracies d i , i = 1, ..., D, form a discrete spectrum of values where D is the rank of the largest degeneracy, and d 1 = 1. Fig. 3 shows the resulting distribution and the corresponding cumulative distribution N cum (d) for n = 21, 50, and 120. Here , the total number of outputs. Figs. 3(b), (c), and (d) demonstrate that the cumulative degeneracy distributions decay with d more rapidly than a power law. On the other hand, the decay of the cumulative distribution is well described by the function where c is a positive number and exponent α approaches 2.3 as n → ∞, see the inset in Fig. 4 (b). Notice that the degeneracy distribution for smaller n, Fig. 3 (a), appears more like a power law than the degeneracy distribution for larger n, Fig. 3(c), for example, since the range of d increases with n, while the exponent c ln α−1 d of the function d −c ln α−1 d varies slowly. Similarly, the cumulative distribution plotted in log-log scale for n = 21 deviates from linear (power-law behavior) noticeably less than for larger n. The wide range of degeneracies d that we observe enable us to present Fig. 4 (1)]} vs. ln ln d. This plot supports the functional form given in Eq. (27). Note that in Fig. 4 we assumed that the coefficient factor of the asymptotics in Eq. (27) is close to N cum (1), which is justified by the results. Fig. 5 shows how the set of degeneracies D full (n) varies with n, see below for more detail.
In Sec. IV we derived the explicit expression for the largest degeneracy d D (n) corresponding to the output with all zeroes, Eq. (19), and found its large n asymptotics, d D (n) ∼ = z n d , Eq. (20). As is natural, N (d D , n) = 1. The second largest degeneracy corresponds to an output with a single 1. The third largest degeneracy is for an output with two ones separated by a single 0. The fourth largest degeneracy is of the output with two ones separated by three 0s. Clearly, Using the asymptotics ofd( ), Eq. (15), we find that, asymptotically, at large n, One may also notice a complex structure in the cumulative distributions Figs. 3(b), (d), and (f), N cum (d) resembling a staircase, with steep jumps between steps. The heights of these jumps are especially large in the region of high degeneracies. Similar structures may be observed in real systems, see for example Fig. 3 of Ref. [20]. Inserting the asymptotics ofd( ), Eq. (15), into the expression for the degeneracy, Eq. (7), we see that outputs with the same number of ones have degeneracies exponentially close to z n d /(z 4 d − 2) 2 if these ones are separated by many zeroes and n is large. The slight deviations from this asymptotic value mean that the degeneracies are split among many points falling in a narrow range. For example, in the rightmost step, corresponding to outputs with exactly two ones, the number of these outputs is approximately n 2 /2 because there are relatively few outputs in which the two ones are close together. This is the height of this jump. These degeneracies are split into a set of about n/2 distinct values, corresponding to different intervals between the ones. each one occurs in n outputs corresponding to the location of the same structure in different parts of the ring. Other jumps are produced by outputs with m strongly separated ones, or by outputs with, e.g., a pair of ones separated by one 0 with m − 2 ones far from each other and from that pair, and so on. This forms the rich staircase-like structure that we observe in Figs. 3(b), (d), and (e) and that is also reflected in Fig. 5. Note that any short filter of our sort will produce a similar complex structure in the degeneracy distribution, as the degeneracy associated to a chain of zeroes (in output) of a given length must asymptotically grow exponentially with the length of the chain, as in Eq. (20).
We list the key numbers defining the asymptotics of a few example cases in Table I, including an example of a two dimensional input, and give the full degeneracy distributions for selected cases in Fig. 2. It is worth discussing the meaning of these results, and their interpretation in terms of sampling of complex systems. In particular let us consider the results for the family of filters composed of a string of ones with a zero at each end, 010, 0110, 01110, etc., indicated in boldface in the Table. We can consider the length of these filters as a crude control parameter of our sampling. Intuitively, we expect shorter filters to be more informative. Resolution, defined as the entropy of a sample: is a measure of the ability to distinguish, at the output, between different input states. If the sample contains M distinct outputs, all with the same degeneracy, the resolution is simply ln M , see Eq. (29). Notice that this is the case when all outputs are different (all degeneracies equal to 1, M = N ), and also when all outputs are the same (degeneracy equal to N , M = 1). The inverse of the probability of a particular output gives the expected number of distinct outputs, if we assume the probabilities to be uniform. In this sense, we can regard the resolu-tion as the expectation of the logarithm of the number of distinct states in a sample, based on the observation of one random state. Shorter filter patterns correspond to higher resolution. The resolution attains its maximum value for the filter pattern 0 or 1, either of which results in a unique output for each input (in the case of 1, the output is identical to the input). However in this case all outputs are distinct, and so these filters are not informative about the system being sampled. As shown in [12], the measure which indicates the informativeness of a sample, is the entropy of the degeneracy distribution, where N = 2 n for the complete input set. This is called the relevance. Comparing Eqs. (29) and (30) shows that the resolution is never smaller than the relevance, As can be seen in the Table, the relevance is greater for shorter filters, but is actually zero for the shortest possible filters 0 and 1. (In this extreme, z d = 1, while z g = z a = 2.) We measured the relevance for a variety of short filter patterns. Specifically, we find that the entropy H[d] is maximal for the filter pattern extracting single ones, on which we focus in this work, and so this pattern provides the most informative representations of the inputs The number z d gives the asymptotics of the largest degeneracies. It quickly approaches 2 as the filter pattern length increases. The largest degeneracies ∼ = z n d . Since N = 2 n , this means that almost all outputs concentrate in a few outputs, and in the limit, in a single state, i.e. all outputs are the same and the filter patterns are not informative. For the shortest few patterns, the value of z d quickly falls, accompanied by a rapidly increasing relevance, indicating a transition to informative sampling. On the contrary, z g , which gives the total number of outputs M (n), increases with decreasing filter length, as shorter filters have more possible outputs. Taken together, these results indicate that the maximally informative sampling for a given family of filters is the shortest pattern having length greater than 1. This behavior is analogous to the transition observed in more complex problems (see for example [10]), reinforcing the relevance of our tractable model to the study of more complex problems.
Importantly, the sets of outputs obtained with the larger filters are subsets of the set of the outputs obtained with our reference filter pattern 010. For this pattern, the value z g , which gives the asymptotics of the number of different outputs, is maximal, and the number of different outputs is the largest one. (Notice that the zero-pattern in Table I has the same z g as the reference pattern but a lower entropy H[d].) Some filters have the property that two occurrences of the pattern cannot partially overlap. All such filter patterns of a given length give an identical degeneracy distribution, as we show in the table. We suggest that the filtering problem with such non-overlapping  N (1, n) which is the number of configurations having only groups of zeroes of length 1, 2, and 3 between single ones. We find the large n asymptotics for this number: where z a = 1.46557...
is the real root of the characteristic equation z 4 = z 2 + z + 1.
Note that there are three key constants in this problem, z g , z d , and z a , which appear in the main asymptotics: M (n) ∼ = z n g , d D (n) ∼ = z n d , and N (1, n) ∼ = z n a . Indeed, z g enters the distribution N (d, n)/M (n) after normalization, z d enters the asymptotics for degeneracies, see, e.g., Eq.  Fig. 6, see also the Supplementary Material [19]. As is natural, D(n) is smaller than the number p(n) of integer partitions of n. Fig. 6 demonstrates that D(n) is close to p(n)/n for large n. The well known large n asymptotics p(n) ∼ = 1 4 √ 3n e π √ 2n/3 [28][29][30] enables us to estimate D(n) ∼ e π √ 2n/3 .
By ranking the full set of degeneracies D full (n) for n = 120 we arrive at Fig. 7, where the main plot presents the number of different degeneracies lower than or equal to d vs. d, increasing roughly as exp[π 2 ln d/3 ln z d ] in the region 1 d d D , and the inset shows the number of different degeneracies higher than or equal d vs. d. The latter demonstrates a staircase-like structure similar to that of N cum (d).
In Fig. 7 we also plot the number of different degeneracies lower than or equal to d in the infinite system, i.e., the rank of each degeneracy d. To obtain the full set of different degeneracies for n → ∞ up to some d max we generate the set of prime degeneraciesd ≤ d max . We start by listing all powers m ≥ 1 of the first prime degeneracỹ d whiled m ≤ d max . Then, we multiply each member of that list by increasing powers of the next prime degeneracy, while the product stays lower or equal to d max , and so on with all the remaining prime degeneraciesd ≤ d max . This procedure will result in duplicate degeneracies that should be removed, particularly for those values of d that have non-unique multiplicative partitions in terms of the prime degeneracies.
For example, the two smallest (larger than 1) prime degeneracies ared(4) = 2 andd(5) = 4, so a multiplicative partition of d with at least a part equal tod(5) = 4 is not unique because there is at least another partition of d where the contribution of the 4 for given in terms ofd(4) as 2 2 . However, apart from the partitions with parts equal to 4, the non-unique partitions are very rare, see Appendix A 2. In factd(4) = 2 is the only prime degeneracyd( ) that can be expressed as a product of lowerd for all ≤ 5000.
The rank of the degeneracies in the infinite system can be estimated with precision for large d, as shown by the black line in Fig. 7. For large the logarithms of prime degeneracies are uniformly distributed lnd( ) ≈ ln z d . Assuming that all degeneracies d have a unique factorization in terms of the prime degeneraciesd, the expected number of different degeneracies smaller than d ∼ z m d would be rank d ≈ n≤m p(n), where p(n) is the number of integer partitions of n. The vast majority of the degeneracies for which the assumption does not hold are the values of d which are multiples of 4, because that factor 4 can also be expressed as 2 2 . We therefore remove from each term of the previous sum the number of integer partitions that have at least one factor equal to 2, which These give the asymptotics of, respectively, the total number of outputs M (n) ∼ = z n g , the largest degeneracy dD(n) ∼ = z n d , and the number of outputs with degeneracy one, N (1, n) ∼ = z n a for large n. The final row corresponds to a simple filter, as illustrated, applied to 2D input (see also Fig. 2). Note that we also included filter patterns consisting of all zeroes. For each filter we also give the relevance H[d] (in nats) calculated from the degeneracy distribution and the resolution H[y] for inputs of size n = 36. For the sake of comparison, the standard entropy of the inputs of this size is H = 36 ln 2 = 24.95330. Finally we include the number of distinct degeneracies D for each pattern. is given by p(n − 2), and write for large d: where we have used the two leading terms of the asymptotic expansion of the number of integer partitions [28] p(n) ≈

VII. MEAN-FIELD THEORY
Here we develop a mean-field theory enabling us to describe the cumulative distribution of degeneracies, N cum (d), in the region of large d where there are few ones in the outputs, and so the gaps of zeroes between them are typically large. In this situation one can assume that the ones exist in a sea (or a mean field) of zeroes, far from each other, so that the degeneracy of an output is completely determined by the number of its ones (and the output size n).
This ansatz is based on the observation that the three terms on the right-hand side of Eq. (12) behave very differently for increasing : The first term grows exponentially as z d , where z d ≡ z 1 = 1.754877... is real. The combined contribution of the other two terms is also real, since z 2 = z * 3 (here * denotes the complex conjugate). It decays exponentially because |z 2 | = |z 3 | = 0.754877... < 1. Thus, for increasing , the deviation of the asymptotics from the exact value ofd( ) approaches 0 exponentially rapidly, making this approximation excellent for large .
Consider an output with m ones, separated by m strings of zeroes of length 1 , 2 , ..., k . If all i 1 the degeneracy d(m) of this output with m ones is accurately given by the asymptotic expression where we used the condition m i=1 i = n − m that the number of ones plus the number of zeroes must be equal to the total number of digits n.
The total number of different outputs with m ones is n. There are jumps in the cumulative distribution at these points when n is large, and N cum (m) approaches the top points of these jumps.
The resulting mean-field theory expression for the dis-tribution takes the following form: This approximation is compared with the exact cumulative distribution in Fig. 3. The approximation can also be obtained from the exact result for N (d, n), Eq. (26), by approximating the weighted sum i of Kronecker symbols in the expression for N (d, n) by a single Kronecker symbol with a factor. One may also obtain the asymptotics of this distribution, as shown in Appendix C.

VIII. DEGENERACY DISTRIBUTIONS OF OUTPUTS FOR RANDOMLY GENERATED INPUT DATASETS
We observed in the previous section that the outputs generated by complete input data sets do not produce real power laws. One should note, however, that in empirical studies, input data sets, typically, are not complete. Usually, when inputs (input size n) are sufficiently large, the input data set sizes, N , are much smaller than of complete data sets, N C n , C > 1. Based on the sets of outputs of complete data sets, which we obtained in the previous section (listed in the Supplementary Material [19]), here we find and explore the distribution of outputs of randomly generated data sets of various sizes. (In principle, these sizes can also be bigger than or equal to that of complete data sets.) Let the size of a randomly generated input data set be N , i.e., we apply filtering to N randomly generated rings of n zeroes and ones. We assume that all degeneracies of the outputs of the corresponding complete input data set are known, namely the full set of pairs {d i , N (d i )}, where i = 1, 2, ..., D, D is the total number of degeneracies for this n in the case of a complete input data set. Then for the randomly generated input data set we obtain the following expected number of outputs of degeneracy d: The probability that a randomly generated input string produces a given output is simply d i /2 n , where d i is the degeneracy of the output with respect to the total input set. The probability that d of the N inputs produce this output is then simply given by a binomial expression. Summing over all total degeneracies (multiplying by the number of instances of a given degeneracy d i and summing over i) gives the above expression. Here outputs of any degeneracy within the interval 1 ≤ d ≤ d D are present with nonzero probability, in contrast to the case of the complete input data set. The results of the application of this formula coincide with those obtained by recording statistics of outputs directly filtered from randomly generated inputs. Let us apply Eq. (39), for instance, to the cases n = 21 [see Fig. 3(a) above for the complete input data set consisting of all 2 21 configurations] and n = 120, and inspect the distributions of outputs of uniformly randomly generated input sets of different sizes. The results are shown in Fig. 8. For each size of the random input data set we present the degeneracy distribution and its cumulative counterpart. Interestingly, the form of these distributions is remeniscent of samples from real systems, see for example Fig. 5 of Ref. [20]. These figures demonstrate that for N 2 n , the distributions indeed resemble a power law. The reason is that for sufficiently small N , the distribution does not approach the large values of d for which the variation of the exponent c ln α−1 d in (27)], becomes noticeable. As N approaches 2 n , the distributions become closer to their counterparts for the complete input data set. Clearly, the distributions obtained in the limit N → ∞ will coincide with those found for the complete input data set.
Curiously, one may obtain distributions with a very similar form for different values of n by choosing N in order to maintain the scaling variable

IX. DISCUSSIONS AND CONCLUSIONS
Our straightforward, purely combinatorial treatment reveals features of distributions of outputs hidden from other approaches. For complete input data sets passed through our filter, we have obtained degeneracy distributions markedly distinct from power laws. On the other hand, these distributions decay as N cum (d) ∝ e −c ln α d , α>1, much slower than exponentially, and in this sense they can still be called "critical". We have observed that the entire form of these output distributions essentially depends on the input size n, which strongly differs, for example, from heavy tailed degree distributions of complex networks having exponential cutoffs [31,32].
For randomly generated input data sets, we found degeneracy distributions which could easily be taken for power laws in empirical studies, if the data set size N Double logarithmic scale plots of (a,c,e,g) the degeneracy distributions and (b,d,f,h) the cumulative degeneracy distributions obtained for a randomly generated input data sets of different sizes N for n = 21 and n = 120. The specific sizes of input data sets for n = 120 are chosen to produce distributions similar to those for n = 21.
is essentially smaller than 2 n . As N → ∞, these distributions approach the clearly non power-law shape of the distributions for the complete input data set. Thus we show that the size of an input data set matters for these problems.
Our model filter can be used as a convenient reference filtering problem. Although apparently rather different in construction, we argue that our problem can serve as a tractable representative model for sampling of complex systems. We focused on simple input sets which were uniformly random strings of zeroes and ones. Correlated inputs are more challenging for analytical treatment. Exploring the simplest filter patterns, we showed that the statistics of outputs is determined not by the form of fil-ter patterns but rather by what occurs in the gaps of zeroes between them. The degeneracy corresponding to each such gap can be found using recursion relationships. We then used an integer partitions apparatus to aggregate the statistics of prime degeneracies from these gaps, finding the exact full spectrum of output degeneracies.
Considering the permutations of the integer partitions allows us to calculate the resulting exact degeneracy distribution N (d, n). Alternatively, using multiplicative partitions we derive coupled linear recursions providing the exact N (d, n) for any finite d and n and the explicit large n asymptotics. Finally we developed a mean-field theory which describes the approximate degeneracy spectrum, degeneracy distribution and their asymptotic behavior. Our mean-field theory of Sec. V also derives from the gaps between outputs.
These results show that in filter problems of this kind, the statistics of outputs is determined by the gaps between outputs, which are essentially determined by the size of a filter pattern. Longer filter patterns than ours, Eq. (1), can be subjected to a similar analysis, see Fig.  2 and Table I. Longer filter patterns, which give less information about the input string, indeed produce less informative degeneracy distributions (i.e. having lower entropy).
For the sake of simplicity, we studied inputs containing only zeroes and ones. We expect that our approach and the mean-field theory should be applicable to more rich inputs containing more degrees of freedom: larger sets of numbers, vectors, etc., as well as to more general cooperative systems with a large number of local minima. The next natural step after the mean field theory, namely a fluctuation theory, should be based on accounting for small gaps between ones and the fluctuations of the lengths of these gaps.
In summary, we have presented a representative filtering problem for which we can obtain exact and complete results for the degeneracy distributions. Our methods may be generalized to other filtering and compression problems involving more complex filter patterns and complex, not necessary synthetic, higher dimensional inputs. We believe the qualitative insights gained from studying this system can be applied to information processing problems in general.
The total number M (n) of different outputs in our problem is the number of all possible periodic (period n) combinations of zeroes and ones having no neighboring ones. Clearly, this number coincides with the total number of different combinations of dimers in a ring of length n.
To find the number of different outputs with k ones, that is, the number of ways of placing k dimers in the ring, it is convenient to consider the cases when the first output digit is 1 and when it is 0 separately. This is equivalent to fixing the state of two chosen neighboring units: either there is a dimer connecting this pair or this dimer is absent.
(i) When the first digit of an output is 1, both the second and the last digits must be 0, and the number of outputs in this case is given by the binomial coefficient n−k−1 k−1 , which corresponds to starting with the sequence 1, {0} n−k (where {0} n−m is a sequence of n − m zeroes), and choosing k − 1 out of n − k − 1 distinct zeroes to be replaced by 01. Due to the periodic boundary condition, the last 0 cannot be selected for replacement, hence the number of zeroes that can be replaced is n − k − 1.
(ii) When the first digit of an output is 0, we start from the sequence {0} n−k and replace k zeroes by 01. In this case the replacement may be made at any of the n − k zeroes, and the number of outputs is n−k k . The total number of different outputs is obtained by summing these two contributions (A1) where we used the notation N (k) from Sec. VII.
Summing over all possible values of k gives i.e., Eq. (4).

Recursions for N (d, n)
Chains of exactly three zeroes in the output have no degrees of freedom in the input; they can only be produced in one way. Thus outputs containing no chains of zeroes of length greater than three have degeneracy 1. For the input size n, the number of such outputs, N (d, n), can be obtained recursively. Here we derive the recursive relation for N (1, n), N (2, n), and N (4, n), and indicate the recursions for higher degeneracy d. See also Supplementary Material [19].
Sequences of degeneracy 1 can be regarded as assortments of three kinds of building blocks, 01, 001, and 0001, put together in a ring of length n. All configurations of blocks are allowed, as long as the total number of binary digits is n. Let us consider a particular position i in the ring and the block to which i belongs. If we add a block 01 between the block of i and one of its neighbor blocks (say, the one to the right) to every possible configuration of length n − 2, we get every possible configuration of length n that has a block 01 to the right of the block of i. Doing the same with configurations of length n − 3 and blocks 001, we get all configurations with a block 001 to the right of the block of i. Finally, repeating the procedure for configurations of length n − 4 and blocks 0001, gives all configurations with a block 0001 to the right of the block of i. Since every block must be 01, 001, or 0001, the union of these three sets is the full set of configurations of degeneracy 1 in system of n digits. Thus, for the number of configurations in this set, N (1, n), we can write The explicit solution of this linear difference equation is given in terms of the roots, z 1 , z 2 , z 3 , and z 4 , of the characteristic equation z 4 = z 2 + z + 1: where the coefficients of the powers of the roots z i are found through the initial condition, We can expand the analysis to higher values of degeneracy. For example, the second smallest value, d 2 = 2, corresponds to the outputs with blocks of the types 01, 001, and 0001 and a single string 00001. Applying the same reasoning as above, we get three terms similar to the ones of Eq.(A3). We must, additionally, consider configurations with degeneracy 1, to which we add a string 00001 in the same way, instead of one of the other blocks.
and so on. See the list of all recursive relations for d < 100 with initial conditions in the Supplementary Material [19]. The corresponding large n asymptotics are and so on. See the list of the resulting large n asymptotics of N (d<100, n) in the Supplementary Material [19].
More generally, for any value of degeneracy d one can write the recursion relation for N (d, n) in terms of the multiplicative partition of d into prime degeneracies. Let us present the full set of recursion relations for N (d, n). The reasoning is as follows.
(i) The recursion relation for N (1, n) is given by Eq. (A3) with the initial condition Eq. (A5).
(iv) Any degeneracy in the spectrum except d D (and d = 1) is multiplicatively separable into the prime degeneraciesd( ), namely, where the powers µ ≡ µ( ) are non-negative integers. Due to the exceptiond(5) =d 2 (4) = 2 2 and the coincidenced(5)d(8) = 4 × 21 = 7 × 12 =d(6)d(7), etc., the multiplicative partition into the prime degeneracies d( ), Eq. (A11), may not be unique, see below. With all possible integers µ(4) ≥ 0, µ(6) ≥ 0, µ(7) ≥ 0 ... we generate the full list D of all degeneracies except 1 and d D in the spectrum for n → ∞. We place the generated degeneracies in ascending order, ignoring repetitions for some degeneracies. For a finite n, only some on the degeneracies from this list are present in the spectrum. In practice, we need degeneracies d only up to some, maybe, large but finite value. So, generating this list, we use a finite set of non-negative integers µ 4 , µ 6 , µ 7 , ... respectively restricted from above.
(v) For any degeneracy d in this list D we define the vector L(d) ≡ ( 1 , 2 , ..., max ), indicating that the prime degeneracies (d( 1 ),d( 2 ), ...,d( max )), given in ascending order, are present in at least one of the partitions of d, Eq. (A11). That is, if a prime partitiond( ) is present in L(d), then the ratio d/d( ) also belongs to D. Then the recursion relation for N (d, n) has the form where in the sum takes the values 1 , 2 , ..., max , i.e., the components of the vector L(d). This formula sums the number of configurations N (d/d( ), n− −1) of smaller systems to which a block of zeros (followed by a one) can be added in order to form a configuration of size n and degeneracy d. We should not explicitly include in this sum configurations that achieve degeneracy d by inserting more than one block of zeros. The insertion of additional blocks besides a block of length l, say, into smaller configurations are already accounted for in the calculation of the degeneracy of the configuration of size n− −1 into which the block of length l will be inserted. Note that the ratio d/d( ) in each of the terms in the sum is one of the degeneracies, smaller than d, in the list D with the degeneracy d = 1 added, so this set of recursions together with the recursion for N (1, n), Eq. (A3), is closed. The initial conditions for these recursions are where δ(i, j) is the Kronecker symbol. Note than when d is a prime degeneracy d =d( ) =d( max ) except d = 4 = d (5) (vii) Our recursions, Eq. (A12), with their initial conditions, Eq. (A13), lead to the following large n asymp-totics of N (d, n): for d with the multiplicative partition into prime degeneracies, Eq. (A12), chosen in such a way that the power of 2, µ 4 , in this partition is maximal.
Let us discuss this point in more detail. We stressed above that a multiplicative partition of d ∈ D, Eq. (A12), may not be unique. For large n, the contribution of each of the possible multiplicative partitions of d to . The leading asymptotics, Eq. (A14), is with the maximal power of n, i.e., it originates from the partition with the maximal sum µ . Let us check whether the multiplicative partition of d with the maximal sum µ is unique and that this maximum corresponds to the maximal µ 4 . In other words, if there exist a number of different multiplicative partitions of d, that only one of them has the maximal µ 4 , and that this partition has the maximal sum µ . To explain why the property of primality is very likely to hold for all large enoughd( ), let us consider the number of conventional prime numbers smaller than some number d, which grows as ∼ d/ ln d. Sinced( ) ∼ z d , the number of conventional primes smaller thand( ) grows exponentially with as ∼ z d / . Additionally, the average number of conventional primes in the factorization of numbers of the magnitude of d grows as ∼ ln ln d [33]. A necessary condition for somed( ) to not be prime is that all of its prime factors are also factors of at least another d <d( ). The combination of the exponential increase of conventional primes smaller thand( ), and the increase of the number of factors ofd( ) as increases, makes the probability of the degeneracyd( ) not being prime approach 0 very rapidly. In the set of valuesd( ≤ 200), the only ones that do not contain at least one conventional prime factor absent from all factorizations of smallerd ared(5) = 4,d(8) = 21,d(12) = 200,d(13) = 351, and d(24) = 170 625, and from these onlyd(5) is actually not a prime degeneracy.
We inspected all products of prime degeneraciesd( ≤ 200) ≈ 1.7 10 48 e focusing on the products producing non-unique multiplicative partitions. Apart from 2 2 = 4 discussed above, we find only three such combinations with ≤ 200 (into this number, we do not include the products of these combinations and arbitrary prime degeneracies). Namely, 2 2 × 21 = 7 × 12, 2 9 × 200 × 351 2 = 12 6 × 65 2 , and 2 2 × 12 2 × 170625 = 7 × 200 2 × 351. For each of these combinations we confirm that the left side of the equality corresponds to the maximal µ and that this partitions is unique. Clearly, the same is true for the products of these combinations and arbitrary prime degeneracies. Thus partitions contributing to the leading asymptotics of N (d, n) are unique, and the gauge is fixed by demanding µ 4 = max, which leads to Eq. (A14). For grasping the form of Eq. (A14), see also Eqs. (A10), especially the asymptotics of N (84, n).
Thus we have the chain of conveniently coupled linear recursion relations that one can easily process starting from d = 1. These recursions generate exact N (d, n) for finite d and n, and in this sense provide the exact solution of the problem. Moreover, the leading large n asymptotics of N (d, n) are found explicitly. Note that the time of computing all N (d < d 0 , n), where d 0 is fixed, by using our recursions, is proportional to n, i.e. the computation for each next size takes the same time.  Table S1 contains the complete data on the spectrum of degeneracy of outputs for complete input data sets of length n up to n = 21, namely all pairs {d i , N i }, i = 1, ..., D, where d i is the i-th value of degeneracy and N i is the corresponding number of outputs having this degeneracy. This data for n = 21 is presented in graphical form in Fig. 3 (a,b). Table S2 presents the total number of different values of degeneracy D for each input length n ≤ 120. TABLE S1. Spectrum of degeneracy produced by complete input data sets with input strings of length n, i.e., the pairs {di, Ni}, i = 1, ..., D, where di is the i-th value of degeneracy and Ni is the corresponding number of outputs having this degeneracy, D is the total number of different values of degeneracy.