Resolution of Nested Neuronal Representations Can Be Exponential in the Number of Neurons

Collective computation is typically polynomial in the number of computational elements, such as transistors or neurons, whether one considers the storage capacity of a memory device or the number of floating-point operations per second of a CPU. However, we show here that the capacity of a computational network to resolve real-valued signals of arbitrary dimensions can be exponential in N, even if the individual elements are noisy and unreliable. Nested, modular codes that achieve such high resolutions mirror the properties of grid cells in vertebrates, which underlie spatial navigation.

The brain encodes many stimulus variables, such as the position of an object, the orientation of an edge, or the frequency of a sound, with high precision.Multidimensional stimulus spaces are represented by joint activity across neurons, each of which fires noisy, unreliable spikes.Yet, despite the stochastic nature of neuronal discharge, the nervous system achieves a highly efficient representation of the outside world [1].
The simplest representation of a continuous stimulus variable is a one-to-one map onto a neuronal firing rate.Given unreliable spikes, however, a labeled line code across different neurons is more robust and efficient [2].The place code for spatial position in the hippocampus is an instance of such a labeled line code.One drawback of such a code is that the resolution only scales linearly in the number of neurons [2][3][4].
A place code can be improved upon by using a cascade of self-similar, periodic representations at different scales, as depicted in Fig. 1.Each successive level refines the representation at the previous coarser scale, such that the overall resolution scales exponentially in the number of neurons, as we show in this Letter.
Neuronal coding of sensory information at multiple scales occurs in many brain areas [5] and arises naturally in the theory of sparse coding [6]; we show here how the dense coding at multiple scales found in the entorhinal cortex and related areas, where each neuron has multiple firing fields, can be highly efficient, even though the firing rate of a single neuron no longer maps onto a single stimulus, but to many possible stimuli.
We consider N statistically independent neurons encoding a compact but possibly high-dimensional stimulus space, normalized to ½0; 1 D .For simplicity, each stimulus is assumed to be equally likely.Each neuron's response is characterized by the number of spikes k i emitted within a time after stimulus onset.The neuron's mean firing rate depends on the stimulus x through its tuning curve i ðxÞ.
The number of spikes is stochastic, so that observing a response K ¼ ðk 1 ; . . .; k N Þ across the population of neurons has a probability A module consists of a set of tuning curves with the same period but different phases c i .The spatial period for modules 2 and 3 are 2 ¼ 0:45 and 3 ¼ 0:3, respectively.In each module, we highlight a single tuning curve by a solid line to show the period.Shifted but otherwise identical tuning curves are dashed.Nested modules successively refine the representation of the stimulus.Periodicity implies that the map from stimulus to population response is not one-to-one within a single module.Only the ensemble response provides a unique representation of x.(b) A unimodal tuning curve in two dimensions, shown at the top, can be rescaled and periodically extended using Eqs.( 5) and ( 6).The periodic tuning curves À in the lower panel is based on a rectangular lattice À spanned by v 1 ¼ ð1; 0Þ 0 and v 2 ¼ ð0; 1Þ 0 , with ¼ 1=2.For this lattice, a fundamental domain U is depicted.
Published by the American Physical Society under the terms of the Creative Commons Attribution 3.0 License.Further distribution of this work must maintain attribution to the author(s) and the published article's title, journal citation, and DOI.
0031-9007=12=109(1)=018103 (5) 018103-1 Published by the American Physical Society We ask how accurately an ideal observer can deduce the stimulus x from K. For this purpose, we use the Fisher information matrix J with components where , 2 f1; Á Á Á ; Dg.The Crame ´r-Rao inequality [7] relates the inverse of J to the covariance matrix AE of any unbiased estimator x of the encoded stimulus x, Let us first consider a population of neurons with tuning curves i ðxÞ such that each has a single peak of width .Suppose the tuning curves differ only in the center c i of the peak, so that i ðxÞ ¼ ðx À c i Þ.A single tuning curve has Fisher information J i ðxÞ.For the stochastic population model in Eq. ( 1), the Fisher information J ðxÞ for the population is simply the sum is the density of the centers.Assume that the centers c i are uniformly distributed in ½0; 1 D , so that the tuning curves cover the entire stimulus space.As the centers become increasingly dense with increasing N, the Fisher information J ðxÞ becomes independent of the specific stimulus x and scales linearly in the number of neurons for some function K [3,8,9].If the stimulus space were not compact, but instead encompassed all of R D , then The linear scaling of Eq. ( 4) in N can be dramatically improved by switching to a nested modular code, in which each module consists of a subpopulation of M periodic tuning curves, as in Fig. 1.Each module is associated with a unique spatial period.
We first compute the Fisher information for a single module that has periodic tuning curves, such that M ¼ N. Any unimodal tuning curve ðxÞ on ½0; 1 D can be periodically extended.Let À & R D be a nondegenerate, affine point lattice [10] such that ðv Þ 1 D is a basis for R D and u a center.Let U & R D be a fundamental domain of this lattice.Then there is a canonical coordinate transformation È: U ! ½0; 1Þ D in terms of an invertible matrix T and a vector w, such that È ¼ Àw þ T. One defines the periodic extension of as This definition is illustrated in Fig. 1(b) and is independent of the particular representation (or translational shift) for U.
A family of shifted, periodic tuning curves on the lattice À constitutes a module and is associated with Fisher information J ;À .We now relate J ;À to the original J .Under the inverse map È À1 , the transformed tuning curves have centers c 0 where By changing the variables in the Fisher information [7] and using the periodicity of Èððx þ 'Þ mod ÀÞ on U, we get In the last step, we interchanged the multidimensional integration and matrix multiplication, as T and its transpose T 0 are independent of '.Under the map È, we obtain Thus, we have derived the following rule: J ;À ðxÞ ¼ TJ ðxÞT 0 .For an orthogonal lattice À defined by v ¼ e on the canonical basis ðe Þ 1 D of R D , each entry in the Fisher information matrix is rescaled by ð Þ À1 under this transformation.If the original Fisher information matrix is diagonal, then so is J ;À ðxÞ.Therefore, rescaling the periodic tuning curves by a factor of 2 quadruples the Fisher information, but at the cost of introducing ambiguity-the value of x can only be recovered modulo the lattice À. Resolving this ambiguity requires a multiscale representation consisting of multiple modules spanning different spatial periods.
How should these different spatial periods be chosen?Suppose there are L modules with spatial periods f 1 ; . . .; L g, arranged in decreasing order from the longest period to the shortest, as in Fig. 1.Each module has M ¼ N=L tuning curves, and each tuning curve within a module is associated with a different phase shift.
The easiest case to analyze is the one in which the lattices À k are orthogonal [11], the rescaling is uniform in each dimension, and the tuning curve is radially symmetric.In such a case, a module on lattice À k has a Fisher information matrix J ;À k ðxÞ ¼ À2 k J , and radial symmetry implies that J ¼ JI is diagonal and proportional to the identity matrix I [3].To ensure that the first module represents x 2 ½0; 1 D unambiguously, we treat this module as a special case and make it aperiodic; i.e., the first module is a place code with the same tuning width that a periodic module with ¼ 1 would have.
Within the first module, the expected error in an unbiased estimate of x asymptotically approaches 1= ffiffiffi J p , according to Eq. ( 3).This error sets a lower bound on the period of the next module that refines the representation of x.Hence, each k should obey where C is a safety factor, such that 1 ( C < ffiffiffi J p .For a C larger than unity, the next module can correct for the error in the previous module.The Fisher information for the nested population is If we take only the last term in the series and substitute L ¼ N=M, we see that For fixed module size M, the Fisher information scales exponentially in the number of neurons N.Such a coding scheme, therefore, outperforms a single module that only neurons is divided into one, two, or three modules, according to Eq. ( 8).The neurons' tuning width is fixed as ¼ 2. (b) The Fisher information for a neuronal population with M ¼ 10 5 neurons per module and D ¼ 3.For a nested modular code with L modules, the Fisher information grows exponentially in N ¼ L Á M, whereas it is linear in N for a place code.(c) The error of the estimator that minimizes ðx À xÞ 2 for place and grid codes in D ¼ 3 dimensions, based on sampling the stochastic response 1200 times.Each module comprises M ¼ 8 3 equidistantly spaced cells.The lattice lengths for different modules are scaled according to Eq. ( 8), with a safety factor C ¼ 20.The Crame ´r-Rao bound of Eq. ( 10) is tight for both the grid and place codes.(d) When the lattice lengths contract more strongly than allowed by Eq. ( 8), such that C ¼ 1, the error fails to improve in a nested modular code.Although the Fisher information predicts an error even lower than in (c), the uncertainty derived from the first module's response is larger than the lattice length scale of the next finer-grained module, so that adding modules with finer spacing does not improve the resolution.
PRL 109, 018103 ( scales linearly in N. Note that this scaling is independent of the dimension D. As a concrete example, we consider a set of tuning curves with Poisson noise, centers c i , tuning width , and period , as given by i ðxÞ ¼ f max expð 1 2 P D ¼1 fcos½2ðx À c i; Þ= À 1gÞ [12].If the number of neurons per period is M ) 1 and the centers are uniformly distributed, the module's average Fisher information is given by where K n ðxÞ ¼ expðÀ1=xÞI n ðÀ1=xÞ and I n ðxÞ is the nth order modified Bessel function of the first kind.In Fig. 2(a), the Fisher information for a large population of N ¼ 300 000 (place) cells is plotted for stimulus dimensions D ¼ 1 to 10. Dividing the population into separate grid modules according to Eq. ( 10) with C ¼ 20 leads to a much higher Fisher information-orders of magnitude, irrespective of the dimension D of the stimulus space.To corroborate these analytical results, we sampled the response K and estimated the minimum mean square error based on the posterior probability distribution pðxjKÞ [13], as shown in Figs.2(c) and 2(d).These simulations show that the Crame ´r-Rao bound is tight as long as the grid codes obey the constraint in Eq. ( 8).For place codes, it is known that on short time scales [14], or for low numbers of neurons [9], the Crame ´r-Rao bound will not be tight, so that the Fisher information underestimates the error in decoding the signal.The same will hold true for grid codes.However, when N > M ) 1 and the expected number of spikes at the center of each tuning curve is appreciable, a nested modular code leads to an error that scales as M ÀN=M .Note that we assume that the firing of neurons is uncorrelated.Whether this assumption holds in cortex is a matter of fierce debate [15].The Fisher information deteriorates with increasing noise correlations, but its scaling in N does not, at least not for the correlation strengths measured by Ecker et al. in cortex.
Periodic tuning curves have been found in entorhinal cortex of rodents-coined grid cells ( [16]; see Supplemental Material [17]).This unexpected discovery has inspired theorists to explore the combinatorial capacity of modular periodic codes and how they might be used in the brain [18].In some cases, the stimulus space is intrinsically periodic-orientation and color hue are but two examples.But when the space of stimulus x is infinite instead of periodic, different spatial periods can be combined to encode a much larger range of x uniquely than would otherwise be possible [19].Indeed, the exponential range that can result confers a relative precision that is also exponential [20].This Letter, in contrast, shows that the absolute precision in x can be exponential in N. Precision is of paramount importance for path integration, for which the mammalian brain is thought to use grid cells [21].Interestingly, the periodic lattices for neighboring grid cells share similar spatial periods and orientations, but are spatially translated relative to each other [16].Moreover, along the dorso-ventral axis of the entorhinal cortex, the typical spatial period of the lattice grows from roughly 20 cm to several meters, while the ratio of grid field width to spatial period remains constant [22].Our theoretical analysis indicates that these grid cell properties may endow the brain with a highly accurate representation of space; the same principles might be used for representing other continuous, high-dimensional stimuli.

FIG. 1 (
FIG. 1 (color online).Example of nested modules.(a) All modules, except for the coarsest one, have periodic tuning curvesi ðx À c i Þ.A module consists of a set of tuning curves with the same period but different phases c i .The spatial period for modules 2 and 3 are 2 ¼ 0:45 and 3 ¼ 0:3, respectively.In each module, we highlight a single tuning curve by a solid line to show the period.Shifted but otherwise identical tuning curves are dashed.Nested modules successively refine the representation of the stimulus.Periodicity implies that the map from stimulus to population response is not one-to-one within a single module.Only the ensemble response provides a unique representation of x.(b) A unimodal tuning curve in two dimensions, shown at the top, can be rescaled and periodically extended using Eqs.(5) and (6).The periodic tuning curves À in the lower panel is based on a rectangular lattice À spanned by v 1 ¼ ð1; 0Þ 0 and v 2 ¼ ð0; 1Þ 0 , with ¼ 1=2.For this lattice, a fundamental domain U is depicted.

FIG. 2 (
FIG. 2 (color online).(a)Grid codes (GC) outperform place codes (PC), regardless of the number of stimulus dimensions.A population of N ¼ 3 Â 10 5 neurons is divided into one, two, or three modules, according to Eq.(8).The neurons' tuning width is fixed as ¼ 2. (b) The Fisher information for a neuronal population with M ¼ 10 5 neurons per module and D ¼ 3.For a nested modular code with L modules, the Fisher information grows exponentially in N ¼ L Á M, whereas it is linear in N for a place code.(c) The error of the estimator that minimizes ðx À xÞ 2 for place and grid codes in D ¼ 3 dimensions, based on sampling the stochastic response 1200 times.Each module comprises M ¼ 8 3 equidistantly spaced cells.The lattice lengths for different modules are scaled according to Eq. (8), with a safety factor C ¼ 20.The Crame ´r-Rao bound of Eq. (10) is tight for both the grid and place codes.(d) When the lattice lengths contract more strongly than allowed by Eq. (8), such that C ¼ 1, the error fails to improve in a nested modular code.Although the Fisher information predicts an error even lower than in (c), the uncertainty derived from the first module's response is larger than the lattice length scale of the next finer-grained module, so that adding modules with finer spacing does not improve the resolution.

Figure 2 (
b) underscores the key finding of this Letter: the Fisher information grows exponentially in the number N of encoding neurons.The place code, in contrast, is linear in N.