Geometric framework to predict structure from function in neural networks

Neural computation in biological and artificial networks relies on the nonlinear summation of many inputs. The structural connectivity matrix of synaptic weights between neurons is a critical determinant of overall network function, but quantitative links between neural network structure and function are complex and subtle. For example, many networks can give rise to similar functional responses, and the same network can function differently depending on context. Whether certain patterns of synaptic connectivity are required to generate specific network-level computations is largely unknown. Here we introduce a geometric framework for identifying synaptic connections required by steady-state responses in recurrent networks of threshold-linear neurons. Assuming that the number of specified response patterns does not exceed the number of input synapses, we analytically calculate the solution space of all feedforward and recurrent connectivity matrices that can generate the specified responses from the network inputs. A generalization accounting for noise further reveals that the solution space geometry can undergo topological transitions as the allowed error increases, which could provide insight into both neuroscience and machine learning. We ultimately use this geometric characterization to derive certainty conditions guaranteeing a nonzero synapse between neurons. Our theoretical framework could thus be applied to neural activity data to make rigorous anatomical predictions that follow generally from the model architecture.


I. INTRODUCTION
Structure-function relationships are fundamental to biology [1][2][3]. In neural networks, the structure of synaptic connectivity critically shapes the functional responses of neurons [4,5], and large-scale techniques for measuring neural network structure and function provide exciting opportunities for examining this link quantitatively [6][7][8][9][10][11][12][13][14][15]. The ellipsoid body in the central complex of Drosophila is a beautiful example where modeling showed how the structural pattern of excitatory and inhibitory connections enables a persistent representation of heading direction [16][17][18][19]. Lucid structure-function links have also been found in several other neural networks [20][21][22][23]. However, it is generally hard to predict either neural network structure or function from the other [5,24]. For example, functionally inferred connectivity can capture neuronal response correlations without matching structural connectivity [25][26][27][28], and network simulations with structural constraints do not automatically reproduce function [29][30][31]. Two broad modeling difficulties hinder the establishment of robust structurefunction links. First, models with too much detail are difficult to adequately constrain and analyze. Second, models with too little detail may poorly match biological mechanisms, the model mismatch problem. Here we propose a rigorous theoretical framework that attempts to balance these competing factors to predict components of network structure required for function.
Neural network function probably does not depend on the exact strength of every synapse. Indeed, multiple network connectivity structures can generate the same functional responses [32,33], as illustrated by structural variability across individual animals [24,34] and artificial neural networks [29,[35][36][37]. Such redundancy may be a general feature of emergent phenomena in physics, biology, and neuroscience [38][39][40]. Nevertheless, some important details may be consistent despite this variability, and here we find well-constrained structure-function links by characterizing all connectivity structures that are consistent with the desired functional responses [24]. We also account for ambiguities caused by measurement noise. Our goal is not to find degenerate networks that perform equivalently in all possible scenarios. We instead seek a framework that finds connectivity required for specific functional responses, independently of whatever else the network might do.
The model mismatch problem has at least two facets. First, neurons and synapses are incredibly complex [41][42][43][44], but which complexities are needed to elucidate specific structure-function relationships is unclear [5,45,46]. This issue is very hard to address in full generality, and here we seek a theoretical framework that makes clear experimental predictions that can adjudicate candidate models empirically. In particular, we predict neural network structure only when it occurs in all networks generating the functional responses. This high bar precludes the analysis of biophysically-detailed network models, which require numerical exploration of the connectivity space that is typically incomplete [24,32,[47][48][49]. We instead focus on recurrent firing rate networks of threshold-linear neurons, which are growing in popularity because they strike an appealing balance between biological realism, computational power, and mathematical tractability [12,16,18,20,22,23,29,30,37,[50][51][52][53][54][55].
The second facet of the model mismatch problem is hidden variables, such as missing neurons, neuromodulator levels, and physiological states [5,[56][57][58]. Here we take inspiration from whole-brain imaging in small organisms [15], such as Caenorhabditis elegans [9], larval zebrafish [8,12,57], and larval Drosophila [11], and assume access to all relevant neurons. Our model neglects neuromodulators and other state variables, which would be interesting to consider in the future. Furthermore, many experiments indirectly assess neuronal spiking activity, such as by calcium florescence [58][59][60][61] or hemodynamic responses [25,[62][63][64]. We restrict our analysis to steady-state responses to mitigate mismatch between fast firing rate changes and these inherently slow measurement techniques.
Our analysis begins with an analytical characterization of synaptic weight matrices that realize specified steady-state responses as fixed points of neural network dynamics [Figs. 1(a) and 1(b)]. A key insight is that asymmetrically constrained dimensions appear as a consequence of the threshold nonlinearity. Synaptic weight components in these semiconstrained dimensions are completely uncertain in one half of the dimension but well-constrained in the other. We then compute error surfaces by finding weight matrices with fixed points near the desired ones. This error landscape has a continuum of local and global minima, and constant-error surfaces exhibit topological transitions that add semiconstrained dimensions as the error increases. This may help explain the importance of weight initialization in machine learning, as poorly initialized models can get stuck in semiconstrained dimensions that abruptly vanish at nonzero error. By studying the geometric structure of the neural network ensemble that can approximate the functional responses, we derive analytical formulas that pinpoint a subset of connections, which we term certain synapses, that must exist for the model to work [ Fig. 1(c)]. These analytical results are especially useful for studying high-dimensional synaptic weight spaces that are otherwise intractable. Since the presence of a synapse is readily measurable, our theory generates accessible experimental predictions [ Fig. 1(c)]. Tests of these predictions assess the utility of the modeling framework itself, as the predictions hold across model parameters. Their successes and failures can thus move us forward toward identifying the mechanistic principles governing how neural networks implement brain computations.
The rest of the paper begins in Sec. II with a toy problem that concretely demonstrates the approach illustrated in Fig. 1 and relates the geometry of the solution space (all synaptic weight matrices that realize a given set of response patterns) to the concept of a certain synapse. In Sec. III, we explain how the solution space for a limited number of response patterns can be calculated for an arbitrarily large threshold-linear recurrent neural network. Section IV is devoted to three simple toy problems that provide additional insights into how the geometry of the solution space can help us to identify certain synapses. This is followed by Sec. V, where we explain and numerically test the precise algebraic relation that must be satisfied for a synapse to be certain when the response patterns are orthonormal. Section VI generalizes our analyses to include noise, including numerical tests via simulation. Finally, Sec. VII concludes the paper by summarizing our main results and discussing important future directions.

II. AN ILLUSTRATIVE TOY PROBLEM
To gain intuition on how robust structure-function links can be established, including the effects of nonlinearity, we begin by analyzing the structural implications of functional responses in a very simple threshold-linear feedforward network [ Fig. 2(a)]. We assume that two input neurons, x 1 and x 2 , provide signals to a single driven neuron, y, via synaptic weights, w 1 and w 2 . The weights are unknown, and we constrain their possible values using two neuronal response patterns, labeled μ = + and μ = −. We suppose that steady-state activities of the input neurons and driven neuron are nonlinearly related according to y = Φ w 1 x 1 + w 2 x 2 , (1) where x 1 , x 2 , and y denote firing rates of the corresponding neurons, and Φ(s) = max(0, s) (2) is the threshold-linear transfer function. The driven neuron responds (y = 1) when x 1 = x 2 = 1 in the μ = + pattern. In contrast, the driven neuron does not respond (y = 0) when x 1 = −x 2 = 1 in the μ = − pattern. If the transfer function were linear, then it is easy to see that there is a unique set of weights, w 1 = w 2 = 1 2 , that produces these driven neuron responses, the brown dot in Fig. 2(b).
How does the nonlinearity change the solution space of weights that reproduce the driven neuron responses? To answer this question, we define two linear combinations of weights, η ± = w 1 ± w 2 , (3) which correspond to the driven neuron's input drive in patterns μ = ±. Equation (1) now yields rather simple algebraic constraints for the two patterns: The example of Fig. 2 concretely illustrates the general procedure diagramed in Fig. 1. . Section III will generalize the first two parts of this procedure to characterize the solution space of any threshold-linear recurrent neural network, assuming that the number of response patterns is at most the dimensionality of the weight vectors. Sections IV and V will then generalize the final part of this procedure to pinpoint synaptic connections that are critical for generating any specified set of orthonormal responses.

III. SOLUTION SPACE GEOMETRY A. Neural network structure and dynamics
Consider a neural network of ℐ input neurons that send signals to a recurrently connected population of driven neurons [ Fig. 3(a)]. We compactly represent the network connectivity with a matrix of synaptic weights, w im , where i = 1, …, indexes the driven neurons, and m = 1, …, + ℐ indexes presynaptic neurons from both the driven and input populations. We suppose that activity in the population of driven neurons dynamically evolves according to w im x m − D , (6) where y i is the firing rate of the ith driven neuron, x m is the firing rate of the mth input neuron, and τ i is the time constant that determines how long the ith driven neuron integrates its presynaptic signals. It is possible that prior biological knowledge dictates that certain synapses appearing in Eq. (6) are absent. For notational convenience, in this paper we will assume that the number of synapses onto each driven neuron remains the same, 2 and we will denote this number of the incoming synapses as . Note that = ℐ + for a general recurrent network, = ℐ + − 1 for recurrent networks without self-synapses, and = ℐ for feedforward networks. We suppose that the network functionally maps input patterns, x μm , to steady-state driven signals, y μi ⩾ 0, where μ = 1, …, labels the patterns [ Fig. 3 We assume throughout that ⩽ , as the number of known response patterns is typically small, and the number of possible synaptic inputs is large. Experimentally, different response patterns often correspond to different stimulus conditions, so we will often refer to μ as a stimulus index and x μm → y μi as a stimulus transformation.

B. Decomposing a recurrent network into feedforward networks
Our goal is to find features of the synaptic weight matrix that are required for the stimulus transformation discussed above. For notational simplicity, let us consider the case where we potentially have all-to-all connectivity, so that = + ℐ, but we will later explain how 2 It will become progressively evident that our construction of the solution space and certainty condition can be trivially adapted to the case where the number of presynaptic neurons changes from one driven neuron to another. our arguments generalize. Since all time-derivatives are zero at steady-state, the response properties provide × nonlinear equations for × unknown parameters 3 : w im x μ, m − D . (7) Inspection of the above equation, however, reveals that each neuron's steady-state activity depends only on a single row of the connectivity matrix [ Fig. 3(c)]; the responses of the ith driven neuron, {y μi , μ = 1, …, P}, are only affected by its incoming synaptic weights, {w im , m = 1, …, }. Thus, the above equations separate into independent sets of equations, one for each driven neuron. In other words, we now have to solve feedforward problems, each of which will characterize the incoming synaptic weights of a particular driven neuron, which we term the target neuron. Note that since a generic target neuron receives signals from both the input and the driven populations, the activities of both input and driven neurons serve to produce the presynaptic input patterns that drive the responses of the target neuron in the reduced feedforward problem.

C. Solution space for feedforward networks
We have just seen how we can solve the problem of finding synaptic weights consistent with steady-state responses of a recurrent population of neurons, provided we know how to solve the equivalent problem for feedforward networks. Accordingly, we will now focus on a feedforward network, where a single target neuron, y, receives inputs from neurons {x m ; m = 1, …, }, to find the ensemble of synaptic weights that reproduce this target neuron's observed responses. The constraint equations are y μ = Φ ∑ m = 1 N x μm w m , (8) where y μ now stands for the activity of the target neuron driven by the μth input pattern, and w is the -vector of synaptic weights onto the target neuron. Assuming that the × matrix x is rank P, we let the × matrix X be rank with X μm = x μm for μ = 1, …, .
This implies that the last − rows of X span the null space of x, and X defines a basis transformation on the weight space,  The linearly independent columns of X −1 define the basis vectors corresponding to the η coordinates, In other words, where e m is the physical orthonormal basis whose coordinates, {w m }, correspond to the material substrates of network connectivity. These basis vectors can be obtained from ε μ by an inverse basis transformation: We can thus write any vector of incoming weights as In terms of η coordinates, the nonlinear constraint equations take a rather simple form: y μ = Φ η μ for μ = 1, ⋯, P . (14) Accordingly, η coordinates succinctly parametrize the solution space of all weight matrices that support the specified fixed points [ Fig. 3(d)]. Each η dimension can be neatly categorized into one of three types. First, for each stimulus condition μ where y μ > 0, we must have η μ > 0. This in turn implies that Φ(η μ ) = η μ = y μ . Because the coordinate η μ must adopt a specific value to generate the transformation, we say that μ defines a constrained dimension. We denote the number of constrained dimensions as ⩽ . Second, note that the threshold in the transfer function implies that Φ(a) = 0 for all a ⩽ 0. Therefore, for any stimulus condition such that y μ = 0, we have a solution whenever η μ 0. Because positive values of η μ are excluded but all negative values are equally consistent with the transformation, we say that μ defines a semiconstrained dimension. We denote the number of semiconstrained dimensions as = -. Finally, we have no constraint equations for η μ if μ = + 1,···, . Because all positive or negative values of η μ are equally consistent with the stimulus transformation, we say that μ defines an unconstrained dimension. We denote the number of unconstrained dimensions as = − . Altogether, the stimulus transformation is consistent with every incoming weight vector that satisfies η μ = y μ if y μ > 0, μ ⩽ P −∞ < η μ ⩽ 0 if y μ = 0, μ ⩽ P −∞ < η μ < ∞ if μ > P . (15) Note that one can enumerate the solutions in the physically meaningful w coordinates by simply applying the inverse basis transformation in Eq. (9) to any solution found in η coordinates.
Going forward, it will be convenient to extend the -dimensional vector of target neuron activity to an -dimensional vector whose components along the unconstrained dimensions are equal to zero, because this will allow us to compactly write equations in terms of dot products between the activity vector and vectors in the -dimensional weight space. Rather than introducing a new notation for this extended -dimensional vector, we simply write y with y μ = 0 for μ = + 1, …, . It is critical to remember that this is merely a notational convenience, and the solution space distinguishes between semiconstrained dimensions and unconstrained dimensions according to Eq. (15). In particular, y μ = 0 is a constraint equation for semiconstrained dimensions, but y μ = 0 is a notational convenience for unconstrained dimensions.

D. Back to the recurrent network
To understand how the solution space geometry of the feedforward network can be translated back to the recurrent network, it is useful to group together the steady-state activities of all input and driven neurons that are presynaptic to the ith driven neuron as a × input pattern matrix, z (i) . 4 The entries of the matrix, z μm (i) , correspond to the responses of the mth presynaptic neuron to the μth stimulus. At this point it is easy to see that when biological constraints dictate that some of the synapses are absent, then one should just exclude those presynaptic neurons when constructing z (i) , such that the m index excludes those presynaptic neurons. Similarly, by a suitable reordering, which will depend on the driven neuron, we can always ensure that m = 1, …, runs only over the neurons that are presynaptic to the given driven neuron.
Once the input patterns feeding into the ith neuron are known, we can follow the steps outlined in the previous subsection to define the × full rank extension of z (i) , Z (i) , and the η (i) coordinates via The nature of the η μ (i) coordinates, that is whether they are constrained, semiconstrained, or unconstrained, is determined by how the ith neuron responded to the stimulus conditions, as in Eq. (15). Repeating this process for all driven neurons provides a geometric characterization of the entire recurrent network solution space, which involves all elements of the synaptic weight matrix, w im .
An important special case is all-to-all network connectivity. In this case, the Z (i) matrices are the same for all driven neurons, and therefore the directions corresponding to the 4 In fact, one can easily incorporate the case when the number of presynaptic partners differs from one driven neuron to another. This just means that the z (i) matrices will have dimensions × i , where i represents the number of presynaptic partners of the ith neuron.
η coordinates are also preserved. 5 In particular, the orientation of the unconstrained subspace with respect to the physical basis does not change from one driven neuron to another. However, how a given driven neuron responds to a particular stimulus determines whether the corresponding η direction is going to be constrained or semiconstrained for the feedforward network associated with that driven neuron.

IV. CERTAIN SYNAPSES IN ILLUSTRATIVE 3D EXAMPLES
Although we have found infinitely many weight matrices that produce a given stimulus transformation, it is nevertheless possible that the solutions imply firm anatomical constraints (e.g., Sec. II). In this paper we focus on finding synapses that must be nonzero in order for the response patterns to be fixed points of the neural network dynamics. We refer to such synapses as certain, because the synapse must exist in the model, and its sign is identifiable from the response patterns. It is clear from the geometry of the solution space that the relative orientations between the η coordinates and the physical w coordinates are significant determinants of synapse certainty. To build quantitative intuition for how the solution space geometry precisely determines synapse certainty, we begin by first analyzing a few illustrative toy problems. In the next section we will describe the more general treatment of high-dimensional networks. Importantly, we select and parametrize each toy problem to introduce concepts and notations that will reappear in the general solution.
More specifically, we first consider three feedforward examples with = 3 [ Fig. 4(a)]. The first two examples have = 3, and the third has = 2. In the first example, we will assume that the driven neuron does not respond to the first two stimulus patterns, but responds positively to the third pattern. So we have two semiconstrained and one constrained dimension, η 1 ⩽ 0, η 2 ⩽ 0, and η 3 = y 3 > 0. (17) In contrast, in the second example we will have two constrained and one semiconstrained dimension, η 1 = y 1 > 0, η 2 = y 2 > 0, and η 3 ⩽ 0. (18) The final example will feature one unconstrained, one semiconstrained, and one constrained dimension, η 1 ⩽ 0, η 2 = y 2 > 0, and −∞ < η 3 < ∞ . (19) For technical simplicity we will consider orthonormal input patterns, X −1 = X T , which implies that 5 Nevertheless, the vector spaces of synaptic weights are fundamentally distinct for different driven neurons, as these vector spaces pertain to the incoming synapses onto different driven neurons. The fact that the Z (i) matrices are the same for all i means that the relative orientation of the η directions, with respect to the physical w-coordinate axes (labeled by the presynaptic indices), remains the same for all the driven neurons. ∑ m = 1 N X μm X vm = δ μv = ε μ ⋅ ε v , (20) where δ μν is the Kronecker δ function, which equals 1 if μ = ν and 0 if μ ≠ ν, so ε μ = ε μ . This trivially implies that the η coordinates are related to the synaptic coordinates via a rotation, so the spherical biological bound on the physical coordinates transforms to an identical spherical bound on the η coordinates: A. Problem 1 Let us first focus on the example with two semiconstrained and one constrained dimension, whose solution space is depicted in deep yellow in Fig. 4(b). Suppose we are interested in assessing whether the w 1 synapse is certain. Since the w 1 = 0 plane divides the weight space into the positive and the negative halves, the synapse will be certain if this plane does not intersect with the solution space, which clearly depends on the orientation of the plane relative to the various η directions [ Fig. 4(b)]. It is thus useful to consider how the w 1 = 0 plane's unit normal vector pointing toward positive weights, e ≡ e 1 , is oriented relative to the η directions. For ease of graphical illustration, here we assume the specific orientation diagramed in Figs. 4(b) and 4(c). Using Eq. (12) and the orthogonality of X, we can parametrize e as e = ∑ μ = 1 N = 3 X μ1 ε μ = cosθc + sinθs, (22) where c = − ε 3 , and s = − cosγ ε 1 + sin γ ε 2 (23) [Figs. 4(b) and 4(c)]. Geometrically, c and s are unit vectors along the projections of e onto the constrained and semiconstrained subspaces [ Fig. 4(b)]. Thus, cos θ ⩾ 0 and sin θ ⩾ 0, making θ an acute angle. In this example, γ is also an acute angle, as depicted in Fig. 4(c).
The perpendicular distance can be identified from Eq. (24) as d s = y 3 cot θ . (26) Substituting this expression for d s into Eq. (25), one finds through simple algebra that the w 1 = 0 hyperplane does not intersect the solution space, and hence the synapse is certain, if the response magnitude exceeds a critical value, y 3 > y cr = W sin 2 θ cos 2 γ cos 2 θ + sin 2 θ cos 2 γ , (27) which we generally refer to as y-critical.
Notice that if θ increases in Fig. 4(b), then the orange line in Fig. 4(c) comes closer to the origin, making it intersect with the solution space for more γ angles. Therefore, the synapse is more difficult to identify, and indeed Eq. (27) shows that y cr increases. However, if γ increases, then the orange line in Fig. 4(c) rotates away from the solution space, making the synapse easier to identify with small d s . Accordingly, y cr decreases.
It will turn out that the concept of y-critical is general, and y cr can always be expressed in terms of projections of ê along several specific directions. In this example, if we define e s* and e y to be projections of e along, s * = − ε and y = ε 3 respectively, then it is easy to check that one can re-express y cr as y cr = W e s * 2 e y 2 + e s * 2 .
We will later discover that these projections are closely related to correlations between pre-synaptic and postsynaptic neuronal activity patterns. Thus, the expressions in Eq. (28) will provide a deeper understanding of the determinants of synapse certainty.

B. Problem 2
Having identified two key angles, θ and γ, that play a role in synapse certainty, let us look at the example of two constrained and one semiconstrained dimensions to uncover other important geometric quantities. In this case, the solution space is a ray defined by η 1 = y 1 , η 2 = y 2 , and −∞ < η 3 0, and the magnitude of η 3 is at most for solutions within the weight bound [ Fig. 4(d)]. Figure 4(d) shows a geometry where the w 1 = 0 plane intersects the solution space at the point w int = y 1 ε 1 + y 2 ε 2 + η 3 ε 3 .
Now we must have e ⋅ w int = 0, (31) as the intersection point lies on the w 1 = 0 plane by definition, where we have defined e ≡ e 1 as in the previous toy problem. The projection directions of e onto the constrained and semiconstrained subspaces are given by c = cosβ ε 1 + sinβ ε 2 , and s = ε 3 , [ Fig. 4(d)]. Then combining Eqs. (22) and (32), we can find an equation to determine η 3 at the intersection point e ⋅ w int = cosθ y 1 cosβ + y 2 sinβ + η 3 sinθ = 0. (33) We next introduce α to represent the angle between c and y [ Fig. 4(e)], such that y 1 = ycos(β − α) and y 2 = y sin(β − α), (34) where y = y . The first two terms in Eq. (33) can then be trigonometrically combined with a difference of angles identity to arrive at ycosθcosα + η 3 sinθ = 0 η 3 = − y cot θcosα . (35) To be able to identify the sign of w 1 , this intersection point must lie beyond the weight bounds of the solution line segment, so η 3 < − W . After some straightforward algebra we obtain the certainty condition as y > y cr ≡ W sin 2 θ cos 2 θcos 2 α + sin 2 θ . (36) From the geometry of the problem in Figs. 4(d) and 4(e), one sees that as θ or α increases, the point where the orange hyperplane intersects the yellow line is closer to the origin. Indeed, y cr increases, making it more difficult to identify the synapse sign. Again, one can re-express y cr as Eq. (28) in terms of projections, with the role of s * being played by ε 3 .

C. Problem 3
Through the two above examples we found three angles, θ,α, and γ, that determine how large the response of the driven neuron has to be in order for a given synapse to be certain. However in both examples the number of patterns were equal to the number of synapses, = . When < , we have unconstrained dimensions, and the projection of the e ≡ e 1 vector into the unconstrained subspace will also matter, because it relates to how much we do not know about the response properties of the driven neuron.
Here we consider a = 3 example with one constrained, one semiconstrained, and one unconstrained dimension rection as a linear combination of its projections along the [ Fig. 4(f)]. In this case, we can express the e synaptic diconstrained, semiconstrained and unconstrained dimension as e = ∑ μ = 1 N X μ1 ε μ = cos θc + sinθcosϕs + sin θsin ϕu, (37) where we can always choose the directions of the unit vectors to make θ and φ acute angles.
For the example shown in Fig. 4(f), this is achieved by choosing c = ε 2 , s = − ε 1 , and u = ε 3 . (38) Obtaining the certainty condition again involves ascertaining whether the w 1 = 0 hyperplane intersects the deep yellow solution space [ Fig. 4(f)]. In the example of Fig. 4(f), one can see that increasing the driven neuron response moves the yellow plane up, and there will come a critical point when the orange w 1 = 0 plane just touches the solution space at the corner (η 1 = 0, η 2 = y cr , η 3 ). Thus, w int = y cr ε 2 + η 3 ε 3 . (39) Since this corner point has a negative η 3 component and lies on the bounding sphere, we must also have we can then determine y cr through simple algebra as y cr = W sin 2 θsin 2 ϕ cos 2 θ + sin 2 θ sin 2 ϕ . (42) The final result now depends on the two acute orientation angles, θ and φ. By inspection of Fig. 4(f) or Eq. (42), it is clear that y cr increases if either θ or φ increases toward π/2. One therefore needs a larger response (y 3 ) to make the synapse certain. We can again express y cr in terms of projections y cr = W e u 2 e y 2 + e u 2 , (43) where e u is the projection of e along u, and e s* does not appear because the intersection occurred at the origin of the semiconstrained subspace.

A. High-dimensional feedforward networks
We have seen in the previous section how geometric considerations can identify synapses that must be present to generate observed response patterns in small networks. One can similarly ask when a synapse is required in high-dimensional networks [ Fig. 5(a)].
Although the rigorous derivation is intricate, this certainty condition is remarkably simple for orthonormal X (Appendix A). Quantitatively, orthonormal X imply that only a few parameters matter for the certainty condition, each illustrated in the previous section and abstractly summarized in Fig. 5 For any given synapse, its physical basis vector, e m , can always be written as a sum of components in the constrained, semiconstrained, and unconstrained subspaces, where c m , s m and u m denote the partial sums over μ in the constrained, semiconstrained, and unconstrained subspaces, respectively. Note that ε μ are orthogonal unit vectors if and only if X is an orthogonal matrix. In this case, the decomposition of e m is a sum of three orthogonal vectors that can be parameterized by two angles, e m = cosθ c m + sinθ cos ϕs m + sinθ sinϕu m , (45) where c m , s m , and u m are unit vectors in the constrained, semiconstrained, and unconstrained subspaces, and (θ, ϕ) are spherical coordinates 6 specifying the orientation of e m with respect to these subspaces [e.g., Fig. 4 As we have seen in the toy examples, these two orientation angles heavily influence whether the synapse is certain.
Additionally, because the solution space's height along c m [e.g., Fig. 4(b)] is controlled = by the angle between c m and y, the equation for the w m = 0 hyperplane that divides the positive and negative synaptic regions in the solution space depends on 6 The angles also depend on the synapse but we have dropped the m index for brevity. y ⋅ c m = y cos α, (47) where y is the length of y and α is the angle between y and c m [ Fig. 4(e)]. Finally, there is another critical angle, which we call γ, that encodes how s m is oriented with respect to the solution space in the semiconstrained subspace. Using a more convenient direction, s m ′ ≡ − Sgn (cos α)s m , which is either along or opposite to the s m direction, we define γ to be the minimal angle between s m ′ and the solution space [e.g., Fig. 4(b)]. It is generally given by (Appendix A), where s μ ′ is the μth component of s m ′ , and we have suppressed m to avoid cluttered notation. Although this definition and equation for γ may initially appear opaque, we soon clarify its meaning in terms of interpretable projections of the synapse vector.
which effectively measures the weakness of the presynaptic neuron's activity, as it is the amount of presynaptic drive for which we do not have any information on the target neuron's response. The more subtle quantity is The condition that s μ ′ < 0 selects for patterns where the sign of the presynaptic activity is Sgn(cosα) = Sgn(e y ), but the postsynaptic neuron does not respond. In other words, presynaptic activity should have promoted a response in the target neuron according to the observed activity correlation. That it does not generates uncertainty in the sign of the synapse. See Appendix A for a heuristic derivation of y cr based on this argument.
We can gain more useful intuition by interpreting our result in relation to what we would obtain in a linear neural network. In the linear problem, there are only constrained and unconstrained dimensions; every dimension that was semiconstrained in the nonlinear problem becomes constrained, with all solutions having η μ = y μ for μ = 1, …, . This implies that y cr, lin = W e u 2 e y 2 + e u 2 . (55) Returning to the nonlinear problem, recall that the certainty condition finds the largest y for which the solution space and w m = 0 hyperplane intersect within the weight bound, and this intersection is simply a point when y = y cr . Importantly, each semiconstrained dimension can either behave like a linear constrained dimension with η μ = y μ = 0 at this intersection point (toy problems 1 and 3), or like an unconstrained dimension with η μ < 0 at the intersection point (toy problems 1 and 2). 7 The first case occurs when e m ⋅ ε μ 7 Since this intersection point depends on m, the semiconstrained dimension indexed by μ can behave as constrained for some synapses and unconstrained for others.
where e p 2 = 1 − e u 2 , r y 2 = e y 2 /e p 2 , r s * 2 = e s * 2 / e p 2 − e y 2 , and all three composite parameters can be independently set between 0 and 1. Conceptually, r y and r s* merely normalize e y and e s* by their maximal values, and e p is the projection of e m into the activity-constrained subspace spanned by both constrained and semiconstrained dimensions. One could also interpret r s* as quantifying the effect of threshold nonlinearity. For instance, r s* = 0 describes the case where all semiconstrained dimensions are effectively constrained, but r s* increases as some of the semiconstrained dimensions start to behave like unconstrained dimensions. As expected, y cr is a decreasing function of r y 2 and e p 2 and an increasing function of r s * 2

B. Regarding nonorthogonal input patterns
While a complete treatment of the certainty condition for generally correlated input patterns is beyond the scope of this paper, we could find a conservative bound for y-critical that may be useful when patterns are close to being orthogonal. The details of the derivation are discussed in the final subsection of Appendix A.
The major challenge caused by nonorthogonal patterns is that the spherical weight space becomes elliptical in terms of the η coordinates. Thus, the main idea behind the bound is that one can always find the sphere that just encompasses this ellipse. We can then use our formalism to obtain a conservative y-critical, such that if the norm of y is larger than this value then all solutions within the encompassing sphere have a consistent sign for the synapse under consideration. An interesting insight that emerges from our analysis is that the relative orientations between n m ≡ ∑ μ = 1 N X mμ −1 ε μ (58) and the various important η directions play the role of θ, φ, α and γ (Appendix A). Note that n m = e m when X is an orthogonal matrix. We anticipate that n will also be an important player in a more comprehensive treatment of nonorthogonal patterns.

C. Application to recurrent networks
As we explained in Sec. III, to find the ensemble of all incoming weight vectors onto the ith driven neuron, one can use the results obtained for the feedforward network and just substitute X with the Z (i) matrix. Consequently, identifying certain synapses onto the ith neuron can follow the route outlined for the feedforward scenario as long as Z (i) is orthogonal. So for example, if we want to ascertain whether any incoming synapse to the ith neuron is certain, we have to replace x μm z μm (i) and y μ → y μi in Eqs. (51)-(54) to compute y cr .

D. Numerical illustration of the certainty condition
To illustrate and test the theory numerically, we first considered a small neural network of three input neurons and three driven neurons [ Fig. 6(a)]. This small number of synapses meant that we could comprehensively scan the entire spherical weight space without relying on a numerical algorithm to find solutions. 8 This is important because numerical techniques, such as gradient descent learning, potentially find a biased set of solutions that incompletely test the theory. We supposed that each driven neuron has three inputs, and we constrained weights with two orthonormal stimulus responses. We set W = 1 for all simulations and numerically screened weights randomly. See Appendix F for complete simulation details.
The first driven neuron in Fig. 6(a), y 1 , receives only feedforward drive, and we suppose that it responds to one stimulus condition with response y (μ = 2), but it does not respond to the other (μ = 1). Its synapses thus have one constrained, one semiconstrained, and one unconstrained dimension, and all of the terms in Eq. (49) contribute to y-critical. We could thus use y 1 to verify Eq. (49). Moreover, this scenario includes the illustrative example of Fig. 4(f) as a special case, so we could also use y 1 to verify Eq. (42).
To these ends, we decided to focus on a two-parameter family of input patterns, x = −sinψcosχ cosψcosχ sinχ cos ψ sinψ 0 , (59) where rows correspond to different input patterns and columns correspond to different input neurons, as usual, and we extend x to the full-rank orthogonal matrix X = −sinψcos χ cos ψcosχ sinχ cosψ sinψ 0 sinψ sinχ −cosψ sinχ cosχ .
8 Our results for the certainty condition hold for network ensembles that exactly generate the desired responses. For numerical tests, we had to allow for small deviations from the desired responses, but our predictions proved robust.
By Eq. (44), the physical basis vector corresponding to the synapse from the first input neuron is thus e 1 = cosψ ε 2 + sinψ cos χ − ε 1 + sin ψ sinχ ε 3 , (61) and it has the same general form as Eqs. (37) and (45), where ε 3 plays the role of u. If ψ and χ are both acute, then one can identify them with θ and φ in Fig. 4(f), and the roles of c and s are played by ε 2 and − ε 1 , respectively. In this case α = 0, cosγ = 0, and the theoretical dependencies of y cr on θ and φ are given by Eq. (42). Figure 6(b) illustrates these dependencies as the purple and dark green curves. If ψ is acute, but χ is obtuse, then according to our conventions, θ = ψ, φ = πχ, c = ε 2 , and s = ε 1 . Now α = 0 and cosγ = 1, and our general formula, Eq. (49), implies These dependencies are plotted as the pink and the light green curves in Fig. 6(b). We do not plot cases where ψ is obtuse, because obtuse and acute ψ result in equivalent y cr formulas. Whether ψ is acute or obtuse nevertheless matters because it determines the sign of the w 1 synapse when it is certain.
The black dots in Fig. 6(b) show the largest response magnitude, y, for which we numerically found solutions with both positive and negative w 1 (see Appendix F for numerical methods), thereby providing a numerical estimate of y cr . The theoretical curves and numerical points precisely aligned in all cases. The differences between the light and dark theoretical curves illustrates the effect of nonlinearity. When χ is obtuse, the semiconstrained dimension effectively behaves as unconstrained, and the mixing angle between the semiconstrained and unconstrained dimension is irrelevant to y-critical. When χ is acute, the semiconstrained dimension effectively behaves as constrained, as if its coordinate were set to zero. Moreover, these results confirmed that stronger responses were needed to make synapses fixed sign when the synaptic direction was less aligned with the constrained dimension [ Fig We next wanted to check the validity of our results for the recurrently connected neurons in Fig. 6(a). We therefore needed to tailor the steady-state activity levels of the recurrent network to result in orthogonal presynaptic input patterns for each driven neuron. In mathematical terms, Z (i) must be an orthogonal matrix for i = 1, 2, 3. We achieved this by considering a two-parameter family of driven neuronal responses in which the activity patterns of y 2 and y 3 were matched to those of x 1 and x 3 , respectively. This construction means that all three driven neurons receive the same input patterns. To ensure positivity of driven neuronal responses, we set χ as an acute angle and ψ as the negative of an acute angle.
Although y 2 has both feedforward and recurrent inputs, we can analyze its connectivity in exactly the same way as y 1 . Recurrence only complicates the analysis for neurons that synapse onto themselves, like y 3 , since changing the output activity also changes the input drive. So y 3 and y cr are not independent. Here we focused on the certainty condition for the self-synapse, w y3,y3 , for which y cr = cos χ, and y 3 = sin χ. Therefore, the synapse should be certain if 45° < χ ⩽ 90°. Since θ = π/2 − χ according to our conventions, 9 this is equivalent to 0⩽ θ < 45° [ Fig

A. Finding the solution space in the presence of noise
So far we have only considered exact solutions to the fixed point equations. However, it is also important to determine weights that lead to fixed points near the specified ones. For example, biological variability and measurement noise generally make it infeasible to specify exact biological responses. Furthermore, numerical optimization typically produces model networks that only approximate the specified computation. We therefore define the ℰ-error surface as those weights that generate fixed points a distance ℰ from the specified ones, where y μi is the specified activity of the ith driven neuron in in the μth fixed point, and y μi is the corresponding activity level in the fixed point approached by the model network when it is initialized as y i (t = 0) = y μi . If the network dynamics do not approach a fixed point, perhaps oscillating or diverging instead [54], we say ℰ = ∞.
Each ℰ-error surface can be found exactly for feedforward networks. For illustrative purposes, let us first consider the = 1 feedforward scenario in which the driven neuron is active in every response pattern. This means that y μ > 0 for all μ = 1, …, , and we can reorder the μ indices to sort the driven neuron responses in ascending order, 0 < y 1 < y 2 < ⋯ < y . Here we assumed that no two response levels are exactly equal, as is typical of noisy responses. Since all responses are positive, the zero-error solution space has no semiconstrained dimensions, and the only freedom for choosing w is in the = − unconstrained dimensions. Therefore, the zero-error surface of exact solutions, 0 , is a -dimensional linear subspace, and 0 is a point in the -dimensional activity-constrained subspace.
How does this geometry change as we allow error? For 0 < ℰ < y 1 , we must have y μ > 0 for all μ. Therefore, the nonlinearity is irrelevant, and ℰ-error surfaces are spherical in the activity-constrained η coordinates [Eq. (63), Fig. 7(a(i))]. However, once ℰ = y 1 it becomes possible that y 1 = 0, and suddenly a semi-infinite line of solutions appears with η 1 ⩽ 0. As ℰ further increases, this line dilates to a high-dimensional cylinder [ Fig. 7(a(ii))]. A similar transition happens at ℰ = y 2 , whereafter two cylinders cap the sphere [ Fig. 7(a(iii))]. Things get more interesting as ℰ increases further because two transitions are possible. A third cylinder appears at ℰ′ = y 3 . However, at ℰ″ = y 1 2 + y 2 2 it is possible for both y 1 and y 2 to be zero, and the two cylindrical axes merge into a semi-infinite hyperplane defined by η 1 ⩽ 0, η 2 ⩽ 0. Thus, when ℰ″ < ℰ′ the error surface grows to attach a third cylinder [ Fig. 7(a(iv))], and when ℰ″ < ℰ′ the two cylindrical surfaces merge to also include planar surfaces in between [ Fig. 7(av)]. These topological transitions continue by adding new cylinders and merging existing ones, and the sequence is easily calculable from {y μ }. Note that we use the terminology "topological transition" to emphasize that the structure of the error surface changes discontinuously at these values of error. The geometric transitions we observe here also relate to topological changes in a formal mathematical sense. For instance, while there are no incontractible circles in Fig. 7(a(ii)), one develops as we transition to Fig.  7(a(iii)).
In general, y μ may also be zero or negative in the presence of noise. Whenever y μ = 0, the μth response pattern generates a semiconstrained dimension in 0 . However, if some response levels are negative, then there are no exact solutions at all. However, it becomes possible to find solutions when ℰ = ∑ μ | y μ < 0 y μ 2 ,, and each response pattern associated with a negative y μ acts as a semiconstrained dimension in ℇ . As illustrated above, more semiconstrained dimensions open up as more error is allowed in each of these cases.
This geometry only approximates ℰ-error surfaces for recurrent networks (Appendix C). For instance, displacing y μi from its specified value changes the input pattern that define the ε μ directions for downstream driven neurons, but this effect is neglected here. We will nevertheless find that this feedforward approximation to ℰ-error surfaces is practically useful for predicting synaptic connectivity in recurrent networks as well.

B. Predicting connectivity in the presence of noise
The threshold nonlinearity and error-induced topological transitions can have a major impact on synapse certainty [ Fig. 7 We therefore generalized the certainty condition to include the effects of error, including topological transitions in the error surface (Appendix C). As before, finding the certainty condition amounts to determining when the w m = 0 hyperplane intersects the solution space within the weight bound, but to account for noise of magnitude ε, we must now check whether an intersection occurs with any ℰ-error surface with ℰ ⩽ ε Although we lack an exact expression for y-critical in the presence of noise, we derived several useful bounds and approximations (Appendix C). We usually focus on a theoretical upper bound for y-critical, y cr,max . Note that this upper bound suffices for making rigorous predictions for certain synapses, because y > y cr,max ⇒ y > y-critical.
We also computed a lower bound, y cr,min , to assess the tightness of the upper bound. This bound is y cr,min = W e s * 2 + e u 2 e y 2 + e s * 2 + e u 2 + ε W (65) without topological transitions. Both bounds increase with error and should be considered to be bounded above byW. As expected, both expressions reduce to Eq. (51) as ε/W → 0. We also note that the two bounds coincide, to leading order in ε/W, if e y ≪ max(e s* , e u ) and e p /max(e s* , e u ) = (1), and we argue in Appendix B that this is typical when the network size is large.
The effect of topological transitions is that y cr,max and y cr,min become the maximums of several terms, each corresponding to a way that constrained dimensions could behave as semiconstrained within the error bound (Appendix C). We compute each term from generalizations of Eqs. (64) and (65) that account for the amount of error needed to open up semiconstrained dimensions.

C. Testing the theory with simulations
To examine our theory's validity, we assessed its predictions with numerical simulations of feedforward and recurrent networks [ Fig. 8(a)]. Each assessment used gradient descent learning to find neural networks whose late time activity approximated some specified orthogonal configuration of input neuron activity and driven neuron activity (Appendix F).
We then used our analytically derived certainty condition with noise to identify a subset of synapses that were predicted to not vary in sign across the model ensemble (W = 1), and we checked these predictions using the numerical ensemble. We similarly checked predictions from simpler certainty conditions that ignored the nonlinearity or neglected topological transitions in the error surface (Appendix C). Note that we expected gradient descent learning to often fail at finding good solutions in high dimensions, as our theory predicts that each semiconstrained dimension induces local minima in the error surface [ Fig.  7(a)]. Since we did not want the theory to bias our numerical verification of it, we focused our simulations on small to moderately sized networks, where we could reasonably sample the initial weight distribution randomly. Future work will consider more realistic neural network applications.
We first considered feedforward network architectures, for which our analytical treatment of noise is exact. To illustrate how nonlinearity and noise affect synapse certainty, we calculated the magnitude of postsynaptic activity needed to make a particular synapse sign certain [ Fig. 8 We specifically considered 102 random input-output configurations of a small feedforward network with 6 input neurons ( = 5, = 2), which were tailored to have orthonormal input patterns and generate one topological error surface transition at small errors. In particular, we generated random orthogonal matrices by exponentiating random antisymmetric matrices, we set one element of y to a small random value to encourage the topological transition, and we ensured that the other nonzero random element of y was large enough to preclude additional transitions (Appendices C and F). For each input-output configuration, we then systematically varied the magnitude of driven neuron activity, y, finding 10 5 synaptic weight matrices with moderate error, ℰ 2 ≈ ε 2 , for each magnitude y. Since randomly screening a six-dimensional synaptic weight space is not numerically efficient, we applied gradient descent learning. Nevertheless, the small network size meant that we could comprehensively sample the solution space and numerically probe the distinct predictions made by each bound or approximation used to estimate y-critical.
As expected, the maximum value of y that produced numerical solutions with mixed synapse signs [ Fig. 8(b), black dots] was always below the theoretical upper bound for y-critical This means that these simplified calculations for estimating y-critical make erroneous predictions, because the synapse sign is supposed to be exclusively positive or negative whenever y exceeds y-critical, by definition. Therefore, we were able to accurately assess synapse certainty, and this generally required us to include both the nonlinearity and noise-induced topological transitions in the error surface.
We next asked how often we could identify certain synapses in larger networks. For this purpose, we generated 25 random input-output configurations in the feedforward setting (Appendix F), again with orthonormal input patterns, but this time we increased the number of input neurons from 4 to 100 across the configurations [ Fig. 8(c)]. As we increased the size of the network, we kept / fixed at 0.25 and / fixed at 1 [ Fig. 8(c), brown] or 0.5 [ Fig. 8(c), purple]. These scaling relationships put our simulations in the setting of high-dimensional statistics [65], where both the number of parameters and the number of constraints increase with the size of the network. In this high-dimensional regime, a simple heuristic argument suggests that the number of zero-error certain synapses should scale linearly with the number of synapses (Appendix B), because y cr and the typical magnitude of y scale equivalently with . Here we tested this prediction by setting y randomly, setting y = 1 − ln 2/-to approximate the median norm of vectors in the unit -ball (Appendix B), and numerically finding a small error solutionC for each configuration (ℰ 2 / ≈ 10 −6 ).
As expected, we empirically found that the number of certain synapses predicted by the theory [Fig. 8(c), solid lines] scaled with the network size linearly [ Fig. 8(c), dashed lines]. The jaggedness of the solid curves reflect the fact that each point is specific to the random input-output configuration constructed for that value of . The purple curve corresponds to the case when = 2 = 4 and the brown curve when = = 4 . Furthermore, for every certain synapse predicted, we verified that its predicted sign was realized in the numerical solution we found [ Fig. 8(c), circles]. These results suggest that the theory will predict many synapses to be certain in realistically large neural systems.
Finally, we empirically tested our theory for a recurrent network [Figs. 8(d) and 8(e)], where our treatment of noise is only approximate. For this purpose, we considered networks without the self-coupling terms, = ℐ + − 1. We constructed a single random configuration with nonnegative driven neuron responses and orthogonal presynaptic patterns for one of the driven neurons 10 (Appendix F). This driven neuron could thus serve as the target neuron for our analyses. Note that it is sometimes possible to orthogonalize the input patterns for more than one driven neuron, but this is irrelevant to our analysis and is not pursued here. We then used gradient descent learning to find around 4500 networks that approximated the desired fixed points with variable accuracy. For technical simplicity, we first found connectivity matrices using a proxy cost function that treated the network as if it were feedforward. We then simulated the neural network dynamics with these weights and correctly evaluated the model's error as prescribed by Eq. (63).
This network ensemble revealed that constrained and semiconstrained dimensions accurately explained the structure of the solution space for recurrent networks with nonzero error. Figure 8(d) shows the projection of the corresponding solution space along two η directions, one predicted to be constrained by the feedforward theory and the other predicted to be semiconstrained. As predicted, the extension of the solution space along the negative semiconstrained direction was clearly discernible. However, recurrence implies that the exact solution space is not perfectly cylindrical around the semiconstrained axes (Appendix C), because the driven neuron inputs to the target neuron can themselves vary due to noise. Here this effect was empirically insignificant, and the geometric structure of the solution space conformed rather well to our feedforward prediction. One might have expected the error [color in Fig. 8(d)] to increase monotonically as one moves away from the center of semiconstrained cylinder, but this expectation is incorrect for two reasons. First, we are visualizing the error surface as a projection along two dimensions, yet variations in other 10 We performed this numerical experiment with several random configurations to confirm that the results did not qualitatively depend on the random sample. η coordinates add variation to the error. 11 Second, we are visualizing the solution space for one target neuron, but other driven neurons in the recurrent network contribute to the summed error represented by the color.
Moreover, the theory correctly predicted how the number of certain synapses would decrease as a function of ε [ Fig. 8(e)], and we never found a numerical violation of the theoretical certainty condition that included nonlinearity and noise. In Fig. 8(e), the yellow circles represent the number of certain synapses that were predicted by the theory and verified to have synapse signs that agreed with the theoretical prediction. Here accurate predictions did not require us to account for topological error surface transitions. In contrast, although our simulations usually agreed with the predictions of the linear theory [ Fig. 8(e), cyan circles], they could also disagree. In Fig. 8(e), the blue crosses indicate configurations where the linear theory incorrectly predicted some synapse signs. The absence of red crosses reiterates the consistency of predictions coming from the nonlinear treatment.

VII. DISCUSSION
In summary, we enumerated all threshold-linear recurrent neural networks that generate specified sets of fixed points, under the assumption that the number of candidate synapses onto a neuron is at least the specified number of fixed points. We found that the geometry of the solution space was elegantly simple, and we described a coordinate transformation that permits easy classification of weight-space dimensions into constrained, semiconstrained, and unconstrained varieties. This geometric approach also generalized to approximate error-surfaces of model parameters that imprecisely generate the fixed points. We used this geometric description of the error surface to analyze structure-function links in neural networks. In particular, we found that it is often possible to identify synapses that must be present for the network to perform its task, and we verified the theory with simulations of feedforward and recurrent neural networks.
Rectified-linear units are also popular in state of the art machine learning models [29,[66][67][68], so the fundamental insights we provide into the effects of neuronal thresholds on neural network error landscapes may have practical significance. For example, machine learning often works by taking a model that initially has high error and gradually improving it by modifying its parameters in the gradient direction [69]. However, error surfaces with high error can have semiconstrained dimensions that abruptly vanish at lower errors (Fig.  7). Local parameter changes typically cannot move the model through these topological transitions, because models that wander deeply into semiconstrained dimensions are far from where they must be to move down the error surface. The model has continua of local and global minima, and the network needs to be initialized correctly to reach its lowest possible errors. This could provide insight into deep learning theories that view its success as a consequence of weight subspaces that happen to be initialized well [70,71].
The geometric simplicity of the zero-error solution space provides several insights into neural network computation. Every time a neuron has a vanishing response, half of a 11 For example, imagine projecting the 3D surfaces in Fig. 7(a) along two dimensions. dimension remains part of the solution space, which the network could explore to perform other tasks. In other words, by replacing an equality constraint with an inequality constraint, simple thresholding nonlinearities effectively increase the computational capacity of the network [72,73]. The flexibility afforded by vanishing neuronal responses thereby provides an intuitive way to understand the impressive computational power of sparse neural representations [50,[74][75][76]. Furthermore, the brain could potentially use this flexibility to set some synaptic strengths to zero, thereby improving wiring efficiency. This would link sparse connectivity to sparse response patterns, both of which are observed ubiquitously in neural systems.
Our theory could be extended in several important ways. First, we only derived the certainty condition to identify critical synapses from orthonormal sets of fixed points. Although our orthogonal analysis also provides a conservative bound for a general set of fixed points (Appendix A), a more precise analysis will be needed to pinpoint synapses in realistic biological settings where stimulus-induced activity patterns may be strongly correlated. Since our error surface description made no orthonormality assumptions, this analysis will only require more complicated geometrical calculations to discern whether the synapse sign is consistent across the space of low-error models. Furthermore, we could use the error surfaces to identify multisynapse anatomical motifs that are required for function, or to estimate the fraction of models in which an uncertain synapse is excitatory versus inhibitory. It would also be interesting to relax the assumption that the number of fixed points is small. This would allow us to consider scenarios where the fixed points can only be generated nonlinearly. We could also consider cases where no exact solution exists at all. Here we assumed that we knew the activity level of every neuron in the circuit. This is not always the case, and it will be important to determine how unobserved neurons alter the error landscape for synaptic weights connecting the observed neurons. The error landscape geometry will also be affected by recurrent network effects that we ignored here (Appendix C). It will be interesting to see whether the geometric toolbox of theoretical physics can provide insights into the nontrivial effects of unobserved neurons and recurrent network dynamics. Finally, we note that it will sometimes be important to analyze networks with alternate nonlinear transfer functions. Our analyses already apply exactly to recurrent networks with arbitrary threshold-monotonic nonlinear transfer functions (Appendix D). Moreover, our analyses can approximate any nonlinearity by treating its departures from threshold-linearity as noise (Appendix D). An extension to capped rectified linear units [67], which saturate above a second threshold, would also be straightforward. In particular, semiconstrained dimensions would emerge from any condition where the target neuron is inactive or saturated.
Our primary motivation for undertaking this study was to find rigorous theoretical methods for predicting neural circuit structure from its functional responses. This identification can be used to corroborate or broaden circuit models that posit specific connectivity patterns, such as center-surround excitation-inhibition in ring attractors [16][17][18] or contralateral relay neuron connectivity in zebrafish binocular vision [12,77]. More generally, if an experimental test violates the certainty conditions we derived using our ensemble modeling approach, it will suggest that some aspect of model mismatch is important. We could then move on to the development of qualitatively improved models that might modify neuronal nonlinearities, relax weight bounds, incorporate subcellular processes or neuromodulation, or hypothesize hidden cell populations. However, we hope that our focus on predictions that follow with certainty from simple network assumptions will enable predictions that are relatively insensitive to minor mismatches between our abstract model and the real biological brain. More nuanced predictions may require more nuanced models.
An important parameter of the theory is the weight bound. In particular, W bounds the magnitude of synaptic weight vectors in biological networks, and our certainty condition declares a synapse to be necessary when the ratio y/W exceeds a critical value. It is not a priori clear how to set this scale parameter without additional biological data. Nevertheless, one could use the neuronal activity data to compute each synapse's W-critical value, below which the certainty condition is satisfied, and rank-order the synapses according to decreasing W-critical values. Until we know the value of W, we do not know where to draw the line between certain synapses and uncertain synapses. However, our theory predicts that all of the certain synapses will be at the top of the list, which specifies a sequence of experimentally testable predictions and may already provide biological insights into the important synaptic connections. Testing these predictions can help constrain the theory's biological bound parameter.
Our theory describes function at the level of neural representations. This description is useful because many systems neuroscience experiments measure representations directly, and it is important to build mechanistic models that explain these data in terms of neural network interactions [12,15,18,77]. However, it would also be interesting to link structure to function at the higher levels of behavior and cognition. This is a significantly different problem because multiple representations can support the same high-level functions, and both neural network structure and representation can change over time [78][79][80][81][82][83][84][85]. Consequently, experimental tests of our current framework must measure network structure and representation on timescales shorter than the network's representational dynamics, and certain synapses may be most biologically meaningful in innate circuits with limited plasticity. Extensions to our framework may also be useful for relating structural and representational dynamics in circuits for learning [86].
An exciting prospect is to explore how our ensemble modeling framework can be combined with other theoretical principles and biological constraints to obtain more refined structurefunction links. For instance, we could refine our ensemble by restricting to stable fixed points. Alternatively, once the sign of a given synapse is identified, Dale's principle might allow us to fix the signs of all other synapses from this neuron [87]. This would restrict the solution space and could make other synapses certain. Utilizing limited connectomic data to impose similar restrictions might also be a fruitful way to benefit from largescale anatomical efforts [7,10,13,14]. Finally, rather than restricting the magnitude of the incoming synaptic weight vector, we could consider alternate biologically relevant constraints, such as limiting the number of synapses, minimizing the total wiring length, or positing that the network operates at capacity [88,89]. These changes would modify the certainty conditions in our framework, as well as our experimental predictions. We could therefore assess candidate optimization principles and biological priors experimentally. While the base framework developed here was designed to identify crucial network connections required for function, we hope that our approach will eventually allow us to assess theoretical principles that determine how neural network structure follows from function.

ACKNOWLEDGMENTS
The authors thank Tianzhi (Lambus) Li, Srini Turaga, Andrew Saxe, Ran Darshan, and Larry Abbott for helpful discussions and comments on the manuscript. This work was supported by the Howard Hughes Medical Institute and the Janelia Visiting Scientist Program.

Preliminaries
For completeness, we begin by briefly reviewing a few central concepts from the main manuscript.

a. From recurrent to feedforward networks
Let us consider a neural network of ℐ input neurons that send signals to an interconnected population of driven neurons governed by dynamical equations (6) where, as prescribed in the main manuscript, y μi and x μm denote steady-state activity levels of the driven and input neurons to the μth stimulus, which we have combined into z μm , and is the number of incoming synapses onto each of the driven neurons. Equation (A1) provides × nonlinear equations for × unknown parameters. However, we immediately notice that the steady-state activity of neuron i depends only on the ith row of the connectivity matrix, so these equations separate into independent sets of equations with unknowns, the weights onto a given driven neuron. In other words, the recurrent network involving driven and ℐ input neurons decomposes into feedforward networks with = + ℐ feedforward inputs. The steady-state equations for these feedforward networks are given by where we have now suppressed the i index in y μi and in w im . For this feedforward network we will refer the ith neuron as the target neuron, and it is as if that all the neurons (driven and input) are providing feedforward inputs to it. As long as we only consider exact solutions to the fixed point equations, the problem of identifying synaptic connectivity in a recurrent network reduces to solving the problem for feedforward networks. Thus, in the rest of this Appendix we will focus on identifying w m 's satisfying Eq. (A2).
Note that the main text used the notation z μm (i) to emphasize that the set of presynaptic neurons may depend on the target neuron, but we simply write z μm throughout the Appendices with the understanding that the formalism applies to a specified target neuron whose index is suppressed. Furthermore, for conceptual simplicity the main text first stated many results in a feedforward setting with a single driven neuron, but the Appendices immediately treat the general case where presynaptic partners may come from either the input or driven populations of neurons.

b. A convenient set of variables
In all our discussions in this section the input neuronal response matrix, z μm , will be assumed to be fixed. Note that z μm connects synaptic weight vectors to the target response vector and can be used to define weight combinations, the η coordinates. Each η coordinate controls the target response to a single stimulus condition: It is rather convenient to extend this set of η coordinates to a basis set of η coordinates, such that all synaptic weights can be uniquely expressed as a linear combination of these η coordinates, and vice versa. To see how this can be done, we will henceforth make the simplifying assumption that the × matrix has the maximal rank, , although we anticipate that much of our framework, results, and insights will apply more generally. If z has maximal rank, then its kernel will be an ( − )-dimensional linear subspace spanned by ( − ) orthogonal basis vectors, denoted by ε μ for μ = + 1 … . We can now extend z to an × matrix, Z, as follows: Z μm = z μm for μ = 1…P, and ∀m, Z μm = ε μm for μ = P + 1…N, and ∀m, where ε μm is the mth component of the null vector ε μ . With this construction, it is easy to see that the new η coordinates, where for notational simplicity we have ordered the response patterns such that y μ = 0 only for μ = 1, …, . Also, we extend the y μ 's to an -dimensional vector, y , by assigning y μ = 0 for μ = + 1 … .
For later convenience we also define the number of semiconstrained and unconstrained dimensions as, = − and = − , respectively.

Derivation of the certainty condition for orthogonal input patterns
Our goal here is to use the solution space (i.e., ensemble of weights that are precisely able to recover the specified target responses) to derive a condition for when we can be certain that a given synapse must be nonzero. For technical simplicity, we will specialize to the case when all the response patterns are orthonormal, i.e., where I is the identity matrix. Then we can always choose the extended Z matrix to be an × orthogonal matrix, such that Z −1 = Z T and the ε μ vectors now form an orthonormal basis. Motivated by biological constraints, we will impose a bound on the magnitude of the synaptic weight vector. For orthonormal response patterns, this translates into a spherical bound on η coordinates as well [see Fig. 5 We refer to this -dimensional ball, in which all admissable synaptic weights reside, as the weight space.

a. A heuristic argument for y-critical
Before diving into the rigorous and technical derivation, in this subsection we first try to intuitively understand how the certainty condition (51) can arise. For this purpose, let us start with a linear theory with no unconstrained dimension, so = = 0. In this case, there is a unique set of weights that can precisely reproduce the observed responses: Since Z μm = z μm represents the responses of the mth presynaptic neuron, the solution for the mth synaptic weight (A12) is simply the correlation between the pre and post synaptic activity. In a linear theory, the sign of the synapse is thus dictated by the sign of the correlation between the pre and post synaptic neuron.
Let us now allow a single ( th) unconstrained direction. One can think of this situation as if we do not have the information on how the target neuron would respond to the unconstrained stimulus pattern. If we knew that this response was say, y u , then we would have been able to determine the sign of w m : Now, it is easy to recognize that the first term is just e ⋅ y = ye y , where we have suppressed the m index on e m here to reduce notational clutter and will continue to doso while referring to the synapse direction whose we are considering. 12 e y = e ⋅ y refers to the projection of e along y. Also, note that in this simple case with one unconstrained direction, the projection of e along the unconstrained subspace is just given by e u = e ⋅ ε N = Z Nm . Further, since Z is orthogonal, to have any solution at all Substituting the maximum |y u | from Eq. (A15) into Eq. (A14), after some algebra we get the condition for sign certainty as y > y cr = W e u 2 e u 2 + e y 2 . (A16) The same argument applies if the th direction is semiconstrained instead of unconstrained, with one notable difference. If the th pattern was semiconstrained that means y = 0, and the nonlinear thresholding is masking how the target neuron would have responded in a linear model. 13 However, the ambiguity in sign can only arise if the second term has a sign opposite to the first term, or a sign opposite to Sgn(e y ). Moreover, for the thresholding to act, the target response for the semiconstrained pattern must be negative in the linear theory, so Z m has to have the same sign as e y to generate the ambiguity. And, if it is indeed so, then we obtain a certainty condition that is identical to Eq. (A16) except that e u → e s , the projection of e along the semiconstrained direction: y > y cr = W e s 2 e s 2 + e y 2 . (A17) If Z m and e y have opposite signs, then the synapse always has the same sign throughout the solution space.
While this derivation of y cr is heuristic and only deals with a single semiconstrained or unconstrained dimension, it provides intuition for the general result (51). Essentially, whether the sign of a given synapse is constant across the solution space depends on two competing quantities: the correlation between the pre-and postsynaptic responses; and the strength of the postsynaptic drive for patterns where the target response is either unknown or masked by the thresholding nonlinearity.

b. Hyperplane dividing excitatory and inhibitory synaptic regions
Having gained some intuition about the certainty condition, let us now proceed to a rigorous derivation of the result. Since the constrained coordinates are fixed for the weight vectors that belong to the solution space [the deep yellow wedge in Fig. 5(b)], we must have so that the solution space resides within an ( + )-dimensional ball with radius 13 Note that our relation between the sign of the synapse and the sign of the correlation is based on a linear response.
as depicted in Fig. 5(b), by the yellow region. We refer to this semiconstrained plus unconstrained subspace as the flexible subspace. Now, the synaptic direction of interest, e, can be decomposed into its projections along constrained, semiconstrained and unconstrained subspaces. For notational simplicity, let us denote e μ ≡ e ⋅ ε μ = Z μm as the component of e along ε μ . Note that the second equality follows from the orthogonality assumption and Eq. (A9). In general, in this manuscript we will subscripts on e to denote projections of e along different directions or subspaces. We are the projections of e along these directions. One could think of θ, φ as representing a spherical coordinate system where the role of x, y, and z axesare played by s, u, and c respectively, and our definitions (A20)-(A22) imply the convention, 0 ⩽ {θ, φ} < π/2.
For later convenience, let us also introduce the projection of e onto the activity-constrained subspace e p ≡ p ⋅ e = cos 2 θ + sin 2 θcos 2 ϕ = ∑ μ = 1 P e μ 2 ⩾ 0, where p ≡ ∑ μ = 1 P e μ ε μ ∑ μ = 1 P e μ 2 . (A23) We would also like to emphasize that we can compute θ, φ just from the knowledge of the neuronal responses, z μm = e μ , which is particularly useful for numerical calculations: 14 Again, we remind the readers that in the main manuscript these projected vectors were denoted by c m , s m and u m .
where we have now also suppressed the index m in w m . Also, we have defined α ∈ [0, π] to be the angle between y and c. We now notice that the origin of the flexible subspace, w s = w u = 0, is in the solution space and the sign of w for this solution point is given by In other words, if the sign of the synapse is certain, this certain sign must be Sgn(cos α), which corresponds to the sign of the correlation between the target neuron and the presynaptic neuron. Intuitively, positive correlations point to an excitatory connection, and negative correlations point to an inhibitory connection.

c. Special case without unconstrained dimensions
To derive the certainty condition, let us start by looking at the case when = , so that there are no unconstrained directions, or equivalently, ϕ = 0. In this case, the solution space is just the all-negative orthant in the -dimensional semiconstrained hypersphere (Fig. 9), and the equation for the w = 0 hyperplane can be written as sinθ s′ ⋅ w s = ycosθ cosα , where the right-hand side (RHS) is positive, and we have introduced which flips the direction of s if cos α > 0, or equivalently, if e y > 0. Now, if the w = 0 hyperplane (orange lines in Fig. 9) is far enough along s′ from the origin that it does not intersect with the all-negative orthant within the weight bounds, then we can be certain that w is nonzero and always has a consistent sign. To check this, we need to compare the cone angle that the orange hyperplane subtends at the center, φ, with the minimum angle, γ, that the s′ vector makes with the all-negative orthant.
First, φ can easily be inferred from trigonometry: where d s = y cot θ|cos α| represents the distance from the center of the semiconstrained sphere to the hyperplane. The expression for d s follows from the general mathematical result that if is an equation for a hyperplane, where x denotes the coordinate vector and β = sin θs′ | β | = sin θ, β 0 are constants, then the perpendicular distance, d ⊥ , to it from a point x = ζ is given by Note, we are interested in the distance from the origin, ζ = 0, to the hyperplane satisfying Eq. (A28), so β = sin θ s′ | β | = sin θ and β 0 = − ycosθ | cosα|.
To provide a geometric intuition for γ, let us first assume that s′ does not point into the allnegative orthant. If we can find the projection of s′ on the correct boundary of the solution space, then γ will be given by the angle between s′ and the appropriate semiconstrained boundary vector, s * (Fig. 9). Since all the components in the solution space (all-negative orthant) have to be negative or zero, to find the appropriate projection vector of s′ onto the boundary of solution space, we essentially have toset all the positive components to zero: where again the sign in Θ is determined by the sign of e y .
A formal way to see that γ is indeed given by Eq. (A34) is to start with any unit vector, w s , lying in the solution space. Then, the angle, γ, between s′ and w s is given by where we have defined A ± as the set of all μ indices for which s μ ′ is positive/negative, respectively. Since w s is in the solution space, w sμ ⩽ 0, and therefore the second term sums positive quantities while the first term subtracts. Thus, where both the equalities are achieved when w s is aligned with the boundary semiconstrained vector, s * , or w s = s * , as argued previously. Note also, that this formal proof did not assume any restrictions on s′ direction and thus Eq. (A34) turns out to be a general result that also holds if s′ points into the all-negative orthant.

d. General case with unconstrained dimensions
We can extend the above analysis to the case when we have unconstrained dimensions by noting that, for a given set of unconstrained coordinates, the solution space is again the all-negative orthant in a semiconstrained hypersphere. Isometry along unconstrained dimensions ensures that it is always possible to make one of the null directions, lets say ε N , align with u. Then, the w = 0 hyperplane Eq. (A26) reads w = sinθcosϕs ⋅ w s + ycosθcosα + η u sin θsinϕ = 0, which can be rewritten as sinθcosϕs′ ⋅ w s = ycos θ cosα − η u ′ sinθsinϕ, where we have introduced η u ′ = − Sgn (cosα)η u . To have a certain synapse, the w = 0 hyperplane cannot intersect the solution space for any allowed value of η u ′ .
The direction of s′ is independent of the unconstrained coordinates and hence the value of γ remains unchanged. However, the cone-angle, φ, does depend on the unconstrained coordinates in two ways. First, the radius, W , of the -dimensional spherical subspace containing admissible solutions is now where η ⊥ is the magnitude of the weight-vector in the ( − 1)-dimensional subspace that is perpendicular to ε N = u. We note in passing that Eq. (A40) implies η ⊥ , η u ′ ⩽ W 2 − y 2 = W .
(A42) So we will now look into cases when y ⩾ y cr,lin which also means that Eq. (A41) will remain valid. For us to be certain that w is nonzero, we have to make sure that even the largest φ that one can obtain by varying η ⊥ and η u ′ is still smaller than γ. Clearly, to make φ large it is best to make η ⊥ = 0. Also, it is clear from inspection that cos φ starts to initially decrease as η u ′ increases from zero, being dominated by the linear term. However, as the quadratic term in η u ′ in the denominator becomes more and more important, cos φ reaches a minimum and starts to increase. Imposing d cos φ/dη u ′ = 0, we can find that this minimum is reached at where we substituted y cr,lin from Eq. (A42) and used the fact that W ⩾ y ⩾ y cr,lin to obtain the inequality. This proves that the minimum cos φ indeed occurs at an allowed positive value of η u ′ ⩽ W . Substituting the above η u ′ in Eq. (A43), after some algebra we find that this minimum value of cos φ, or equivalently the maximum φ, is given by cos φ = y 2 cos 2 θcos 2 α − W 2 − y 2 sin 2 θ sin 2 ϕ W 2 − y 2 cos 2 ϕsin 2 θ . (A45) The certainty condition then requires cos 2 φ = y 2 cos 2 θcos 2 α − W 2 − y 2 sin 2 θsin 2 ϕ W 2 − y 2 cos 2 ϕsin 2 θ > cos 2 γ, which can be recast as y > y cr ≡ W cos 2 γ sin 2 θcos 2 ϕ + sin 2 θ sin 2 ϕ cos 2 α cos 2 θ + cos 2 γ sin 2 θcos 2 ϕ + sin 2 θsin 2 ϕ , It is illuminating to express y-critical in terms of the projections, e y , e u , e s * , of the synaptic direction, e, respectively along the data vector, y, the unconstrained unit vector, u, and the semiconstrained boundary vector, s * : y cr = W e s * 2 + e u 2 e y 2 + e s * 2 + e u 2 , where e y ≡ e ⋅ y = ∑ μ = 1 C y μ e μ ∑ μ = 1 C y μ 2 = cosθ cosα, e s * ≡ e ⋅ s * = ∑ μ ∈ A − e μ 2 = − Sgn(cosα )sinθcos ϕcos γ, and e u is given by Eq. (A22). We note that setting ϕ = 0 precisely reproduces the correct limit with no unconstrained directions (A37).

Regarding orthogonal input patterns in recurrent networks
While our analysis of the solution space and the certainty condition (A48) translate directly to recurrent networks, the requirement of orthogonality for the derivation of our certainty condition imposes certain technical restrictions on its scope when it comes to recurrent neural networks.
The certainty condition we derived for feedforward networks can be applied to two different recurrent neural network set ups. First, let us consider networks where neurons have selfcouplings. A consequence of having orthogonal response patterns in this case is that the certainty condition can only be satisfied for self-couplings w ii , as long as W ⩾ 1. This is because the imposition of orthogonality in response patterns also restricts the correlation between the target neuron and the other neurons: However, for the synapse-sign to be certain, the responses of pre and postsynaptic neurons need to be correlated. To see the problem more quantitatively, suppose we are interested in constraining the synapse from the mth neuron onto the ith neuron, as before. Now, the first P elements of the unit vectors, e i and e m contain the responses of the ith and the mth neuron in the patterns. We have already derived a decomposition of e m in terms of its projections onto the constrained, semiconstrained and unconstrained subspaces (A20). Similarly, e i can be decomposed as e i = yy + 1 − y 2 y ⊥ , where y lies entirely along the constrained directions, and y ⊥ is orthogonal to it and only has components along unconstrained directions. Then, orthogonality implies e i ⋅ e m = y cosθcosα + 1 − y 2 sinθsinϕ y ⊥ ⋅ u = 0 sinθ sinϕ = − ycos θcos α y ⊥ ⋅ u 1 − y 2 . (A52) Starting from the certainty condition (A48), we can now go through a sequence of (in)equalities: y 2 > W 2 sin 2 θsin 2 ϕ + sin 2 θ cos 2 ϕcos 2 γ sin 2 θ sin 2 ϕ + sin 2 θ cos 2 ϕcos 2 γ + cos 2 θ cos 2 α ⩾ W 2 sin 2 θ sin 2 ϕ sin 2 θsin 2 ϕ + cos 2 θcos 2 α = W 2 y 2 cos 2 θcos 2 α y 2 cos 2 θcos 2 α + 1 − y 2 y ⊥ ⋅ u 2 cos 2 θ cos 2 α = W 2 y 2 y 2 + 1 − y 2 y ⊥ ⋅ u 2 ⩾ W 2 y 2 , where we substituted sin θ sin ϕ from Eq. (A52). Note that the RHS is minimized when u and y ⊥ are either aligned or antialigned. Even in this case, RHS = W 2 y 2 , and thus the certainty condition cannot be satisfied if W ⩾ 1. One can check that when i = m, because the RHS in the first equation of (A52) is one and not zero, no similar constraints appear. Indeed, the certainty condition may be satisfied depending upon the specific response patterns.
As a second possibility, suppose that no self-couplings are present. Then to be able to apply our framework and determine the couplings w im for a given i, we only need the truncated row vectors of z whose ith column entry is absent, to be orthonormal. Therefore, the response of the ith driven neuron, which consists of the entries of the ith column, can now be chosen independently from the responses of its input neurons. In other words, e i and e m , m ≠ i, no longer need to satisfy orthogonality constraint of Eq. (A52). Consequently, the w im weights can indeed satisfy the certainty condition, just as in the feedforward case.

Implied conservative bound on the certainty condition for nonorthogonal input patterns
A complete treatment of the certainty conditions for nonorthogonal fixed point patterns is beyond the scope of this work. However, here we provide some preliminary results and insights by explaining how our formalism for analyzing orthogonal fixed-point patterns can be simply adapted to derive an exact, but conservative, upper bound for y-critical that applies to general sets of patterns.
Conceptually speaking, deriving the certainty condition amounts to determining when the w m = 0 hyperplane intersects the solution space within the sphere of weight vectors with norm at most W. Because the solution space is exceedingly simple in η coordinates, our orthogonal analysis used η coordinates to conveniently recast the equations for the bounding sphere and w m = 0 hyperplane. To adapt this analysis to the nonorthogonal case, it is important to account for three important changes to the geometry of the problem. Most fundamentally, the ε μ directions corresponding to η coordinates are no longer orthonormal.
However, the mathematical notions of orthogonality and normality are implicitly defined with respect to the inner-product structure imposed on the vector space, and our geometrical calculations from the orthogonal case easily carry over to the nonorthogonal case if we redefine the inner product structure of the weight space to give an orthonormal coordinate system with respect to the η coordinates rather than the physical coordinates. 15 In practice, all that this will entail is interpreting the η coordinates as if they define coordinates along orthogonal axes, and we will never need to explicitly write down the associated inner product. Second, the equation for the weight bound in the orthogonal system of η coordinates is elliptical around the origin, rather than spherical. However, for any ellipse one can find a sphere that just encompasses it. If we can find the radius of this bounding sphere, then one can look for an intersection anywhere within this sphere and our geometrical approach for deriving Eq. (A48) will carry over and provide a conservative bound for y-critical. This bound will poorly approximate the true y-critical when some axes of the ellipse are much longer than others. Third, the normal vector to the w m = 0 hyperplane is no longer e m in the orthogonal system of η coordinates. Therefore, the projections of e m in Eq.
(A48) must be generalized to become projections of the hyperplane's normal vector.
To obtain the radius of the bounding sphere, consider the SVD decomposition of the datamatrix: where L and R are × and × orthogonal rotation matrices, and Λ is a × rectangular diagonal matrix whose only nonzero entries are given by 15 In particular, for an orthogonal Z matrix, the two inner product structures defined via e m ⋅ e n = δ mn and ε μ ⋅ ε v = δ μν are equivalent, but this is not the case when Z is nonorthogonal. Although one would conventionally adopt the first inner product structure, both the derivation and interpretation of the conservative y-critical formula is easier in terms of the latter inner product structure, which makes all of the response pattern directions orthonormal by definition.
Λ aa = λ a ⩾ 0, a = 1…P . (A55) Note that λ 1 ,·⋯ ,λ are called the singular values of z. We can now define rotated coordinates: so that η′ = L T LΛR T w = Λw′ η a ′ = λ a w a ′ ∀ a = 1…P . (A57) Note that η and η′ are vectors in the current notation. We also note that since w′ is just a rotation of the original synaptic coordinates, the biological bound does not change as we go from w to w′ coordinates: where in the last step we have used the fact that the orthogonal matrix L does not change the L2-norm as one goes from η′ to η coordinates, and λ max is defined as the maximal singular value. To obtain spherical symmetry, we thus define unconstrained coordinates via η μ = λ max w μ ′ , ∀ μ > P, (A60) such that We now realize that the problem of finding y-critical using this conservative bound can be recast into the problem of the orthogonal case: As we just described, the η coordinates satisfy the conservative spherical bound. Our goal can then be to find the minimum value of y for which the hyperplane satisfying, w m = 0, does not intersect the solution space. Now, if we define Z to be, as in the orthogonal case, a full rank extension 16 of z: Z μm = z μm , ∀ μ ⩽ P, λ max R μm T , ∀ μ > P, (A63) so that for all value of μ, then the hyperplane equation can be rewritten as From Eq. (A65) it is clear that is perpendicular to the w m = 0 hyperplane, where we remind the readers that ε μ 's were defined by Eq. (A9). n m thus plays the role of e m whose orientation with respect to the constrained, semiconstrained and unconstrained dimensions determines the certainty condition. Specifically, y cr = W ′ n s * 2 + n u 2 n y 2 + n s * 2 + n u 2 = W λ max n s * 2 + n u 2 n y 2 + n s * 2 + n u 2 , where the various projections of n m are given by sign as n y . To summarize, the above analysis suggests that both Z and Z −1 will play an important role in generalizing the certainty condition to nonorthogonal patterns, and especially the relative orientation of n m (defined by Z −1 ) with respect to the η directions.

CERTAIN IN LARGE FEEDFORWARD NETWORKS
For given values of (= ℐ), , in a feedforward setting we will here try to assess how likely it is that noiseless orthonormal neuronal responses require a given synapse to be nonzero. As we have seen in Eq. (49), whether a synapse is certain to exist depends on six parameters, θ, φ, γ, α, W, and y. The first four quantities depend on how e is oriented with respect to various directions in the weight space. Since e is a unit vector, typically we expect its component along any given direction to be O(1/ N). Thus, we typically expect Then, we find that as the network size increases y cr behaves as and y cr is essentially pushed up toward W.
However, the typical scale of y behaves similarly as the dimensions increase. To see this concretely, let us define y cons ≡ y μ | μ = 1…C as a -dimensional vector, and assume that every possible y cons is equally likely within a sphere of radius W (larger activity levels of the target neuron admit no solutions respectively. Since y and y cr scale similarity as one increases the network size, the probability of a synapse being certain should not change as the network size increases. In Fig. 8(c), we show that if we choose, y = 1 − ln 2/ , as the approximate median value in simulations with random input-output configurations (see Appendix F for details), then the number of certain synapses does indeed increase linearly with .
To quantitatively estimate the probability of finding a certain synapse, we can compute the fraction of volume of y cons 's for which the synapse is certain for the typical projections (B2), as compared to the volume of y cons 's for which solutions to the steady-state equations exist. We know that y cons has to lie within a -dimensional sphere of radius W in order for there to be any solutions to the problem. 17 However, for the synapse sign to be certain, we need W ⩾ y cons = y > y cr , where y cr is given by Eq. (B2). So, we need to compare the spherical shell volume, V W ⩾ y > y cr , with the volume of the -dimensional sphere, V y⩽W .
To find V W ⩾ y > y cr , we have to subtract the -dimensional spherical volume with radius y cr from the spherical volume with radius W. Since n-dimensional spherical volumes scale as the nth power of the radius, the probability, P, of ascertaining the sign of the synapse is approximately given by Thus, we get 17 The allowed y cons must also lie in the all positive orthant, but as we will compute the ratio of two spherical volumes the reduction factor will cancel out.
The most prominent feature of Eq. (B8) is that the probability only depends on the ratios of the various dimensions. Hence, it does not change as we increase the size of the network as long as the ratios are kept constant.
For the purpose of illustration and numerically testing this feature we assessed how certainty predictions changed when the network size is increased while holding the ratios between , , and fixed. In Fig. 8(c) we have plotted the number of certain synapses in simulations generated from random data as we scale up maintaining the ratios between , , and (see Appendix F for more details). We illustrate two cases. In the first example, no unconstrained directions were present, and = 3 . Then P = 1 − e −1/3 ≈ 0.28, so one has a 28% chance of being able to determine the sign of the connections. This answer incidentally is the same as an example with = = . As another example, Fig. 8(c) considered the case when = 2 = 2 . According to Eq. (B8), then P = 1 − e −1/5 ≈ 0.18, so the chance of determining the sign drops to about 18%. We only expect these numbers to be approximate. For example, our arguments relied on the assumption that all target responses admitting solutions are equally likely, an assumption that definitely needs to be revisited for realistic networks. However, the scaling behavior should hold for other probabilistic distributions as long as the scale of y cons behaves similar to Eq. (B5) with increasing .

APPENDIX C: NONZERO-ERROR CERTAINTY CONDITIONS
There are various reasons why we may want to not only consider weights that exactly reproduce the specified neuronal responses, but also weights that do so approximately. For instance, we are always limited by the accuracy of the measurement apparatus. More importantly, there are various sources of biological noise that typically lead to uncertainties in observed values of neuronal responses. For the purpose of this paper we will consider any set of weights to be part of the ε-error solution space if it is able to reproduce the specified neuronal responses with an error ⩽ ε [see Eq. (63) for definition of E]. We will neglect uncertainties in the input responses to the target neuron, but we will comment on their possible effects toward the end of this Appendix.

Errors in feedforward networks
Let us first focus on feedforward networks. Allowing for error increases the value of y cr by expanding the solution space. One way to think about this is to realize that we have to now make sure that Eq. (51) is satisfied for any nonnegative y = y + δ , where y and δ are vectors in the -dimensional activity-constrained subspace, the former representing the observed responses, and the latter coming from noise. We will initially assume that all the observed responses are nonnegative, so a zero-error solution is possible and the noise is bounded by | δ | ⩽ ε. Our strategy will be to first seek the minimum y needed to have a certain synapse for a given δ . We then find the maximum among these y-critical values as we let δ vary within the ε-ball. Since this procedure will guarantee that the w = 0 hyperplane does not intersect the entire solution space with ℇ ⩽ ε, this means that the synapse must exist for the network to generate the specified responses patterns. The synapse's sign will match the zero-error analysis. We will first estimate y-critical when the error is small enough to not induce topological transitions in the error surface. In the subsequent sections, we will include the effects of topological transitions, as well as explain how to deal with situations where some of the observed responses are negative, which is possible due to noise.

a. When all observed responses are nonnegative and no topological transitions occur
To understand how errors affect the certainty conditions, let us consider the case where the observed responses are nonnegative and the allowed error satisfies 0 < ε < min{y μ } μ=1, …, , so that no topological transitions can occur. If some responses that were zero in y are now nonzero in y , then both e y and e s * can change due to the noise. Withoutloss of generality, let us assume that δ only has nonzero components along μ = + 1, …, semiconstrained dimensions, 18 where A μ is 1 if (y ⋅ c) and e μ have the same sign and 0 otherwise. This follows from the definition of the boundary projection vector (A33) and e s * (A49). Thus, for a given δ the certainty condition (A48) yields 18 The components of δ along these semiconstrained directions are all positive since y must be nonnegative. 19 This will happen if a component of y that was zero now has a nonzero component. As before, one can interpret the above inequality as equivalently specifying either y-critical or W-critical. For a fixed y , one can obtain a minimum value of the left-hand side (LHS) of the latter inequality by varying δ within the ε ball. The square root of this is W-critical. Then as long as W is less than W-critical, we will have a certain synapse. Inverting the relation, one finds y-critical as the minimum y needed to make the synapse sign certain for all δ and given y and W. More explicitly, equating the two sides of the inequality forany given y, δ , and W, we get a minimal y that depends on δ . To find y-critical, we have to take the maximum of the minimal y as we vary over all possible δ in the ε-ball.
Let us first obtain a lower bound on y-critical. By inspection of the LHS of the above inequality, it is clear that the more the δ -dependent terms can cancel the y -dependent terms, the harder it is to satisfy the certainty condition. We observe that in Eq. (C3), the second term is minimized when δ = − εy. 20 Accordingly, one can obtain a lower bound on y-critical by substituting δ = − εy in Eq. (C3): e ⋅ y 2 y − ε 2 + e s * 2 + e u 2 y − ε 2 > W 2 e s * 2 + e u 2 , or, y > y cr, min, 0 ≡ W e s * 2 + e u 2 e y 2 + e s * 2 + e u 2 + ε, where we have used e y = e ⋅ y and e s * 2 = e s * 2 , since δ has no components along the semiconstrained directions. We will see later that this simple lower bound can approximate the actual y-critical very well in many situations. Notice that we used a subscript "0" to denote this lower bound. This is because, as we will soon see, when noise allows for topological transitions, one may be able to obtain stricter lower bounds by allowing some constrained dimensions to behave as semiconstrained. This "0" emphasized that no constrained indices behave as semiconstrained. Next, we can find an upper bound for y-critical by noting 20 This assumes that y > ε. Smaller values of y permit y = 0 and all weights can be set to zero. where the first inequality is true because e s * 2 ⩽ e s * 2 , and the second inequality because we have dropped positive (δ 2 ) terms. Then we can obtain an upper bound on y-critical by finding a y such that even the last expression on the RHS is greater than W 2 . Specifically, e ⋅ y 2 + 2 e ⋅ y e ⋅ δ + e s * 2 + e u 2 y 0 2 + 2 y ⋅ δ > W 2 e s * 2 + e u 2 . (C6) So, let us try to find the δ that minimizes the LHS: LHS = y 2 e y 2 + 2ye y (e ⋅ δ ) + e s * 2 + e u 2 y 2 + 2yy ⋅ δ = y 2 e y 2 + e s * 2 + e u 2 + 2y e y e + e s * 2 + e u 2 y ⋅ δ = y 2 e y 2 + e s * 2 + e u 2 + 2y ξ ⋅ δ , where ξ ≡ e y ∑ μ = 1 P e μ ε μ + e s * 2 + e u 2 y, and we have noted that u ⋅ δ = 0 because δ must be in the activity-constrained subspace. It is now clear that LHS is minimized if δ anti-aligns with ξ . Then Eq. (C6) yields y 2 e y 2 + e s * 2 + e u 2 − 2y ξ ε > W 2 e s * 2 + e u 2 . (C8) Equating the two sides of Eq. (C8) and solving for y, 21 we now get an upper bound for y-critical: y cr, max, 0 ≡ W 2 e s * 2 + e u 2 e y 2 + e s * 2 + e u 2 + ε 2 ξ 2 e y 2 + e s * 2 + e u 2 2 + εξ e s * 2 + e u 2 + e y 2 , where ξ is the norm of ξ and can be simplified as 21 This quadratic equation obviously has two solutions. The correct one can easily be identified, for instance, by taking the ε → 0 limit. As with lower bound, we will see that to obtain the correct upper bound in presence of topological transitions, one has to maximize over several upper bounds. Hence, we refer the above upper bound that does not include any effects from topological transitions with an index "0." Finally, we would like to point out that for small errors one can also obtain an approximate correction to y-critical that lies in between y cr,min,0 and y cr,max,0 . To obtain this estimate, let us first write down the bound on y that one would obtain from Eq. (C3) as δ →

0:
y > W e s * 2 + e u 2 e y 2 + e s * 2 + e u 2 . (C12) If δ has components along any semiconstrained direction that contributes toward the original e s * vector, then e s * 2 < e s * 2 , and comparing Eqs. (A48) and (C12) we see that, as δ → 0, the bound on y will be less than the zero-error y cr . In other words, for sufficiently small errors, if δ explores directions that contribute to e s * , then the corresponding bound on y is going to be smaller than even the zero-error y cr . Thus, for these small errors the leading order corrections to Eq. (A48) is obtained only if δ do not have any components along these semiconstrained directions. This means e s * 2 = e s * 2 , and we can reorder the indices such that the semiconstrained directions along which excursions of δ will be considered range from + 1, …, ; i.e., Sgn s μ ′ is negative for these, and only these, semiconstrained indices. To obtain the certainty condition, one can then follow steps (C5) 22 through (C9) except that δ is restricted to only have nonzero components along constrained directions those semiconstrained directions that do not contribute to e s * , i.e., for μ = 1 … . In other words, it can at the most anti-align with a truncated ξ , 22 Since we are only interested in the leading order correction, we could also drop the (δ 2 ) terms needed to arrive at an expression such as the RHS of Eq. (C5). We will see later how y cr,appr,0 can be generalized to provide an approximation, y cr,appr , to y-critical that accounts for topological transitions.
Reassuringly, we see that at ε = 0, y cr,appr,0 , y cr,max,0 , and y cr,min,0 , all reduce to the zero-error y cr Eq. (A48). Also it is obvious that the coefficient of ε in y cr,appr,0 is greater than that of y cr,min,0 but less than that of y cr,max,0 . Finally, note that y cr,appr,0 coincides with y cr,min in the maximally nonlinear case where e s * 2 = e p 2 − e y 2 .
In Fig. 10(a), we have plotted how these different quantities depend on e p , e y and e s * . In particular we note that as the network size increases, these curves typically come closer together [ Fig. 10(b)], so that they provide a good approximation for y-critical. Finally, for future reference we point out that for a given set of input patterns, z μm , the various y-criticals that we have computed above depend on the orientation of the target response vector, or y, and the total noise budget, ε. In other words, y cr, min, 0 = y cr, min, 0 (y, ε), y cr, max, 0 = y cr, max, 0 (y, ε), and y cr, appr, 0 = y cr, appr . 0 (y, ε).

b. Comparing predictions from linear and nonlinear models
To assess the effects of nonlinearity it is useful to compare the predictions for certainsynapses between the linear and nonlinear theory. In a linear theory, there are no semiconstrained directions, and therefore, a lower bound, leading order and upper bound on y-critical can be obtained from Eqs. (C4), (C15), and (C11), respectively, by setting e s * = We note that since all these quantities are increasing function of e s * , the linear values are always less than or equal to the nonlinear counterparts. Since no topological transitions are possible in a linear theory, these expressions do not need a qualifying "0" index. In Fig. 10, we show a comparison between the upper bound on y-critical obtained in the linear and the nonlinear theories.

c. y cr,max , when one and only one nonzero response is smaller than noise
In the previous section we have considered responses which are either zero, or positive and greater than the noise bound, ε. In this section, we consider a situation where one and only one of the observed responses is smaller than the noise, |y 1 | < ε.
Note that once we admit noise, it is possible for the small observed response to be negative. In this case, there is no zero-error solution as Φ(η 1 ) cannot be negative, and therefore we need a minimum noise, and incur a minimum error: δ 1 = − y 1 ℰ min = y 1 2 . (C19) In fact, since the noise for this observation has to be positive, we must have Φ η 1 = y 1 + δ 1 ≡ δ 1 ′ ⩾ 0.
(C20) Then using we obtain a modified bound on the noise: Let us now introduce a new reduced response vector whose response to the first pattern is set to zero: We can then identify δ 1 ′ to be the noise associated with the μ = 1 response in this new feedforward problem, while the other δ μ 's can continue to represent the noise associated with all the other responses. Thus, a sufficient condition for a given synapse to be certain is y ′ > y cr, max, 0 y′, ε′ .

(C25)
A very similar condition arises if y 1 is positive but small enough to admit a topological transition. To see how, let us first remember that in order for a synapse to be certain, the solution space should not intersect with the w = 0 hyperplane. Now, let us look at the solution space coming from denoised y 's that have y 1 > 0. Since, the solution space corresponding to these y 's do not have any additional semiconstrained dimension as compared to the observed response, y , the condition for no intersection with this part of the solution space is simply given by y > y cr, max, 0 (y, ε), a condition that guarantees a certain synapse when no topological transitions are considered.
Next consider the solution space for denoised y 's with y 1 = 0. The solution space for these y 's have an additional semiconstrained dimension corresponding to the first pattern.
We can therefore use the reduced response vector, y ′ (C24), so that the solution space corresponding this new response vector with the error bound, ε′ (C23), along with the solution space with y 1 > 0 accounts for the full solution space of y with error ε. Note, that the noise budget is again reduced according to Eq. (C23) since we are committed to making at least an error of y 1 to convert the first response to a semiconstrained dimension. To ensure that there is no intersection of the w = 0 hyperplane with the y ′ solution space we must therefore also satisfy Eq. (C25). We note that to calculate the right-hand side using Eq. (C11), the various projections have to be recalculated according to e y′ = ∑ μ = 2 C y μ e μ ∑ μ = 2 C y μ 2 and e s′ * = ∑ μ ∈ A − e μ 2 + Θ Sgn e y′ e 1 e 1 2 . (C27) The condition (C25) on y ′ translates to a condition on | y |: | y | > y cr, max, 0 2 y, ε′ + y 1 2 | y | > y cr, max, 0 y′, ε′ 1 − y 1 where we have defined y μ 's to be the μth component of y.
We note that this is a nonlinear inequality as the right-hand side depends on | y | through its implicit dependence on ε′. When we have a negative y 1 , only Eq. (C28) needs to be satisfied to guarantee a certain synapse, but if y 1 is positive, both Eqs. (C26) and (C28) have to be satisfied. It is not hard to see how this process should be continued if one has more than one topological transition within the allowed error. Since we know the precise sequence of topological transitions, all the sequential certainty-conditions can in principle be obtained. A synapse is certain if all of its certainty-conditions are satisfied.
So far, we have obtained a way to check whether a synapse is certain given the response data, y . We also have an upper bound of y-critical, y cr,max,0 , ignoring effects from topological transitions when all the observed responses are nonnegative. We will now investigate how topological transitions can change this upper bound. We will start by quantifying effects from a single topological transition by finding potentially a new upper bound for y-critical, y cr,max , such that we can say that if | y | > y cr,max , then the synapse is certain. Suppose we start out with a data vector whose norm is so large that there are no topological transitions. Then as we decrease the strained norm, but keep its orientation, y, fixed, eventually a semiconstrained dimension will open up in the solution space, in our example, the first direction. If we keep decreasing further, then at some point another response dimension will become semiconstrained due to the presence of noise. Let us however consider the situation where y cr,max (that is yet to be computed) is going to turn out to be larger than the norm when the second transition occurs. In this case, we do not have to consider this possibility (and any other transitions) because then if | y | > y cr,max the second transition cannot occur. We will later find a condition that guarantees this. Since we are trying to find the smallest value of y cr,max that we can find, what all this means is that at y cr,max = y cr,max y, one of the two inequalities, (C26) or (C25), becomes an equality. While the first equality is trivial to solve as the right-hand side does not depend on y cr,max , the second equation is highly nonlinear 23 : In particular, we notice that there are two competing effects that ultimately determine y cr,max,1 . The numerator depends on ε′, which decreases as y cr,max,1 increases and 23 Here the "1" in the subscript indicates that this possibility for y-critical is computed by only considering the first topological transition.
therefore has an overall effect of decreasing y cr,max,1 . However, the presence of y 1 in the denominator within the square root tends to increase y cr,max,1 . To determine the correct upper bound for y-critical one has to compare the y cr, max, 1 y′, y 1 , ε′ obtained from Eq. (C29) with y cr, max, 0 (y, ε), and then choose the maximum because then both the inequalities (C26) and (C25) will be satisfied. For the negative response case, we simply need to solve Eq. (C29) to obtain y cr,max,1 . Before we describe how we can obtain y cr,min , let us note that if y cr,min y μ > ε, ∀ μ > 1, then the second transition occurs at a magnitude that is lower than y cr,max,1 , and therefore does not need to be incorporated in the y cr,max,1 calculation. Indeed, Eq. (C32) is a sufficient condition but not a necessary one.

d. y cr,min , when one and only one nonzero response is smaller than noise
When we have a small response, |y 1 | < ε, we have seen that we have to consider solution space around a reduced response vector, y ′ (C24), with a smaller error budget, ε′ (C23). Accordingly, we can obtain an equation for a lower bound on y-critical using Eq. (C4), 24 y cr, min, 1 = y cr y′ + ε′, where μ = 1 along with μ = + 1 … are all treated as semiconstrained. As before, since the norms along y and y ′ are related via we get an equation for y cr,min,1 very similar to Eq. (C29) for y cr,max,1 : y cr,min ≡ max y cr, min, 0 , y cr, min, 1 .
(C38) e. When more than one nonzero responses are smaller than allowed error It is not difficult to see how the arguments above generalize if we have more than one small (< ε) observed response. We have to consider cases where all the negative observed responses, and different possible combinations of the positive responses, are set to zero. Let us denote T to be one such possible set of μ indices. As before, we define a reduced response vector, which is now indexed by T: so y T,μ = 0 for all μ ∈ T. Then, essentially following the same algebraic manipulations as above we obtain a lower bound according to To reiterate, whenever an observed response is negative, which is inconsistent with a threshold linear transfer function, this means that some of the noise budget has to be used up to bring this response up to zero, and the same noise reduction occurs if one wants to consider topological transitions. Each y cr,min,T evaluated this way provides us with a lower bound, and hence we have to take a maximum over all these to find the tightest lower bound, y cr,min . To make things explicit, let us also enumerate the new expressions for the various projections of e that one needs to calculate y cr y T : (C41) Once we have a lower bound, we can obtain conservative upper bounds analogous to Eq. (C31) for each T: 25 The second root gives a negative result, and accordingly does not reduce to the correct ε → 0 limit. y cr, max, T ≡ y cr, max, 0 y T , ε T 1 − ∑ μ ∈ T y μ 2 , where ε T ≡ ε 2 − y cr,min 2 ∑ μ ∈ T y μ 2 .
(C42) As before, for a consistent upper bound for y-critical, we need to take the maximum over all y cr,max,T 's.
Finally, we note that we do not need to consider all possible transitions. While going through the sequence of transitions, as soon as we find a T such that y cr, min, T y μ > ε for all μ ∉ T we can stop as this means that by the time | y | is small enough that any additional y μ 's can be set to zero, the synapse is already uncertain.

f. Numerical simulation
To illustrate the behavior of the various y-critical functions and check their utility, in Fig. 11 we have plotted y cr,min (light gray curve) and y cr,max (black curve) for the same 102 configurations as the ones depicted in Fig. 8(b) involving a feedforward simulation with = 6, = 5, = 2, and ℇ < ε = 0.1. We also defined, y cr,appr , as a maximum over different approximations, y cr,appr,T 's, that incorporate topological transitions and are defined as (C43) We have plotted y cr,appr , the approximation of y-critical, in green in Fig. 11. As in Fig. 8(b), the black dots here denote the maximum value of y in our simulations that still admitted mixed signs for the synapse under consideration, for details on the simulations, please see Appendix F. As one can see, most of the black dots seem to closely track the y cr,min curve, but some of the dots lie between the y cr,appr and y cr,min curves.

New sources of corrections in recurrent neural networks
It is clear that recurrent neural networks inherit error corrections to y-critical that were already present in the feedforward case. There are two additional sources of error that one could consider as one moves from feedforward to recurrent networks. However, our numerical simulations of recurrent networks suggest that these are sometimes small effects, and we leave their systematic study for the future.
First, we could account for the fact that the ε μ directions themselves can change. This is because the inputs driving any given driven neuron can no longer be assumed to be fixed at z μm if the other driven neurons suffer from noise. However, these activity patterns define the ε μ directions and η μ coordinates. Allowing noise in input neurons would lead to similar corrections.
Second, the total error in Eq. (63) may be unevenly distributed across the driven neurons. If the total squared error summed over all responses and neurons is ε tot 2 , then on average, the root-mean-square error associated with each driven neuron is ε tot / D. We can thus hope that a substitution of ε = ε tot / D in the various y-critical formulas will provide a good approximation. However, it is also possible that a few neurons will incur most of the error (up to ε tot ), potentially leading to violation of the certainty conditions computed from the root-mean-square error over neurons.

APPENDIX D: BEYOND THRESHOLD-LINEAR TRANSFER FUNCTIONS
So far, we have always modeled the firing rate as a threshold-linear function applied to the input drive. Here we will explain how our analyses of y-critical with noise also provide a formalism to analyze a much more general class of nonlinear transfer functions.

Bounded deviations from the threshold-linear function
Let us start by considering transfer functions with bounded differences from the thresholdlinear function: In this case, the fixed-point equations become Since |Δ(x)| is bounded by Δ 0 , we have a bound on the squared norm of δ : It is therefore clear that we can estimate y-critical for the Ψ nonlinearity with exactly the same formalism that we used to estimate y-critical for the threshold nonlinearity in the presence of noise. In particular, all the y-critical estimates [ (64) to obtain estimates and bounds on y-critical.

NETWORKS WITH SELF-CONNECTIONS
As discussed in Appendix A, when one moves from feedforward to recurrent neural networks with self-synapses, the input patterns can no longer be considered independent from the target neuron responses. How then does one assess synapse certainty for driven neurons with self-synapses, such as y 3 in Fig. 6(a), y in Fig. 12(a), y 1 in Fig. 12(c), and y 2 in Fig. 12(c)?
We begin by concretely analyzing neuron y in Fig. 12(a), because this is the conceptually simplest example, and fundamentally the mathematical analyses are the same for the other examples. In particular, to test our formalism and analytical results using this neuron, we performed low-dimensional simulations where the × extended input pattern matrix was Z = x 1 x 2 y −sin ψ cosχ cosψ cosχ sinχ cosψ sinψ 0 sinψ sinχ −cos ψ sinχ cosχ (E1) which is the same as X in Eq. (60), except that the role of x 3 is now played by y itself. 26 The third column of Eq. (E1) corresponds to the responses of the driven neuron, but it also provides a self-input. The two input neuron responses are given by the first two columns. Equation (E1) is meant to correspond to the case where = 2 and = 1, such that μ = 1, 2, 3 correspond to the constrained, semiconstrained, and unconstrained response patterns, respectively. For the purpose of numerical testing, we assumed that χ ∈ (0°, 90°) and ψ ∈ (−90°, 0°), as this range of angles ensures that driven neuron responses were nonnegative. 27 We set W = 1. See Appendix F for numerical simulation details. This example problem has one self-coupling, w, and two feedforward couplings, u 1 and u 2 . For this response structure, one can use Eq. (51) to calculate y cr for each of these three couplings. We find y cr, w = W cos χ, y cr, u 1 = W cos 2 ψ + cos 2 χ sin 2 ψ, y cr, u 2 = W sinχ .

(E2)
We assess synapse certainty by checking whether these formulas for y cr are smaller than the magnitude of y , 26 In the context of Fig. 6(a), y, x 1 , and x 2 can be identified with y 3 , y 2 , and x 2 , respectively. 27 This range also allowed us to ensure nonnegativity for the other example recurrent circuits in Figs. 6(a) and 12(c). y = sinχ .
consistent with the conventions of Eq. (37). Next, basic trigonometric manipulations tell us that the certainty condition can never be satisfied for u 1 [ Fig. 12(b), middle]. Finally, we see that the condition for certainty is always just not satisfied for u 2 . Here this implies that all solutions have u 2 ⩾ 0 [ Fig. 12(b), right]. This is because it is a very special case where cos γ = e s * = 0 and y ⊥ in Eq. (A51) is aligned with u, so that both the inequalities in Eq.
(A53) turn into equalities. In each panel of Fig. 12(b), we tracked the fraction of positive and negative synapse signs across the simulations, as we varied θ. In particular, we see that w had a unique sign as long as θ < 45°, u 1 always had mixed signs, and u 2 was nonnegative. 29 The same exact response matrix (E1) can also be used to consider neurons y 3 in Fig. 6(a) and y 2 in Fig. 12(c). The only difference is that the three columns respectively encode the responses of the second driven neuron, second input neuron, and third driven neuron in Fig.  6(a) and the responses of the single input neuron and two driven neurons, y 1 and y 2 , in Fig.   12(c). This correspondence can be seen by comparing the numerical results in Figs. 12(b) and 12(d). We use this correspondence to avoid having to simulate the full network in Fig.  6(a), and the numerical results in Fig. 6(c) are the same as those in Fig. 12(b).

Low-dimensional numerical methods
Here we detail the numerical methods relevant for Figs. 6 and 12.

a. Feedforward analysis
To test the analytic dependence in Fig. 6(b), we wanted to simulate solutions without biasing ourselves by the particular search algorithm used to find solutions. Accordingly, to find solutions to the fixed point equations (A2) with very small error (ℇ < ε = 0.01) we performed a random screen where each weight was chosen randomly from a uniform distribution between −1 and +1. For feedforward circuits, given the synaptic weights, one can obtain the fixed point responses of the target neuron by direct substitution of the known input responses in Eq. (A2) and then comparing these simulated target responses with the known target responses. We varied ψ, χ in the response data (E1) systematically in steps of 28 Note that e yy = sin χ ε 1 + cos χ ε 3 = cos θ ε 1 + sin θ ε 3 , where we have identified c and u with ε 1 and ε 3 , respectively. 29 As we explained before, y cr,u2 = y for all values of θ. Hence, the certainty condition is not satisfied because u 2 may be zero. The fact that u 2 can vanish is not discernible from our simulations because the weight magnitudes were generated randomly, and weights where u 2 = 0 comprise a zero measure set.
6°. 30 For the light and dark green curves, ψ was fixed at 45°, and χ was varied between (0°, 90°) and (90°, 180°), respectively, while for the pink and purple curves χ was fixed at 45°, 135°, respectively, and ψ was varied between (0°, 90°). Finally, for a given choice of ψ, χ, we systematically varied y between 0 and 1, in intervals of Δy = 0.01. For each value of ψ, χ, and y, we obtained ~(10 2 -10 4 ) solutions 31 satisfying the error and the biological bound (A11) from five to ten million different trial weight vectors. We then identified the maximal value of y for which the solutions had both positive and negative w 1 's. This simulation point should lie beneath the theoretical y cr if no error is allowed. However, since the error is small but nonzero, occasionally the y-criticals determined from simulations did slightly exceed the theoretical value. Also, since we vary y by small amounts y = 0.01, we expected the simulated y-criticals to be discrete but close to the theoretical predictions, which is exactly what we found in Fig. 6(b).

b. Recurrent analysis
Because the recurrent network solution space separates into several feedforward solution spaces at zero error, we numerically treated the driven neurons one at a time. To find solutions for the recurrent neurons in Figs. 6(a), 12(a), and 12(c), we fixed χ, ψ and then performed screens with random weights, selected in the same manner as the feedforward simulations discussed earlier. For each set of weights, and for each μ = 1, 2, we obtained the late time values of y by solving the time evolution equation [Eq. (6) with τ i = 20 ms] using Euler's method starting with initial conditions y i (0) = y μi , for μ = 1, 2. We used a time step of Δt = 0.2 ms. The y μ 's obtained from the simulation at late times, t ~ 600 ms, were then compared with y μ to obtain ℇ. If the weights satisfied, ℰ < 0.05 D and the biological bound (A11), then we considered the weights as solutions and checked the sign of the synaptic weights. For every value of ψ, χ, we found at least 50 solutions 32 to test the certainty predictions.

High-dimensional numerical methods
Here we detail the numerical methods relevant for Figs. 8 and 11.

a. Generating random orthogonal matrices
In several simulations we had to generate orthogonal response matrices. This meant that we had to obtain orthonormal -dimensional vectors. This was done by first generating an × matrix, G, where each of its entries was randomly selected from a uniform distribution between −1 and +1. We then antisymmetrized the matrix, G → (G − G T )/2, and a random orthogonal response matrix was then obtained via matrix exponentiation, Z = e G , where the matrix exponential is defined by substituting the matrix G into the power series 30 Since here we were primarily interested in the zero-error result, we restricted ourselves to a range of ψ, χ where no topological transitions can occur due to the small but finite error we had to allow for numerical simulations. 31 The number of solutions varied between 200 and 40 000 depending primarily on the value of y, the higher the value, typically the more difficult it was to find solutions. 32 The number of solutions varied between 10 4 and 10 5 for the = 1 simulation in Fig. 12(b) and between 56 and 110 for the = 2 simulation in Fig. 12(d). Note that for the = 2 simulation we have a six-dimensional weight space, which makes it a lot harder to find solutions through random scanning. Also, for this latter case we only checked that the biological constraint is satisfied by the incoming weights to y 1 . expansion of the exponential function and is distinct from the simple exponentiation of individual matrix elements. The first rows of Z could then correspond to the orthogonal patterns, and Z can be interpreted as the orthogonal extension of z as discussed before.

b. Generating random orthogonal matrices with nonnegativity constraints
In recurrent networks, all driven neurons must have nonnegative responses for all patterns. Accordingly, when the input response pattern includes responses of driven neurons, we follow a different procedure for generating the response matrix, which works as long as ℐ ⩾ -1. We started by choosing a ( × )-dimensional matrix, z, containing the responses of driven neurons and ℐ ⩾ -1 input neurons. The first columns of z corresponded to driven neuron responses, and the last columns to input neuron responses, such that z = (y x).
We made sure that the responses of the driven neurons were all nonnegative, as the threshold nonlinearity dictates, by choosing them to lie randomly between 0 and 1. To mimic a sparse response pattern, we set driven responses to 0 with 50% probability. The feedforward inputs, however, were randomly selected between −1 and 1. We then orthogonalized the input responses to the target neuron as follows. We start by normalizing the ν = 1 pattern: Then, for each row, ν = 2 … , in a sequential order we performed the following operations:

2.
We next changed the first m = 1 … ν − 1 elements of the νth row of x and thus z. The other elements of the νth row of z were left unchanged. In particular, none of the driven neuron responses changed during this step.

3.
Finally, we rescaled all the elements of the νth row of z for normalization: This algorithm essentially uses the responses of the input neurons to the νth stimulus to ensure that the full νth response pattern involving both the driven and input neurons is orthogonal to all μ ⩽ ν − 1 patterns.

c. Generating target responses and response directions
To generate target responses with null responses, we simply randomly selected numbers between 0 and 1 for the = − nonzero responses.
In some simulations, we wanted to consider situations where one has to account for a single topological transition to compute y-critical. Accordingly, we tailored the responses as follows. First, we set one nonzero response of the target neuron to a small value, 0.1ε. y was then obtained by dividing the response vector by its norm. We then only considered those y's whose other entries were large enough to prevent additional topological transitions from affecting y-critical. This was done by: evaluating y cr,min , the theoretical lower bound for y-critical that includes the first topological transition (Appendix C); constructing y = y cr,min y, which approximates the activity vector right below y-critical; and ensuring that all the other entries of y were greater than the allowed error, which guarantees that no other constrained dimensions can become semiconstrained in between y cr,min and the true y-critical. This way, typically one and only one constrained direction became semiconstrained when we allowed solutions with errors ≲ε.

d. Finding solutions using gradient descent learning in feedforward networks
In all the high-dimensional simulations, we had to find solutions to the fixed point equations (A2). Since scanning a high-dimensional synaptic weight space randomly is not numerically efficient, we applied gradient descent learning 33 to obtain solutions. For feedforward networks, this meant using the loss function We performed gradient descent optimization until we reached the desired error bound, ℇ < ε.
The initial weights were first chosen randomly from a uniform distribution between −1 and 1. The initial weight vector was then rescaled to have a norm between 0 and W = 1, chosen uniformly.

e. Finding solutions using gradient descent learning in recurrent networks
To find solutions for recurrent neural networks, we used the modified loss function, to perform gradient descent, instead of Eq. (63). Since the responses of the driven neurons can vary for nonzero errors, the two loss functions, ℇ and ℰ, differ. However, it is numerically a lot quicker to obtain solutions via gradient descent with ℰ as compared to using back-propagation through time to consider the entire time evolution of the network. Thus, the strategy we adopted to find solutions with ℇ ≲ ε was to first find weights satisfying ℰ ≲ ε = ε/10. Also, the gradient descent was done in two stages. In the first stage we minimized the error associated with each individual driven neuron, ℰ i treating it as a 33 Typically with learning rate ~0.01. feedforward problem. Once each of these errors were less than ε, we performed a second stage of gradient descent to minimize ℰ down to ε. Next, we obtained the late time values of y i 's by solving the time evolution equations (6) with τ i = 20 ms using Euler's method with step time Δt = 0.2 ms for the weights obtained via gradient descent and starting with initial conditions y i (0) = y μi , ∀μ,i. The y μi 's obtained at late times, t ~ 600 ms, this way were compared with y μi to obtain ℇ. Finally, we checked that the weights satisfied the biological bound 34 (A11).
The solutions were obtained using a gradient descent learning rate of 0.02. We varied the norm of the target response vector, y = yy, systematically by Δy = 0.01 in a manner similar to the low-dimensional simulations.
In Fig. 8(c), we considered a single input-output configuration for a given value of and , and we found a single solution with ℰ < 0.001 P using a gradient descent learning rate of 0.005. Cartoon of theoretical framework. (a) We first specify some steady-state responses of a recurrent threshold-linear neural network receiving feedforward input. (b) We then find all synaptic weight matrices that have fixed points at the specified responses. Red (blue) matrix elements are positive (negative) synaptic weights. (c) When a weight is consistently positive (or consistently negative) across all possibilities, then the model needs a nonzero synaptic connection to generate the responses. We therefore make the experimental prediction that this synapse must exist. We also predict whether the synapse is excitatory or inhibitory.

FIG. 2.
An illustrative two-dimensional problem. (a) Cartoon depicting two stimulus response patterns in a simple feedforward network with two input neurons and one driven neuron. (b) Since the driven neuron in panel (a) responds in one condition but not the other, we have one constrained dimension (magenta axis) and one semiconstrained dimension (green axis). The yellow ray depicts the space of weights, (w 1 , w 2 ), that generate the stimulus transformation.
The weight vector 1 2 , 1 2 (brown dot) would uniquely generate the neural responses in a linear network. We assume that the magnitude of the weight vector is bounded by W, such that all candidate weight vectors lie within a circle of that radius. A nonzero synapse x 2 → y exists in all solutions, but the x 1 → y synapse can be zero because the yellow ray intersects the w 1 = 0 axis.

FIG. 3.
Finding network structure that implements functional responses. (a) Cartoon depicting a recurrent network of driven neurons (blue) receiving feedforward input from a population of input neurons (orange). (b) The μth pattern of input neuron activity (x μm ) appears at t = 0 and drives the recurrent neurons to approach the steady-state response pattern (y μi ) via feedforward and recurrent network connectivity (w im ). (c) (Left) We focus on one driven neuron at a time, referred to henceforth as the target neuron, to determine its possible incoming synaptic weights, w m . (Right) These weights must reproduce the target neuron's steady-state responses from the steady-state activity patterns of all presynaptic neurons. (d) The yellow planes depict the subspace of incoming weights that can exactly reproduce all nonzero responses of the target neuron, and the subregion shaded dark yellow indicates weights that also reproduce the target neuron's zero responses. The top graph depicts the weight space parametrized by physically meaningful w coordinates, but the solution space is more simply parametrized by abstract η coordinates (bottom). The η coordinates depend on the specified stimulus transformation (x μm → y μi ), and η c , η s , and η u are coordinates in -dimensional constrained, -dimensional semiconstrained, and -dimensional unconstrained subspaces, respectively.

FIG. 4.
Geometric quantities determining whether neurons must be synaptically connected in several three-dimensional toy problems. (a) Cartoon depicting the = 3 feedforward network corresponding to the toy problems. (b), (c) Geometrically determining whether a synapse is nonzero when the target neuron responds to one input pattern but does not to two other patterns. A synapse can only vanish if the w 1 = 0 plane (orange circle) intersects the solution space (dark yellow wedge) within the weight bounds (bounding sphere). For example, this intersection occurs in panel (b), so the synapse is not required for the responses. For every synapse one can associate a direction in synaptic weight space (orange arrow) that is normal to the planes with constant synaptic weight. This synapse vector can be decomposed into its projections into the semiconstrained subspace (green arrow, s ) and along the constrained dimension (pink arrow, c ). In this example, whether the synapse is certain is determined by the size of the bounding synapse space, W [see panel (b)], the angle θ between the synapse direction (orange arrow) and the closest axis of the constrained dimension − ε 3 [see panel (b)], and the angle γ between s and its closest vector in the solution space s * [see panel (c)]. In panel (c), d s depicts the perpendicular distance from the origin of the yellow semiconstrained plane in panel (b) to its intersection line with the w 1 = 0 orange plane. If this distance is sufficiently large, then the orange line will not intersect the target neuron responds to two input patterns but not the third pattern. In panel (d), the orange w 1 = 0 plane intersects the solution space (deep solution space within the yellow plane's circular bound of radius W . (d), (e) Geometrically determining whether a synapse is nonzero when the yellow line) within the bounding sphere, so the synapse is not certain. In this example, the factors that determine synapse certainty are W [see panel (d)], the angle θ that the synapse vector (orange arrow) makes with its projection along the constrained subspace (pink arrow) [see panel (d)], and the angle α between the target response vector (brown arrow) and the pink arrow [see panel (e)]. The angle β does not ultimately matter, but it is included in the diagrams to aid the derivation. Here d s is the distance from the brown dot to the point of intersection between the yellow line and the orange plane. Again this point will lie outside the bounding sphere if d s is large enough, and this signals a certain synapse. (f) Geometrically determining whether a synapse is nonzero when the target neuron responds to one input pattern but does not to a second pattern. In the figure shown, the w 1 = 0 orange plane intersects the solution space (deep yellow semicircle) within the bounding sphere, so the synapse is not certain. In this example, apart from W, what determines synapse certainty are the angles θ and φ, which encode how the synapse vector (orange arrow) can be decomposed into its projections along the constrained direction (pink arrow), semiconstrained direction (green arrow) and unconstrained direction (purple arrow).

FIG. 5.
Identifying certain synapses in high-dimensional networks. (a) Cartoon depicting the high-dimensional feedforward network under consideration. (b) Geometrically determining whether a synapse is nonzero throughout a high-dimensional solution space. A synapse can only vanish if the w m = 0 hyperplane (orange circle) intersects the solution space (dark yellow wedge) within the weight bounds (bounding sphere). In the example shown, this intersection does not occur, so the synapse must be present. For orthonormal neural responses, only a few parameters determine whether this intersection occurs (Appendix A). First, the magnitude of the weight bound, W, controls the extent of the solution space.
Second, there are three projections of the synapse direction (orange arrow) whose lengths are important determinants of the certainty condition: e y , the length of projection along the target response vector (pink arrow); e s* , the length of projection along the closest boundary vector in the semiconstrained solution subspace [green arrow, see also s * in Fig. 4(c)]; and e u , the length of projection into the unconstrained subspace (purple arrow). Note that the shown example would have had an intersection if the solution space (dark yellow wedge) were moved down (along c) to lie below the hyperplane (orange circle). The solution space's height is proportional to the magnitude of the postsynaptic responses, y. Thus, the solution space does not intersect the hyperplane only if y exceeds a critical value, y cr . (c) Plots of the certainty condition, Eq. (57), for W = 1. The red, blue, and purple curves plot y cr as a function of r y = e y /e p for e p = 0.3, 0.6, and 0.9, respectively. Different purple shades correspond to different values of r s * = e s * / e p 2 − e y 2 . As this ratio increases, nonlinear effects increase y cr and make the sign harder to determine. The red and blue curves are for the Testing the certainty condition with exhaustive low-dimensional simulations. (a) A simple recurrent network with three input neurons and three driven neurons (Appendix E). (b) We plot the theoretically derived y cr for feedforward synapses to y 1 as we vary θ (green curves) or φ (magenta curves), keeping the other angle fixed at 45°. The lighter shades correspond to cosγ = 1 ⇒ r s* = 1. The darker shades correspond to cosγ = 0 ⇒ r s* = 0, where the predictions from the nonlinear network match those of a linear network. The dots represent y cr estimated through simulations, and they agree well with the theory. (c) (Bottom) Bar graph of the fraction of solutions with positive (red) and negative (blue) self-couplings (y 3 → y 3 ) as a function of θ. (Top) As predicted, all solutions have positive w y3,y3 when y − y cr > 0. The theory accounting for error explains numerical ensembles of feedforward and recurrent networks. (a) Cartoon of a recurrent neural network. We disallow recurrent connectivity of neurons onto themselves throughout this figure. = 1 corresponds to the feedforward case, and W = 1 for all panels. (b) Comparison of numerical and theoretical y-critical values for 102 random configurations of input-output activity (Appendix F). We considered a feedforward network with ℐ = 6, = 5, = 2. For each configuration and postsynaptic activity level y, we used gradient descent learning to numerically find many solutions to the problem with ℇ ≈ 0.1. The black dots correspond to the maximal value of y in our simulations that resulted in an inconsistent sign for the synaptic weight under consideration. The continuous curves show theoretical values for y-critical that upper bound the true y-critical (y cr,max , black), that neglect topological transitions in the error surface (yellow), or that neglect the threshold nonlinearity (cyan). Only the black curve successfully upper bounded the numerical points. Configurations were sorted by the y cr,max value predicted by the black curve. (c) The number of certain synapses increased with the total number of synapses in feedforward networks. Purple and brown correspond to = 2 = 4 and = = 4 , respectively. The solid lines plot the predicted number of certain synapses. The circles represent the number of correctly predicted synapse signs in the simulations. The dashed brown and purple lines are best-fit linear curves with slopes 0.16(±0.01) and 0.07(±0.01) at 95% confidence level, significantly less than the zero error theoretical estimates of 0.28 and 0.18 (Appendix B). (d), (e) Testing the theory in a recurrent neural network with = 10, ℐ = 7, = 4, = 8, and = 3. Each dot shows a model found with gradient descent learning. (d) x and y axes show two η coordinates predicted to be constrained and semiconstrained, respectively, and the color axis shows the model's root-mean-square error over neurons, ℰ/ D. Although our theory for error surfaces is approximate for recurrent networks, the solution space was well explained by the constrained and semiconstrained dimensions. Note that the numerical solutions tend to have constrained coordinates smaller than the theoretical value (vertical line) because the learning procedure is initialized with small weights and stops at nonzero error. (e) The x axis shows the model's error, and the y axis shows the number of synapse signs correctly predicted by the nonlinear theory (yellow dots or red crosses) or linear theory (cyan dots or blue crosses). Dots denote models for which every model prediction was accurate, and crosses denote models for which some predictions failed.

FIG. 9.
Cartoons depicting the orientation of the semiconstrained projection of a given synaptic weight direction s′ within the semiconstrained subspace and its impact on determining the sign of the given weight. In these plots, the yellow wedges represent the solution space, η 1 , η 2 ⩽ 0. d s is the distance of the w m = 0 orange line (hyperplane in higher dimension) from the origin. If d s is small, as in the left plot (a), then the projection angle γ is smaller than φ, half of the angle subtended by the orange line to the origin, and therefore the orange line and the yellow cone intersect. This means that solutions with both positive and negative w's are present. In the right plot (b), d s is sufficiently large such that γ > φ and consequently, all the solutions must have consistent sign.

FIG. 10.
Dependence of y-critical on various parameters for nonzero errors. (a) The red, blue, and purple curves track y-critical as a function of e y /e p for e p = 0.5, 0.7, and 0.9, respectively.
The dotted, dashed and bold curves represent the lower bound, leading-order and upper bound y-critical curves for a fixed error, ε = 0.1W. The darker shade correspond to the most nonlinear case when e s * / e p 2 − e y 2 = 1, while the lighter shade correspond to e s * = 0. These latter curves are also the ones that one obtains in a linear theory. Clearly, the difference between the linear and nonlinear theory increases as e p increases. In all these cases y-critical decreases with increase of e p , and for a given e p , as e y /e p increases. Also, as e s * increases and the semiconstrained dimensions become more important, it becomes harder to constrain the synapse sign, and therefore y-critical increases. (b) The green, brown, and orange curves again track y-critical, but this time as a function of ε, for networks with = 3,9, and 27 input neurons, respectively. The dotted, dashed and bold curves plot the lower bound, leading-order and upper bound on y-critical for typical values of e p , e y , and e s * that one expects in these networks (B1). We see that these curves come closer together as the network size increases. The dot-dashed curves correspond to the linear theory (e s * = 0), which remains clearly separated from the nonlinear curves. In each of these networks, / = 2/3 and / = 1/2.

FIG. 11.
Testing bounds on y-critical for solutions with error. We show the same 102 random configurations of input-output activity as Fig. 8(b). The bold black, green, and gray curves represent the upper bound y cr,max , approximate y cr,appr , and lower bound y-critical y cr,min , values, respectively. The black dots correspond to the maximum value of y in our simulations that resulted in mixed signs for the synaptic weights under consideration.