Statistical physics of neural systems with non-additive dendritic coupling

How neurons process their inputs crucially determines the dynamics of biological and artificial neural networks. In such neural and neural-like systems, synaptic input is typically considered to be merely transmitted linearly or sublinearly by the dendritic compartments. Yet, single-neuron experiments report pronounced supralinear dendritic summation of sufficiently synchronous and spatially close-by inputs. Here, we provide a statistical physics approach to study the impact of such non-additive dendritic processing on single neuron responses and the performance of associative memory tasks in artificial neural networks. First, we compute the effect of random input to a neuron incorporating nonlinear dendrites. This approach is independent of the details of the neuronal dynamics. Second, we use those results to study the impact of dendritic nonlinearities on the network dynamics in a paradigmatic model for associative memory, both numerically and analytically. We find that dendritic nonlinearities maintain network convergence and increase the robustness of memory performance against noise. Interestingly, an intermediate number of dendritic branches is optimal for memory functionality.

Information processing in artificial and biological neural networks crucially depends on the processing of inputs in single neurons (e.g. [31]). The dendrites, branched protrusions of a biological nerve cell or the input preprocessing of formal neurons constitute the main input sites. Traditionally, dendrites are modeled as passive, cable-like conductors which integrate incoming presynaptic signals linearly or sublinearly and propagate the change in voltage to the cell body or soma where it is subject to nonlinear transformations [59]. Accordingly, the input preprocessing in formal neurons is usually assumed to be a linear or sublinear summation.
Single-neuron experiments, however, demonstrate the occurrence of strongly supralinear dendritic amplification. Biophysically, this is caused by action potentials generated in the dendrite of the neuron. Such dendritic spikes are mediated by voltage-dependent ion channels such as sodium, calcium, and NMDA channels [4,13,32,45,49]. In particular, dendritic spikes may emerge if sufficiently synchronous inputs are received by the same branch of a dendrite. The many inputs to the dendrites can thus be processed non-additively, depending on their spatial and temporal distribution [46,49]. This implies crucial deviations from the classical assumptions on linear dendritic input processing as modeled, e.g., by cable equations. It has been recently shown that dendritic spikes are present and prominent all over the brain (e.g. [35]).
A number of theoretical studies highlighted the importance of nonlinear, spiking dendrites already for the input processing in single neurons: Simulations of neuron models with detailed channel density and morphology showed dendritic spike generation in agreement with neurobiological experiments [4,13,[45][46][47]. Further, firing rate models have been developed [38] which reproduce the response properties of detailed models to diverse stimuli and behave like multi-layered feed-forward networks of simple rate neurons [46,47,49]. Two and multi-layer feed-forward networks of binary, deterministic neurons have been studied using statistical physics methods [6,7,22]. In particular, the so-called committee machine may be seen as a neuron model incorporating a layer of dendrites with step-like activation functions, i.e. without analogous signal transmission [12,64]. Neurons in biological networks receive time dependent, noisy input at high rates which often makes a statistical description of the response properties of single neurons necessary. In Ref. [63], the authors derived such a description for linear and quadratic dendritic summation together with some numerical results for a biologically plausible, sigmoidal dendritic nonlinearity. The propagation of dendritic spikes in branched dendrites with steplike activation functions has been studied in Ref. [17], providing the somatic input as a numerical solution to a high-dimensional system of nonlinear equations. To date there is no efficient statistical description for neu-rons with biologically plausible, sigmoidal dendritic nonlinearities. In biological systems, neurons form complex, recurrent networks. Thus, a description which allows to analytically study networks of neurons with multiple nonlinear dendrites is especially desirable.
Recent single neuron experiments investigated the role of active dendrites in detecting specific spatio-temporal input patterns [13,37,54]. Theoretical studies showed that nonlinear dendrites improve the ability of single neurons and ensembles of single neurons to discriminate and learn different input patterns [39,48,51,52]. Besides trivially multiplying the single neuron abilities to detect input patterns, a network of neurons can store, retrieve and complete spatio-temporal patterns aided by its recurrent dynamics: It can function as an associative memory device [2,22,24]. Yet, the impact of nonlinear dendrites on associative memory networks is unknown.
So far, only few studies considered the impact of nonadditive dendrites on network dynamics. Selectivity and invariance of network responses to external stimuli and their intensity were analyzed in a firing rate model [42,43]. Refs. [34,42,66] proposed that NMDA-receptor dependent dendritic nonlinearities play a crucial role in working memory, i.e., in the formation of persistent activity in unstructured networks. Nonlinear, multiplicative dendritic processing arising from spatial summation of input across the dendritic arbor was similarly shown to enable spontaneous and persistent network activity [67]. Dendritic spikes were suggested to work as coincidence detectors and provide a neuronal basis for temporal and spatial context in biological networks [29,60]. Refs. [36,61] studied networks of bursting neurons, where the bursts facilitate the emergence of patterns of coordinated neuronal activity and can be explained by dendritic spikes. Further, it was shown that nonlinear dendrites can enable robust propagation of synchronous spiking in random networks with biologically plausible substructures [27] and in purely random networks [41]. Finally, dendritic spikes were related to so-called sharp-wave ripples in the hippocampus which are important for longterm memory consolidation [40].
Networks of binary neurons with linear input summation have been intensively investigated in statistical physics ("Hopfield networks", [2,22,24]) and extensions to different nonlinear and non-monotonic transfer functions exist (cf. e.g. [25,26,44,50,56]). All of these studies assumed point-neurons, neural networks of arborized neurons with non-additive coupling have not been studied in comparable setups. Hopfield networks are paradigmatic models for associative memory which may in particular contribute to solving two important conundrums in Neuroscience: how biological neural networks achieve a high memory capacity and how they can work so reliably under the experimentally found noisy conditions. The incorporation of non-additive dendrites into these models may therefore (1) shed light on the impact of these features on memory capacity and robustness and, at the same time, (2) allow to understand the underlying mechanisms due to their analytical tractability.
In the first part of this article, we describe the response properties of single neurons in presence of biologically plausible dendritic nonlinearities in a statistical framework. In the second part, we employ the results and study the effect of nonlinear dendrites on associative memory networks. We consider networks of the Hopfield type, as this standard model for associative memory lends itself to analytical treatment and allows to concisely work out the effects of nonlinear dendritic enhancement. We find that dendritic nonlinearities improve pattern retrieval by effectively reducing the thresholds of neurons and by increasing the robustness to noise. The improvement is strongest for intermediate numbers of dendritic branches. We quantify these effects and illustrate our analytical findings with numerical simulations.

A. Basic model for non-additive processing in dendrites
Consider an extended topological structure of a neuron consisting of one point-like soma and B independent dendritic compartments (Fig. 1). Each compartment receives its inputs from a number of presynaptic neurons and transfers its output to the soma. We assume that nonlinear dendritic integration takes place over a time window ∆t. Our model can be applied to dendrites with fast or slow dendritic spikes, where ∆t assumes values of 2 − 3 ms (fast, sodium spikes [4,13]) or tens of ms (slow, e.g. NMDA spikes [49,53]). The durations of the different dendritic spikes have timescales similar to their integration windows. The input arriving at a branch within ∆t is denoted by u. On each branch, we capture the non-additive input summation of the dendrites through a piecewise linear, sigmoidal transfer function Ifu is smaller than a threshold θ, i.e. u < θ, the inputs superpose linearly. Biologically, this means that u has not reached the threshold θ for dendritic spike generation and that it is conventionally transferred to the soma. If the threshold is exceeded, i.e. u ≥ θ, the inputs superpose non-additively and a fixed dendritic output strength D is attained. This models the effect of a dendritic spike elicited by sufficiently strong input. The summation scheme as well as the compartmentalization are in agreement with experimental findings and modeling studies [4,13,40,43,46,49]. Our approaches may be directly extended to neurons with multiple stages of dendritic processing (cf. App. A3). dendrites (or dendritic branches) modeled as separate compartments (B). The red boxes mark the neuronal unit for each case. Circles represent linear summation of inputs (C), triangles and squares represent somatic and dendritic processing, respectively. In a point-neuron all inputs are summed up linearly and then processed nonlinearly. In a model with nonlinear dendrites there is an additional, preceding layer where inputs to each dendrite are summed up linearly and then subjected to a dendritic nonlinearity. This nonlinearity is modeled as a piecewise linear function with threshold θ and saturation strength D, incorporating the effect of dendritic spikes (D). The somatic transfer function is not constrained in the model.

B. Capturing dendritic spikes by an effective somatic input strength
To quantify the impact of non-additive dendritic events on the neuronal input processing, the temporal and spatial distribution of synaptic input must be taken into account. We consider a neuron with B dendritic branches b ∈ {1, . . . , B} and some time interval of the length of the dendritic integration window. x b denotes the number of synapses on branch b that are active within this window. The numbers of active synapses are distributed according to P (x 1 , . . . , x B ). Furthermore, we allow for distributed connection strengths by assigning the synapses weights w which are independently and identically distributed according to P (w). Averages with respect to P (x 1 , . . . , x B ) and P (w) can be interpreted as ensemble averages or temporal averages. The ensemble average is taken over a large number of neurons at a fixed time where each neuron has numbers x b of active inputs and weights w which are samples of P (x 1 , . . . , x B ) and P (w), respectively. Under the additional assumption of a large number of synaptic contacts on each branch, the averages may also be understood as time averages which are taken at a fixed neuron over a suitably segmented long time interval in which the active inputs are changing. In this article, we follow the first interpretation of ensemble averages.
What is the effective input to the soma given that synaptic inputs are distributed across branches? We assume that each branch samples a volume in which synapses of axons from S other (presynaptic) neurons can be synaptically contacted [1]. A synapse is present and active with probability p b such that P ( Alternatively, the total number of active synaptic terminals across branches might be fixed to S (e.g. due to homeostatic learning) which suggests a multinomial distribution for P (x 1 , . . . , x B ). Then, the x b on different branches are not independent but negatively correlated with covari- The input to branch b is given by the linear sum and we are interested in the distribution P (u 1 , . . . , u B ) of input across branches. According to Wald's equation [65], the Blackwell-Girshick equation [8] and the conditional covariance formula [55], we have for b = c, where due to the independence of the synaptic weights w. P (u 1 , . . . , u B ) is in general a complicated distribution (App. A2). Since its first moments are known (Eqs. (3)-(5)) we approximate P (u 1 , . . . , u B ) by a multivariate normal distribution, i.e. by the maximum entropy distribution for the given moments. This enables us to derive an effective input to the soma that depends only on the number B of branches, the probabilities p b , S, and the moments E [w] and Var [w]. For only linear branches, i.e. u b < θ on all branches, the input to the soma is simply given by the linear sum B b=1 u b . For u b ≥ θ, the dendritic nonlinearity sets in and branch b provides input of strength D to the soma. Since somatic preprocessing is linear, the total input to the soma is The evaluation of the sum is numerically simple but denies analytical treatment. Yet, for Gaussian P (u 1 , . . . , u B ), we may compute its mean (App. A1) via where we exploited that the marginal distribution P (u) of the multivariate normal distribution P (u 1 , . . . , u B ) is a normal distribution again and defined In the second line of Eq. (7) we assumed p b = p 0 , where p 0 is a constant probability independent of b. Throughout the rest of the article we follow this choice for simplicity, although many results are independent of p b or may be easily generalized to arbitrary p b . Eq. (7) may be interpreted as follows: P (u, v) is the marginal distribution of P (u 1 , . . . , u B ) in two variables. For binomially distributed synapses, P (u, v) = P (u) P (v) factors into two independent normal distributions which yields For multinomially distributed synapses, the double integral in Eq. (10) needs to be evaluated numerically (see App. A1 and Fig. 2).
The mean E [F ] and its variance Var [F ] (as well as the expected number E [k] of nonlinear branches and its variance Var [k], see below) may also be computed without the Gaussian approximation employing the exact expression for P (u 1 , . . . , u B ) (App. A2).
We note that our approximation can be employed to compute the input statistics to neurons with several layers of non-additive dendritic branches by iteratively applying the formulas for E [F ] and Var [F ] (Eqs. (7)-(10)) to each branching point and using the result as a new input to the next layer (cf. App. A3 for a derivation).

C. Features of the somatic input
We compare the input F to the soma from numerical simulations (Eq. (6) and Fig. 2, circles), from the Gaussian approximation (Eqs. (7) and (10) 7)) has two contributions, the first (from nonlinearly enhanced inputs) is DE [k] and the second (from linearly summed inputs) is monotoni-cally increasing in B. Since D is comparably large, the maximum of E [k] induces a maximum in E [F ]. The latter is shifted to the right due to the monotonic increase of the linear contribution. The shift indicates that a few additional branches may further increase E [F ] because there synapses can provide input which would otherwise be lost on saturated, nonlinear branches. Because a further increase in the number of branches, however, leads to a substantial loss of (non-additive) input, the maxi- θ . Up to now we considered combined processing of inhibition and excitation on the dendritic branches. Often, inhibitory synapses are found to directly target the soma [19,30]. Such input can be readily incorporated in our model by including an extra term in Eq. (7), where E [u D ] is the average input to a dendrite and P NL and C NL (Eqs. (8) and (9)) are computed using u = u D . E [u S ] is the mean direct somatic input. Both summation scenarios may lead to different collective dynamics on the network level (see below). Concluding, we modeled the somatic input of a neuron with non-additive dendrites. Our findings are independent of a specific neuron model. We introduced a Gaussian approximation to describe the input irrespective of the particular distribution of active synapses across branches (Eqs. (7) and (10)). It provides a sufficiently good description (cf. Fig. 2), simplifies calculations, and is therefore used in the remainder of this article.

D. Deterministic Hopfield networks of arborized neurons
How do nonlinear dendrites influence the dynamics of associative neural networks? Because of its analytical accessibility and its relevance in neural computation, we consider a Hopfield network [24] of N neurons n ∈ {1, . . . , N } with discrete states v n ∈ {−1, +1} and asynchronous updates of one random unit at a discrete time t ∈ N. We might interpret t as being measured in units of N −1 ∆t so that on average each neuron is updated once per ∆t (≈ dendritic spike duration) and samples states which are present in other neurons for ∆t (≈ dendritic integration window). The update rule for the conventional deterministic Hopfield model reads where n is the neuron updated at t. sign(x), with sign (x) = −1 if x < 0 and sign (x) = 1 otherwise, is the neuronal transfer function and Θ denotes the neuronal threshold. u n is the linear field  (23)) with strengths D = 4 (solid orange) and D = 6 (solid red) lead to effective thresholds ϑ ≈ 2.5 (dashed orange) and ϑ ≈ 1.9 (dashed red), respectively.
The couplings w n,m between neurons n and m ∈ {1, . . . , N } are assumed to be symmetric w n,m = w m,n . Then, a Lyapunov function may be derived, satisfying E L (t + 1) ≤ E L (t) [22,24]. Equality E L (t + 1) = E L (t) only occurs if the state of the network upon update is not changed or in the rare case when u n matches exactly the threshold Θ. Since E L is bounded and u n = Θ implies v n (t + 1) = 1, the weak Lyapunov property guarantees convergence of the system. Thus, the network converges to an asymptotically stable minimum in the energy landscape To store P patterns ξ p , p ∈ {1, . . . , P }, the couplings in the Hopfield model are set in Hebbian manner [21], Classically, the storage of random, uncorrelated patterns ξ p n ∈ {−1, +1} is studied, where ξ p n = ±1 with equal probabilities. Self-coupling terms w n,n may lead to spurious states close to stored patterns and are usually omitted, w n,n = 0 [22]. In this article we adopt these conventions.
An alternative and similarly common model represents the neuronal states via v n ∈ {0, 1} (cf. e.g. [62]). For an  15)) for linear input summation (solid black) decreases and the network converges towards a fixed point. Including weakly nonlinear branches (thick gray, θ = 0.6 and D = 1) does not alter the dynamics or the energy of the modified network (which is now given by Eq. (27)). This is confirmed by the vanishing Hamming distance d = 1 2N N n=1 vn − v n (dashed gray) between the two systems. For stronger dendritic spikes (solid red, θ = 0.1 and D = 2), deviations of the somatic input from its ensemble averageF and slightly asymmetric couplings lead to occasional increases in the energy (red square) but network convergence is preserved (checked for 1000 runs). The dynamics converge towards an attractor that is different from that of the linear network as shown by the Hamming distance (dashed red) between the systems. When inhibitory and excitatory inputs are processed separately by the soma and the dendrites, respectively, no energy function is known and we exemplarily choose the energy function given by Eq. (27) with ϑ = Θ. The energy is then nonmonotonic (solid orange, θ = 0.1 and D = 2) but the system nonetheless reaches a stationary state (checked in simulations up to t = 2000) which is different from the one of the linear system, cf. the Hamming distance (dashed orange).
appropriate choice of variables (cf. [22]) this is equivalent to the {−1, +1}-model, when introducing a dependence of the neuronal threshold Θ on the couplings w nm . Since we assume constant Hebbian couplings (Eq. (16)), we can include this term into the (then still constant) thresholds and translate a {0, 1}-model into a {−1, +1}-model. Because it is most often used in classical statistical mechanics studies of neural networks (cf. [3]), we adopt the {−1, +1}-representation.
We now modify this well-known model to incorporate dendritic branches. For simplicity, we assume that each neuron has B dendritic branches. The arborization changes the network topology ( Fig. 1) since neurons are now linked to branches. The coupling matrix becomes a N × B × N -"matrix" with entries w n,b,m that characterize the coupling of neuron m to branch b of neuron n. The input to an individual dendrite is given by the dendritic field The inputs are processed by the dendrites according to Eq. (1) and the somatic input is given by Eq. (6). Taken together, the update rule at time t reads where Like in the classical Hopfield model, we assume a Hebbian rule which strengthens connections w n,b,m between co-active neurons such that their expected value is E [w n,b,m ] = B −1 w n,m , with w n,m given by Eq. (16). Because the process of adjustment of synaptic weights is subject to fluctuations [18], we further assume that the weights w n,b,m are distributed with vari- . The width of the distribution is proportional to the mean (with a parameter Var [w]) which avoids excessively large deviations for small weights.
The network is fully connected and because input correlations across branches vanish (App. A4), this setup can be identified with the binomial scenario introduced before, with S = N and p b = p 0 = 1. Eqs. (3) and (4) yield (App. A4) The moments are computed as ensemble averages over an ensemble of neurons with index n at a fixed network state (annealed approximation). The neural identity is preserved as it is specified by the parameters w n,m that determine the expectation value and the variance of the weight distribution of w n,b,m over which we average. An averaging over x b is unnecessary as The mean somatic input at neuron n, E [G n ], follows from Eqs. (7)- (9). It depends on v 1 , . . . , v N and n only via the mean input per branch (Eq. (20)), and thus via the linear field u n (Eq. (14)). We may therefore define an effective input functionF =F (u n ) as From Eqs. (7)-(9) we find with . (25) Analogously,Eq. (11) shows that the standard deviation of G n , Std [G n ], is a function of the mean input per branch only. We may therefore define . . , v n ))] and compute it from Eqs. (11) and (23).
To investigate the convergence properties of the network, we consider its state (v 1 (t) , . . . , v N (t)) at time t and the update of a neuron n by Eq. (18). Led by Fig. 2, we neglect the fluctuations Std [F ] F and replace G n in Eq. (18) by its meanF (u n ). This approximation of the response of neuron n by the response of a typical neuron with identity n (as specified by the weights w n,m ) simplifies the analysis of the network dynamics. For reasonable parameter choices, the deviations are small and lead to erroneous updates of neuron states very rarely (Fig. 4). Furthermore, we may assume that the effective input functionF (u n ) is strictly monotonic in u n and thus invertible (App. A5). These dynamics are then equivalent to the conventional Hopfield network dynamics (Eqs. (13)-(15), see App. A5) with coupling matrix w n,m and an effective threshold ϑ. ϑ is determined by the intersection of the effective somatic inputF and the neuronal threshold Θ (Fig. 3), In particular, for symmetric weights, w n,m = w m,n , the dynamics has a Lyapunov function (App. A5) By construction, E NL decreases in time and the system converges towards a dynamical fixed point. Thus, the supralinear dendrites effectively reduce the neuronal threshold to ϑ ≤ Θ and leave the convergence properties of the system unchanged.
To study the convergence of the extended Hopfield model, we generate a network by first drawing Hebbian synaptic weights w n,m according to Eq. (16). This yields random patterns with ξ p n = ±1 with equal probabilities. Then, w n,b,m are drawn from a Gaussian distribution with mean E [w n,b,m ] = B −1 w n,m and variance Var [w n,b,m ] = w 2 n,m B −2 Var [w] as explained above. We note that generally B b=1 w n,b,m =: w n,m = w m,n so that the neuronal connectivity in presence of dendrites is not symmetric like in the classical Hopfield model. The energy function in Eq. (27) was derived for symmetric weights w n,m = w n,m = w m,n = w m,n which may be seen as an approximation valid at least for slightly asymmetric couplings. Fig. 4 shows that this approximate energy function correctly reflects the convergence of the network. Also stronger deviations from the symmetric scenario (quantified by Var [w]) leave the findings largely unchanged (App. A6).
The results of numerical simulations displayed in Fig. 4 illustrate the convergence properties of our model. If the threshold reduction by dendrites is small, subthreshold inputs to a neuron remain subthreshold also in the presence of nonlinear dendrites and the network converges to the same state as for linear branches (Fig. 4, gray). However, if the effective threshold reduction is stronger, inputs that are subthreshold may become superthreshold due to the dendritic nonlinearity and the same initial conditions tend to converge towards different attractors (Fig. 4, red). Since deviations of the somatic input from its ensemble averageF may violate our approximation and the network is slightly asymmetric, the energy function can occasionally increase (Fig. 4, red square). However, for the considered parameters, these events occur rarely and do not affect the long-term convergence of the system (checked for 1000 runs, not shown).
If inhibitory synapses project directly onto the soma instead of being mingled with excitatory synapses on the dendrites, the effective somatic inputF is given by Eq. (12). Since the dendritic saturation by excitatory input may be exceeded by linear inhibition,F is nonmonotonic and no Lyapunov function is apparent. The energy of the system as given by Eq. (27) with, e.g., ϑ = Θ does not decrease monotonically in time (Fig. 4, orange). However, numerical simulations suggest that the network reaches a stable fixed point nevertheless, as exemplarily shown in Fig. 4. Such network convergence despite non-monotonic transfer functions is known from other systems [23,26,44]. These studies do not split excitation and inhibition but choose transfer functions non-monotonic in the total, linear input u n .

E. Capacity of stochastic Hopfield networks with non-additive dendritic input processing
We now assess the extent to which a network of binary neurons with nonlinear dendrites is capable of storing and retrieving specific patterns. Since biological neurons are noisy, i.e. their input-output relation is not fully reliable (e.g. [57]), we generalize the above deterministic dynamics to allow for stochasticity. For the analysis of the storage capacity of the extended Hopfield network, we exploit the analogy between spin glasses and neural networks and employ statistical physics methods [2,22].
As a generalization of the deterministic update rule (Eq. (18)) we use the common Glauber dynamics [15,22,56] with asynchronous updates according to which the state of a randomly chosen neuron n is set to v n = ±1 It decreases with increasing T and reaches zero at the critical temperature Tc above which retrieval fails. This phase transition is discontinuous for the nonlinear and continuous for the linear model of input processing. Dendritic nonlinearities increase Tc. Panel (B) shows m versus α at zero temperature T = 0. It decreases with increasing load α and displays a discontinuous jump to zero at the critical storage capacity αc. αc increases with stronger dendritic nonlinearities up to a level at which the effective threshold ϑ vanishes and the linear scenario with Θ = 0 is approached (dash-dotted black).
with probability This is equivalent to flipping the state of the respective neuron with probability p n (v n → −v n ) = (1 + exp (2β [G n − Θ] v n )) −1 . Here, T := β −1 is the pseudo-temperature and a measure for the noise in the system and G n (Eq. (19)) is the input to the neuron. We recall that P is the number of desired patterns and the fraction α = P N −1 is called load parameter. We obtain a temporally averaged state v n of unit n in the ensembleaveraged, stochastic network in mean-field theory by replacing G n by the ensemble averageF (u n ) (Eq. (23)). Further, we replace the fluctuating argument ofF (u n ) by its average, which yieldsF ( u n ), such that in the In the limit α = 0 (B, orange), the jump in ∆m is discontinuous, which slightly changes the critical temperature but leaves the system's critical behavior unchanged.
The overlap between pattern p and state v n is defined by m p := N −1 N n=1 ξ p n v n . Without loss of generality we study the retrieval of pattern p = 1, so that m := m 1 estimates the quality of retrieval. m is given as an implicit solution to a set of coupled integral equations (App. A7, Eqs. (A7.3), (A7.6), (A7.8), and(A7.9)). In particular, we consider two limits, α ≈ 0 and T = 0.
First, we study a finite number of patterns P so that in the thermodynamic limit of large N we have α ≈ 0 (App. A8). The overlap m is given by the zeros of ∆m, where the functions P NL and C NL are given by Eqs. (24) and (25). The solutions of the transcendental equation ∆m = 0 are obtained numerically and compared to simulation results of the Hopfield network with nonlinear dendrites (Fig. 5A).
The dendritic nonlinearities have a strong impact on the overlap curve. They change its shape and increase the critical temperature T c which marks the transition between functioning and non-functioning associative memory. They provide a discontinuous, first order phase transition with a non-zero critical overlap m c := m (T c ). For the same parameters, the conventional Hopfield model displays a continuous, second order phase transition. These findings may be understood by graphically solving Eq. (30): For the considered, not too large Θ (Θ < 0.448), linear input processing leads to a concave ∆m (m) for high temperatures, so that the overlap continuously goes to zero when T approaches the critical temperature T c (Fig. 6A). In the presence of nonlinear dendrites, we need to take into account that P N −1 Var [w] ∝ α 0 (Eqs. (23)- (25)) is small so that the dendritic nonlin-earityF sharply rises to its maximal (saturation) value at m with Bθ − m ≈ 0 due to its dependence on P NL (m) (cf. Eq. (24) with u n = m; C NL (m) ∝ α is small). The secondtanh is approximately constant there (because P NL (−m) and C NL (−m) are small for small P N −1 Var [w]) and may be neglected. The sharp rise in F thus induces a convex turn in the right-hand side of Eq. (30) and results in a stable and an unstable fixed point of m (Fig. 6B, red). With growing temperature T , the two fixed points vanish in a saddle-node bifurcation at non-zero m c ≈ Bθ and the system undergoes a discontinuous phase transition of first order. For α = 0, the increase inF is jump-like so that no unstable fixed point appears and the critical overlap is m c = Bθ (Fig. 6B,  orange).
The increased critical temperature T c due to the dendrites implies an increased robustness of the network against thermal fluctuations and may be intuitively understood as follows: If the network state is close to a learned pattern, the input to the neurons is either strongly positive and thus further amplified by the dendritic nonlinearities or strongly negative and not affected by the dendrites. The overall strengthening of the stored patterns counteracts the influence of the temperature (Eq. (28)) and stabilizes the patterns against thermal fluctuations. Consequently, the nonlinear dendrites al-low pattern retrieval in a temperature regime in which linear neurons fail.
Because for the conventional Hopfield model neuronal thresholds Θ decrease the critical temperature (cf. Fig. 5A, black), we test if our results are a mere consequence of an effective threshold reduction, ϑ ≤ Θ, by the nonlinear dendrites (cf. Eq. (26)). Repeating the above calculations and simulations with Θ = ϑ = 0 we find that results as described above hold also for vanishing neuronal thresholds (App. A8).
Second, we consider the zero temperature limit T = 0, in which thermal fluctuations cease and the binary neurons are deterministic threshold units (App. A9). The overlap m is determined by where ϑ is the effective threshold (Eq. (26)). We solve these coupled equations numerically and compare them to the simulation data of a Hopfield network with nonlinear dendrites (Fig. 5B). Eqs. (31) are equivalent to the order parameter equations of the conventional Hopfield model with threshold ϑ. For Θ > 0, we may therefore conclude that the dendritic branches reduce the neuronal threshold to ϑ ≤ Θ and thereby improve the critical storage capacity α c of the network. Analogous to T c , α c denotes the critical load above which retrieval of patterns fails. We note that the improved performance is a direct effect of the dendritic nonlinearity as demonstrated in Fig. 5, where the connectivity of neurons is the same for networks with linear and nonlinear dendrites while there is a clear increase in the critical temperature T c and load α c in the latter case.
These findings may be understood by considering a neuronal threshold Θ = 0 which generally introduces an asymmetry between the two states v n = ±1 of a unit n. Since we assume the storage of random patterns ξ p n = ±1 with equal probabilities, the non-zero Θ = 0 impedes the retrieval of learned patterns. For a positive threshold Θ > 0, the threshold reduction by the dendritic nonlinearities attenuates the asymmetry of the network and improves the retrieval of random patterns. In the zero temperature limit and for strong nonlinearities, our model becomes equivalent to the standard Hopfield model without threshold Θ = 0 (Fig. 5B, dash-dotted black).
We note that the agreement of analytical and numerical results as shown in Fig. 5 is even better if the couplings w n,m = B b=1 w n,b,m are normalized such that the w n,m exactly equal the symmetric Hebbian weights w n,m . This holds in particular for stronger dendritic spikes (Fig. 5B, red) which emphasize the asymmetry. To check if the above results hold also for larger deviations from the assumption of symmetric couplings, w n,m = w m,n , we repeat the simulations for larger Var [w] (cf. Eq. (21); App. A6). In the deterministic limit T = 0 with many patterns, we find that stronger asymmetries impede the quality of retrieval. For finite temperatures T > 0 and few patterns α ≈ 0, the impact of moderately asymmetric synaptic weights is negligible.
Complementing our analytical study of the limiting cases α ≈ 0 and T = 0, we compute the quality of retrieval in the α-T -phase space numerically. We find that our associative memory network with non-additive dendrites enables memory functioning in a larger α-T -region than the model with linear branches (cf. App. A10).

F. Optimal number of dendritic branches for memory function
Finally, we investigate the impact of varying numbers B of dendritic branches on the performance of the extended stochastic Hopfield network. As shown above, non-additive dendritic input processing leads to an increased storage capacity and more robust memory retrieval by amplifying strong input to the neuron. Fig. 2B shows that, when this input is fixed, the average somatic input E

[F ] (=F ) is maximal for an intermediate number of dendrites. This leads us to expect optimal memory performance for intermediate branch numbers.
We thus study the Hopfield network for varying B. In analogy to Fig. 5A we compute the overlaps m for the limiting case α ≈ 0 (Fig. 7A). The critical temperature T c (Fig. 7B) displays a maximum at an intermediate number of branches, here B T,opt = 30. For larger θ, the optimal branch number is smaller (not shown). We can understand the maximum in T c by considering the ∆m (m) curves (Fig. 2C). They display maxima at m c ≈ Bθ (see discussion of Fig. 6) with absolute heights determined by m c andF (m c ) where the latter is maximal for intermediate branch numbers. In combination, they yield the maxima of ∆m (m = 0) which are highest for intermediate numbers B T,opt of branches. Upon increasing temperature, the corresponding curves are thus the last to fall entirely below zero at their T c (B) so that T c is highest for such branch numbers.
Another notable feature is the growing critical overlap m c at the critical temperature T c with increasing numbers of branches (Fig. 7D). As shown in the discussion of Fig. 6, the critical overlap for small α is given by m c ≈ Bθ. The argument is correct for strong nonlinearities D and moderate θB but breaks down for θB ≈ m c > 1 since the overlap is naturally bounded by 1 and only the solution m c = 0 remains, if a larger overlap would be needed to reach the upturn point of ∆m (cf. Fig. 7C). In particular, for large B, the behavior of the linear scenario with m c = 0 is reobtained.
Thus, the performance of the memory network depends non-trivially on the number of dendritic branches B. Additionally, depending on the purpose of the memory network, its robustness against noise (specified by T c ) may be balanced against the quality of retrieval (for which m c gives a worst-case measure) (cf. Fig. 7A).

III. CONCLUSION: NONLINEAR DENDRITES IMPROVE PATTERN RETRIEVAL
Non-additive processing of synaptic input is an important feature of biological neurons and may have severe consequences for neural processing in single neurons and networks. In this work, we first studied the influence of dendritic spikes in a neuron with a variable number of dendritic branches and sufficiently synchronous spiking input of variable strength, independently of a specific neuron model. We derived an approximation for the somatic input in the presence of nonlinear dendrites (Eqs. (7)-(10) and App. A3). This approximation allows the analytical investigation of dendritic summation phenomena in networks of arbitrary connectivity. Second, we extended the well-known Hopfield model to include neurons with branches that process inputs non-additively. Employing the results from the first part, we constructed networks which are, at each neuron, ensemble-averaged over the nonlinear dendrites and their inputs, such that the overall connectivity and thus the neural identities in the networks are preserved. These networks could be analyzed analytically with statistical physics methods. We used them to approximate the full dynamics of networks with nonlinear dendrites. We find that, for a deterministic Hopfield network, the dendritic nonlinearities reduce the neuronal thresholds (Eq. (26)) and the network still converges to a dynamical fixed point (Eq. (27)). Separate processing of inhibition and excitation (Eq. (12)) can break the monotonic decrease of common energy functions. A mean-field analysis for a stochastic Hopfield network revealed an improved memory storage capacity and a greater robustness against thermal fluctuations due to non-additive dendritic input processing (Eqs. (30) and (31) and App. A10). An intermediate number of dendritic branches was shown to optimally support memory functionality of the associative network.
Our findings help to advance the understanding of the role of nonlinear dendrites in three respects: (i) Earlier works studied the ability of arborized neurons to discriminate patterns [48,51]. They focused on a combinatorial approach of counting the numbers of different input-output functions of single neurons with multiple dendrites. In contrast, our study assumes a dynamical perspective. We derived an expression for the approximate somatic input which may be readily used to investigate the dynamics of networks of neurons with nonlinear dendrites for arbitrary connectivity and synaptic weights. (ii) We applied our results and studied the capability of networks with non-additive dendrites to serve as memory devices. Our work shows that nonlinear dendrites can increase the capacity and the robustness of memory retrieval against thermal fluctuations in recurrent, dynamic associative memory networks. (iii) Finally, our theoretical results suggest that there might be an intermediate number of dendritic branches that is optimal for network functionality [20,48]. This may have severe implications for biological neural circuits featuring nonadditive dendrites e.g. in the hippocampus. Since nonadditive dendritic integration may take place in sliding window-like segments of the dendritic tree, a precise number of independent dendritic compartments is unlikely to be found [49]. Biological studies suggest around 50 − 100 independent sites capable of generating dendritic spikes per neuron, depending on the neuron type and function [11,54]. Our model shows optimal memory performance for such numbers of branches, depending on the dendritic parameters (see discussion of Fig. 7).
For biological neural networks, we suggest that nonlinear dendrites may serve to stabilize memory recall against noisy network background activity. In such networks, memories might be stored in so-called Hebbian cell assemblies with higher internal connectivities or connection strengths that display elevated firing rates when presented with a specific input pattern [5,58]. Similar to our Hopfield model, matching patterns of activity provide a larger input to the other cells of the assembly which is amplified by the dendritic nonlinearities. In our model we restricted ourselves to Hebbian learning of coupling strengths, and noise and patterns were equally nonlinearly enhanced. Biological neural networks can change their wiring as well as the dendritic nonlinearities in an activity dependent manner [16,37,49]. Further, some kinds of dendritic spikes amplify only temporally highly coordinated inputs [4]. Both features may contribute to a selective nonlinear enhancement of pattern activity and may increase the stabilizing effects of non-additive dendrites in memory networks.
On a theoretical level of statistical physics, our work may be continued in several directions: Preliminary calculations suggest that for certain parameters new phenomena arise, such as improved memory retrieval for moderate noise levels (this is reminiscent of stochastic resonance [10,28]). Further, previous studies on non-monotonic transfer functions found beneficial effects for memory performance in artificial and biological neural systems [9,24,26,44]. The novel kind of nonmonotonicity which is due to dendritic reception of excitatory input and somatic reception of inhibitory input (cf. Eq. (12)) should be further explored and linked to these findings. Finally, many studies suggest that neural plasticity exploits dendritic spikes and dendritic compartmentalization [16,37,49]. Non-Hebbian learning rules that are tailored to utilize dendritic spikes and branches were shown to increase the memory capabilities of single neurons or ensembles of such neurons [12,33,39,48,51]. Therefore, such dendrite-based learning is expected to boost also the performance of associative memory networks and is a particularly important target of future studies. Our statistical treatment (Eqs. (7)-(10)) and the numerical approach can be directly applied to address these issues. Gaining insight into these matters will help to better understand and utilize the full power of dendritic computation.

IV. ACKNOWLEDGMENTS
Supported by the BMBF (grant no. 01GQ1005B) and the DFG (grant no. TI 629/3-1). We thank Sven Jahnke and Gunter Weber for helpful discussions.
We compute the first moments of the somatic input F (Eq. (6)) using a Gaussian approximation of the dendritic input distribution P (u 1 , . . . , u B ) and assuming statistically identical branches, i.e. p b = p 0 , b ∈ {1, . . . , B}. Means E [u b ], variances Var [u b ] and covariances Cov [u b , u c ] of P (u 1 , . . . , u B ) are given by Eqs. (3)-(5). We start with and pick u := u 1 without loss of generality to obtain where P (u) is the marginal distribution of P (u, u 2 , . . . , u B ) and thus Gaussian with mean E [u] and variance Var [u].
Using the definition of the dendritic transfer function f (Eq. (1)), we split where we used partial integration in the second line and the definitions The second moment is computed similarly, For a multinomial distribution of active synapses across branches, the double integral I F needs to be computed numerically (see Fig. 2).These results are discussed in the main text and in the caption to Fig. 2. The calculations may be easily extended to cover non-uniform branch probabilities p b , dendritic thresholds θ b , and strengths D b .
Similar to E [F ], we compute the expected number E [k] of branches k in the nonlinear regime. The dendritic transfer function f (u) in Eq. (A1.1) is replaced by a step function step (u − θ) to count the number of branches above threshold θ. Here, we defined step (x) = 0 if x < 0 and step (x) = 1 otherwise. Then, Since the step function satisfies step 2 (x) = step (x) we derive the second moment of the distribution of the number k of nonlinear branches via The binomial distribution of synapses among branches provides P (u, v) = P (u) P (v) and thus while I k needs to be computed numerically in the multinomial case.
Appendix A2: Exact mean and variance of the effective somatic input and the number of nonlinear branches We now derive the mean somatic input E [F ] and its variance Var [F ] for the exact distribution P (u 1 , . . . , (2)) with Gaussian P (w). We decompose P (u 1 , . . . , u B ) into and variances x b Var [w] and P (u 1 , . . . , u B ) is a weighted superposition of Gaussian distributions. For non-Gaussian P (w) and large numbers of small inputs, we may employ a central limit theorem to establish Gaussianity of the P ( where we assumed identical statistics for all branches only in the last line and P (x) is the marginal distribution of P (x 1 , . . . , x B ). Similar to App. A1 we defined (A2.7) The second moment of F is given by · P NL,u|y D + 1 − P NL,u|y E [u | y] − C NL,u|y , (A2. 8) where the assumption of identical branch statistics was used in the last step and P (x, y)is the marginal distribution of P (x 1 , . . . , x B ). Analogously, we may compute the exact average number E [k] of nonlinear branches. Similar to App. A1, we use a step function to count the number k of branches in the nonlinear regime, (A2.9) The second moment yields The approximation for the somatic input F (Eqs. (7)-(10)) may be readily employed to compute the somatic input for more complex dendritic arbors and multiple steps of non-additive dendritic processing (Fig. A3.1). For this, whenever necessary, we assume that the distribution of inputs may be approximated by the maximum entropy distribution for given mean and variance, i.e. by a normal distribution (cf. discussion of Eqs. (3)-(5)). For simplicity of presentation, we assume that the number of sub-branches B l at a level l ∈ {1, . . . , L} is identical for all branches b l ∈ {1, . . . , B l } at the level, that inputs arrive only at the terminal branches and sufficiently synchronously, and that the distribution of synapses is identical across the terminal branches. To compute the neuronal input for such a neuron, we may start from the soma and recursively work towards the terminal branches: The first moments of the effective somatic input are given by   5)). We note that one can also introduce factors implementing branch coupling strengths at this point. Fig. A3.2 shows good agreement of our analytical predictions with simulation results, both for the mean somatic input E [F ] and its standard deviation Std [F ]. Dendritic thresholds and dendritic spike strengths in layer l = 1 were increased to compensate for large input strengths in absence of branch coupling factors. We note that, as for the single-layered neuron, the input is largest for intermediate numbers of branches B 1,opt and B 2,opt (cf. discussion of Fig. 2).
The derivation can be directly generalized to cover non-identical branches, i.e. branches with different probabilities for the formation of synapses, different numbers of sub-branches (cf. App. A1 and A2), linear branches, and additional external inputs on intermediate level branches. To incorporate linear branches we may set the dendritic threshold to infinity, to incorporate external inputs to intermediate level dendrites, we can add an additional, linear input branch to the considered dendrite, where the external inputs arrive. We may thus conclude that our approach covers arbitrary tree-like dendritic structures.
We now derive the expected input to branch b of neuron n in the extended Hopfield model when the states of the neurons are fixed and the average is taken over the weight distribution. By construction, we have and for m = k they are independently chosen from their respective (in general different) distributions. Because the input across branches is uncorrelated, we may employ the results for the somatic input we derived for binomially distributed active synapses (Eqs. (7) and (11)). Assuming thatF (Eq. (23)) is strictly monotonically increasing and therefore also invertible, we find that v n (t + 1) = sign F (u n (t)) − Θ = sign (u n (t) − ϑ) , (A5.1) with the effective threshold see also Fig. 3. This update rule is equivalent to the conventional one with threshold ϑ. Hence, the energy function E NL of the system is obtained from E L (Eq. (15)) by substituting Θ by ϑ, For symmetric couplings, w n,m = w m,n , E NL is monotonically decreasing and the system converges to a steady state. More explicitly, by assuming that neuron n is updated we have Equality holds only for v n (t + 1) = v n (t) or u n (t) = ϑ where the latter implies v n (t + 1) = 1. Therefore, the energy E NL either decreases in time or remains constant only if the state of the network does not change or the updated neuron n is set to v n = 1. Since the energy is bounded from below due to the network converges to a stable state which is given by a minimum in the energy landscape E NL (v 1 , . . . , v N ). Thus, the effective nonlinearity reduces the neuronal threshold to ϑ ≤ Θ as compared to linear input summation (Eq. (15)) but maintains network convergence. For ϑ to be uniquely defined, the dendritic nonlinearities have to be strong enough, BD > Θ, so thatF intersects the constant function Θ (Fig. 3). Analytical calculations show that for D > θ the transfer functionF is strictly monotonic (cf. Fig. A5.1). Since experiments demonstrate supralinear dendritic amplification, e.g., with thresholds of θ ≈ 3.8 mV and spike amplitudes of D ≈ 10 mV [4], this parameter regime is biologically plausible.  Fig. 4, with Var [w] = 0.5. Simulations are performed with identical initial states, topology and order of updates. As the linear Hopfield model is not affected by Var [w], its energy (solid black) decreases monotonically and reaches a fixed point. For the extended Hopfield model with non-additive dendrites and more asymmetric couplings, the energy decreases (solid red) with rare events of increasing energy (red squares) due to deviations from the mean-field approach or asymmetric couplings (cf. discussion of Eq. (27)). The convergence of the system is preserved (checked for 1000 runs). The Hamming distance d = 1 2N N n=1 vn − v n (dashed red) between the systems shows that they settle into different attractors.

Appendix A6: Asymmetric couplings and convergence of a Hopfield network with nonlinear dendrites
The couplings of the classical Hopfield network are symmetric, w n,m = w m,n , so that convergence is guaranteed by a Lyapunov function (Eq. (15)). In the extended model, we argued that the coupling weights Our simulations indicate that although convergence of the extended deterministic Hopfield network is not guaranteed by a Lyapunov function for asymmetric couplings, it reaches a fixed point as shown exemplarily by Fig. A6.1 and confirmed for 1000 runs (not shown) also for larger asymmetries Var [w]. To study the impact of asymmetric couplings on the memory performance of the extended stochastic Hopfield network, we repeat the simulations shown in Fig. 5 for larger Var [w]. For a small load α ≈ 0 and non-zero temperatures T > 0, the analytical calculations for symmetric couplings agree well with the simulation results (Fig. A6.2A). Yet, in the zero temperature limit T = 0, the asymmetries decrease the storage capacity of the network compared to the symmetric case (Fig. A6.2B).
Appendix A7: Mean-field calculations for a stochastic Hopfield model with nonlinear dendritic branches We now derive the mean-field equations for the overlap m := m 1 = N −1 N n=1 ξ 1 n v n of the network state (v 1 , . . . , v N ) with pattern p = 1, i.e. ξ 1 1 , . . . , ξ 1 N . The following calculations go along those provided in [14,22]. From where we employed the point symmetry of tanh (x) = − tanh (−x) in the second line. We now assume that the number P of patterns is large, of order O (N ). We define the mean square overlap r := α −1 P q =1 (m q ) 2 and assume that the m q are independent, zero-centered random variables with variance αrP −1 . Then, the sum ξ 1 n P p =1 ξ p n m p can be seen as a Gaussian noise term of variance αr and the sum N −1 N n=1 may be treated as an average over this noise. Since ξ 1 n = ±1 with equal probabilities, the overlap with the first pattern is (Eq. (A7.2)) with the effective somatic inputF given by Eqs. (23)- (25). Next,the correlations quantified by r must be determined self-consistently. We defineū q n := P p =q ξ p n m p and because ξ q n m q = u n −ū q n is small, of order O N −1/2 , we expand Eq. (A7.2) into a Taylor series to first order, with where we took into account again that ξ 1 n = ±1 with equal probabilities. Solving Eq. (A7.5) for m q , squaring it and averaging over all patterns q yields where we use that ξ 1 n = ±1 with equal probabilities in the third line. This transcendental equation for m is solved numerically (Fig. 5).
It is shown in the main text that dendritic nonlinearities elevate the critical temperature T c above which retrieval fails (Fig. 5). The increased critical temperature T c may result partially from the effectively reduced neuronal threshold ϑ ≤ Θ (Eq. (26)). To exclude this effect, we study the impact of the nonlinearity on the critical temperature T c for vanishing neuronal threshold Θ = 0. We find that when changing Θ from Θ = 0.4 to Θ = 0 the critical temperature T c is altered only slightly ( Fig. A8.1, see differences between dashed and dotted vertical lines) and the behavior of the system remains the same.
In contrast, the threshold θ of the dendritic nonlinearity has a strong influence on the critical behavior of the network. For θ = ∞, we reobtain linear input summation ( Fig. A8.1, dash-dotted black and solid gray). For finite 0 < θ < ∞, the critical temperature T c is increased (Fig. A8.1, green and lime, blue and cyan). For θ = 0, there is no phase transition at all (Fig. A8.1, red and orange) and memory retrieval at arbitrarily high temperatures T (although with smaller and smaller overlapsm) is possible. This may be understood by assuming θ = 0 and T = β −1 → ∞ in Eq. (A8.1). Then,