Fooling the classifier: Ligand antagonism and adversarial examples

Machine learning algorithms are sensitive to so-called adversarial perturbations. This is reminiscent of cellular decision-making where antagonist ligands may prevent correct signaling, like during the early immune response. We draw a formal analogy between neural networks used in machine learning and the general class of adaptive proofreading networks. We then apply simple adversarial strategies from machine learning to models of ligand discrimination. We show how kinetic proofreading leads to “boundary tilting” and identify three types of perturbation (adversarial, non adversarial and ambiguous). We then use a gradient-descent approach to compare different adaptive proofreading models, and we reveal the existence of two qualitatively different regimes characterized by the presence or absence of a critical point. These regimes are reminiscent of the “feature-to-prototype” transition identified in machine learning, corresponding to two strategies in ligand antagonism (broad vs. specialized). Overall, our work connects evolved cellular decision-making to classification in machine learning, showing that behaviours close to the decision boundary can be understood through the same mechanisms.


Introduction
Machine learning is becoming increasingly popular with major advances coming from deep neural networks [22]. Deep learning has improved the state-of-the-art in automated tasks like image processing [16], speech recognition [12] and machine translation [28], and has already seen a wide range of applications in research and industry. Despite their high-performance, neural networks suffer from blind spots: small perturbations added to unambiguous samples may lead to misclassification [29].
Such adversarial examples are most obvious in image recognition, for example, a panda is misclassified as a gibbon or a handwritten 3 as a 7 [11]. Adversarial examples can be universal, and are often transferable across multiple architectures (see Akhtar and Mian [1] for a recent review). Two important properties should be noted: adversarial examples appear even in linear problems, and they can be significant notwithstanding the small L∞ norm of the perturbation.
Another broad class of complex systems dealing with categorization and inference is found in cellular decision-making. For instance, immune cells have to discriminate between foreign and self ligands, irrespective of ligand quantity [5]. Such immune decisions are also prone to detrimental perturbations in a phenomenon called ligand antagonism [6]. Antagonism appears to be a general (and possibly necessary [7]) feature of cellular decision-making, and has been observed in T cells [2], mast cells [31] and other recognition processes like olfactory sensing [26].
In this work, we draw a correspondence between biophysical models of cellular decision-making displaying antagonism on the one hand, and adversarial examples in machine learning on the other hand. We start by drawing a formal analogy between feedforward neural networks and adaptive proofreading (or adaptive sorting) models [8,20,6], and we illustrate visually how to recast immune recognition into an image recognition problem. Then we show how direct adversarial perturbations correspond to antagonism by many weakly interacting ligands for the simplest model of adaptive proofreading. We refine our analysis by showing how kinetic proofreading steps work by "tilting" the decision boundary of cellular decision, corresponding to a strategy proposed in machine learning to defend against adversarial perturbations. We finally explore the geometry of the decision boundary for immune recognition, and we exhibit the emergence of a critical point, which we associate to a "feature-to-prototype" transition recently proposed in machine learning [18]

Neural network for immune decision-making
We consider cellular decision-making based on ligand quality (notation τ ) irrespective of quantity (notation L). An example can be found in immune recognition with the "lifetime dogma" [5], where it is assumed that a T cell discriminates ligands based on their characteristic binding time τ to T cell receptors. Defining τc as the activation threshold binding time, the problem boils down to ignoring many subthreshold ligands (τ < τc) while being able to respond to few agonist ligands with τ > τc [2,5,8]. Ligand discrimination becomes a nontrivial problem when the cell cannot measure single-binding events, but only has access to global quantities such as the total number of bound receptors ( Fig. 1 A).
We assume an idealized situation where a given receptor i, upon ligand binding (on-rate k on i ) can exist in N biochemical states (corresponding to phosphorylation stages of the receptor tails in the immune context [23,14]). Those states allow the receptor to effectively compute different quantities, such as c i m = k on i τ m i , 0 ≤ m ≤ N , which can be done with kinetic proofreading [23]. In particular, ligands with larger τ give a relatively larger value of c i N due to the geometric amplification associated with proofreading steps. We assume receptors to be identical, so that any downstream receptor processing by the cell must be done on the sum Cm = i c i m = i k on i τ m i . We also consider a quenched situation in which only one ligand is locally available for binding to every receptor. In reality, there is a constant motion of ligands, such that k on i and τi are functions of time and stochastic treatments are required [27,21,25], but on the time-scale of primary decision it is reasonable to assume that the ligand distribution does not change much [2].
A neural network-like diagram is displayed in Fig. 1 B to illustrate a general state-based decisionmaking mechanism based on those principles. The input nodes are the receptors, and the first layer of the network corresponds to the nodes Cm, summing up contributions of all receptors at a given proofreading step. With logarithmic activation functions between the first and second layer, the second layer effectively integrates these Cm to perform decision-making. While this network represents a simplified view of decision-making, it describes well the general class of "adaptive sorting" or "adaptive proofreading" models, accounting for many aspects of immune recognition [6]. Probability of decision-making in this context is a monotonically increasing function of the quantity If L ligands with identical τ and k on are presented to the T cell, we find TN,m = k on Lτ N k on Lτ m = τ N −m , so that the binding times of the ligands can be directly evaluated irrespective of their quantity, corresponding to the lifetime dogma [9,5]. If we now add Lanta antagonists with lower binding time τanta < τ and equal on-rate k on , we find TN,m = Lτ N +L anta τ N anta Lτ m +L anta τ m anta , which is smaller than the response τ N −m for a single type of ligands, corresponding to ligand antagonism [10, 3, 2, 6] ( Fig. 1 C). Fig. 1 D shows experimental curves of immune detection and antagonism (redrawn from [8]) compared to a model similar to Eq. 1 with (N, m) = (4, 2) and identical on-rates 1 . To further the analogy with machine learning, we represent ligand mixtures as grayscale images in Fig. 1 E, where the intensity of each pixel corresponds to the binding time of a single ligand. The goal of the T cell is to detect the presence of white pixels, and because the T cell can not single out individual pixels, one thus needs to compute functions such as Eq. 1 to perform decision-making. The response is modulated by the binding times of "background" ligands. With our convention, it is easier for the system to detect white pixels in black background rather than in a gray background, corresponding to the idea that ligands close to threshold -more white than black -antagonize more strongly than ligands far from it, as proposed in [2]. This effect might be important for filtering out ambiguous cases [21], where many less white pixels could be mistaken for white ones.

Fast Gradient Sign Method and antagonism
Coming back to Eq. 1, (N, m) = (1, 0) corresponds to a recently proposed model for antagonism in olfaction, with the role of k on played by inverse affinity κ −1 , the role of τ played by efficiency η, and where the spiking rate of olfactory receptor neurons is a function J(TN,m) [26], that can be interpreted as a scoring function in the machine learning sense. Notice that in this case, T1,0 computes the average τ weighted by k on i .  Figure 1. Ligand discrimination interpreted as an image recognition problem. A) Typical receptor occupancy of ligand binding events through time. B) Representation of an immuneural network. C) The T cell responds to few agonists alone, while in the presence of antagonists, it fails to notice them. D) Dose-response curves for ligands with different binding times for pure ligand types, and a mixtures for theory and experiment (redrawn from [8]). E) Immune pictures: schematics of images the T cell observes. The red frames mark when upon adding agonists, the T cell starts responding, corresponding to the red dashed lines in D), right panel.
We follow the original Fast Gradient Sign Method (FGSM) [11] by computing the maximum adversarial perturbation η = sgn (∇xJ) with ||η||∞ ≤ and ∇xJ the gradient of the scoring function. In our case, ∇xJ with respect to parameters k on i and τi gives where A = From the above expression, we find that an equivalent maximum adversarial perturbation is given by three intuitive rules (Fig. 2 A).
• Decrease all τi by • Decrease k on i for ligands with τi > T • Increase k on i for ligands with τi < T Next, we consider important limiting cases. For instance, starting from L identical ligands with {kon = 1, τ }, response T before 1,0 = τ where τ itself is of order 1 (in proper units), a drastic change of response occurs if we suddenly add R ligands with short binding times τ and small k on of order , see Fig. 2 B 2 . We then have T after If there are many receptors compared to initial ligands, and τ τ , the relative change is of order 1 as soon as R ∼ L, giving a decrease comparable to the original response instead of being of order . The limit where R is big thus corresponds to a strong antagonistic effect of many weakly bound ligands, which yields the same effect as "competitive antagonism" in olfaction [26] 3 . A similar change is observed if initially many ligands R have binding times τ between 0 and : decreasing their binding time to 0 yields a change of the numerator from Lτ + R → Lτ while leaving the denominator unchanged to L + R, so that again the relative change T after 1,0 /T before Both these situations are reminiscent of the adversarial perturbations in [11] where it was observed that adding η = sgn (∇xJ) leads to a significant perturbation on the scoring function J of order N , with N being the (usually high) dimensionality of the input space (corresponding to the number of pixels). There is thus a direct correspondence between number of pixels in a picture and the high number of available receptors R. In both cases, the change of scoring function can be large despite the small amplitude of the perturbation, intuitively corresponding to a steep gradient in the maximally adversarial direction.

Boundary tilting and categorizing perturbations
In the immune context, such strong adversarial effects have to be mitigated because they correspond to antagonism by self [20]. This is done with kinetic proofreading [23,2,8], i.e. in our language by taking an output TN,m with N > m > 0. This ensures that self ligands with τ 1 barely contribute to TN,m since they appear in the numerator and denominator as τ N and τ m . Their contribution can be neglected compared to the contribution from other ligands with τ ∼ 1, even when the self ligands are numerous. This imposes an inverted hierarchy of antagonism, where the strongest antagonizing ligands exist closer to threshold [6], contrary to the case where m = 0. We now show that proofreading provides a boundary tilting effect, similar to what is described in machine learning [30] (Fig. 2 C, see Appendix 1 for an illustration of this effect on the discrimination of the original 3 vs 7 MNIST from Goodfellow et al. [11]).
We numerically compute how the decision boundary changes when L2 ligands at τ2 are added to the initial L1 ligands at τ1, i.e. we compute the manifold so that 2 Equivalently, we could have assumed that R short binding ligands change their k on from 0 to 3 One difference with olfaction is that for competitive antagonism, the concentration C is of order 1 while the affinity κ −1 is big, conversely, here the concentration R is big while k on is low. Since we consider the product of both terms, both situations lead to similar effects, but our focus on a small change of k on makes the comparison with machine learning more direct.   , corresponds to a very tilted boundary, close to the plane L2 = 0, and a strong antagonistic case. In this situation, assuming τ1 ∼ τc, each new ligand added with τ2 close to 0 gives a reduction of T1,0 proportional to τc L 1 in the limit of small L2 (see Appendix 1, [6]), which is again of the order of the response T1,0 = τ1 ∼ τc in the plane L2 = 0 . This is clearly not infinitesimal, corresponding to a steep gradient of T1,0 in the L2 direction. We call the perturbation in this case "adversarial".
This should be contrasted to the case for higher m (Fig. 2 C, top right) where the boundary is vertical, independent of L2, such that decision-making is based only on L1 ligands at τ1 initially present . Here, the change of response induced by the addition of each ligand with small binding time τ2 is τ m 2 , due to proofreading a very small number when τ2 0 [6]. Contrary to the previous case, the gradient of TN,m with respect to this vertical direction is almost flat and very small compared to the response in the L2 = 0 plane. We call the perturbation in this case "non adversarial".
Tilting of the boundary only occurs when τ2 gets sufficiently close to the threshold binding time τc (Fig. 2 C, bottom). In this regime, each new ligand added with quality τ2 = τc − contributes an infinitesimal change of TN,m proportional to τc−τ 2 L 1 = /L1, which gives a weak gradient in the direction L2. But even with such small perturbations one can easily cross the boundary because of the proximity of τ2 to τc, which explains the tilting. The cases where the boundary is tilted, while the gradient is weak, are of a different nature of the adversarial case of Fig. 2 C, top left, where the boundary is tilted as well but the gradient is steep. For this reason we call them "ambiguous". Similar ambiguity is observed experimentally: it is well known that antagonists (ligands close to thresholds) also weakly agonize an immune response [2]. Our categorization of perturbations is presented in Table  1.

Dichotomy of antagonism close to the decision boundary
These observations motivate a more precise study of the gradient towards the decision boundary. We follow Krotov and Hopfield, who studied a similar problem for MNIST digit classifier, encoded with generalized Rectified polynomials of variable degrees n [18] (reminiscent of the iterative FGSM introduced in Kurakin et al. [19]). The general idea is to find out how to most efficiently fool the system, and how this depends on the architecture of the decision algorithm. Krotov and Hopfield identified qualitative changes from a "feature" to a "prototype" encoding with increasing n, accompanied by a better resistance to adversarial perturbations [17,18]. While for small n, digits on the boundary are the initial digits to which a weak, distributed perturbation is added (corresponding to the learned "features"), for big n, they are intermediate forms with no clear identity, as would be expected for ambiguous digits that are already difficult to recognize for human observers (Fig. 3 A).
We consider the dynamics of a ligand mixture when following the gradient of TN,m, and study how to most efficiently fool the decision-maker (or in biological term, how to best antagonize it). We iteratively change the binding time of nonagonist ligands with τ < τc to while keeping the distribution of foreign ligands with τ > τc constant. Biologically, these dynamics should be thought of as a foreign agent trying to antagonize the immune system by rapidly mutating and generating antagonists ligands to mask its non-self part. Such antagonistic phenomena have been proposed as a mechanism for HIV escape [15,24] and associated vaccine failure [13]. We consider two initial ligand distributions with different visual representations: one with pixels randomly distributed (Fig. 3 C), the other with pixels arranged to form the letters "MTL" (Fig. 3 D). The letter "T" contains ligands (pixels) just below threshold while the "M" and "L" are made up of ligands above threshold. We then follow the dynamics of Eq. 6, and display the ligand distribution at the decision boundary for different values of N, m as well as the number of steps to reach the boundary in the descent defined by Eq. 6. In both cases, for small m, we see strong adversarial effects, as the boundary is almost immediately reached. As m increases, in Fig. 3 B the distribution of ligands concentrate around one peak visually corresponding to a weak "whitening" of our visual representation, while the two peaks in Fig. 3 C approach each other. For m = 2, a qualitative change occurs: the ligands suddenly spread over a broad range of binding times in both Figs 3 B and C , and the number of iterations in the gradient dynamics to reach the boundary drastically increases. For m > 2, the ligand distribution becomes bimodal, and the ligands close to τ = 0 barely change, while a subpopulation of ligands peaks closer to the boundary in the gray "antagonistic" zone. Visually, this corresponds to black pixels reappearing for higher m while all other pixels turn white gray, which gives pictures at the boundary very different from the original ones. Consistent with this, the number of -sized steps to reach the boundary is 3 to 4 orders of magnitude higher for m > 2 than for m < 2. The qualitative difference is most striking for the "MTL" case: for m ≤ 2, one can still distinguish the three letters while for m > 2, the "T" almost entirely disappears so that "MTL" (Montreal) turns into (the city of) "ML" (Machine Learning).
The qualitative change of behaviour observed at m = 2 can be understood by studying the contribution to the potential TN,m of ligands with very small binding times τ ∼ 0. Assuming without loss of generality that only two types of ligands are present (agonists τ1 > τc and self τ2 = τ , similar to equation 5), an expansion in τ gives, up to a constant, TN,m ∝ −τ m for small τ (see Fig. 3 D for a representation of this potential and Appendix 2 for this calculation). In particular, for 0 < m < 1, ∂T N,m ∂τ ∝ −τ m−1 diverges as τ → 0, which corresponds to the steep gradient described above for adversarial perturbations. In this regime, the ligands close to τ ∼ 0 follow the steep gradient to quickly localize close to the minimum of this potential (unimodal distribution of ligand for small m on Fig. 3 B, C). The potential close to τ ∼ 0 flattens for 1 < m < 2, but it is only at m = 2 that a critical point appears at τ = 0, and an inflexion point (square) appears in between the minimum (circle) and τ = 0 (Fig 3 D). This explains the sudden broadening of the ligand distribution, and the associated increase in the number of steps to reach the decision boundary. For m > 2, ligands close to 0 are pinned while only ligands with large enough ∂T N,m ∂τ can efficiently move towards the minimum (which is closer to the boundary as N, m increases). Flatter potentials are obtained for large N, m, which explains the many required iterations to reach the boundary.
The change at m > 2 is strongly reminiscent of the transition observed by Krotov and Hopfield in their study of gradient dynamics similar to eq. 6 for rectified polynomials with increasing degree n [18], applied on digit classifiers. This is best visible in Fig. 3 B. For m < 2, all "background" ligands below threshold only slightly change their τ , corresponding to a broad non-specific antagonizing effect, reminiscent of the speckled pattern for low n in Fig. 3 A. The distribution of ligands in Fig. 3 B then barely changes and stays unimodal. Conversely, for m > 2, the main antagonizing effect comes from a specialized subpopulation of ligands, corresponding to the appearance of a bimodal distribution where some ligands "localize" at the maximally antagonizing τ just below threshold (minimum of TN,m in Fig. 3. D). Similarly, only the subset of pixels that spatially correlate with the initial digit change value in Fig. 3 A for big n. In Fig. 3 C, the same dichotomy between global and specialized antagonism is observed. For m < 2 the pictures barely change, indicating a non-specific antagonizing effect, while for m > 2, only the part corresponding to the "T" part of the pictures changes while the black background of the "M" and "L" remains unchanged, which is again reminiscent of the ambiguous digits on Fig. 3 A for big n.

Discussion
Using gradient-based methods from machine learning we can phenomenologically relate adversarial examples to ligand antagonism. Simple models are fooled by gradient-based methods and mitigate the effects by tilting the decision boundary with kinetic proofreading. Gradient descent close to the boundary can display two qualitative behaviours reminiscent of what is observed in machine learning, which we further characterize by the appearance of a critical point for m ≥ 2 for ligands at τ = 0. Interestingly, the models of adaptive proofreading presented here were first generated with in silico evolution [20]. Strong antagonism naturally appeared in the simplest simulations and required modification of objective functions very similar to what has been done at the same time for adversarial examples in machine learning [11]. Both ligand antagonism and adversarial examples appear to be instances of the general phenomenon of fooling the classifier. Adaptive proofreading models as presented here are arguably the simplest instance of this phenomenon, amenable to analytical studies and helpful to build our intuition to perturbations in decision-making.
A caveat of our approach is that in biophysical models a clear decision axis in the τ direction exists, which is not usually the case in machine learning. Here the algorithm has to effectively learn representations, such as pixel statistics and spatial correlations in images [16]. Case in point, a spatial transformation was recently proposed as an adversarial attack to exploit such adversarial directions [33]. However, underlying, low manifold descriptions could still combine higher level information in ways similar to our individual parameters τ so that the theory presented here could still apply once those directions are discovered.
Many internal biological systems have evolved to perform decision-making, and it is fascinating that quantitative studies of such systems allow for connections with machine learning. From the biology standpoint, it means that deep insights might come from the general study of computational systems built via machine learning. Our study of Fig. 3, inspired by gradient descent in machine learning, suggests that changing agents can present themselves in two distinct regimes when confronted with cellular detectors. Biochemically, antagonism manifests itself via broadly distributed antagonizing ligands (m < 2) or via specific optimization of ambiguous antagonists (m > 2), depending on the detection mechanism used by immune cells. From the defence standpoint, the case m ≥ 2 appears to be much more resistant to adversarial perturbations, and thus would be most relevant in an immune context where detectors (immune cells) have to filter out antagonistic perturbations. This might be relevant for the pathology of HIV infections [15,24,13] or, more generally, could provide explanations on the diversity of altered peptide ligands [32]. The case m < 2 with a steep gradient might be more relevant in signaling contexts, where it might be valuable to separate well mixtures of inputs. For olfaction it has been suggested that such strong antagonism allows for a "rescaling" of the distribution of typical odor molecules, ensuring a broad range of detection irrespective of the quantity of molecules presented [26].
On the machine learning side, new inspirations coming from biology are not restricted to classical sensory systems or neuroscience, but may also come from cellular decision-making, such as immune recognition. Our results in Fig. 3 suggests that flattening of the adversarial directions via a critical point might be key to resisting adversarial perturbations, yielding qualitative changes in the dynamics towards the decision boundary. Those are characterized by a subpopulation of inputs moving towards the boundary to define samples ambiguous to a human observer. This is reminiscent of the perturbed animal pictures fooling humans [4] e.g. with chimeric images that combine different animal parts (such as spider and snake). Lastly, one can show mathematically in the biophysical context that it is not possible to fully get rid of antagonism close to the boundary [6]. Bypassing this requires that the "signal" of the agonist ligands is strong and far enough removed from the boundary (which can be done using many proofreading steps). Similar inevitability theorems might be generalizable to machine learning.

Appendix 1
Boundary tilting in digit classification Appendix 1 Figure 1. Boundary tilting in digit classification. A) 3s and 7s. (i) Typical 3 and 7 from MNIST. (ii) Average 3, 7 of the traditional test set, (iii,iv) with adversarial perturbation, found by (v) subtracting the sign of3 from7, which corresponds closely to (vi), the perturbation found with FGSM B) Projection of 3s and 7s on its PCs. The classes are separated by the blue line from a linear Support Vector Machine, and the triangle and star show the average of the classes with and without adversarial perturbation. From (i) to (iv) we have cycled through permutations of perturbing training and/or test set with their specific adversarial perturbation. On the right panels, note how the boundary has tilted and the triangle moved away from the star parallel to the decision boundary.
The goal of this section is to illustrate on a very simple example boundary tilting in a machine learning context, namely the MNIST dataset as explored first in [11], which happens to be almost linear.
We are interested in a binary classification problem (response vs. no response), thus, it suffices to take a subset of 3s and 7s from MNIST. Typical 3's and 7's are shown in Fig. 1 A (i-iv). Tanay and Griffin [30] pointed out that the adversarial perturbation generated with the Fast Gradient Sign Method (FGSM) proposed in [11] can also be found via D = sgn (3 −7), Fig. 1 A (v). Note its similarity to the FGSM adversarial perturbation sgn (w) = sgn (∇xJ) ( Fig. 1 A (vi)).
To reveal the linearity of binary digit discrimination, we computed the principal components (PCs) of the traditional training set of 3s and 7s, and projected all digits in the test set on PC1 and PC2 (Fig. 1 B). With a Support Vector Machine (ordinary linear regression) trained on the transformed coordinates PC1 and PC2 of the training set, we achieve over 95% accuracy in the test set. While such accuracy is far from the state-of-the-art in digit recognition, it is much higher than typical accuracy of detection for single cells (e.g. immune cells present false negative rates of 10 % for strong antagonists [2]) The red and blue star in the figure denote the average digit3,7.
Next, we transformed the test set as 3 → 3 = 3 − testD, 7 → 7 = 7 + testD, where test = 0.4 is the strength of the adversarial perturbation ( Fig. 1 A (iii)).3 and7 moved towards each other in Fig. 1 B, orthogonal to the decision boundary and along the line between the initial averages. This adversarial perturbation moves the digits in what we call an adversarial direction perpendicular to the decision boundary, and reduces the accuracy of the linear regression model to a mere 69%.
Goodfellow et al. proposed adversarial training as a method to mitigate adversarial effects by FGSM. We implemented adversarial training by adding the adversarial perturbation trainDtrain = train(3train −7train) to the images in the training set, computing the new PCs and training the linear regression model. This effectively "tilts" the decision boundary, while keeping 95% accuracy. In the presence of the original adversarial perturbations, we see the effect of the tilted boundary: the perturbation moves digits parallel along the decision boundary, thereby preserving the good classification accuracy, giving a simple example of the more general phenomenon studied in [30].

Gradient in the L 2 direction
We recall results from [7] to show how the addition of subthreshold ligands one at a time changes the output. We first consider {L1, τc} threshold ligands with output The main result of [7] is the linear response of TN,m(L1, τc) to the addition of {L2, τc − } subthreshold ligands.
where we used the definition for the coefficient in a mean-field description. As the derivative d dτ TN,m(L, τ ) τ =τc > 0, and = τ2 − τc, each additional subthreshold ligand at τ2 decreases the output with a value proportional to In the case (N, m) = (1, 0), the mean-field approximation is exact, i.e. the first derivative of dT dτ is the only nonzero derivative, given by With the addition of a single subthreshold ligand τ2 ∼ 0, so that ∼ τc, the output is maximally reduced by τc τc L 1 , a finite quantity, as described in the main text. For higher m, the linear approximation holds only for ligands at τ2 close to threshold.

Appendix 2 Gradient descent towards the boundary
Our immune model is well-suited to characterize the decision boundary between two classes, because of the analytical classifier. We want to know how to most efficiently change the binding time of the short binding self to cause the immune model to reach the decision boundary. We have taken inspiration from [18] and adapted our approach from the iterative FGSM [19]. At first, we sample Ls self ligands from a normal distribution folded around τ = 0 and LAg agonist ligands from a narrowly peaked normal distribution above τc. The agonist ligand distribution, the "signal" in the immune picture, remains constant. Next, we bin ligands in M equally spaced bins τ b , b ∈ [1, M ], and we compute the gradient for those bins for which τ b < τc where L b is the number of ligands in the b th bin. We subtract this value multiplied by a small number from the exact binding times, as in Eq. 6 in the main text, and we compute a new output TN,m. We repeat this procedure until TN,m dips just below the response threshold τ N −m c . Finally, we display the ligand distribution and the immune pictures, like we did in Fig. 3 in the main text. The reason why we bin ligands and compute the gradient in batches is to prevent the gradient from becoming negligibly small. If we would compute the gradient for each ligand with an individual binding time, there would be exactly one ligand with that specific binding time, and because the gradient scales with L, we would need to go through many more iterations. Decreasing the binsize and step size may enhance the resolution, but is not required. We found good results by considering bins with a binsize of 0.2s and = 0. For the immune pictures at random ligand order in Fig. 3 B, we have drawn L self = 7000 from {τs} ∈ |N (0, 0.33)| and Lag = 3000 from {τag} ∈ N (3.5, 0.1). For the MTL → ML transition in Fig. 3 C, we have distributed the pixels in the 179x431 frame -appropriately set equal to R, the number of receptors -as L self = 0.60R, Lanta = 0.12R, Lag = 0.28R. We sampled self ligands from {τs} ∈ |N (0, 0.33)|, antagonists from {τanta} ∈ τc−|N (0, 0.33)| and agonists from {τag} ∈ |N (3.5, 0.01)|, and set τc = 3. The picture is engineered such that the agonist ligands fill the M and the L, the antagonists fill the T (which is why its color is slightly darker than the M and L). The self ligands fill the area around the letters M, T and L, such that the self with highest binding time surround the T. We have chosen this example to make the effect of proofreading explicit (and of course because we are based in Montreal and study Machine Learning). This result is generic, and the ambiguity of the true decision boundary can be visualized with any well-designed image.

Effect of agonist binding times
For a given initial distribution, the distribution at the boundary is robust to parameter values, although neither mean nor variance of the agonist distribution can be too large. In that case, for high N and m, the decision boundary cannot be reached, even with all self ligands employed at the maximally antagonizing τ . A prediction from our model is that with sufficient Lag at a high enough τag a fixed number of self ligands L self are not able to antagonize the response. This is a way to escape the inevitability theorem, which states that close to the boundary, there always exists an adversarial perturbation that causes misclassification, no matter how strong the adversarial defence is. Perceptibly changing the signal naturally allows for a change of classification, but these macroscopic perturbations are no longer small or imperceptible, and do not fall under the umbrella of adversarial examples.
If we allow for changes in {τag}, we find that after the final iteration, {τag} has gone below τc, such that we are not capturing antagonistic effects anymore, instead we trivially find that we lose response when there is no more signal to respond to. We could have taken a lower limit for agonists, i.e. {τag} ≥ Θ, but the value of Θ is arbitrary and not easily justifiable. For one, it should be larger than τc to still provide a net weight to the response class agonist ligands. In Fig. 1 we have computed the distribution at the boundary in the case where we allow agonists to change binding times until a threshold. We set Θ = 3.1s to be the center of the first bin past τc, sample L self = 6000 from {τs} ∈ |N (0, 0.33)| and Lag = 4000 from {τag} ∈ 10 − N (0, 0.1). Even at m = 0, we have to go through some iterations before reaching the boundary: we are sufficiently far away from the boundary due to the long binding agonists. The agonist ligands change binding times quickly because the gradient is much steeper for agonists with τ b > τc than for nonagonists. This is also clear from the antagonism potential in Fig. 3 D in the main text: above τc the gradient gets only steeper. For m ≥ 1, all agonists congregate in the first bin τ b ∼ τc. We observe an overall graying of the immune pictures at m = 1, which is undone at higher m when only a subset of nonagonist ligands changes binding time and antagonizes the response. This gives the typical bimodal distribution at larger (N, m), again due to the flat gradient at τ ∼ 0. It provides more evidence for the appearance of a critical point in a robust adversarial defence.
Appendix 2 Figure 1. Characterization of the decision boundary when agonists are not constant.

Behaviour for small binding times
Consider a mixture with L1 ligands at τ1 > τc and L2 ligands with small binding time τ2 → τ = τ1 τ1. To understand the behaviour of TN,m as a function of τ we expand TN,m in small variable = τ τ 1 as