Quantitatively Visualizing Bipartite Datasets

As experiments continue to increase in size and scope, a fundamental challenge of subsequent analyses is to recast the wealth of information into an intuitive and readily interpretable form. Often, each measurement conveys only the relationship between a pair of entries, and it is difficult to integrate these local interactions across a dataset to form a cohesive global picture. The classic localization problem tackles this question, transforming local measurements into a global map that reveals the underlying structure of a system. Here, we examine the more challenging bipartite localization problem, where pairwise distances are available only for bipartite data comprising two classes of entries (such as antibody-virus interactions, drug-cell potency, or user-rating profiles). We modify previous algorithms to solve bipartite localization and examine how each method behaves in the presence of noise, outliers, and partially observed data. As a proof of concept, we apply these algorithms to antibody-virus neutralization measurements to create a basis set of antibody behaviors, formalize how potently inhibiting some viruses necessitates weakly inhibiting other viruses, and quantify how often combinations of antibodies exhibit degenerate behavior.


Introduction
Given a country's geographic map, it is straightforward to determine the distance between any pair of cities.Yet posing this question in reverse (called the classic localization problem) is far more challenging: given only the distances between pairs of cities, can we reconstruct the full geographic map (1)?Across all scientific disciplines, the interactions between vast numbers of entries are routinely measured, yet the deeper relationships underlying these entries only become apparent when recast into a global description of the system.For geographic maps, large tables of city-city distances are less interpretable than a 2D map positioning cities relative to one another.
To take another example from the field of human perception, the similarity between pairs of colors reveals that reds, greens, blues, and violets cluster together (Figure 1A,left).Yet by embedding these measurements into 2D space (without any additional information about the colors themselves), the colors naturally form into a highly-intuitive color wheel (Figure 1A,right).This representation greatly reduces the complexity of the system, enabling us to hypothesize how new colors would be perceived and predict trends in the data (e.g., that each color has a maximally-distant "complementary color" on the opposite side of the wheel).
When systems have such a simple underlying structure, we intuitively expect that a straightforward algorithm can dissect the pairwise distances and recover the global embedding.Indeed, for complete and noise-free data this can be achieved in two steps: the first centering the distances to reveal a matrix of inner products, and a second step using the singular value decomposition to determine the coordinates (Appendix A.1) (2).For noisy or partially-missing data, numeric minimization (3,4) and semidefinite programming relaxations (5)(6)(7)(8) have been developed to drive nonlinear dimensionality reduction (7), nuclear magnetic resonance spectroscopy (9,10), and sensor network localization (4-6, 8, 11).
In this work, we consider a twist on this classical problem that we call bipartite localization, where a bipartite dataset consists of two classes of entries, and interactions can only be measured between (and not within) each class.Since previous methods are poorly suited to handle bipartite data (12,13), we modify existing methods and tailor them for bipartite localization.In particular, we discuss two variants of the popular multidimensional scaling (MDS) algorithm -metric MDS and bipartite MDS -as well as a semidefinite programming (SDP) approach (6).Each method has its own advantages: metric MDS is the simplest and most flexible numerical framework, bipartite MDS provides a closed-form solution up to an affine transform, and SDP uses a convex-relaxation that is harder to trap in local minima.For reproducibility, we create a GitHub repository with example code for each algorithm.
As a proof of principle, we apply these methods to the pressing issue of antibody-virus interactions, where multiple antibodies are assessed against panels of virus mutants (Figure 1B).Unlike many previous efforts that either exclusively visualized the viruses or the antibodies (23,24) or required data to be normalized (25), we embed both types of entries into a shared space that directly corresponds to experimental measurements.The resulting map presents a natural context to probe features such as clustering and to explore the tradeoffs and inherent constraints of the system.Through these embeddings, we collapse the complexity of datasets into a readily interpretable and quantitative framework.

The Need for Embedding Algorithms
Before exploring the algorithms, we motivate the need for such embeddings by describing several potential applications.To ground this discussion, we suppose the bipartite classes represent antibodies and viruses (with distances describing antibody-virus interactions), although these applications generalize to any bipartite dataset.
First, an embedding combines datasets and predicts unmeasured interactions.For example, we cannot directly compare an antibody measured against viruses #1-6 with a second antibody measured against viruses #7-12 (top two rows in the Figure 1B dataset).Yet by embedding both antibodies, we   4.1 of ( 26), with distance=1-(dissimilarities in table).(B) Embedding antibody neutralization against strains of the influenza virus.In this case, only antibody-virus distance can be measured experimentally, and some distances are missing (tan).Viruses are colored from lightest-to-darkest hues (oldest to more recent strains; full data in Figure S1).Adapted from Figure 5 of ( 27), with distance=log 10 ( 50% Neutralization 10 −10 Molar
predict their behavior against all viruses in the dataset.Hence, embeddings represent a form of matrix completion (28,29).
Second, an embedding defines the intra-class distances between any two viruses (or two antibodies), a quantity that by definition cannot be directly measured through antibody-virus interactions.This intra-class distance describes how differently any antibody can neutralize the two viruses (i.e., essentially quantifying their cross-reactivity).In the limit where two viruses lie on the same point, they will be neutralized identically by all antibodies; when the two viruses lie far apart, their neutralization can greatly differ.
Third, the inferred virus-virus distances are crucial when designing future experiments.Viruses that are close together offer redundant information, whereas sampling viruses that are spread out across the map can detect more distinct antibody phenotypes.
Fourth, an embedding defines a basis set of behaviors, which is essential for systems where no mechanistic models exist.For example, there is a dearth of models that enumerate the space of antibody behaviors (30)(31)(32), which hinders theoretical exploration into features such as the optimality or degeneracy of the antibody response (both of which we address later in this work).
Finally, embeddings provide a fundamentally different vantage to study a system, and this shift in perspective could help uncover its underlying rules.For example, the complex sequence-to-function relationship of viral proteins may be simpler to crack within a low-dimensional embedding.Similarly, the antibody response changes with each viral exposure, and the dynamics of how each antibody evolves may be more readily understood within the context of an embedding.

Algorithms
We next develop the algorithms to transform pairwise interactions into a global map of a system.In bipartite embedding, we seek to recover the bipartite set of points {x * i } m i=1 , {y * j } n j=1 ⊂ R d given the noisy distance matrix D ∈ R m×n of the form where distance is only measured between the {x * i } and {y * j }.D * ij represents the true distance that is perturbed with independently and identically distributed random noise ij .The goal is to use the noisy D ij with (i, j) ∈ E, where E represents the subset of measured values, to find an embedding {x i } m i=1 , {y j } n j=1 that approximates the true embedding {x * i }, {y * j }.In the following sections, we describe three algorithms to tackle this problem.

Metric Multidimensional Scaling (Metric MDS)
The straightforward numerical approach is to randomly initialize each x i and y j , and then apply numerical methods (e.g., gradient descent or differential evolution) to match their coordinates as closely as possible to the distance matrix.In this paper, we use the least-squares loss function min although we note that other loss functions can strongly affect the embedding (Figure S2).While this method is simple to implement, it is liable to get trapped in local minima and does not harness the underlying structure of the bipartite data.

Bipartite Multidimensional Scaling (Bipartite MDS)
In stark contrast, bipartite MDS provides a closed-form solution (up to a rigid transform) for noise-free and complete data.Although variants of the classical monopartite problem have been developed to deal with large datasets and noisy measurements (33), to our knowledge this technique has not been extended to complete bipartite data.
The key insight underlying classic MDS is that the doubly-centered squared-distance matrix is intimately related to the inner products (Gram matrix) of the embedded points.More precisely, we define the centering matrix that subtracts the mean from any vector, where I k is the k × k identity matrix and 1 k is the all-ones vector of size k (with J k 1 k = 0).Consider the complete noise-free bipartite graph, where • denotes entrywise multiplication.Double-centering reveals the inner products of the embedding where in the second equality we assume without loss of generality that the points in X * are centered at the origin (J m X * = X * ).

the embedding
Steps: 1. Define a complete distance matrix D equal to D at measured values, with missing values filled in using the mean of all observed entries in the same row and column 2. Compute the double-centered matrix, and t V by minimizing the difference between D ij and x i − y j using non-convex numerical minimization or SDP (see Appendix A.4) The rank d singular value decomposition (SVD) of the double-centered squared-distance matrix, U ΣV T = − 1 2 J m (D * • D * )J n , determines the embedding of X * and Y * up to linear transforms, for some matrices ) are determined by utilizing the distance information x * i − y * j = D * ij and minimizing (2) using semidefinite programming or numeric minimization (Appendix A.4).
In summary, this algorithm reduces the embedding problem with (m + n)d unknown variables into the simpler problem of determining the d 2 + d unknown variables in A V and t V , regardless of the size of D! This same approach can be used for a noisy distance matrix D (Algorithm 1).A caveat of this method is that it cannot readily handle missing values.In the numerical experiments below, we first fill in any missing values using the mean of all observed entries in the same row and column of the distance matrix -this leads to poor behavior when a substantial fraction of values are missing, which can be ameliorated with metric MDS post-processing (Figure S3).

Semidefinite Programming (SDP)
Lastly, we investigate an intermediate algorithm that harnesses the bipartite nature of the data to perform a more robust numerical search.More precisely, by forming a positive-semidefinite matrix, we can adapt the sensor network localization SDP algorithm (6) and utilize efficient conic solvers for bipartite embedding (34,35).We define the combined coordinates We further define the inner product matrix G ∈ R (m+n)×(m+n) as so that the squared-distance between x i and y j can be entirely written in terms of the entries of G, namely, Note that we can exactly recast the optimization over X and Y in terms of an optimization over a positive semidefinite matrix G of rank d.The goal is then to minimize Algorithm 2 Semidefinite Programming (SDP) Steps: 1. Solve G ∈ R (m+n)×(m+n) from Equation 102.Compute the top d SVD, G = U ΣU T .The embedded coordinates {x i } are given by the first m rows of U Σ 1/2 while {y j } are given by the final n rows in terms of G. To this end, we introduce an extra error matrix E ∈ R m×n and minimize over the sum of errors: minimize The final constraint ensures that the X coordinates are centered at the origin, removing their translational degree of freedom.Note that to achieve this convex conic program, we removed the non-convex rank(d) constraint of G, which must now be added back.Thus, we apply an SVD to G of rank d, G = U ΣV T .
The resulting m + n coordinates are given by As with metric MDS, missing values are seamlessly handled in SDP since the objective in Equation 10is restricted to the measured distances.As shown in the following sections, SDP often recovers a better embedding than metric or bipartite MDS, especially when there are many missing values.Note that we specifically chose a different loss function for metric MDS (Equation 2, optimized for systematic noise) and SDP ( (i,j)∈E | x i − y j 2 − D 2 ij |, optimized to handle outliers) in order to explore the diversity of embedding behaviors.When analyzing datasets, it is worth trying multiple loss functions to determine which one best characterizes the system (Figure S2C).For completeness, we note that bipartite MDS is a closed-form method that does not explicitly use any loss function.

Numerical Experiments
We first assess the three embedding algorithms -metric MDS , bipartite MDS, and SDP -using simulated data with m = 20 entries x i and n = 20 entries y j (each chosen uniformly on [−1, 1] × [−1, 1]).These points generate the true distance matrix, which we then perturb and use as the input matrix D. The accuracy of the resulting embedding is calculated using the RMSE of Euclidean distances, ( 2 )/(m + n), between the estimated and true coordinates (once aligned via a rigid transform).

Systematic Noise and Missing Values
To generate the input matrix D, we perturb each entry of the true distance matrix by adding a random value uniformly chosen from [−σ, σ] (x-axis) and withhold a fraction f Missing of randomly selected entries (y-axis).Of the three algorithms, SDP exhibits the most robust behavior in the presence of missing values (Figure 2), and in the noise-free case along the y-axis it undergoes a phase transition from near-perfect recovery when f Missing ≤ 0.6 to noisy recovery (Figure S4A).In contrast, the error of bipartite MDS increases nearly proportionally to f Missing , since each missing value must be initialized as the row/column mean which effectively perturbs the distance matrix.Metric MDS also finds poorer embeddings with larger f Missing , as it occasionally gets trapped in local minima (even in the low-noise limit).Bipartite MDS SDP When D is fully observed along the x-axis, the error increases approximately linearly with noise for all three algorithms (RMSE ≈ σ/2, Figure S4B), although metric MDS displays somewhat erratic behavior as it may get stuck in local minima.The bottom panels in Figure 2 show example embeddings in the intermediate regimes when σ = 0.1 and f Missing = 0.6 (purple) or when σ = 0.6 and f Missing = 0.1 (brown), with gray lines connecting the true coordinates to their numerical approximations.

Metric MDS
In terms of overall performance, the region of near-perfect recovery is largest for SDP followed by bipartite MDS and metric MDS (Figure 2).One way to improve these algorithms is to combine them, for example, by using SDP or bipartite MDS to initialize the coordinates in metric MDS.These combined algorithms substantially improve embedding accuracy, allowing bipartite MDS to handle missing values and extending the capability of SDP to embed noisy measurements (Figure S3).

Handling Large Outliers and Bounded Measurements
In addition to noisy measurements, datasets may contain outliers that distort an embedding.Bipartite MDS is highly susceptible to large outliers, which can corrupt the largest singular vectors of the squareddistance matrix (Figure 3A).In contrast, SDP minimizes the sum of absolute (un-squared) deviation (36), and such loss is far more robust against gross corruptions.Metric MDS exhibits intermediate behavior, although we note that the choice of loss function heavily influences this behavior (Figure S2).
Lastly, we explored each algorithm's tolerance to distances given as upper or lower bounds, which can arise when an experiment measures a value outside of its dynamic range.Figure 3B shows the embedding from the same distance matrix, now modified to represent 30% of measurements as upper or lower bounds.In this complete and noise-free case, both metric MDS and SDP can directly utilize these bounds to generate near-perfect reconstructions.In contrast, bipartite MDS cannot directly incorporate bounded data, and hence we replace each bounded measurement by the bound itself, which leads to worse reconstruction.

Analysis of Antibody-Virus Measurements
We next applied these embedding algorithms to an influenza dataset where the neutralization from 27 stem antibodies was measured against 49 viruses that circulated between 1933-2019 (Figure S1).
The following section transforms these experimental measurements into map distances to embed these antibody-virus interactions, while all subsequent sections utilize this embedding to probe the antibody response.

Transforming Antibody-Virus Measurements into Distances
For each antibody-virus pair, the inhibitory concentration required to neutralize 50% of virus particles (IC 50 in Molar units) was measured, with lower values signifying a more potent antibody (27).IC 50 s ranged from 8.6 • 10 −11 M (very strong neutralization) to >1.6 • 10 −7 M (weak neutralization outside the range of the assay).
To briefly describe the biological context for this dataset, each of the 27 antibodies targets the stem region of hemagglutinin, one of the key surface proteins on the influenza virus.This stem domain is highly conserved, and antibodies targeting it can neutralize very diverse viruses; for example, some antibodies measurably neutralize both the H1N1 and H3N2 influenza subtypes, which is rarely seen in antibodies targeting the head domain of this same viral protein (37).
Yet even these broadly neutralizing antibodies have limits.Antibodies that potently neutralize H1N1 viruses tend to weakly neutralize H3N2 strains (and vice versa), while antibodies that neutralize all viruses tend to have intermediate effectiveness.These trends hint that there is an underlying tradeoff between antibody potency (how much a virus is neutralized) and breadth (how many diverse viruses can be neutralized).Such patterns are difficult to directly discern from a table of pairwise interactions, yet they naturally emerge through an embedding.
To that end, we first converted these antibody-virus neutralization measurements into distances.Antibodies typically have IC 50 s > 10 −10 M (since selection does not act below this point (38,39)), and hence we define antibody-virus distance as D ij = log 10 IC50 10 −10 M (Figure 4A).We then applied all three embedding algorithms to create a global map of the system.Since both the dimensionality and the ground truth coordinates are not known, we assessed each algorithm through cross validation (training on 90% of data, testing on the remaining 10%).
Metric MDS performed the best in all dimensions and exhibited a sharp "elbow" at d = 2, suggesting that a 2D landscape captures the underlying structure of the system (Figure 4C).We note that the 2D cross-validation RMSE was 0.44 (Figure 4D), so that withheld neutralization measurements are predicted within 10 0.44 ≈ 3-fold, which is comparable to the noise of the neutralization assay.

Designing Optimal Antibody Cocktails
The resulting map provides a powerful way to quantify trends in the data (Figure 4B).For example, the H1N1 viruses [green] and H3N2 viruses [blue] cluster together, as expected based on their genetic similarity.Interestingly, the centers of these clusters are ≈ 2.5 map units apart, demonstrating that while antibodies can be highly potent against H1N1 or H3N2 viruses, no antibody in the panel could strongly neutralize both subtypes.
Similar to the color wheel example in Figure 1A, the antibody-virus embedding not only represents the entities in this specific dataset, but also describes other potential antibodies and viruses.For such entities, the embedding serves as a discovery space to quantify and constrain their behavior.
For example, within this framework we can design a mixture of n antibodies that optimally neutralizes the 5 viruses at the top of the H1N1 cluster as well as the 5 viruses at the top of the H3N2 cluster as potently as possible (Figure S5).This question lies at the heart of ongoing efforts to find new broadly-neutralizing antibodies, yet few methods exist to predict or even constrain antibody behavior.
To that end, we use each point on the map to describe a potential antibody whose neutralization against each mapped virus is determined by its map-distance.This reduces the complex biological problem of enumerating antibody behavior to a straightforward geometry problem.
The best n = 1 antibody mixture against these 10 viruses is represented by the center of the smallest circle that covers every virus (Figure S5, distance ≤ 1.4 [IC 50 ≤ 10 −8.6 M] for each virus).For a mixture with n = 2 antibodies, the potency dramatically improves by using one H1N1-specific antibody and one H3N2-specific antibody (distance ≤ 0.3 [IC 50 ≤ 10 −9.7 M] for each virus).This problem can be readily extended to mixtures with an arbitrary n antibodies covering any set of mapped viruses.Given the growing number of efforts to find broadly neutralizing antibodies (40)(41)(42)(43), it is essential to have some framework to estimate the limits of antibody behavior.Such estimations inform when the antibodies already discovered are near the theoretical best behavior (and hence further searching is less likely to lead to significant improvement) or when there are alleged antibodies that could perform orders of magnitude better than what we have currently seen (44).
Nevertheless, quantifying the degree of antibody degeneracy becomes tractable through an embedding.Such analyses necessarily make the strong assumption that every point on the map represents a viable antibody.Moreover, there may be other antibody phenotypes (e.g., from highly-specific hemagglutinin head-targeting antibodies) that are not represented by any point on the map; in essence, the embedding serves to locally extrapolate antibody behavior based on the specific interactions provided as input (Figure 4A).Yet with these caveats, we can explore how often a mixture made within this space of antibodies can be mimicked by a single antibody.
We describe an antibody mixture by n points in Figure 4B, with the i th antibody neutralizing the j th virus with an IC ij 50 = 10 −10+Dij dictated by the map distance D ij between the antibody and virus.Since all antibodies in our panel bind to the same region of the hemagglutinin stem (45)(46)(47), we treat their binding as competitive, so only one antibody can bind to each hemagglutinin monomer at a time.Thus, a mixture's neutralization against virus j is given by 11) where f i represents the fraction of antibody i in the mixture (with i f i = 1).A diluted antibody with small f i will effectively have a weaker (larger) IC 50 , which in the embedding translates to an extra "distance handicap" of log 10 f i added to its distance from any virus.We note that this binding model has been verified on antibody mixtures from this specific panel (44) and on other datasets (48,49).For simplicity, we restrict ourselves to equimolar n-antibody mixtures (f i = 1/n).Given a specific mixture (n random points on the map, sampled near the H1N1 and H3N2 clusters), we quantify the closest approximating single antibody (another point on the map) by scanning through every possible location and minimizing the average fold-difference between the mixture's and antibody's neutralization profiles across all viruses.Figure 5A shows a mixture of 2-antibodies (gray), one of which is potent against the blue H3N2 viruses on the left of the map and the other potent against the green H1N1 viruses, that behave nearly identically to a single antibody (red) in the middle of the map.While a few viruses are neutralized differently by the mixture and antibody (vertical black lines, right panel of Figure 5A), on average the antibody's IC 50 s are within 1.6-fold of the mixture's values against these 50 diverse viruses.This discrepancy is comparable to the ≈2-fold error of the assay, and hence given either neutralization profile, we could not determine whether it arose from an individual antibody or a mixture.
Higher-order mixtures unlock more unique behaviors that cannot be replicated by an individual antibody.For example, not only does the 4-antibody mixture in Figure 5B show a 3.6-fold difference from the nearest approximating antibody, but the mixture's measurements are systematically lower across nearly all viruses.Thus, neutralization profiles exhibiting such strong breath are indicative of multiple antibodies.
To systematically explore degeneracy, we sampled 100 antibody mixtures for each n (with 2 ≤ n ≤ 10) and found the closest approximating single antibody.The resulting distributions of the mean folddifference are shown in Figure 5C.While 2-antibody mixtures tend to resemble individual antibodies, higher order mixtures often exhibit distinctive profiles with a fold-difference > 2 to the closest approximating antibody.By the time n ≥ 5 antibodies are combined, the likelihood that they match any single antibody becomes exceedingly rare.

Discussion
Embedding algorithms fill a "hole" in our understanding by transforming local pairwise interactions into a global map.Such algorithms have been used to identify when a new viral variant arises, quantify drug-protein interactions, and distinguish between cell types (25,50,51).Yet we propose that such algorithms also provide the groundwork for new theoretical studies that only become possible when we reveal the underlying structure of a system.
In the context of antibody-virus interactions, an embedding provides a rigorous approach to extrapolate available measurements.Each point describes a potential antibody, and the entire map defines a basis set of antibody behaviors.By coupling these data-driven results with a biophysical model of how antibodies collectively act, we can model higher-order mixtures and pave the way to study the complex array of antibodies within each person.
More work is needed to understand the limits of these embeddings and quantify their predictive power.At the same time, we are just beginning to scratch the surface on aspects of the antibody response that can be probed with these embeddings, from designing antibody cocktails to determining how the antibody response evolves on the map with each viral exposure.
As datasets continue to grow in size and complexity, it becomes increasingly important to quantitatively visualize interactions between entities.Future datasets may require multi-localization, where higher-order interactions (e.g., between a ligand and multimeric receptor (22); antibodies, antigens, and cell receptors (52); or single-cell multi-omics datasets (53)) are embedded in a low-dimensional space.3A).(B) Large systematic noise is handled better by mean squared error, since this represents the maximum likelihood estimator for approximately-Gaussian error (see σ = 1, fMissing = 0 from Figure 2).(C) Cross-validation for the influenza data in Figure 4 is slightly lower for metric MDS with mean squared error for embeddings with dimension ≥ 2.

Figure 1 .
Figure 1.Embedding monopartite or bipartite data in Euclidean space.(A) The perceived similarity between colors recovers the canonical color wheel.Adapted from Table4.1 of (26), with distance=1-(dissimilarities in table).(B) Embedding antibody neutralization against strains of the influenza virus.In this case, only antibody-virus distance can be measured experimentally, and some distances are missing (tan).Viruses are colored from lightest-to-darkest hues (oldest to more recent strains; full data in FigureS1).Adapted from Figure5of (27), with distance=log 10 ( 50% Neutralization

Figure 2 .
Figure 2. Performance on a simulated dataset.Top, Phase diagram of embedding error as a function of the element-wise noise σ of the distance matrix and the fraction fMissing of missing entries for metric multidimensional scaling (metric MDS), bipartite multidimensional scaling, and semidefinite programming (SDP).Error is computed as the average Euclidean distance between the numerical and true coordinates (aligned using a rigid transform).Diagrams show the average of 10 runs, and the metric MDS results were smoothed because its embedding accuracy was erratic.Bottom, Examples of the embedding when σ = 0.1 and fMissing = 0.6 (purple box) as well as σ = 0.6 and fMissing = 0.1 (brown box) for each method.Edges connect the numerical coordinates to the true embedding.

Figure 3 .
Figure 3. Embedding with outliers and bounded data.(A) Embedding a noise-free distance matrix D with three highly-corrupted measurements (highlighted in red).(B) Embedding a distance matrix where 30% of entries are replaced with upper or lower bounds (blue and purple).

Figure 4 .
Figure 4. Mapping influenza antibody-virus interactions.(A) Experimentally measured distance matrix between 27 antibodies and 49 influenza viruses (27).(B) The metric MDS embedding in 2D.(C) 10-fold cross validation RMSE (calculated using the distance (D) Example of 2D cross validation for each method, demonstrating that metric MDS performs the best.

Figure 5 .
Figure 5. Degeneracy of antibody mixtures.Examples of (A) a 2-antibody mixture that behaves like a single antibody and (B) a 4-antibody mixture that exhibits distinct behavior from any individual antibody.Left, the antibodies in the mixture (gray) and the best approximating antibody (red).Right, the neutralization IC50s across all viruses.The fold-difference between the mixture and antibody is shown by the vertical black lines for each virus, with the mean fold-difference given in the bottom-right.(C) For each mixture containing n antibodies (x-axis), we sample 100 equimolar mixtures and quantify their average fold-difference to the nearest approximating antibody.

Figure S1 .Figure S2 .
Figure S1.Annotated influenza antibody-virus data from Creanga et al. (27).(A) Neutralization measurements of 49 influenza viruses against 27 antibodies targeting the stem of influenza hemagglutinin.The inhibitory concentration of antibody needed to neutralize 50% of viruses (IC50, grayscale).Some antibody-virus interactions were not measured (tan), and some antibodies exhibited weak neutralization (IC50 > 1.6 • 10 −7 M, light-blue) outside the dynamic range of the assay.(B) The same 2D metric MDS embedding (as in Figure 4B) with the antibodies and viruses labeled.

Figure S4 .
Figure S3.Post-processing an embedding with metric MDS.As in Figure2, data is simulated with element-wise noise σ and a fraction fMissing of missing entries.The results of each embedding is used to initialize one additional metric MDS, which greatly improves its accuracy.Error is computed as the average Euclidean distance between the numerical and actual coordinates (aligned using a rigid transform).Example plots at the bottom show an embedding when σ = 0.1 and fMissing = 0.6 (purple box) as well as σ = 0.6 and fMissing = 0.1 (brown box) for each method.