Singular vectors of sums of rectangular random matrices and optimal estimation of high-rank signals: the extensive spike model

,

Estimating structure in high-dimensional data from noisy observations constitutes a fundamental problem across many disciplines, especially in the age of big-data.A common scenario is that such data are presented as a large matrix.Such matrices could contain, for example, the observed time series of many recorded neurons in neuroscience, the expression level of many genes across many conditions in genomics, or the time series of many stock prices in finance.Given such data matrices, one often wishes to: (1) understand the structure of the data via its singular value decomposition; (2) denoise the data in order to find clean signals hidden in the data, and (3) estimate the covariance structure of these clean hidden signals.These hidden signals could correspond for example to temporally correlated cell assemblies in neuroscience, gene modules in genomics, or sectors of correlated stocks in finance.
In this work, we develop new RMT tools in order to quantitatively study the basic question of how the singular value decomposition (SVD) of an arbitrary highdimensional hidden signal matrix is deformed under additive observation noise.Based on this understanding of the relation between the data and signal SVDs, we go on to derive both optimal denoisers of the data to recover the hidden signal, as well as optimal estimators for the signal covariance.
An influential line of prior related research has studied spiked matrix models, focusing on an asymptotic limit in which the size of the hidden signal matrix tends to infinity but its rank remains finite [20][21][22][23][24][25][26][27].These works consider the addition of a random noise matrix to the signal to generate a data matrix, and they study how the singular values and singular vectors of the data are related to those of the signal.The finite number of signal eigenvalues or singular values constitute a set of "spikes" in the signal spectrum, hence the name spiked matrix model.
A key observation in these models is that the addition of noise to the signal yields a data matrix that: (1) has inflated singular values relative to the signal, and (2) has singular vectors that are rotated relative to the signal.For the finite-rank rectangular spiked matrix model, both the degree of singular-value inflation and the angle of the singular vector rotation can be explicitly computed [22,23].Notably, in this finite-rank regime, the multiple spikes do not interact as they get deformed from signal to data.This means that to predict the mapping from a given signal singular value to the corresponding data singular value, as well as the angle between a data and signal singular vector, one only needs to know the noise distribution and the singular value of the signal spike in question; one does not need to know all the other signal singular values.This underlying simplicity in the relation between signal and data spectral structure implies that one can optimally denoise data, and optimally estimate signal covariance, by applying shrinkage functions that act independently, albeit non-linearly, on each singular value or eigenvalue of the data [23,28,29].The idea is that these shrinkage functions that shrink data singular values independently, partially reverse the independent singular-value inflation and compensate for the independent singular-vector rotation, both due to additive noise.
In this work, however, we demonstrate that the assumptions and consequences of the finite-rank model may constitute a significant limitation for the practical application of this theory and its associated estimation techniques.For example, below we will see that the spectral structure of random matrices of large size (e.g., 1000 × 500), and of even moderate rank (e.g., 10) cannot be accurately modeled by the finite-rank spiked matrix model.This lack of numerical accuracy of the finite-rank theory for large but finite-size matrices of moderate rank could have a significant impact on the three problems of spectral understanding, data denoising, and signal covariance estimation across the empirical sciences, where the effective rank of signals is expected to vary significantly, and sometimes even be quite high.Therefore, it is imperative to develop a new theory that more accurately describes data containing higher-rank signals.We develop that theory by generalizing the finite-rank theory to an extensive-rank theory in which the rank of the signal matrix is proportional to the size of the signal and data matrices, working in an asymptotic limit where both the size and rank approach infinity.
We note that it is not immediately obvious how to extend existing finite-rank results to the extensive regime.The finite-rank theory [20][21][22] makes use of algebraic formulas for matrices with low-rank perturbations that do not directly generalize, and so one must resort to more elaborate tools from RMT and free probability.Along these lines, powerful theoretical methods have been developed in recent years for studying the eigendecomposition of sums of square Hermitian matrices [30], and deriving techniques for optimally estimating arbitrary square-symmetric matrices from noisy observations [31][32][33][34][35][36].
However the situation for rectangular matrices, relevant to data from many fields including neuroscience, genomics and finance, lags behind that of square matrices.While the singular value spectrum of sums of rectangular matrices has been calculated [37][38][39][40], and a few works have studied optimal denoising of rectangular matrices under a known (usually Gaussian) prior [41][42][43], there are currently no methods for determining the deformation of the singular vectors of a rectangular signal matrix due to an additive noise matrix.
The outline of our paper is as follows.In Section II we motivate our work with an illustrative numerical study of the spiked matrix model, showing that the finite-rank theory fails to accurately predict the outlier singular values and singular vector deformations in data matrices containing even moderate-rank signals.In section III we introduce tools from RMT that we will need to derive our results, including Hermitianization, block matrix resolvents, block Stieltjes transforms and their inversion formulae, and block R-transforms.In Section IV we study how the singular values and singular vectors of an arbitrary rectangular signal matrix are deformed under the addition of a noise matrix to generate a data matrix.To do so, we derive a subordination relation that relates the resolvent of the Hermitianization of a data matrix to that of its hidden signal matrix in Section IV A. We next employ this subordination relation to derive expressions for the overlap between data singular vectors and the signal singular vectors in Section IV B. We then ap-ply these results to study the extensive spike model in which the rank of the signal spike is assumed to grow linearly with the number of variables (and observations) in Section IV C.There we map out the phase diagram of the SVD as a joint function of signal strength and rank ratio.Intriguingly, we find an that certain transitions in the singular value spectrum of the data do not coincide with the detectability of the signal, as they do in the finite-rank model.Finally, in Section V we exploit the expressions for singular vector overlaps in order to derive optimal rotationally invariant estimators for both data denoising (Section V A) and signal-covariance estimation (Section V B).We find that unlike in the finiterank model, in the extensive-rank model signal singular values interact nontrivially to generate data singular values.Therefore, we obtain more sophisticated optimal data denoisers and signal-covariance estimators that take into account these nontrivial extensive-rank interactions, and which furthermore significantly outperform their simpler, non-interacting, finite-rank counterparts.
We note that during the preparation of this manuscript, a set of partially overlapping results appeared on a pre-print server [44].In our discussion section, we describe the relation and additional contributions of our work relative to that of [44].

II. A MOTIVATION: INADEQUACIES OF THE FINITE-RANK SPIKED MATRIX MODEL
Let Y be an N 1 × N 2 signal matrix.We can think of each of the N 1 rows of Y as a variable, and each of the N 2 columns as a distinct experimental condition or time point, with Y ij representing the clean, uncorrupted value of variable i under condition j.Now consider a noisy data matrix R, given by where X is a random N 1 × N 2 additive noise matrix.X is assumed to have well-defined limiting singular value spectrum in the limit of large N 1 with fixed aspect ratio, c = N1 /N2.Furthermore we assume the probability distribution P X (X) over X is rotationally invariant, meaning where O 1 and O 2 are orthogonal matrices of size N 1 × N 1 and N 2 × N 2 respectively.These assumptions guarantee the asymptotic freeness of X and Y .For a general definition of freeness, see [40].
We are interested in understanding the relationship between the singular value decomposition (SVD) of the data matrix R and the SVD of the clean signal matrix Y .In general we will write the SVD of the data as where each Ûa , for a = 1, 2, is an N a × N a matrix with orthonormal columns, ûak for k = 1...N a , and Ŝ is a diagonal N 1 × N 2 matrix with ŝk along the diagonal.
As a motivating example, we will study a version of the spiked matrix model [20][21][22] in which the signal matrix Y is given by where each U a , for a = 1, 2, is an N a × K matrix with orthonormal columns, u ak for k = 1...N a , and s is the signal strength.This signal model can be thought of as a rank K spike of strength s in that its singular value spectrum has K singular values all equal to s.
In the finite-rank setting, where K remains finite as N 1 , N 2 → ∞, there is a signal-detectability phase transition [20,22] in the singular value structure of the data matrix R. For s < s crit , where s crit is a critical signal strength that depends on the singular value spectrum of the noise matrix X, the entire signal in Y is swamped by the additive noise X and cannot be seen in the data R.More precisely, in the large size limit, when s < s crit the singular value spectrum of the data R is identical to the singular value spectrum of the noise X.Furthermore, no left (right) singular vector of the data matrix R has an O(1) overlap with the K-dimensional signal subspace corresponding to the column space of U 1 (U 2 ).However, for s > s crit the singular value spectrum of the data R is now not only composed of a noise bulk, identical to the spectrum of X, as before, but also acquires K outlier singular values all equal to ŝ.The location of the data spike at ŝ, occurs at a slightly larger value than the signal spike at s.This reflects singular-value inflation in the data R relative to the signal Y , due to the addition of noise X.Furthermore, each singular vector of the data R corresponding to an outlier singular value acquires a nontrivial O(1) overlap with the K dimensional signal subspace of Y even in the asymptotic limit N 1 , N 2 → ∞.
The location of the outlier data singular values and their corresponding singular-vector overlaps with the signal subspace have been calculated for finite K and general rotationally invariant noise matrices X [22].In the special case where the elements of X are i.i.d.Gaussian, explicit formulas can be derived (see Appendix A for a review).This signal-detectability phase transition in the finite-rank spiked model is depicted in Fig. 1 for an i.i.d.Gaussian noise matrix X.
Notably, according to the finite-rank theory, the K spikes do not interact.More precisely, above the critical signal strength, in the large-size limit, the K identical singular values of Y are all predicted to map to K identical outlier singular values of the data matrix R. Furthermore, the overlaps of the K corresponding data singular vectors with the signal subspace are predicted to be identical and completely independent of the finite value of K (see [45] however, for finite-size fluctuations in the The color plot in the bottom panel shows the singular value spectrum of the spiked matrix model given by the finite-rank theory in the asymptotic limit with aspect ratio, c = N 1 N 2 = 0.7 (see Appendix A for formulas).The singular value of the data matrix, R, is in the x-axis, and the strength of the single non-zero singular value of the signal matrix, Y , is in the y-axis.The "bulk" spectrum of the data is identical to the spectrum of the noise matrix, X.The bounds of that spectrum are the vertical dashed grey lines.Above the critical signal, scrit = c 1/4 (black horizontal), the data has an outlier singular value shown as a solid red curve.The top two panels show theory curves corresponding to two horizontal slices, with s = 0.85, 1.5, together with a histogram of singular values each of a single instantiation with N2 = 2000.The top panel has a single outlier eigenvalue very close to the theory prediction.The panel below shows a data spectrum that is indistinguishable from noise.B. The overlap of the top left singular vector of the data with the left singular subspace of the signal, given by the finite-rank theory.The overlap becomes non-zero at exactly the same critical signal, scrit, at which an outlier singular value appears in the data.X is Gaussian i.i.d. with variance 1 /N 2 throughout.
square-symmetric spiked covariance model).More generally, if the signal Y consists of K different rank 1 spikes each with a unique signal strength s l for l = 1, . . ., K, the corresponding location of the data spike ŝl can be computed by inserting each s l into a single local singular value inflation function ŝ(s) (depicted in Fig. 1), without considering the location of any other signal spike s l ′ for l ′ ̸ = l.In this precise sense, at finite K the spikes do not interact; one need not consider the position of any other signal spikes to compute how any one signal spike is inflated to a data spike.The same non-interacting picture is true for singular vector overlaps (Fig. 1B).
This lack of interaction between different spikes in the signal as they are corrupted to generate data spikes, allows optimal denoising operations based on the finiterank theory to be remarkably simple.For example, estimators for both data denoising [23,28,29], which corresponds to trying to directly estimate the signal Y given the corrupted data R, and covariance estimation [46], which corresponds to estimating the true covariance matrix C = Y Y T from the data R, both involve applying a single shrinkage function, that non-linearly modifies each data singular value of R in a manner that acts independently of any other singular value.This shrinkage function, applied to each data singular value ŝ, in a sense optimally undoes the singular-value inflation s → ŝ and compensates for the singular-vector rotation u a → ûa which arise in going from signal Y to data R = Y + X.Moreover, the reason the shrinkage can act independently on each data singular value is directly related to the property of the finite-rank theory that each signal singular value is inflated independently through the same inflation function, while each signal singular vector is rotated independently through the same random rotation.
In this work, however, we find that the assumptions and resulting consequences of the finite-rank theory may constitute a significant limitation for the practical application of this model to both explain the properties of noise corrupted data, as well as to optimally denoise such data.To illustrate, we test the finite-rank theory for various values of K, with N 1 and N 2 fixed.In Figure 2 we show simulation results in which we find substantial deviations between simulations and finite-rank theory predictions, for both the location of the leading data singular value outlier, and the data-signal singular-vector overlap, for K as small as 10 with N 1 = 1000.Thus, even for moderate numbers of spikes and relatively large matrices, the finite-rank theory cannot explain the SVD of the data well, (though as mentioned above, see [45] for finite-size fluctuations in the square-symmetric case).As a consequence, as we will show below, typical denoising techniques, which depend crucially on the predicted singular structure of the data, perform poorly, even for moderate K.
Thus, motivated by the search for better denoisers of higher-rank data, we extend the finite-rank theory to a completely different asymptotic limit of extensive rank, in which the rank K of the data is proportional to the number of variables N 1 as both become large.We show that our extensive-rank theory both: (1) more accurately explains the SVD of large data matrices of even moderate rank, and (2) provides better denoisers in these cases, than the finite-rank theory.And interestingly, our extensive-rank theory reveals qualitatively new phenomena that do not occur at finite-rank, including highly nontrivial interactions between the extensive number of signal singular values, as they become corrupted to generate data singular values, under additive noise.

III. MATHEMATICAL PRELIMINARIES
We review some basic concepts from random matrix theory and introduce notation.Let M be an N × N Hermitian matrix M .We denote by G M (z) the matrix resolvent of M : We define the normalized trace operator τ as The Stieltjes transform g M (z) is the normalized trace of G M (z): In this work, we will be interested in the singular values and vectors of rectangular matrices.In order to apply Hermitian matrix methods to a rectangular matrix R ∈ R N1×N2 , we will work with its Hermitianization, which we denote with boldface throughout: which is an N × N Hermitian matrix, with The eigenvalues and eigenvectors of R can be written , where s is a singular value of R, and u 1 , u 2 are the corresponding left and right singular vectors.This will allow us to extract information about the singular value decomposition of a rectangular matrix R from the eigen-decomposition of the Hermitian matrix R.
Hermitianization leads naturally to a Hermitian block resolvent, which is a function of two complex scalars z 1 and z 2 , rather than one: where z = (z 1 , z 2 ) is a complex vector.This block resolvent can be computed explicitly, with each block written For the same signal model as in panel (C), we plot the overlap of the leading data left singular vector with the leading signal singular vector as function of signal strength s, for different K and for the finite-rank theory (black).We see that the finite-rank theory incorrectly estimates both the singular values and singular vectors of signals of even moderate rank K. See Appendix A for finite-rank theory formulas.
in terms of a standard square-matrix resolvent: Analogously, we define the block Stieltjes transform g R (z) as the 2-element complex vector consisting of the normalized traces of each diagonal block of G R : Here we have introduced notation for the block-wise normalized traces: where M aa is the ath diagonal block of size N a × N a .
Notationally, we write the block vectors and block matrices g R and G R in bold, while we indicate the component blocks in standard roman font, with the indices a, b for both scalar, g R a , and matrix G R ab blocks, with a, b ∈ {1, 2}.
We will also use the fact that the eigenvalues of RR T and R T R differ by exactly Each element g R a (z) can be written in terms of the corresponding singular value density: where ρ R a (s) denotes the singular value distribution of R, accounting for N a singular values.Note that for non-zero s with finite singular value density, ρ R 2 (s) = cρ R 1 (s).The special case in which the two arguments are equal, z 1 = z 2 = z, will be important and so we abbreviate: g R (z) := g R (z, z).
We can write an inversion relation for the singular value densities using the Sokhotski-Plemelj theorem, which states, lim Finally, we define the block R-transform: where t ∈ C 2 is in the range of g R ; we denote by g R −1 the functional inverse of the block Stieltjes transform g R , satisfying g R −1 g R (z) = z; and 1 /t is the componentwise multiplicative inverse of t.
The block R-transform will arise naturally in our calculation of the subordination relation for the sum of free rectangular matrices, R = Y + X, and as we shall verify, it is additive for independent, rotationally invariant matrices:

IV. THE SINGULAR VALUE DECOMPOSITION OF SUMS OF RECTANGULAR MATRICES
In this section, we characterize how an additive noise matrix X deforms the singular values and vectors of a signal matrix Y to generate singular values and vectors of the data matrix R = Y + X (see (1) and following text).We consider general signal matrices of the form where each U a is an N a × N a orthonormal matrix (a = 1, 2), and S is N 1 × N 2 diagonal matrix.
We begin by deriving an asymptotically exact subordination formula relating the block resolvents ( 9) of R and Y in the limit N 1 , N 2 → ∞ with the aspect ratio c = N1 /N2 fixed.From this, we extract both the singular value spectrum of R, as well as the overlaps between the singular vectors of R and those of the signal matrix, Y .

A. A Subordination Relation for the Sum of Rectangular Matrices
Exploiting the rotational invariance of P X (X), we first calculate the block resolvent of R as an expectation over arbitrary rotations of the noise X.Thus, we write R where O a are Haar-distributed orthogonal N a ×N a matrices.We can write the Hermitianization (7) of R in terms of the Hermitianized X and Y : where we have written The main result of this section is the following subordination relation for the expectation of the block resolvent G R , taken over the random block-orthogonal matrix Ō.
As mentioned above, this notation refers to the special case in which the argument to g R is the two-dimensional complex vector with equal arguments, z 1 = z 2 = z.Note that the argument to G Y , by a slight abuse of notation, is the vector z − RR X a g R (z) for a = 1, 2. In Appendix B we present the detailed derivation for this case, which is sufficient for computing the singular values and associated singular-vector overlaps.We provide a sketch of the calculation here.The general case follows.
We first write the analog of a partition function, and observe that we can write the desired matrix inverse as a derivative of the corresponding free-energy: We would like to average this over the block-orthogonal matrix Ō, yielding a "quenched" average free-energy.In Appendix B, we show that in the large N limit, the quenched and annealed averages are equivalent.In short, viewing log Z R as a function of Ō, we find it has Lipschitz constant proportional to 1 / √ N , and then use the concentration of measure of the orthogonal group, SO (N ), with additional concentration inequalities to show that: We can therefore calculate our desired block resolvent as We proceed by writing the determinant as a Gaussian integral, and then we substitute R = Y + ŌX ŌT , extract terms that do not depend on Ō, and take the expectation of the terms that do, which yields an intermediate integral, This integral is analogous to the Harish-Chandra-Itzykson-Zuber (HCIZ) or spherical integral, which appears in the calculation of the subordination relation for sums of square-symmetric matrices [33,34,36].We compute this "block-spherical" integral asymptotically in Appendix C, and highlight key points of the calculation here.
First, we observe the key difference between our calculation and the square-symmetric case.In the square symmetric case, the expectation is over a single Haardistributed orthogonal matrix that rotates v arbitrarily, and so the expectation depends only on the norm of v.In our rectangular case, however, Ō has two blocks, and they rotate the N 1 -and N 2 -dimensional blocks of v separately, so that I X (v) depends on the norms of each of these two blocks.Therefore, we define the 2-component vector, t with components We calculate the expectation (24) by performing an integral over an arbitrary N -dimensional vector, while enforcing block-wise norm constraints using the Fourier representation of the delta function, and introducing integration variables, q 1 and q 2 .
To compute the integral, we make a saddle-point approximation in the asymptotic limit of large N .Appealingly, we find the saddle-point conditions are of the form: That is, the block Stieltjes transform, g X (q * ) = t, arises naturally, and the saddle-point of the block-spherical integral ( 24) is its functional inverse evaluated at the vector of block-wise norms of v.
Inserting the saddle-point solution, we find that asymptotically where, for a neighborhood of values of t around 0, the saddle-point free energy itself, H X (t), has gradient with elements proportional to the block R-transform (14): Thus, the block R-transform arises via the anti-derivative of the logarithm of the block-spherical integral, analogously to the regular R-transform in the case of squaresymmetric matrices.
Note that given the definition in 24, it is straightforward to see that I R (v) = I Y (v)I X (v), and thus H R (t) = H Y (t) + H X (t).Therefore, we have established the additivity of the block R-transform as well.
Continuing with the derivation of the subordination relation (18), we next substitute the result for I X (v) back into the Gaussian integral over v (23), and then introduce another pair of integration variables, t, in order to decouple v from its block-wise norms, t.Performing the Gaussian integral we find with Note that the block resolvent of Y arises here naturally as a function of the two-element vector, z − t, despite the fact that we set out to find G R evaluated at the point (z, z).
The integrals over t and t yield an additional pair of saddle-point conditions.The first requires t * = R X (t * ) and combining with the second gives We have thus found the desired annealed free energy, 2 log We next take the derivative with respect to Y (see Appendix B for a more careful treatment), which gives Finally to find t * , we take the block-wise normalized traces to find g R (z) = g R z − R X (t * ) = t * , and that completes the derivation of the block resolvent subordination relation (18).
We note that the saddle-point condition (31) turns out to be the subordination relation for the block Stieltjes transform: Note that while g R is evaluated at the scalar point (z, z), the argument to g Y is the vector subordination function ζ ≡ z − R X g R (z) whose two components are distinct in general.
The singular value spectrum of the sum of rectangular matrices can thus be obtained by first finding the block Stieltjes transform, either by employing the additivity of the block R-transform or by solving the subordination relation (33), and then using the inversion relation (13).

B. Deformation of Singular Vectors Due to Additive Noise
Turning now to the singular vectors of the data matrix R = Û1 Ŝ Û2 T = Y +X, we quantify the effect of the noise, X, on the signal, Y = U 1 SU T 2 , via the matrix of squared overlaps between the clean singular vectors of the signal, U a with a = 1, 2 for left and right, respectively, and the noise-corrupted singular vectors of the data, Ûa , written In the noiseless case X = 0, one has Û T a U a 2 = I Na , signifying perfect correspondence between signal and data singular vectors.In the presence of substantial noise, the overlaps of a signal singular vector are generically distributed over order N a data singular vectors and are of order 1 /Na, therefore we define the rescaled expected square overlap between a given singular vector, ûa of R with corresponding singular value ŝ, and a given singular vector, u a of Y , with corresponding singular value s, where once again a = 1, 2 for left and right singular vectors, respectively: To see how to obtain the expected square overlaps from the block resolvent, G R (z), we write each of the diagonal blocks, G aa (z) R (9), in terms of their eigendecomposition, and multiply on both sides by a "target" singular vector of Y , say u a with associated singular value s: If we choose z = ŝ − iη where ρ R a (ŝ) ∼ O(1), with N −1 a ≪ η ≪ 1, and take the imaginary part, then we get a weighted average of the square overlaps of a macroscopic number of singular vectors of R, ûak , that have singular values close to ŝ, with the target singular vector u a , each weighted by π /2ρ R a (ŝ k ).If we first take the limit of large N a and then take η → 0 we obtain the expectation: Now, we use the subordination relation (18) to replace the resolvent of R with the resolvent of where we have written the 2-component vector These expressions can be written in terms of the real and imaginary parts of the block R-transform of the noise X.In the following section we provide simplified expressions for the important case of Gaussian noise.

Arbitrary Signal with Gaussian Noise
We show in Appendix D that the block R-transform of an N 1 × N 2 (with c = N1 N2 ) Gaussian matrix with i.i.d.entries of variance σ 2 /N2 is: Note that from the definition of the R-transform, one can find that R A 2 (t)t 2 = cR A 1 (t)t 1 , for any rectangular A with aspect ratio c, and (39) is the only pair of linear functions of t that satisfies this constraint.
We substitute (39) into the block Stieltjes transform subordination relation yielding g , and then use the identity g R 2 (z) = cg R 1 (z) + 1−c z (for arbitrary rectangular R, the spectra of RR T and R T R differ only by a set of 0 eigenvalues).Then using the definition g , we arrive at: with x/N 2 with σ 2 x = 1.Top three panels show singular vector overlaps for 3 horizontal slices associated with 3 fixed "target" signal singular values s = 1.5, 2.5, 3.5, for 10 realizations of random matrices with N1 = 1500 and N2 = 1000.Each grey dot denotes an overlap between a left singular vector of R with singular value ŝ (position on x-axis) with the left singular vector of Y with singular value closest to s. Red dots reflect binning the singular values of R from all 10 realizations, with number of bins set to √ N 2 giving bin width ≈ 0.13.Blue is the theoretical prediction from Φ1(ŝ, s) in (43a).Note that as the signal singular value s increases, Φ1(ŝ, s) as a function of ŝ becomes more concentrated about a value larger than s.This reflects the fact that singular vector structure in the signal Y at singular value s is mapped to singular vector structure in the data R at larger singular values ŝ, due to singular value inflation under the addition of noise X.This is a self-consistency equation for the block Stieltjes transform of R, g R (z), that depends on the noise variance σ 2 , the aspect ratio c, and the standard Stieltjes transform of the signal covariance, g Y Y T (z).
Once this equation is solved, the singular vector overlaps can be obtained as well.We introduce notation for the real and imaginary parts of the block Stieltjes transform: where we assume that the spectral density at ŝ is finite.Then we insert this into (39) to get the real and imaginary parts of the block R-transform of X.After defining, for notational ease, ν(z we can finally simplify the overlaps (38) for the case of Gaussian noise: where we have Formula (43a) is confirmed in Figure 3, which shows the left singular vector overlaps between data and signal, when the signal, Y , is Gaussian as well.The bottom colorplot can be thought of as an input-output map for the singular value structure under additive noise.It shows that a signal singular vector associated with a given singular value s undergoes, loosely speaking, both "diffusion" and "inflation", aligning partially with data singular vectors across a range of singular values with a peak associated with larger singular values, ŝ > s.In the upper three panels we observe that individual overlaps are not self-averaging -a smooth overlap function emerges only when one averages either over many overlaps within a range of singular values, or over many instantiations.
We stress that these formulas for the overlap of data singular vectors with signal singular vectors do not depend directly on the unobserved signal Y .Rather, they depend only on the noise variance and the block Stieltjes transform, g R 1 (z), of the noisy data matrix, R. Furthermore, g R 1 (z) can be estimated empirically via kernel methods for the empirical spectral density and its Hilbert transform [32,35,36].This suggests that significant information about the structure of the unobserved extensive signal can be inferred from noisy empirical data, and this will lay the foundation for the optimal estimators derived below.

C. SVD of the Extensive Spike Model
We now return to the spiked matrix model R = Y +X, with signal Y = sU 1 U T 2 , where s is a scalar, U a are N a ×K matrices with orthogonal columns.But now we assume the rank of the spike grows linearly with the number of rows at a fixed rank ratio, b, i.e.K = bN 1 , while the aspect ratio c = N1 /N2 is fixed as before.We will assume the elements of the noise matrix X are i.i.d.Gaussian: X ij ∼ N 0, 1 N2 .In the following we first discuss the singular values, and then the singular vectors of the extensive-rank model.

Singular Value Spectrum of the Extensive Spike Model
Y Y T has K eigenvalues equal to s 2 and N 1 − K zero eigenvalues.Its Stieltjes transform can therefore be found to be We can now make use of the self-consistency equation for g R 1 (z), (40).Momentarily writing g in place of g R 1 (z) and simplifying, we find where we write ζ 2 = z − cσ 2 g as above.This is a quartic polynomial for g = g R 1 (z).We solve this numerically for z near the real line in order to find the density of singular values of R (see Appendix E for the polynomial coefficients and details of numerical solution).For strong signal s, the spectrum in the extensive case differs from the finite rank case most clearly in that singular values reflecting the signal do not concentrate at a single data singular value.Rather, (see Figure 4 top) for sufficiently strong signal s, the presence of noise blurs the signal singular values into a continuous bulk that is disconnected from the noise bulk.This signal bulk appears near the single outlier predicted by the finite-rank theory, but has significant spread.
At very weak signals s there is a single, unimodal bulk spectrum, just as in the finite-rank setting, but in contrast, these weak signals make their presence felt by extending the leading edge of the bulk beyond the edge of the spectrum predicted by the finite-rank theory, even when the signal strength s is below the critical signal strength s crit predicted by finite-rank model (Figure 4 3rd panel).
At intermediate signal strength s, the singular value distribution exhibits a connected bimodal regime not present in the finite-rank model (Figure 4   Thus, as s increases, we see two qualitative changes: first a crossover from a single unimodal bulk to a single bimodal bulk, and then from one connected bulk to two disconnected bulks.This final splitting of the signal bulk from the noise bulk is a phase transition as the block Stieltjes transform goes from having a single branch cut to two disjoint branch cuts.This transition happens at significantly larger signal s than the signal-detectability phase transition in the finite-rank regime (Figure 4 bottom).
In the limit of low rank (small b) the spectrum approaches the finite-rank theory as expected (Figure S1).Interestingly, we find that as a function of rank ratio b, there are three distinct regimes.For sufficiently strong signals (Figure 5A), the signal bulk remains disjoint from the noise bulk for all b.For intermediate signals (Figure 5B), the two bulks merge but the spectrum remains bimodal for all b.Finally, for weak signals (Figure 5C), there is a single connected bulk for all b.

Singular Vector Subspace Overlap in the Extensive Spike Model
We now turn to the singular vectors of the extensive spike model.For simplicity we focus on the left-singular vectors.Since the K non-zero singular values of the signal are degenerate, the only meaningful overlap to study is a subspace overlap, or the projection of the data singular vectors, û1k , onto the entire subspace defined by U 1 .Therefore we compute Since this is an extensive sum, we expect that it is selfaveraging, and should be well-predicted by bΦ 1 (ŝ k , s), where Φ is defined in (43).
After solving (46) for the block Stieltjes transform of R, we insert the result in (43) to find Φ 1 (ŝ, s).In Figure S2 we return to the simulation results presented in Figure 2 and show that the extensive-rank theory predicts both the leading outlier singular value and the subspace overlap of the corresponding singular vector, even when the finite-rank theory fails.
In Figure 6 we explore the phase diagram of the extensive-rank model and successfully confirm the predictions of the extensive-rank theory for singular vector overlaps by comparing these predictions to numerical simulations.We observe that the extensive spike model exhibits a singular value inflation in its singular-vector overlaps.Not only are data singular values larger than the corresponding signal singular values, just as in the finite-rank model, but also the singular-vector overlap peaks at the upper edge of the data singular values.
Figure 7 summarizes the results of this section with a two-dimensional phase diagram in the signal-strength vs rank (s-b) plane.It shows the boundaries between three regimes of the singular value spectrum: unimodal, bimodal, and disconnected.Additionally, the color map shows the average excess signal subspace overlap of the singular vectors associated with the top b fraction of singular values.Since by chance, any random vector is expected to have an overlap b with the signal subspace, we compute the excess overlap as Φ(ŝ, s) = b(Φ(ŝ, s) − 1).We then average the excess overlap across the singular vectors associated with the top b singular values, that is: where t is given by b = ∞ t ρ R 1 (ŝ)dŝ.Importantly, the figure demonstrates that in contrast to the finite-rank setting, the transitions in the data singular value spectrum of the data do not coincide with the detectability of the signal.Rather, the alignment of the data singular vectors with the signal subspace is a smooth function of both signal strength s and rank ratio b, and nonzero excess overlap can occur even in the unimodal regime.

V. OPTIMAL ROTATIONALLY INVARIANT ESTIMATORS
We now consider two estimation problems given noisy observations, R = Y + X: 1) denoising R in order to optimally reconstruct Y , and 2) estimation of the true signal covariance, C = Y Y T .We focus on the case where both signal Y and noise X rotationally invariant (P Y (M ) = P Y (O 1 M O 2 ) for arbitrary orthogonal matrices O 1 , O 2 , and similarly for X).In this setting it is natural to consider rotationally invariant estimators F that transform consistently with rotations of the data: [47,48].Such F can only alter the singular values of R while leaving the singular vectors unchanged.More generally, when Y is not rotationally invariant, our results yield the best estimator that only modifies singular values of R.
Our problem thus reduces to determining optimal shrinkage functions for the singular values.In the finiterank case, distinct singular values and their associated singular vectors of Y respond independently to noise, so the optimal shrinkage of ŝ depends only on ŝ [23,29,46].As we show below, this is no longer the case in the extensive-rank regime.The optimal shrinkage for each singular value generally depends on the entire data singular value spectrum.

A. Denoising Rectangular Data
We first derive a minimal mean-square error (MMSE) denoiser to reconstruct the rotationally invariant signal, Y , from the noisy data, R.Under the assumption of rotational invariance, the denoised matrix is constrained to have the same singular vectors as the data R, and thus takes the form Ỹ = Û1 ϕ Ŝ Û T 2 .The MSE can be written Minimizing with respect to ϕ(ŝ m ) gives the optimal shrinkage function: which appears to require knowledge of the very matrix being estimated, namely Y .However, in the large size limit it is possible to estimate ϕ * (ŝ m ) via the resolvent G R (z).We first write As z is brought toward the singular value ŝm the sum is increasingly dominated by the contribution from We next apply the subordination relation (18), yielding a product of Y with a Y -resolvent, whose trace is readily FIG. 7. Singular Vector Overlaps Disregard Singular Value Phases Two-dimensional phase diagram shows the average "excess' subspace overlap (48) of the top b fraction of data singular vectors with a signal of strength s (y-axis) and rank ratio b (x-axis).The (lower) grey line separates the unimodal and bimodal regimes of the SV spectrum, and the (upper) black line separates the connected phase from the disconnected phase.The singular vector overlap does not respect the boundaries of the SV spectrum.The signal impacts the data via significant overlaps with the signal subspace well below the boundary between unimodal and bimodal regimes.Aspect ratio c = 0.7 found: where ζ a (z) = z − R X a g R (z) , and we have used the identity τ [CG C (z)] = zg C (z) − 1 for arbitrary symmetric C.
which depends only on the block Stieltjes transform of the empirical data matrix, R, and the block R-transform of the noise, X. Importantly, the dependence on the unknown signal Y is gone, making this formula amenable to practical applications, at least when the noise distribution of X is known.
For i.i.d.Gaussian noise with known variance, σ 2 N2 , we have R X 1 (g) = σ 2 g 2 and the general relation g 2 (z) = cg 1 (z) + 1−c z , so (54) simplifies considerably.Writing the real and imaginary parts, g R , we obtain the following simple expression depending only on the variance of the noise, and the Hilbert transform of the observed data spectral density: This expression for the Gaussian case was derived previously in [41].
Figure 8 compares (55) to the optimal shrinkage found based on the finite-rank theory [29].The extensiverank formulas recover many more significant singular values (Figure 8A).Moreover the mean-square error of Y * = Û1 ϕ * Ŝ Û2 is superior to that of the finite-rank denoiser, steadily improving as a function of the signal rank, while the finite-rank denoiser worsens (Figure 8B).In fact, for our simulations with N 1 = 1000 and N 2 = 500, the extensive-rank denoiser out performed the finite-rank denoiser for all K > 5, across the range of signal strengths tested.Finally, given an estimate of the noise variance σ 2 , we are able to numerically estimate g R 1 (ŝ) with kernel methods (Appendix F) and compute an empirical shrinkage function that is very close to the theoretical optimum (Figure 8C).

B. Estimating the Signal Covariance
We now derive an MMSE-optimal rotationally invariant estimator for the signal covariance, C = Y Y T .Just as in [31,33], and similarly to our results in the previous section, the optimal estimator is given by C * = Û1 , ψ * Ŝ Û T 1 , where: We observe that the top-left block of the square of the Hermitianization Y is given by C, and so Thus, we can calculate the optimal shrinkage function by the inversion relation (13): Now, we apply the subordination relation, Again, using the identity τ [CG C (z)] = zg C (z) − 1 for arbitrary symmetric C, we have We therefore conclude for general noise matrix, X: Once again, for i.i.d.Gaussian noise with known variance, σ 2 N2 , our estimator (60) simplifies considerably.Using the optimal shrinkage function found above for rectangular denoising, ϕ is the real part of g R 1 (ŝ), we finally obtain Just as in optimal data denoising, we find that given an estimate of the noise variance, σ 2 , the optimal shrinkage for covariance estimation depends only on the spectral density of R and its Hilbert transform, which can be estimated directly from data.
In Figure 8 we show the optimal shrinkage function for the extensive spike model, and demonstrate that it can be approximated given only an estimate of the noise variance and the empirical data matrix, R (Figure 8C).We find that the optimal singular value shrinkage of singular values derived for covariance estimation (61), ψ(ŝ), is substantially different than ϕ(ŝ) (55) obtained for denoising the rectangular signal (Figure 8C).The denoising shrinkage suppresses the noise more aggressively, but suppresses the signal singular values more as well.
Finally, we compare the shrinkage obtained from assuming a multiplicative form of noise instead of the additive spiked rectangular model studied here.In the finite-rank regime, the spiked rectangular model can be instead modeled as a multiplicative model with data arising form a spiked covariance.Concretely, the data in the multiplicative model is generated as R mult = √ C mult X, i.e. each column is sampled from a spiked covariance: C mult = Y Y T + I.In the finite-rank regime, with Gaussian noise, the two models yield identical spectra and covariance-eigenvector overlaps.The optimal shrinkage for covariance estimation for the multiplicative model for arbitrary C mult has previously been reported ( [31] Eq (13) for Gaussian noise and [33] Eq IV.8 for more general noise), and here we consider the impact of employing the multiplicative shrinkage formula on data generated from the additive spiked rectangular model.We observe (Figure 8D Top) that for small rank-ratio (b = 0.05) the two models give fairly similar eigenvalue distributions.Nevertheless, applying the optimal multiplicative shrinkage on the additive model data gives poor results: the shrinkage obtained is non-monotonic in the data eigenvalue (Figure 8D Bottom).Furthermore, the mean-square error in covariance estimation obtained with the multiplicative shrinkage worsens as a function of rank (Figure 8E).

VI. DISCUSSION
While one approach to estimation depends on prior information about the structure of the signal (such as sparsity of singular vectors for example), we have followed a line of work on rotationally invariant estimation that assumes there is no special basis for either the signal or the noise [47,48].In this approach, knowledge of the expected deformation of the singular value decomposition (SVD) of the data due to noise allows for the explicit calculation of optimal estimators.
In the case of finite-rank signals, where the impact of additive noise on singular values and vectors is known [21,22], formulas for optimal shrinkage for both denoising [23,28,29] and covariance estimation [46] have been found.For extensive-rank signals, however, while formulas for the singular value spectrum of the free sum of rectangular matrices are known [37,38,40], there are no prior results for the singular vectors of sums of generic rectangular matrices (though see [44] for contemporaneous results).
Even in the setting of square, Hermitian matrices, re-sults on eigenvectors of sums are relatively new [30,31].Recent work derived a subordination relation for the product of square symmetric matrices, and applied it to a "multiplicative" noise model in which each observation of high-dimensional data is drawn independently from some unknown, potentially extensive-rank, covariance matrix [33].In that context, knowledge of the overlaps of the data covariance with the unobserved population covariance is sufficient to enable the construction of an optimal rotationally invariant estimator [31,33,35,36].
We have derived analogous results for signals with additive noise: we have computed an asymptotically exact subordination relation for the block resolvent of the free sum of rectangular matrices, i.e. for the resolvent of the Hermitianization of the sum in terms of the resolvents of the Hermitianization of the summands.From the subordination relation, we derived the expected overlap between singular vectors of the sum and singular vectors of the summands.These overlaps quantify how singular vectors are deformed by additive noise.We have calculated separate optimal non-linear singular-value shrinkage expressions for signal denoising and for covariance estimation.Under the assumption of i.i.d.Gaussian noise these shrinkage functions depend only on the noise variance and the empirical data singular value density, which we have shown can be estimated by kernel methods.
We have applied our results in order to study the extensive spike model.We found a significant improvement in estimating signals with even fairly low rank-ratios, over methods that are based on the finite-rank theory.Our results may have significant impact on ongoing research questions around spiked matrix models [24][25][26][27], such as the question of the detectability of spikes or optimal estimates for the number of spikes, for example.
The subordination relation derived here is closely related to operator-valued free probability, which provides a systematic calculus for block matrices with orthogonally/unitarily invariant blocks, such as the 2 × 2-block Hermitianizations Y , X, R. In that approach, spectral properties of a matrix are encoded via 2 × 2 operatorvalued Stieltjes and R transforms -whose diagonal elements correspond exactly to the block Stieltjes and Rtransforms defined here.A fundamental result in this context is an additive subordination relation for the operator-valued Stieltjes transform, which is an identical formula to (33) [40].
We comment briefly on our derivation of the block resolvent subordination, which is summarized in Section IV A and treated fully in Appendix B. First, we note that previous work derived resolvent subordination relations for square symmetric matrices using the replica method [33,34,36].These works assume the replicas decouple which results in a calculation that is equivalent to computing the annealed free energy.Here we used concentration of measure arguments to prove that the annealed approximation is asymptotically correct (Appendix B).
In the course of our derivation of the subordination relation we encountered the expectation over arbitrary block-orthogonal rotations of the Hermitianization of the noise matrix (eq (24) in the main text, and Appendix C), which we called a "block spherical integral".As noted in the main text, this integral plays an analogous role to the HCIZ spherical integral which appears in the derivation of the subordination relation of square symmetric matrices [36].In that setting, the logarithm of the rank-1 spherical integral yields the antiderivative of the standard R-transform for square symmetric matrices [49].To our knowledge, the particular block spherical integral in our work (Appendix C) has not been studied previously.In fact, it is very closely related to the rectangular spherical integral, whose logarithm is the antiderivative of the so-called rectangular R-transform [37].In our setting, two such rectangular spherical integrals are coupled, and the logarithm of the result is the antiderivative of the block R-transform (14) (up to component-wise proportionality constants related to the aspect ratio).While the rectangular R-transform is additive, its relationship to familiar RMT objects such as the Stieltjes transform is quite involved.In contrast, the block R-transform that arises from the block spherical integral is a natural extension of the scalar R-transform, with a simple definition in terms of the functional inverse of the block Stieltjes transform.Furthermore, as mentioned above, the block R-transform is essentially a form of the more general operator R-transform from operator-valued free probability.This formulation is appealing because it provides a direct link between a new class of spherical integrals and operator-valued free probability.
We stress that even under the assumption of Gaussian i.i.d.noise, the optimal estimators we obtained in V are not quite bona fide empirical estimators, as they depend on an estimate of the noise variance.This may not be a large obstacle, but we leave it for future work.We do note that while under the assumption of finite-rank signals, appropriate noise estimates can be obtained straightforwardly for example from the median data singular value (see [28] for example), this is no longer the case in the extensive regime that we study.In empirical contexts in which one has access to multiple noisy instantiations of the same underlying signal, however, a robust estimate of the noise variance may be readily available.
Other recent work has also studied estimation problems in the extensive-rank regime.[50] studied the distribution of pairwise correlations in the extensive regime.[41] studied optimal denoising under a known, factorized extensive-rank prior, and arrived at the same shrinkage function we find for the special case of Gaussian i.i.d.noise (55).References [43] and [42] studied both denoising and matrix factorization (dictionary learning) with known, extensive-rank prior.
Lastly, during the writing of this manuscript, the preprint [44] presented work partially overlapping with ours.They derived the subordination relation for the resolvent of Hermitianizations as well as the optimal rotationally invariant data denoiser, and additionally establish a relationship between the rectangular spherical integral and the asymptotic mutual information between data and signal.However, unlike our work, this contemporaneous work: (1) does not calculate the optimal estimator of the signal covariance; (2) does not explore the phase diagram of extensive spike model and its associated conceptual insights about the decoupling of singular value phases from singular vector detectability that occurs at extensive but not finite rank; (3) does not extensively numerically explore the inaccuracy and inferior data-denoising and signal-estimation performance of the finite-rank model compared to the extensive-rank model, a key motivation for extensive rank theory; (4) at a tech-nical level [44] follows the approach of [33] using a decoupled replica approach yielding an annealed approximation, whereas we prove the annealed approximation is accurate using results from concentration of measure; (5) also at a technical level [44] employs the rectangular spherical integral resulting in rectangular R-transforms, whereas we introduce the block spherical integral yielding the block R-transform, thereby allowing us to obtain simpler formulas.
We close by noting that our results for optimal estimators depend on the assumption of rotational (orthogonal) invariance.Extending this work to derive estimators for extensive-rank signals with structured priors is an important topic for future study.The rectangular subordination relation and the resulting formulas for singular vector distortion due to additive noise hold for arbitrary signal matrices.These may prove to be of fundamental importance from the perspective of signal estimation in the regime of high-dimensional statistics, as any attempt to estimate the structure of a signal in the presence of noise must overcome both the distortion of the signal's singular value spectrum and the deformation of the signal's singular vectors.depends only on, R X , the block R-transform of the noise matrix X, and the block-wise norms of the vector, v. Introducing the two-element vector, t whose a th entry is 1 Na ∥v a ∥ 2 , we have where in anticipation of a saddle-point condition below, we write H X (t) as a contour integral within C 2 from 0 to t: where and ⊙ is element-wise product.
This gives In order to decouple v from t, we introduce integration variables and Fourier expressions for the delta-function constraints δ N a t a − ∥v a ∥ 2 : We now have where we have introduced the diagonal N ×N matrix, T , which has t1 along the first N 1 diagonal elements followed by t2 along the remaining N 2 elements.
The integral over v is a Gaussian integral with inverse covariance zI − T − Y (which is positive-definite for sufficiently large z).Crucially, this covariance is exactly, G Y z − t the block resolvent of Y with a shifted argument, z − t.Note that the block resolvent, as a function of two complex numbers, has emerged here in our calculation.
The result is the inverse square-root of the determinant: Thus, ignoring proportionality constants we have with P X,Y t, t := − β 1 t 1 t1 − β 2 t 2 t2 + H X (t) where, we remind the reader, β a := Na /N.
We expect this integral to concentrate around its saddle point in the large-size limit.We find that taking the derivative of P X,Y t, t with respect to t a gives the following appealing saddle-point condition for t: t = R X (t). (B15) In order to take the derivatives with respect to ta , we find it helpful to write out Then we find that taking the derivative of (B14) gives the final saddle-point condition: We can write this concisely in vector notation: Thus, asymptotically, the desired free energy is Informally, to derive the matrix subordination relation, we differentiate F R , which, from (B14), yields zI − T − Y for all Hermitian test matrices A with bounded spectrum, or as written informally in the main text, For sufficiently large z, the matrix in the determinant is always positive, and this is a smooth function on SO (N ) bounded above and below by constants c ± := log z ± ∥A∥ op + ∥B∥ op , where ∥•∥ op is the operator norm.For such z, we prove the following Lipschitz bound below (see Section B 0 a for proof): where µ := π ∥B∥ op e −c− and ∥•∥ 2 is the Euclidean norm In particular, we will be interested in the case that the orthogonal matrix O is block diagonal with blocks O a ∈ SO(N a ), and thus O is a member of the product space SO(N 1 )×SO(N 2 ) with N 1 +N 2 = N .The group SO (N a ) with Haar measure and Hilbert-Schmidt metric obeys a logarithmic Sobolev inequality with constant 4  Na−2 , so the product space has Sobolev constant max a which shows that the limiting annealed average is less than or equal to the limiting quenched average.We could obtain a lower bound via a similar argument, but we have directly via Jensen's inequality that the quenched average is less than or equal to the annealed average, To prove B23, note that the gradient of f (B22) is where, as above, M = z − A + OBO T .Thus, the ordinary Euclidean norm of the gradient is

FIG. 1 .
FIG. 1. Background: Signal-Detectability Phase Transition in the Finite-Rank Spiked Matrix Model.A.The color plot in the bottom panel shows the singular value spectrum of the spiked matrix model given by the finite-rank theory in the asymptotic limit with aspect ratio, c = N 1 N 2 = 0.7 (see Appendix A for formulas).The singular value of the data matrix, R, is in the x-axis, and the strength of the single non-zero singular value of the signal matrix, Y , is in the y-axis.The "bulk" spectrum of the data is identical to the spectrum of the noise matrix, X.The bounds of that spectrum are the vertical dashed grey lines.Above the critical signal, scrit = c 1/4 (black horizontal), the data has an outlier singular value shown as a solid red curve.The top two panels show theory curves corresponding to two horizontal slices, with s = 0.85, 1.5, together with a histogram of singular values each of a single instantiation with N2 = 2000.The top panel has a single outlier eigenvalue very close to the theory prediction.The panel below shows a data spectrum that is indistinguishable from noise.B. The overlap of the top left singular vector of the data with the left singular subspace of the signal, given by the finite-rank theory.The overlap becomes non-zero at exactly the same critical signal, scrit, at which an outlier singular value appears in the data.X is Gaussian i.i.d. with variance 1 /N 2 throughout.

FIG. 2 .
FIG. 2. Finite-Rank Theory FailsTo Capture the Singular Value Structure of the Spiked Rectangular Matrix Model. A. Singular-value inflation, i.e. leading data singular value, ŝ1, as a function of signal singular value, s1, for spikes of various ranks K. Black shows the finite-rank theory (which is independent of the rank of a spike).Matrix size in this and all subsequent panels is N1 = 1000, N2 = 500.Numerical results are presented as mean and standard deviation over 10 instantiations for each value of b and s.B. Singular-vector rotation, i.e. overlap of leading data left singular vector, û1 with the K-dimensional left singular space of the signal, U1. C. To break the degeneracy of the spikes with rank K > 1 in panel A, here we consider a single leading signal spike with singular value s, along with K − 1 spikes drawn independently and uniformly in [0, s].We plot the leading data singular value as a function of s for various K, compared to finite-rank theory (black) D.For the same signal model as in panel (C), we plot the overlap of the leading data left singular vector with the leading signal singular vector as function of signal strength s, for different K and for the finite-rank theory (black).We see that the finite-rank theory incorrectly estimates both the singular values and singular vectors of signals of even moderate rank K. See Appendix A for finite-rank theory formulas.

)
Since u a is an eigenvector of G Y aa (ζ 1 , ζ 2 ) with eigenvalue ζ b ζ1ζ2−s 2 where b = 2 for a = 1 and b = 1 for a = 2, we find

FIG. 3 .
FIG. 3. Singular Vector Overlaps of Sums of Gaussians.Color plot (bottom) shows the theoretical prediction of the logarithm of the overlap, log 10 Φ1(ŝ, s) (Eq (43a)), between the left singular vectors of R = Y + X and those of Y , as a bivariate function of the associated singular values ŝ of R and s of Y .Dashed line is identity, s = ŝ.Here the signal Y and noise X are both rectangular Gaussian matrices with aspect ratio c = N 1/N 2 = 3 /2.Elements of Y are i.i.d. with variance σ 2 y/N 2 where σ 2 y = 3. Elements of X are i.i.d. with variance σ 2x/N 2 with σ 2 x = 1.Top three panels show singular vector overlaps for 3 horizontal slices associated with 3 fixed "target" signal singular values s = 1.5, 2.5, 3.5, for 10 realizations of random matrices with N1 = 1500 and N2 = 1000.Each grey dot denotes an overlap between a left singular vector of R with singular value ŝ (position on x-axis) with the left singular vector of Y with singular value closest to s. Red dots reflect binning the singular values of R from all 10 realizations, with number of bins set to √ N 2 giving bin width ≈ 0.13.Blue is the theoretical prediction from Φ1(ŝ, s) in (43a).Note that as the signal singular value s increases, Φ1(ŝ, s) as a function of ŝ becomes more concentrated about a value larger than s.This reflects the fact that singular vector structure in the signal Y at singular value s is mapped to singular vector structure in the data R at larger singular values ŝ, due to singular value inflation under the addition of noise X.

FIG. 4 .
FIG. 4. Signal-Strength Transition in the SV Density of the Extensive Spike Model.Each row of the bottom color map shows theoretical predictions for the singular value density of the extensive spike model, ρ R 1 (ŝ), corresponding to different signal strengths s (along y-axis) at a fixed rank ratio of b = K /N 1 = 0.25.In all panels in this figure, the aspect ratio is fixed to c = N 1/N 2 = 0.7.Features of the finite-rank spike model are shown as lines for comparison.The horizontal black dashed line indicates the threshold signal strength scrit above which the finite-rank model acquires an outlier singular value.The red curve indicates the position of this outlier singular value.The vertical grey dashed lines indicate the edges of the bulk spectrum of the finite-rank model.The top 3 panels, corresponding to horizontal slices of the color maps, plot the singular value density at 3 different signal strengths s = 0.8, 1.5 and 2. Solid blue curves indicate theoretical predictions from numerically solving (46), while grey histograms indicate the spectral density from a single realization with N2 = 2000.For comparison, the red vertical spike indicates the position of outlier singular value in the finite-rank theory, while the grey dashed spike indicates the edge of the noise bulk in this theory.Together these panels demonstrate that as s increases, the singular value density first undergoes a crossover from a unimodal to a bimodal regime, and then a phase transition from a connected to a disconnected phase.

For
strong signal s (Fig 6 top panel), the overlap of the data singular vectors with the true signal subspace is reasonably approximated by the finite-rank theory [22].However, for moderate signals (Fig 6 second panel) the data singular vectors interact, competing for the signal subspace.Singular vectors associated with the leading edge of the signal bulk have higher subspace overlap with the signal, while those at the lower edge overlap less.Perhaps most intriguingly, even for weak signals below the finite-rank phase transition at s = s crit the top data singular vectors still overlap significantly with the signal subspace (Fig 6 third panel).Note, this overlap is nontrivial and O(1) even when the singular value spectrum of the data is in the unimodal bulk regime.

FIG. 5 .
FIG. 5. Rank-Ratio Transitions in the SV Density of the Extensive Spike Model.Each row of the bottom color maps show theoretical predictions for the singular value density ρ R 1 (ŝ) corresponding to different rank ratios b (y-axis) at a fixed signal strength, and all color maps have same color scale.Top 3 panels indicate matching theory (blue curves) and empirics of a single realization (grey histograms) for 3 rank ratios b = 0.05, 0.25, 0.5, and the aspect ratio is fixed to c = N 1/N 2 = 0.7 with N2 = 2000.Comparisons to the finite-rank theory are shown using the same conventions as in Fig 4. A. Results for s = 1.8, illustrating that for sufficiently strong signal the singular value density remains in the disconnected phase for all values values of b.B. Results for s = 1.55, illustrating that for intermediate signal strengths the density undergoes a transition from disconnected to connected as the rank ratio b increases.C. Results for s = 0.9, illustrating that for subthreshold signals the density remains connected for all b.scrit = c 1/4 ≈ 0.915 throughout.

FIG. 6 .
FIG.6.Singular Vector Overlaps in the Extensive Spike Model.Each row of the bottom color plot shows the theoretical prediction for the overlap of a left singular vector with singular value ŝ of the data matrix R = Y + X, with the entire K dimensional signal subspace of Y (i.e. the squared norm of the projection of the singular vector onto this subspace).The prediction is given by bΦ1(ŝ, s), using (43) after numerically solving for g R 1 (ŝ) from(46).Different rows along the y-axis correspond to different signal strengths s for Y .Comparisons to the finite-rank theory are shown using the same conventions as in Fig.4.The top 3 panels show horizontal slices for s = 0.75, 1.4, 2.0.Solid grey curves indicate the singular value density of the data matrix R in the extensiverank model.Blue dots indicate numerical calculations for the overlap for a single realization with N2 = 2000.Solid light blue lines through the blue dots indicate matching theoretical predictions for this overlap.For comparison, the horizontal dashed line indicates the overlap predicted by the finite-rank theory (which depends only on s and not ŝ).The 3rd panel indicates that the signal subspace of Y is detectable in the top data singular vectors of R, even at small signal strengths s below the transition in the singular value density of R from unimodal to bimodal.The aspect ratio is c = N 1/N 2 = 0.7, while the rank ratio for Y is fixed at b = K /N 1 = 0.1.

FIG. 8 .
FIG. 8. Optimal Denoising of Extensive Spikes.A Each row of the bottom color map shows the optimal shrinkage function ϕ * (ŝ) (55) for denoising data from the extensive spike model.Different rows on the y-axis correspond to different rank ratios b = K /N 1 of the signal Y , while the signal strength s of Y is fixed at s = 1.8 and the aspect ratio is fixed at c = 0.7.The top 3 panels show horizontal slices with b = 0.25, 0, 1, 0.01.Blue (darker) curves indicate the optimal shrinkage function for the extensive-rank model while orange (lighter) curves indicate the optimal shrinkage function for the finite-rank model [29] Eq (7) (which does not depend on b).These panels indicate that the optimal shrinkage function for the extensive-rank model balances singular values more than that of the finite-rank model by more (less) aggressively shrinking larger (smaller) singular values.B. Comparison of mean-square error in rectangular data denoising of K spikes, as a function of K for fixed signal strength s = 2, using the optimal shrinkage function for the finite-rank model (orange) versus that of the extensive spike model (blue).Even at small spike numbers of K = 10 for N1 = 1000 by N2 = 500 sized data matrices, the extensive denoiser outperforms the finite-rank denoiser, and at larger K the extensive (finite-rank) denoiser gets better (worse).C. Empirical shrinkage, and comparison of optimal shrinkage function for two different errors: in blue (darker) denoising the rectangular signal matrix, ϕ * (ŝ) (55), and in green (lighter) estimating the N1×N1 signal covariance matrix ψ * (ŝ) (61), for signal strength s = 1.5 and rank ratio b = 0.1, with aspect ratio c = 0.7.Lighter curves show the theoretical optima found using Eq (46) Darker dots show empirical shrinkage obtained via kernel estimation (see Appendix F) of the block Stieltjes transform from the data singular values with N2 = 2000.D. Comparison between multiplicative model (spiked covariance) and additive model (spiked rectangular model) with K = 50.Top: Eigenvalue spectra of data covariance (RR T ) for multiplicative model and additive model.Bottom: Optimal shrinkage for covariance estimation under the wrong model.Data spectrum generated by additive model, shrinkage function of multiplicative model [31] Eq (13) vs the correct, additive model.E. Mean-square error in covariance estimation as a function of K using multiplicative model (orange) vs correct, additive model (blue).Throughout D and E, s = 2 with N1 = 1000 and N2 = 1500, and multiplicative model is displayed in orange (lighter) and additive model is displayed in blue (darker).
FIG. S1 .Singular Value Density of Extensive Spike Model in the Low-Rank Limit.Color plot shows singular value density of extensive spike model with aspect ratio c = N 1/N 2 and signal strength s = 1.5.The rank ratio, b = K /N 1 , varies along the y-axis.Top 3 panels show horizontal slices for b = 0.001, 0.005, 0.01 together with empirical histograms of individual model instantiations with N2 = 2000.The extensive-rank theory converges to the finite-rank theory as b gets small.

(z
N 2 singular values s m of Y (including N 2 − N 1 zeros when N 2 > N 1 ).Then zI − Y − T decouples into 2 × 2 matrices of the form z − t1 −s m −s m z − t2, and that allows us to writedet zI − Y − T = z − t1 − t1 z − t2 − s 2 m .

− 1 =
G Y (z − R(t * )).But we argued above that d dY F R (Y ) = G R (z), which gives the subordination relation.More formally, consider a Hermitian test matrix, A, with a bounded spectral distribution, and then observe that 1 N d dy log det(M + yA) = τ AM −1 .Thus, we substitute Y → Y + yA into the expression for F R (B14) and differentiate to find limN →∞ τ AE Ō G R (z) = lim N →∞ τ AG Y z − R X (t * ), take the normalized block-wise traces of both sides, yieldingg R (z) = g Y z − R X (t * ) .(B20)Thus, comparing to B18 we have t * = g R (z), and B20 becomes the subordination relation for the block Stieltjes transform.Substituting t * = g R (z) into B19, we obtain the desired resolvent relation Proof the Annealed Free Energy AsymptoticallyEquals the Quenched Free Energy Suppose A, B are hermitian matrices with bounded spectrum.Define the functionf (O) := 1 N log det zI − A + OBO T ,(B22)for arbitrary orthogonal O ∈ SO (N ).

1 N
E O [log det (M )] ≤ 1 N log E O [det (M )], so in the limit they are equal:lim N →∞ 1 N E O [log det (M )] = lim N →∞ 1 N log E O [det (M )] .
) M −2 and OB 2 O T are positive definite Hermitian matrices, so Tr M −2 OB 2 O T ≤ N B 2 op M −2 op .