E 98 , 052120 ( 2018 ) Data-driven diagnosis for compressed sensing with cross validation

We propose a data-driven procedure for diagnosing the results of compressed sensing as success or failure. Compressed sensing is an efficient data-acquisition method widely used in experimental physics. Many previous studies demonstrated that compressed sensing works well if sparse modeling can be applied; that is, physical phenomena of interest are assumed to be described by only a few explanatory factors. However, it is difficult to confirm the assumption of sparse modeling for respective instances in advance because we do not know the true representation or its sparseness of the instance being investigated. To overcome this difficulty, we examined a statistical tool called cross validation, in which all available data are randomly divided into two subsets and the accuracy of describing data in one subset by a sparse representation estimated from only data in the other subset is evaluated as an error. In particular, we focused on the dependence of the cross-validation error on the size ratio of the two subsets. Our analysis, inspired by statistical mechanics, showed that the cross-validation error asymptotically follows a power law of the size ratio when the total amount of available data is exactly at the critical point between success and failure. Hence, compressed-sensing results can be diagnosed in a data-driven manner by comparing the behavior of the cross-validation error with the power law.


I. INTRODUCTION
In today's era of data-driven science, compressed sensing (CS) is becoming increasingly common in experimental physics across disciplines.CS is a statistical method that saves more measurement time and thereby allows experiments to be conducted more efficiently [1,2].A key idea for finding a correct solution in spite of insufficient data is sparse modeling, in which it is assumed that physical phenomena have a sparse representation; that is, they are described by only a small number of explanatory factors.The research history of CS can be traced back to exploration geophysics in the 1970s [3,4].Modern theoretical work, in which it was mathematically proved that CS works perfectly under the assumption of sparse modeling [5][6][7][8], has expanded the scope of application of CS to magnetic resonance imaging [9,10], ghost imaging [11,12], quantum state tomography [13,14], nuclear magnetic resonance spectroscopy [15,16], and black-hole observation with radio interferometry [17,18].More recently, the authors and their collaborators applied CS to a quasiparticle interference experiment with scanning tunneling microscopy and scanning tunneling spectroscopy [19].

Published by the American Physical Society under the terms of the Creative Commons Attribution 4.0 International license. Further distribution of this work must maintain attribution to the author(s) and the published article's title, journal citation, and DOI.
In this article, we address an oft-overlooked but fundamental problem of detecting false results of CS due to very small datasets.It is well known that CS exhibits a threshold behavior, where if the amount of data is above a critical value it can succeed, and vice versa [5,6].In statistical mechanics, this phenomenon can be interpreted as a phase transition from failure to success of CS with an increasing amount of data, and its phase diagram and phase boundary have been extensively analyzed [20][21][22][23][24][25].The phase boundary indicates the minimum amount of data necessary for the success of CS.In practice, however, no method is available to determine whether the given data are sufficient because the minimum amount of data depends on the degree of sparseness of the phenomena being estimated.Thus, it is important to diagnose the results of CS in a data-driven manner, which does not require any prior knowledge on the true representation or its sparseness of the phenomenon.To develop a data-driven diagnosis method, we examined an orthodox statistical technique called cross validation (CV), in which all available data are divided into training data and validation data, and the accuracy of describing the validation data by a sparse representation estimated from only the training data is evaluated as an error [26,27].We show that, when the total amount of data is at the critical value between success and failure of CS, the CV error (CVE) asymptotically decreases according to a power law with respect to the ratio of the sizes of the training and validation datasets.We thus propose CV-based diagnostics to assess the results of CS without prior knowledge of the true sparseness of the phenomenon, in which the asymptotic behavior of CVE is monitored and compared with the power law.
The rest of the article is organized as follows: In Sec.II, two main concepts-compressed sensing and cross validation-are mathematically formulated.In Sec.III, we describe how to analyze cross-validation error using statistical mechanics.In Sec.IV, our method is proposed and its performance is discussed with numerical experiment.Section V concludes the article.

A. Compressed sensing
The purpose of CS is to accurately obtain the true representation of the target phenomenon x 0 = (x 0 1 , . . ., x 0 N ) ∈ R N , which contains a few nonzero components, from a small set of data samples Y = {y μ |μ ∈ M}, where M = {1, 2, . . ., M} and M < N.Each component of the data is acquired in the form of an inner product, y μ = a μ , x 0 ≡ i a μi x 0 i , where a μ = (a μ1 , . . ., a μN ) ∈ R N is a measurement vector.The set of measurement vectors is denoted by A = {a μ |μ ∈ M}.Basis pursuit is a representative algorithm of estimation in noise-free CS [28].The estimator of basis pursuit xBP is defined as follows: where • 1 is called the l 1 -norm of vectors and is defined by The optimization problem in Eq. ( 1) can be solved with linear programming.Note that no unique solution is derived from only the equality constraints because the size of dataset Y is smaller than the dimension of representation x.However, the l 1 -norm minimization leads to the true representation, that is, xBP = x 0 , if it is sufficiently sparse.
Many theoretical studies have been conducted on the performance of CS with basis pursuit.In an influential study, it was proved using geometry that in the large-size limit, N → +∞, basis pursuit can obtain the true representation accurately if the ratio of the number of samples to the dimension of representation, that is, α = M/N, exceeds a critical value [5,6].The critical value α c is given by where ρ represents the degree of sparseness of the true representation, dz φ(z) in the case of random Gaussian measurement vectors A [29], which will be analyzed later in this study.These results, indicating a phase transition of CS with respect to the amount of data, were discussed and confirmed from the viewpoint of statistical mechanics [20][21][22].However, the formula for α c , Eq. ( 2), is of little practical use because it depends on ρ.In practical situations, the sparseness of the true representation, ρ, is not known in advance; furthermore, it cannot be judged whether the data already acquired are sufficient for CS.
In other representative work, the performance of CS was analyzed using a mathematical concept called the restricted isometry property [7,30].For each integer K, the restricted isometry constant δ K of a measurement matrix A = (a T 1 , . . ., a T M ) T is defined as the smallest number such that holds for all K-sparse vectors x.The transpose of matrix A is denoted by A T .A vector is said to be K-sparse if it has at most K nonzero components.Using the restricted isometry constant δ 2K of A, it has been shown that if δ 2K < √ 2 − 1, the K-sparse vectors are guaranteed to be recovered using the basis pursuit [30].This sufficiency condition has subsequently been improved several times [31][32][33][34][35].However, these conditions are no more useful than the aforementioned formula for α c because whether the true representation x 0 is K-sparse is not known in advance.

B. Cross validation
Our proposed method for data-driven diagnosis is based on CV.CV is one of the most frequently used statistical tools for parameter tuning and model selection [26,27].For example, the regularization parameter for the least absolute shrinkage and selection operator (LASSO) used in noisy CS, which is not a topic tackled in this study, is often determined by applying CV [36].The procedure of CV is formulated as follows: The dataset Y is divided into a validation dataset Y V = {y μ |μ ∈ V} and a training dataset Y T = {y μ |μ ∈ V}, where V is a nonempty proper subset of M.An algorithm such as basis pursuit is then applied to the training dataset Y T to obtain an estimate of the representation xT .Note that CV is a versatile tool applicable to other estimation algorithms, though only the use of basis pursuit is discussed hereafter.The CVE is then evaluated by comparing xT with Y V .In this study, CVE is defined by the mean-squared error as follows: where M V is the size of Y V .Considering the fluctuation in values of the CVE, the CV procedure is repeated many times with different partitions of the same dataset into training and validation subsets.We should stress that the calculation of CVE does not need the true representation or its sparseness.Therefore, CV-based methods satisfy the demand of being data-driven.

III. ANALYSIS
We analyzed the CVE in the large-size limit, N → +∞, when basis pursuit is used as the estimation algorithm.The typical value of CVE is expressed as an expected value The angular brackets • x T denote thermal averaging with respect to where, compared to the original optimization in Eq. ( 1), the δ functions represent the equality constraints and the Boltzmann factor represents the l 1 -norm minimization considering the low-temperature limit of β → +∞.The square brackets [•] Y,A,x 0 denote configurational averaging with respect to {Y, A, x 0 }.Our analysis deals with a simple noisefree case in which each component of measurement vectors, a μi , is an independent and identically distributed Gaussian random variable of mean zero and variance N −1 and each component of the true representation, x 0 i , is independent and identically generated from a sparse prior distribution P ( The parameters ρ and σ 2 x represent the degree of sparseness and magnitude of nonzero components, respectively.For convenience, we introduce a free-energy function, The function Z is a partition function defined by where Tr Then, the CVE is also expressed as a partial derivative of the free energy, It is technical to evaluate [ln Z] Y,A,x 0 with the help of the replica method developed in the field of spin-glass theory [37].
The replica method is based on an identity called the replica trick: The replica trick replaces [ln Z] Y,A,x 0 with ln[Z n ] Y,A,x 0 , and the latter is easier to calculate than the former when n is a positive integer.The value in the limit, n → +0, is extrapolated from the resultant values at positive integers n.The extrapolation, however, assumes that the functional relation of ln[Z n ] Y,A,x 0 to positive integers n is valid for any real numbers n in the same manner, though it is not always mathematically correct.This is why the replica trick is called a trick.A lengthy but straightforward calculation, which is explained in Sec.S-I of the supplemental material [38], gives an analytic result of CVE, and saddle-point equations (SPEs) with respect to (Q, χ , Q, χ ), FIG. 1.Typical values of cross-validation error plotted against k when ρ = 0.1.The solid line (magenta), indicating an asymptotic power law for large k, represents a critical case, α = α c 0.3288.Each dashed line (red) having a nonvanishing value at the limit of large k represents a failure case below the criticality, α = 0.30, 0.31, 0.32.Each dashed line (blue) exponentially decaying to zero represents a successful case above the criticality, α = 0.33, 0.34, 0.35. where , and θ 0 = χ − 1 2 .In our analysis, the replica-symmetry ansatz was assumed, and the validity of the analytic result will be confirmed later through numerical analysis.

IV. DISCUSSION
We examined the dependence of the CVE on the parameter k, which is derived from Eqs. ( 12) and (13).Figure 1 plots CVE values against k in log-log scale.At a glance, regardless of the value of α, a larger k corresponds to a smaller CVE, and this trend is natural because a larger k implies that the training dataset Y T has a larger size and the estimate xT in the procedure of CV is thus expected to be closer to the true representation x 0 .To develop data-driven diagnostics, the manner in which the CVE decreases with k should be focused on.In the case of α > α c , the CVE exponentially converges to zero with respect to k, and for α < α c it remains a finite value for large values of k.
We analyzed the critical case of α = α c in more detail.Considering the limit of k → +∞, the solution of the SPEs (13) converges to that of the SPEs from which Eqs. ( 2) and ( 3) are derived in [20].Thus, the leading-order terms of the Taylor expansion of Eq. (13a) at (θ 1 , θ 0 ) = (0, θ c ) are relevant to obtain an asymptotic formula of CVE: A similar calculation for Eqs.(13a) and (13b) after substitution of Eqs.(13c) and (13d) shows that θ 1 is asymptotically proportional to 1/k.Therefore, the CVE has a power-law relation to k with the exponent −2 when the amount of data is on the phase boundary, α = α c .This indicates that the results of CS can be diagnosed as success or failure by applying CV at various k values and comparing the decreasing trend of CVE with the power law.As mentioned above, we performed numerical analysis to confirm the theoretical results from the replica-symmetry ansatz.The numerical results are shown in Figs. 2 and 3.According to the settings, we generated numerous triplets of {Y, A, x 0 } and calculated CVE by applying basis pursuit to {Y, A}.The median over 5000 instances of {Y, A, x 0 } was plotted to take account of the expectation denoted by [•] Y,A,x 0 .Extrapolation to the large-size limit was performed by  linear regression using a simple model, CVE The linear regression was performed by using the least-squares method.Error bars represent the standard error evaluated using the bootstrap method.We can see that the analytic solution is close to the extrapolated value.Therefore, the results of our analysis with the replica-symmetry ansatz can be considered valid.We conducted numerical experiments to show that our proposed method works well in practical cases of finite N for each instance, while theoretical results by the replica method are obtained for a typical case of infinite N averaged over instances.The results on two instances of {Y, A, x 0 } with parameters N = 300, ρ = 0.1, σ 2 x = 1, and α = 0.3 are shown in Fig. 4.Each left panel shows a set of data samples Y , from which a sparse representation is estimated using basis pursuit.Judging from the results of CS shown in the middle panels, the top and bottom instances represent success and failure cases, respectively.We need to determine whether our proposed CV-based method can distinguish between success and failure properly in a datadriven manner.According to our proposal, CVE values, which are calculated from only the data, are plotted against k in log-log scale in the right panels.At a glance, the top instance indicates a faster exponential decay of CVE compared to the power law with the exponent −2, which is consistent with the success of CS.In contrast, as shown in the bottom instance of failure, the CVE remains a finite value at the largest k that can be examined when M = 90.
In Sec.S-II of the supplemental material [38], we show the results of other 200 instances with various system sizes from N = 100 to 1000.Regarding small-size systems, a few instances were found difficult to diagnose by our method because it is ambiguous as to whether the convergence of CVE is faster or slower than the power law.Practically in such a case, we should regard the results of CS as failure, and acquire more data so as not to draw a hasty conclusion.In addition, we see that the results of diagnosis were not always consistent with the true results of CS.A closer look reveals that all the inconsistent instances were diagnosed as failure by our method, though CS itself succeeded.Our method is considered to make a conservative diagnosis because in CV, a part of the available data needs to be kept aside for validation data.Conversely, we argue that our method never failed to detect hidden failure of CS and prevents us from a meaningless or misleading discussion derived from insufficient data.As a whole, our method worked well, and in particular, all of the instances with N 800 were correctly diagnosed as far as we examined.Therefore, we conclude that our proposed CVbased method performs well in data-driven diagnosis for CS.

V. CONCLUSION
We have proposed a data-driven framework based on CV to diagnose the results of CS as success or failure even before the truth is recognized.The framework notifies us when the amount of data has just exceeded a critical threshold and helps improve experimental design to maximize the utilization of experimental resources.In addition, our framework is expected to provide a criterion based on which an algorithm should be chosen in CS among numerous algorithms such as basis pursuit, orthogonal matching pursuit [39,40], iterative hard thresholding [41,42], and subspace pursuit [43].As is often the case, different results are obtained using different algorithms even when the same dataset is analyzed, but our framework enables us to distinguish which results can be trusted by evaluating the relation of CVE to the parameter k for each algorithm [44].Moreover, our framework is considered widely applicable to many other formulations of CS.A representative formulation is matrix completion [45,46], which is useful for quantum state tomography in quantum information science [13,14] and phase retrieval in x-ray crystallography [47].Considering computational cost, cross validation intensively used in our proposed method is time-consuming because it involves many repetitions of estimation from various training datasets.Future work on developing efficient methods of cross validation is needed and the approach of generalized cross validation would be promising [36,[48][49][50].Statistical methodologies such as CS deserve further discussion from the viewpoint of statistical mechanics including phase transition and critical phenomena.It will be important not only to point out the existence of a phase transition in information processing but also to establish a practical framework of data-driven diagnosis.

FIG. 4 .
FIG. 4. Results of numerical experiments on two instances are shown in the top and bottom rows, which correspond to success and failure cases of compressed sensing, respectively.In the left, filled (green) and open (gray) circles indicate observed and unobserved data, respectively.In the middle, estimated and true sparse representations are shown by filled (red) and open (black) circles, respectively.The overlapping of corresponding pairs of filled and open circles of the top instance indicates a perfectly accurate estimation.Basis pursuit is applied for estimation from the samples.In the right, CVE values are plotted against k in log-log scale.In both instances, the parameters are set to N = 300, ρ = 0.1, σ 2 x = 1, and α = 0.3.