Random Projection using Random Quantum Circuits

The random sampling task performed by Google's Sycamore processor gave us a glimpse of the"Quantum Supremacy era". This has definitely shed some spotlight on the power of random quantum circuits in this abstract task of sampling outputs from the (pseudo-) random circuits. In this manuscript, we explore a practical near-term use of local random quantum circuits in dimensional reduction of large low-rank data sets. We make use of the well-studied dimensionality reduction technique called the random projection method. This method has been extensively used in various applications such as image processing, logistic regression, entropy computation of low-rank matrices, etc. We prove that the matrix representations of local random quantum circuits with sufficiently shorter depths ($\sim O(n)$) serve as good candidates for random projection. We demonstrate numerically that their projection abilities are not far off from the computationally expensive classical principal components analysis on MNIST and CIFAR-100 image data sets. We also benchmark the performance of quantum random projection against the commonly used classical random projection in the tasks of dimensionality reduction of image datasets and computing Von Neumann entropies of large low-rank density matrices. And finally using variational quantum singular value decomposition, we demonstrate a near-term implementation of extracting the singular vectors with dominant singular values after quantum random projecting a large low-rank matrix to lower dimensions. All such numerical experiments unequivocally demonstrate the ability of local random circuits to randomize a large Hilbert space at sufficiently shorter depths with robust retention of properties of large datasets in reduced dimensions.


I. INTRODUCTION
Many problems in machine learning and data science involve the dimensional reduction of large data sets with low ranks [1] (e.g.Image processing).Dimensional reduction as a preprocessing step reduces computational complexity in the later stages of processing.Principal Components Analysis (PCA) [2], reliant on Singular Value Decomposition (SVD), is one such method to reduce the dimension of data sets by retaining only the singular vectors with dominant singular values.There are quantum circuit implementations for PCA (and SVD) [3][4][5][6] and for related applications [7], some of which are near-term (Noisy Intermediate Scale Quantum (NISQ) technologies era [8]) algorithms [6].
Techniques like PCA (and SVD) involve a complexity of ( 3 ), where  is the size (or the dimension) of data vectors.An alternative to such computationally expensive dimensional reduction methods is the random projection method [9][10][11].In the random projection method, we multiply the data sets with certain random matrices and project them to a lower dimensional subspace.Recent years have witnessed fruitful usage of an especially thoughtful variant of such random projections which are known to preserve the distance between any two vectors in the data set (say ⃗  1 and ⃗  2 ) in the projected subspace up to an error that scales as ( √ log() ) where  is the original dimension and  is the reduced dimension of each data vector.This choice is motivated by the Johnson Lindenstrauss lemma (JL lemma) [12] introduced at the end of the last century.Since * kais@purdue.eduthis manuscript will exclusively use such transformations to validate all the key results, we shall hereafter refer to such candidates as good random projectors.Such projection techniques are beneficial to myriad applications because the preservation of distances between data vectors ensures that their distinctiveness is uncompromised thereby rendering them usable for discriminative tasks such as classification schemes like logistic regression ( [13]).
Classically, this is advantageous compared to other methods like PCA because the random matrix used for projection is independent of the data set considered.The time complexity involved in the random projection arises from matrix multiplication complexity ( 2.37 ) [14] followed by the usual SVD complexity of ( 2 poly log()) making the resulting scheme cheaper than the PCA (or SVD).It must be emphasized that a further reduction in the time complexity to (poly log(N)) can be afforded using the Fast Johnson-Lindenstrauss transforms [15].Several candidates have been studied in classical random projection including Haar random matrices, Gaussian random matrices etc.But the memory complexity of storing such matrices can be potentially huge (proportional to  2 times the precision of each matrix entry).This has engendered the introduction of several competing candidates with better memory complexity(containing sparse matrices with random integer entries) and multiplication complexity.The latter category is mainly considered in practical applications today [10] and will also be used to compare the results of the quantum variants in this manuscript.
Classically random projections performed by using projectors sampled from Haar random unitaries suffer from the innate problem of storage due to its exceptionally high memory usage.Even in the quantum setting, implementing Haar random unitaries requires exponential resources as shown in some counting arguments [16].As a result, it is natural to consider unitary −designs which only match the Haar measure up to -th moments.Quantum implementation of such − designs, as has been studied in this manuscript, is efficient owing to the fact that local random quantum circuits approach approximate unitary − designs [17][18][19] in sufficiently shorter ((log() 10.5 ) depths [20,21] (Here, we have assumed that the number of qubits  required to encode a data vector or a wave vector of size  is ∼ log()).It was shown recently that even shorter depths suffice [22].The primary workhorse of this manuscript will be based on such quantum circuits which as we shall eventually show not only performs better in accuracy than standard more commonly used classical variants but also require a lesser number of single qubit random rotation gates (poly(log())) for implementation.
The flow of the paper is as follows.In Sec.II, we begin with an introduction to the JL Lemma and how that makes the random projection method effective.This is followed by a brief introduction to the Haar measure and approximate Haar unitaries generated from local random quantum circuits.Then, we explicitly prove that the local random quantum circuits which are exact unitary 2− designs can satisfy the JL lemma with the same high probability as Haar random matrices thereby making them good random projectors.We then extend the results to approximate unitary 2− designs and discuss the bounds on depths to achieve a certain error threshold in the JL lemma and derive a slightly different probability of the satisfaction of the latter.We would note that the quantum memory required to store a 2-design or approximate 2-design is (poly(log )) where  is the size of the data vector.It is worth noting that it has been previously shown in Ref. [23] that approximate unitary − designs with  = () can be used to satisfy the JL lemma, thus corroborating our assertions that even they are good candidates for random projection.The exponentially low limit obtained in Ref [23] is better than the limit derived in this paper only for system sizes  ≥  (10 4 ).For  ∼ (10 3 ), limits derived in this manuscript for unitary 2− designs are tighter.
For numerical quantification of the key assertions, we first use the MNIST, CIFAR-100 image data sets( [24,25]) and show that the quantum random projection preserves distances post-projection not far off from the computationally expensive algorithms like PCA (along the lines similar to Ref. [11]) and is similar to the classical random projection.This task doesn't require one to know the singular values or the singular vectors explicitly.We compare the performance of quantum random projection with the commonly used classical random projection technique.Instead of benchmarking it against the Haar random matrices generated classically, we make use of classical random projectors whose storage and multiplications are efficient.To this end, we use Subsampled Randomised Hadamard Transform (SRHT)( [15]) for different sizes of data sets (1024, 2048 corresponding to 10 and 11 qubits, respectively).As a second instance, we look at a task that requires us to calculate the singular values of large low-rank data matrices and the singular vectors associated with them.In this regard, we perform the computation of entropies of low-rank density matrices by randomly projecting them to reduced subspace (along the lines of (Refs.[26,27]) to get the dominant singular values post-projection.We also demonstrate that one can construct the simplest quantum random projector by performing quantum random projection and extracting the dominant singular values using the variational quantum singular value decomposition (VQSVD) [6].Here, random projection to a lower dimension allows us to optimize using a lower dimensional variational ansatz at one end.The combined effect of the variational nature of the algorithm and the fact that unitary -designs are short-depth establishes good testing grounds for the implementation of this demonstration in near-term devices [8] .These demonstrations highlight the ability of local random circuits to efficiently randomize a large Hilbert space(and hence require exponentially lesser parameters to create a random projector) and serve as good random projectors for dimensionality reduction.

A. Random Projection
The random projection method is a computationally efficient technique for dimensionality reduction and is useful in many problems in data science, signal processing, machine learning, etc (See, for example,[ [10,11]]).The reason behind the effectiveness of the method stems from the Johnson-Lindenstrauss lemma [12].Lemma 1.For any 0 <  < 1 and  ∈ ℤ + .Let us also consider  ∈ ℤ + s.t.
Then, for any set of vectors where || 2 refers to the  2 norm.
Proof.See Lemma Ref. [12] 1.Definition 1: random projections vs Good random projections Multiplication with Gaussian or Haar random matrices along with a scaling factor followed by projection to a reduced subspace is one function that obeys Eq.2 [28,29].Essentially, it follows from the fact that the expected value of Euclidean distance post-random projection is equal to the Euclidean distance in the original subspace.And the distances post-random projection are not distorted beyond an  factor with high probability because the variance of the distances post-random projection is sufficiently low.
From now on, we will consider random projections that satisfy the JL lemma in Eq.2 to be good random projections.In this regard, the JL lemma says that any set of  points in a high-dimensional Euclidean space (say, ℝ  )can be embedded into a lower number of dimensions (say,  = ( −2 log )) by a random projection, preserving all the pairwise distances to within a multiplicative factor of 1 ± .This is also equivalent to preserving all the pairwise inner products (or angles).Formally, where || 2 refers to the  2 norm and ⃗   ∈ ℝ  ∀  and Π denote the random projection matrix of size  ×  (or  ×  in which case the random matrix multiplies the data vectors from the right) which obeys Eq.3 and will be called good random projectors from now onwards.
Several other candidates which satisfy JL lemma have been considered for random projection in various applications.These random matrices include Subsampled Randomised Hadamard Transform (SRHT) and Input Sparsity Transform (IST).( [10,15,30,31]).These random projectors are database friendly because, unlike Gaussian or Haar random matrices whose storage memory cost is proportional to the number of matrix entries and precision, these could be retrieved by matrices that are sparse and have whole number entries.
For benchmarking the quantum random projection in our analysis later, we will be using the SRHT ( [32]) to compare the performances of random projection using random quantum circuits.We picked SRHT because we want to compare different random matrices that could be efficiently stored and multiplied.In a classical setting, that would be the SRHT and in a quantum setting, it would be the 2-designs that act as quantum random projectors.We construct a SRHT random projector as in the Algorithm

B. Approximate Unitary 𝑡-designs
In the next section, we will show that the random matrices sampled uniformly from the Haar measure satisfy JL lemma.
Though the exact replication of Haar random unitaries is not possible as a quantum circuit because of the fact that they require exponential resources [16], we will show that to satisfy JL lemma, exact or approximate -designs [19], which matches the Haar measure only until 2nd moment would suffice.We will introduce the definitions related to the approximate -designs in this section and provide theorems on approximate -(or 2) designs satisfying the JL lemma.

Definition 1: Moment Operator
The  ℎ moment of a superoperator defined with respect to a probability distribution  () defined on the unitary group () is defined as where ( ) is the volume element of the probability distribution  ().

Definition 2: Exact Unitary 𝑡-design
Let us define Δ ()  () (⋅) [31][33] [34] as where   refers to the uniform distribution over the Haar measure.Unitaries like  sampled from a distribution  () are said to form a This essentially means that the  () mimics the Haar measure up to the -th moment.

Definition 3: 𝛼 Approximate unitary 𝑡-designs
The unitary group () is said to form an  approximate unitary − design iff where || ⋅ || ⋄ refers to the diamond norm (see for example [22]).Though the  approximate unitary design definition here involves the diamond norm, formulations using other norms exist [35], and the theorems in the following section generalize for those formulations as well.Local random quantum circuits with lengths (log()(log() + log(1∕))) become  approximate 2-designs [22].

III. RANDOM QUANTUM CIRCUITS AS RANDOM PROJECTORS
In this section, we show that local random quantum circuits which are approximate unitary 2-designs(or exact unitary 2-designs) are suitable candidates for the random projection (will be called quantum projectors from now on).We show that quantum projectors satisfy Johnson Lindenstrauss's lemma so that their random projection is a  2 subspace embedding with a very high probability of having a very low error.And if one were to compute specific quantities like entropy, one should quantify whether such random matrices produce projected singular values that are closer to the true singular values with higher probability.In a later section, we will discuss how the projection can be done on real quantum computers and how the reduced dimensional vectors and their singular values can be read out from near-term quantum computers.In the following theorems, let us denote the Haar measure distribution as   and the distribution corresponding to  approximate  = 2 design as  2, .Proofs of the theorems can be found in the AppendixA.

Proof. See Appendix A
It is worth mentioning that upper bounds on the distortion have been obtained before with exponential scaling ( [23]), namely 2 4 (−2 −4  2 ) for Haar measure and 2 10 (−2 −10  2 ) for approximate −designs.These limits are better than the limits obtained here only for (, ) > 10 4 .For the cases that are to be explored in the paper, our limits are tighter than the exponential limits.For the plots in the experiments section of the paper, we use the ansatz used in [36] which is assumed to be an exact 2-design ansatz beyond a certain depth Fig. 1.The main text contains the depths at which the ansatz matches the exact 2design limit(∼ 150).There are many candidate local random circuit architectures which are  approximate 2-designs [22].Instead of studying the projection abilities of different local random circuits architecture, the appendix E contains some experiments where we look at less expensive ansatz and hence is in an approximate unitary 2-design regime by choosing a lower depth (∼ 50) (analogous to [34]) of the same circuit Fig. 1.
FIG. 1.The local random quantum circuit used in preparing a quantum random projector is the ansatz that has been used in [36] and is known to converge to an exact 2-design limit of the variance of the local cost function beyond a certain depth.The circuit contains a layer of   (∕4) rotations (often used to make all the directions symmetric in a variational training procedure.We don't necessarily need to have this component).This is followed by alternating random single-qubit rotations and ladders of CPHASE operations repeated D times.For  = 10,11 (the dimensions studied in this paper), the circuit reaches the exact 2-design limit (variance limit) at D ≥ 150.

IV. EXPERIMENTS ON QUANTUM RANDOM PROJECTORS
In this section, we consider two different experiments to benchmark the performance of quantum random projection discussed in the previous section against the SRHT projection which will be labeled as classical random projection in the plots.This should be looked at as a comparison of random projectors' that can be stored and applied efficiently in terms of memory and time complexity in classical vs quantum settings.Since quantum random projectors approximate the Haar measure, their projection abilities are expected to be better than the SRHT projectors because the latter is less random compared to the Haar measure.However, in certain applications, it is known that they both converge to similar performance when the size of the data set tends to infinity [29].We see in the appendix D that their performances start becoming closer when we increase the size of the data matrices, and vectors from 1024 to 2048 (corresponding to 10 and 11 qubits, respectively).
We initially consider the task that doesn't require us to know the singular values and is concerned with only dimensionality reduction.In this regard, we reduce the dimensions of the MNIST [24] and CIFAR 100 [25] image datasets and benchmark the performance of quantum random projection against classical random projection.We also compare it with the computationally expensive principal components analysis (PCA) which is supposed to give the exact projection to the dominant singular vectors of the datasets and cannot be outperformed beyond a certain rank.
In the second task, we calculate the Von Neumann entropy of low-rank density matrices (along the lines of [26]) which requires us to know the singular values after random projection in addition to the dimensionality reduction.We compare the performance of quantum random projection (QRP) vs classical random projection (CRP) for this task over different ranks (r) of the density matrices.
In this section, we pick the local random quantum circuit from Fig. 1 and we assume that we can make arbitrary projection operators with any number of basis vectors, i.e, if the   operator projects to the first  basis state (| 1 ⟩ , | 2 ⟩ , .., |  ⟩) in any basis.Then, where we don't have any restriction on what basis we pick and what values k can take.In a later section, we discuss the simplest projection operator one can construct by measuring one or more qubits and restricting to particular outputs (0 or 1) in those qubits, as shown in Fig. 2. It is worth noting that this scheme has a structure similar to the quantum autoencoders ( [37]) but the circuit here is data-agnostic.

A. Dimensionality reduction of Image data sets
In this subsection, we benchmark the performance of the QRP against CRP in the task of dimension reduction of subsets of two different image datasets, MNIST and CIFAR-100.We also plot the performance of the computationally expensive PCA which is supposed to capture all the nonzero singular valued singular vectors.When the reduced dimension is greater than the rank of the system, PCA could never be outperformed.
MNIST contains 28x28 grayscale images.The matrix representations of the images were boosted to 32x32 so FIG. 2. The figure shows the schematic of performing the quantum random projection.The data vector has to be encoded into the circuit through one of the existing encoding schemes (See main text).This is followed by the local random quantum circuit and partial measurements (the number of qubits measured depends on how low your final reduced dimensions are) or an arbitrary projection operator.For partial measurements, the algorithm proceeds only if the measurement results in qubits in only 0 (or only 1).This is equivalent to reducing the data set's size by 1/2,1/4,1/8 and so on depending on how many qubits are measured.that they can be reshaped into 1024x1 normalised vectors by adding zeros(Note that this is not a common quantum encoding scheme.We use QRP on the normalized data vectors for a direct comparison with CRP).We have to do this preprocessing step because the projectors that we consider (both CRP and QRP) are of the form 2  ×  and hence take only 2  dimensional vectors as the input.
CIFAR-100, in addition to being 28x28 images, are also colored images and had to be converted to 32x32 grayscale so that they can be fed as input to our projectors.But unlike MNIST, which contains handwritten integers from 0 to 9, the CIFAR-100 dataset contains images belonging to 100 different classes including Airplanes, Automobiles, birds, cats, trucks, etc.As a result, CIFAR-100 is expected to have more features in their datasets and hence greater rank compared to MNIST if we consider subsets from each of these datasets.
To perform the comparison between CRP and QRP, we took 1000 images from each of these datasets.And in each of these subsets, we reshaped the images into 1024x1 normalized vectors and performed random projection to lower dimensions (x-axis of the Fig. 3).Then, we randomly sampled two data vectors and compared the error percentage in their  2 norm (Euclidean distance) between them in the original space and the reduced dimensional space obtained after random projection.This procedure is repeated 10,000 times and the mean error percentages and their 95 % confidence intervals for different reduced dimensions has been reported in the plots Fig. 3.
The random projections are performed by multiplying the vectors with random matrices (see Algorithm 1 and Algo-rithm 2) where Π  is the SRHT projector and   ,   are the sampled from the matrix representation of local random quantum circuit and projectors used.And PCA projection is obtained by first computing singular value decomposition on the dataset and projecting them to the subspace of dominant singular vectors.
Projection errors on MNIST image dataset using different schemes FIG. 3. The plots show the mean percentage errors in the distance between 10,000 different random pair of data vectors in the MNIST and CIFAR-100 data sets.The envelopes represent their 95 % confidence intervals.We see that PCA outperforms the random projection methods beyond a certain rank.Amongst the random projection methods, though there is not much difference between the classical random projection (CRP) and quantum random projection(QRP), we observe that the latter performs slightly better.
Fig. 3 show that the PCA outperforms the random projection methods beyond a certain rank.This is because, beyond the rank of the dataset considered, PCA projects exactly to the subspace with non-zero singular values.Despite that, we see that random projection methods which are not computationally extensive (because they don't compute the subspace with non-zero singular values) perform to the same extent and even better than the PCA at lower reduced dimensions.This dominance in performance at lower reduced dimensions is visible in larger dimensional datasets (see Appendix D).We also see that PCA takes more reduced dimensional vectors to catch up with the random projection algorithms in the case of CIFAR-100 because it has comparatively more rank (loosely because it has more features) than the MNIST dataset.These data vectors dimensionally reduced via quantum random projection could be used in quantum machine learning applications such as training an image recognition/classification model (See for example [38]).
Within the random projection methods, quantum random projection performs slightly better than the classical random projection mainly because Haar random matrices are more random and have tighter JL lemma bounds than the classical random projector.The performance of quantum random projectors which are away from the exact 2-design limit has been analyzed in the Appendix E by looking at shorter depths (∼ 50) of Fig. 1 and hence lesser expressive ansatz.
The discussion in this section assumed the existence of an exact amplitude encoding scheme for the data vectors.This would require impractical depths of (2  ) unless the data vectors are genuinely quantum, e.g., groundstates of a family of local Hamiltonians.However, for general data vectors like image data vectors, we do not necessarily need exact encoding.Preserving the distinctness of image data vectors (⃗ , ⃗ ) to a good enough accuracy enables us to use them for many image processing applications such as recognition, and classification.In this regard, there has been substantial work on Approximate amplitude encoding.These schemes encompass approximately encoding data vectors whose amplitudes are all positive [39], real [40], and even complex data vectors [41] using shallow parametrized quantum circuits.
With the plots in Fig. 3, we showed that for exactly encoded data vectors (⃗ , ⃗ ) and quantum random projected vectors ( ⃗ x, ⃗ ỹ) on average for pairs of images in the data set used.Here  is a very small fraction of |⃗  − ⃗ |.A good approximate amplitude encoding scheme is bound to preserve this distance with minimal error, since it preserves the distinctness of the samples as well as (calling ⃗   , ⃗   approximate encoded vectors) on average.Here Δ is a small fraction of |⃗  − ⃗ |.
With the equations Eqs.13 and 14, it is clear that even with the approximate amplitude encoding, quantum random projection would preserve the distinctness of samples (up to a perturbation of  + Δ) and be useful for image processing applications.The exact value of Δ depends on the efficiency of the approximate encoding used.
The other alternative to circumvent the impractical depths of the exact data encoding issue is by adopting different encoding schemes.One can start by reducing the resolution of the images (equivalent to reducing the pixels), which results in reduced classical image data vector dimension (to say  < 2  ) and use any other existing data encoding schemes that use qubits' dimensions greater than  but with polynomial depths(For example [42], [43]).
If Φ(.) is the encoding function that takes the original data vector and encodes it as a data vector of dimension 2  .Then, to check how well the distinctness is preserved, experiments need to be run on the  qubits with a quantum random projector corresponding to  qubits.Mathematically, we need to check how low the following values are (on average) for two data vectors ⃗ , ⃗  from the original dataset where Φ(⃗ )  , Φ( ⃗ )  are reduced randomly projected encoded vectors.
In this work, we confined ourselves to experiments involving an exact encoding scheme despite impractical depths because the preservation of distance for the exact encoding scheme implies the same for the approximate encoding schemes as described earlier.Checking the preservation of distance for other encoding schemes would require knowing the exact form of encoding in Eq.15.
Just like its classical counterpart, we can also reconstruct the images back to the original size after the random projection.For classical methods (for a data vector ⃗  and its reduced data vector ⃗ x) For the quantum case, we need to put in the extra qubits or the subspace to which we projected to get back to the original size and then apply the inverse of the unitary circuit used for projection.For example, if we had projected to the subspace where one of the qubits is in |0⟩ state.We boost the size back to the original size by having a new qubit at |0⟩ and add the inverse unitary circuit ( †  ) on this new system.For a general projector, | x⟩ →| x⟩ ⊗ | p⟩ where tensor product with the | p⟩ ensures that we get back the original size of the data(image).And the reconstruction is done as follows (For the dataset |⟩) This is similar to the reconstruction done in [37].These reconstructions work on the premise that the product Π † Π ∼  of the original data dimension.It is trivial to see that this holds for the PCA projector.It turns out that this also holds for the random projectors.This is because in a larger dimensional space, finding almost orthogonal vectors becomes more common, this has been studied in [44] and was used in the discussion of [11].Fig. 4 shows how one would reconstruct an image from the MNIST dataset after dimensionally reducing a subset of the MNIST images.
With the reconstructed quantum data vectors, there are processing applications which have computational advantage over classical processing.For example, the complexity of the quantum edge detection algorithm [45] is polynomial and doesn't require exponential resources if we have an image encoded either exactly or approximately.The measurement outputs of the edge detection algorithm contain information about the edges.To get the outputs of this experiment, one can also adopt the classical shadows [46,47] approach to get the probabilities of all the bit strings with less number of measurements than full-state tomography.

B. Entropy estimation of low-rank density matrices
In this subsection, we compare the performance of the quantum random projectors against the classical random projector, SRHT on a task that requires one to obtain the approximate singular values of the dominant singular vectors of a data matrix after the dimensionality reduction.Unlike the previous task, this is concerned with reducing the dimensions of a large data matrix instead of individual data vectors by random projection.After the dimensionality reduction, we check how well the system captures the properties of the dataset by computing the error percentage in a particular property of the data matrix which requires the knowledge of all its singular values.Specifically, we will consider randomly generated semipositive definite density matrices with random singular vectors but their singular values follow a certain profile.We then compute their entropy after quantum random projection and check their accuracy(along the lines of [26]).The exact singular values profile of these density matrices depends on the nature of the system.In this experiment, we consider singular values which are linearly decaying and exponentially decaying until the rank of the system and zero afterward.These profiles could be motivated through the existence of physical systems with such profiles.A thermal ensemble of a simple harmonic oscillator mode of frequency  with N internal degrees of freedom has an exponentially decaying profile.Here, singular values of  will be proportional to 1,  −ℎ ,  −2ℎ ,  −3ℎ , .. and so on.And, it is known that maximal second order Rényi entropy ensemble of a system with a simple harmonic oscillator mode of frequency  and N internal degrees of freedom follows a linearly decaying singular value profile for its density matrix [48].The main text contains the plots related to the linearly decaying singular profile and the appendix C contains the plots related to the exponential decay profile.
Here is the procedure to perform random projection given a semi-positive definite matrix  of dimension  ×  • Project the original density matrix of size  ×  to a lower dimension  ×  using Π  and Π • Perform SVD (classical) or QSVD (quantum) on the lower dimensional matrix to get the singular vectors with singular values p1 , p2 , p3 ,.... p which are approximations to  1 ,  2 , ...  .
• Then we obtain an approximation to entropy (S) using The accuracy in the approximated entropy is bounded in the following theorem IV.1 Theorem IV.1.For a random matrix Π satisfying the Johnson-Lindenstrauss (JL) lemma with a distortion  √ , where  ≤ 1∕6 and  ≤ 1∕2, the difference in the Von Neumann entropy of a density matrix  computed using the random projection with Π (denoted as S()) and the true entropy (()) can be bounded as follows with probability at least (1 − ).

Proof. See Appendix A
The Fig. 5 shows the error percentage in the computed entropy after random projection for density matrices of size 1024 × 1024 with linearly decaying singular values until a certain rank ( = 10, 50, 100, 40 in the plot) and zero afterward.The x-axis represents the different reduced dimensions ().The accuracies are better for low ranks as expected and get worse for larger ranks.We observe that the quantum random Projector and the classical random projector perform to similar extents (if not a better quantum performance than classical performance for density matrices with very low rank) in this task.This matches the trends reported in [26] where they reported similar performance for various other classical random projection matrices like Gaussian, SRHT, and IST.We also show the accuracies with which the quantum and classical random projectors capture the singular values of the system for the rank  = 10 when the system's size  Error % in Singular Values Singular Value (linear profile) errors for (r=10, N=1024, k=512) Quantum Random Projection Classical Random Projection FIG. 6.The plot shows the accuracy with which the quantum random projector and the classical random projector pick the singular values of the density matrix for  = 10 when reducing the system size by half.The envelope represents 95 % confidence intervals by running the experiments over 10,000 randomly generated density matrices has been reduced by half in figure 6.Here, we see that the quantum random projectors perform better than their classical counterpart mainly because Haar random matrices that the random circuits try to approximate is more random than any classical random projectors that could be stored with similar or comparable complexity.The appendix contains a discussion regarding how the accuracy improves when we increase the size of the original datasets from 1024 × 1024 to 2048 × 2048.
We discuss the same plots for density matrices with exponentially decaying singular value profile until a certain rank in the Appendix C. We observe there that increasing the rank doesn't change the singular value profile as much and hence the accuracy with which the random projection algorithms work remains pretty much constant.The appendix E contains the error plots for the accuracy in individual singular values for the case  = 10 obtained using lesser expressive random circuits (depth ∼ 50).

V. HOW TO PROJECT IN A REAL QUANTUM COMPUTER?
For the quantum random projection to work, in addition to sampling a unitary from the exact (or approximate 2 designs), we also need to have a circuit component for projector operators.In one of the previous sections, we considered arbitrary projection operators which might not be able to be efficiently implemented in a quantum computer with polynomial resources.However, we can look at the simplest projection operations that one can use for the quantum random projection.In Fig. 2, we looked at the simplest projection operation, which is measuring some of the qubits and proceeding only if the qubits are in a certain state (|0⟩ or |1⟩).This is equivalent to projecting it to the subspace where those qubits take that specific value.For example, when you have a circuit of 10 qubits, measuring one of the qubits and proceeding only when that qubit is in |0⟩ means a reduction in the data vector (or ket) dimension from 1024 to 512.However, the projection through measurement discussed above is different compared to the classical projection because a quantum measurement (wavefunction collapse) automatically takes care of the normalization factor and the extra √   is not needed.Since the Hilbert space we consider here is large and the ket entries are randomized, the normalization that happens because of the wavefunction is the same as the prefactor we would get in a classical random projection.
To demonstrate the quantum random projection with a simple projection operation, we will consider projection operators of the form 1 2 (1+   ) which is a projection operator to the space where  ℎ qubit is at |0⟩ state.To demonstrate this, we perform such a quantum random projection for a large data matrix with a linearly decaying singular values profile of size 1024 × 1024 and rank  = 5 by reducing the data vectors to sizes 512, 256 and 128 by projecting out 1, 2 and 3 qubits respectively.Then, we retrieve the dominant singular vectors by performing a variational quantum singular value decomposition (VQSVD) [6].But since the data matrix has been dimensionally reduced, the ansatz we use for finding the right singular vectors is also of reduced size (Fig. 7).The details regarding the implementation of VQSVD, and the ansatz type used can be found in the Appendix F. FIG. 7. The figure shows the schematic of the variational quantum SVD post quantum random projection to lower dimensions.The data matrix  needs to be loaded using a set of unitary gates with techniques like importance sampling(see related discussion in the appendix of [6]).Similar to the setup in Fig. 2, we perform projection by measuring a few qubits at the top.This is followed by a training procedure to obtain the dominant singular vectors and their singular values.The singular vectors on the right end belong to the lower dimensional space and hence require a lower dimensional ansatz.
The Fig. 8 shows the accuracy with which we were able to reconstruct the singular vectors after quantum random projection for individual singular vectors.This demonstrates how one can perform quantum random projection in near-term devices as the VQSVD algorithm used to retrieve the dominant vectors is a near-term algorithm.The accuracy with which the singular values have been retrieved depends on the expressivity of the ansatz and whether or not it falls into a barren plateau during the training procedure.We haven't discussed the most accurate retrieval of the singular vectors, as that is beyond the scope of this paper.There are many strategies to avoid falling into the barren plateau and improve the convergence rate [49][50][51].We used the identity block strategy [51] to avoid barren plateaus (more details on that in the appendix F.)

VI. CONCLUSION
In this work, we explored a practically useful application of local random quantum circuits in the task of dimensional reduction of large low-rank data sets.The main essence of the applicability of the local random circuits in this task is their ability to anticoncentrate rapidly at linear or sub-linear depths [52,53].This makes them a good random projector to lower dimensions, meaning they preserve the distinctness of different dominant data vectors in a large dataset after dimensional reduction.
The theorems discussed in the paper show that just like the Haar random matrices which are good random projectors, their approximate quantum implementations, the exact and approximate t-designs are also good random projectors.The rapid anti-concentration of Hilbert space at linear depths means that the number of random parameters(the random rotation parameters) required to create and reproduce a random projector is logarithmic in the size of the data sets.Such efficiency in the storage complexity of classically generated Haar random matrices or in any classical random projector is not possible.We then, benchmarked its performance against the commonly used classical random projector, Subsampled Randomised Hadamard Transform (SRHT).The quantum random projectors performed slightly better than this classical candidate because they are trying to approximate Haar random matrices which are more random than the classical candidate.We then demonstrated these comparisons for various tasks such as image compression, reconstruction, retrieving the singular values of the dominant singular vectors post-dimension reduction, etc.
Though the initial discussion assumed arbitrary projection operators to arbitrary subspaces, we showed simplest projection operators and projection subspaces exist.We demonstrated this simplest quantum random projection and retrieved the dominant singular vectors post-quantum random projection via VQSVD [6].This shows the applicability of such quantum random projections and their retrievals in near-term devices.
Dimensionality reduction facilitated by random projections as discussed in this work can also precede kernel-based variants of PCA wherein eigenvalue decomposition of the Gram matrix associated with the higher-dimensional embedding (often called kernel) is sought [54] especially if the said Gram matrix is low-rank.Beyond the precincts of classical data, such a technique can act as an effective precursor to improve the efficiency of simulation even on quantum data as has been studied in recent work [55].The essential crux of the idea is heavily rooted on PCA but applied to quantum data wherein repeated Schmidt decomposition of the states and vectorized form of arbitrary operators are performed followed by subsequent removal of singular vectors associated with non-dominant singular values akin to PCA.The techniques explored in this work involving good random projections can be used in conjunction prior to the application of such a protocol to contract the effective space of the states/operators involved.Owing to the demonstrated near-term applicability, similar reduction can also be afforded as a preprocessing step in a host of quantum algorithms manipulating quantum data [56] on noisy hardwares.Such protocols are of active interest to the scientific community due to their profound physicochemical applications ranging from exotic condensed-matter physical systems like Rydberg excitonic arrays [57], modeling higher dimensional spin-graphical architectures in quantum gravity [58] and in learning theory of neural networks [59], constructing unknown hamiltonians through time-series analysis [60,61], tomographic estimation of quantum state [62,63], in electronic structure of molecules and periodic materials [64], quantum preparation of low energy states of desired symmetry [64,65] or even order-disorder transitions in conventional Ising spin glass using quantum annealers [66] and quantum variants of Sherrington-Kirkpatrick model [67] to name a few.
We did an extensive comparison using deep (∼ 150) exact 2-design ansatz and deferred the discussion about circuits away from the exact 2-design limit to the appendix E. This is because there exist various random circuit architectures which anti concentrate just like the exact 2-design ansatz and hence could be good candidates for random projection.This could be a good starting point for future study.Also, it is worth studying and constructing quantum random projectors suited for specific applications and datasets( for example, the datasets in health care [68,69]).It has to be noted that the results de-rived in the main text assumed noiseless quantum gates and measurements.Similar theorems need to be understood for real quantum computers where different noise sources are unavoidable.This leads to a possible future study to understand the extent to which the theorems in the main text are valid on real quantum computers by performing statistical analysis on the bitstrings from the output of real quantum computers (See Refs.[70,71]).

VII. CODE AVAILABILITY
The classical and the quantum random projection matrix(and the rotation parameters used to generate the circuit) used for the comparisons, the data matrix used to generate Fig. 8 along with the code for generating the plots in this paper will be made available upon reasonable request.The simulation for the retrieval of dominant singular vectors through VQSVD was done in the Paddle quantum framework [72].
projection is made such that one of the qubits is projected to say |0⟩ state.Then, reconstruction is done by appending the  † circuit to the reduced quantum state + measured qubit in |0⟩ (This is equivalent to adding zeros to the basis elements where the measured qubit is in |1⟩ state just like how we boosted dimensions from k=400 or 700 to 1024.)

Theorem III. 1 .
Let  ∈ () be sampled uniformly from the Haar measure(  ) and let ⃗  1 , ⃗  2 ∈ ℝ  .Then the matrix Π × obtained by considering any k rows of  followed by multiplication with √   satisfy

S = ∑ p𝑖 ln 1 p𝑖FIG. 4 .
FIG. 4. The schematic of steps involved in the dimensionality reduction and image reconstruction of image datasets using (a)PCA, (b)CRP, and (c)QRP.The figure shows the reconstructed images for various reduced dimensions.Though projectors with dimensions 700 and 400 are not straightforward to construct, a reduced dimension of 512 represents projection by measuring one of the qubits (so the size drops from 1024 to 512) and processing further only if it's 0 or 1.The figure illustrates the reconstruction of one of the data vectors from the MNIST data subset we are experimenting, a quantitative description of how the reconstruction performs compared to classical methods has been discussed in the Appendix B along with a description about the construction of projection operators.
FIG.5.The plots in this figure show the accuracies of quantum random projection and classical random projection in the entropy computation of randomly generated density matrices of size N=1024 and ranks  = 10, 50, 100, 400 with a linearly decaying singular value profile.The envelopes represent their 90 percent confidence intervals by running the experiments over 100 randomly generated density matrices.The accuracies improve with decrease in the ranks as expected.

FIG. 8 .
FIG. 8.The figure shows the errors in the singular values obtained by reconstructing the dominant singular vectors post quantum random projecting a randomly generated data matrix of rank  = 5 using variational quantum SVD for various reduced dimensions  = 512, 256, 128 .