Max-Planck-Institut f¨ur Mathematik in den Naturwissenschaften Leipzig Tensor Network Compressed Sensing with Unsupervised Machine Learning

We propose the tensor-network compressed sensing (TNCS) by incorporating the ideas of compressed sensing, tensor network (TN), and machine learning. The primary idea is to compress and communicate the real-life information through the generative TN state and by making projective measurements in a designed way. First, the state | Ψ (cid:105) is obtained by the unsupervised learning of TN, and then the data to be communicated are encoded in the separable state with the minimal distance to the projected state | Φ (cid:105) , where | Φ (cid:105) can be acquired by partially projecting | Ψ (cid:105) . A protocol analogous to the compressed sensing assisted by neural-network machine learning is thus suggested, where the projections are designed to rapidly minimize the uncertainty of information in | Φ (cid:105) . To characterize the efﬁciency of TNCS, we propose a quantity named as q-sparsity to describe the sparsity of quantum states, which is analogous to the sparsity of the signals required in the standard compressed sensing. The need of the q-sparsity in TNCS is essentially due to the fact that the TN states obey the area law of entanglement entropy. The tests on the real-life data (hand-written digits and fashion images) show that the TNCS has competitive efﬁciency and accuracy.


I. INTRODUCTION
Hybridizing the ideas and techniques in information theories and quantum physics has given birth to significant and fruitful achievements. On one hand, quantum physics may enhance the communications of classical information. Taking dense/super-dense coding protocols [1][2][3][4][5][6][7] as examples, the idea is to use previously shared entangled state between a sender and the receiver(s) to send more classical information than is possible without the resource of entanglement. On the other hand, classical techniques can assist quantum approaches. One example is to use compressed sensing [8] (see also the book in Ref. [9]) to improve quantum state tomography [10][11][12][13].
We here consider to combine the ideas of compressed sensing [8], quantum communication [14], and unsupervised tensor network (TN) machine learning [15]. Compressed sensing is a powerful scheme for classical data compression by sampling, which is particularly useful when the samplings of the data are difficult or expensive. For instance in the magnetic resonance imaging, compressed sensing can largely compress the required samplings, thus significantly improve the efficiency [16].
In quantum communication, measurements are also expensive, since quantum states are difficult to prepare and each measurement will collapse or disturb the state to some extent. Consequently, the quantum communications of real-life data (e.g., images of O(10 3 ) bits or more), even including the corresponding simulations of the quantum processes on classical  (1) train the Born machine |Ψ representing the probability distribution of the data that Alice considers to send; (2) encode the specific piece of information to be sent by projecting |Ψ ; (3) decode the information as a generative process by the projected Born machine. explain TNCS, let us consider the following scenario. Alice wants to send a piece of classical information {x}, e.g., an image of hand-written digit "3" consisting of O(10 2 ) pixels, to Bob. She intends to send only a small number of pixels (or features in the terminology of machine learning) denoted by {x sent } to Bob by classical communication which might be unsafe or even public. The rest of the information {x rest } (with {x} = {x sent } ∪ {x rest }) will be encoded in the Born machine |Ψ . To recover {x rest }, Bob projects |Ψ that is previously provided by Alice in the way determined by {x sent }. After the projections, |Ψ will be projected to another entangled state denoted as |Φ , and by design, {x rest } will be encoded in the separable state that has the minimal distance to |Φ . Therefore, Bob can reliably recover {x rest } by implementing projections on |Φ , or generate {x rest } by sampling on |Φ in the terms of TN machine learning. A flowchart of TNCS is given in Fig. 1.
There remain two key questions: how to construct |Ψ and how to design the projections on it, so that the information to be sent can be optimally encoded in |Φ in the above way. Our proposal is the following. First, Alice trains |Ψ by the unsupervised TN machine learning algorithm [15], so that |Ψ represents the probability distribution of a huge amount of information that Alice considers to send. |Ψ is called a Born machine since the probability of each piece of information is the square of the corresponding coefficient in |Ψ [35]. Then to send a specific piece of information, she chooses to send Bob the pixels, with which the uncertainty of the rest of the pixels in the probability distribution will be minimized. The full information {x} is efficiently compressed to (or in other words, can be accurately reconstructed from) a small part of the image {x rest } and the Born machine |Ψ .
We test our TNCS with the datasets of hand-written digits and fashion images (namely MNIST [36] and fashion-MNIST [37]) by classical simulations. Any image in the training or testing sets can be reconstructed reliably and efficiently. The efficiency is indicated by the compression ratio r = N f /N 10%, where N = #{x} denotes the total number of features in {x} and N f = #{x [sent] } denotes the number of the sent features. In other words, the information Bob accesses is about 10 times of the information that Alice needs to send through the classical channels. Most part of the information is encoded in the Born machine (quantum state). Similar to the compressed sensing, randomly choosing {x [sent] } already leads to small compression ratios. Better performance is reached by choosing {x [sent] } with a sampling protocol based on the entanglement of |Ψ , and by implementing post-selections to access {x [rest] }. Finally, q-sparsity to characterize the sparsity of quantum state is proposed. For TNCS, q-sparsity characterizes how fast the Shannon entropy of the prbability distribution will decrease by projecting the Born machine |Ψ , and how efficient the compressed sampling can be via |Ψ . An empirical equation to estimate the required number of pixels for reliable reconstructions is given.
Our proposal shares a similar spirit with the recent works that combine the standard compressed sensing with the classical probabilistic models. For instance, the auto-encoders parameterized by deep neural networks are used to design an efficient Markov sampling process to significantly reduce the compression ratio [38,39]. Here, we employ matrix product state (MPS) [40] that possess simple and shallow architecture to encode the information and to design the sampling strategy based on the entanglement. While this work will focus on demonstrating TNCS by classical simulations, our scheme could in principle be implemented on quantum platforms. The TNCS would pave a way to processing real-life data through quantum many-body states. The possible advantages of the TNCS in efficiency, security, and etc., when implemented on quantum platforms are to be discussed in future.
This paper is organized as follows. In Sec. II, we explain the basic concepts and theories of TNCS. In Sec. III, TNCS is improved by introducing a quantum-inspired protocol and post-selections. In Sec. IV, we introduce q-sparsity to characterize the sparsity of quantum states based on entanglement and projections, and in this work to qualitatively characterize the efficiency of TNCS. At last, we present a summary and the perspective on quantum communications in Sec. V.

II. TENSOR-NETWORK COMPRESSED SENSING
In this section, we will mainly explain TNCS as a TN scheme that efficiently deals with quantum many-body states (optimizations, projections, and etc.) by classical simulations. Note that TNCS can in principle be generalized to quantum platforms. Some discussions on the quantum nature of TNCS can be found in Appendices D, E, and F from the perspectives of entanglement, security, and efficiency.
Suppose Alice wants to send Bob an image of a handwritten digit "3" by TNCS (Fig. 1). She firstly trains the state |Ψ as the generative model for the training set of many "3" images in MNIST. This can be done with the unsupervised TN machine learning algorithm [15]. The idea is to firstly map the images to quantum states. For example, the n-th pixel (0 ≤ x n ≤ 1) is mapped to a state of a qubit as x n → |s(x n ) = cos(x n π/2)|0 + sin(x n π/2)|1 , (1) with |0 and |1 the two eigenstates of the Pauli matrixσ z .
In this way, one image with pixels {x} = (x 1 , x 2 , · · · ) is mapped to a separable state |ψ = n |s(x n ) . Then the Born machine |Ψ is optimized to capture the probability distribution of the training set, by minimizing the distance (negativelog likelihood) to the probability distribution of the images in the training set. See more details in Appendix A. Here, we take |Ψ in the form of MPS [40]. Note that TNCS is a general scheme, where one may also choose other TN forms to represent |Ψ , such as tree TN or MERA [26,31,32], or simply a quantum state without a specific entanglement structure.
In the sense of machine learning, though we only use the "3" images in the training set to optimize |Ψ , it is expected that |Ψ approximately gives the probability distribution of any "3" images. In other words, |Ψ learns the probability distribution of the "3" images from a finite (training) set, but can generalize to generate and/or recognize arbitrary "3" images that |Ψ might have never learned. The ability of a machinelearning model to process the information beyond the training set is known as the generalization power (see, e.g., [41]). As shown in the previous works [15,25,26,[28][29][30][31][32], TN models (including MPS) possess remarkable generalization power that is competitive to neural networks. Notably, TN models surpass neural networks as they possess high interpretability and allow to implement quantum processes.
As |Ψ gives the probability distribution of the "3" images in the training set and beyond (due to its generalization power), it is then possible to use |Ψ to communicate any "3" images. As a direct advantage, Alice can train |Ψ without knowing the specific "3" image that will be sent to Bob. In other words, different "3" images can be communicated with the same state |Ψ , as long as |Ψ can "recognize" (in the sense of machine learning) it as an image of "3" (see Appendix B for more discussions).
In the communication, Alice sends Bob only a small part of this image {x sent } and |Ψ ; then Bob projects |Ψ according to {x sent } as with C a constant to normalize |Φ . {x sent } should be selected so that #{x sent } is small and meanwhile Bob can accurately reconstruct the rest of the pixels {x rest } from |Φ . Note that in the classical sense, Bob just needs the data of |Ψ (e.g., the tensors in the MPS) to simulate the projections. When considering TNCS as a quantum protocol, many copies of |Ψ will be needed to obtain at least one expected projected state. Some relevant discussions can be found in Appendix F. The selection of {x sent } is analog to the sampling process of compressed sensing [9]. In a standard compressed sensing scheme, one may randomly choose a certain number of pixels from the image as {x sent }. The sampling can be compressed since the randomly selected pixels approximately lead to averagely distributed frequencies, and meanwhile an image in the frequency space is normally sparse. These are known as incoherence and sparsity, which are the two conditions for the compressed sensing to efficiently work.
For the TNCS, one may also randomly choose {x sent } from {x}, which is denoted as random ordering (RO). Each projection by |s(x n ) in Eq. (2) projects the state towards the separable state |ψ = n |s(x n ) . With sufficient data in {x sent }, |Φ will eventually be projected to such a state, where n |s(x n ) (x n ∈ {x rest }) is the separable state that has the minimal distance to |Φ among all separable states. Therefore, Bob can access {x rest } by simply projecting on |Φ . Till now, we can already see that the data can be efficiently compressed by the TNCS if xn∈{x [rest] } | Φ|s(x n ) | → 1 with #{x sent } #{x}. In other words, Bob will accurately have #{x sent } when |Φ is mapped to a state that is close enough to the separable state corresponding to #{x rest }. Later in Sec. IV, we will give more discussions about the efficiency from the sparsity of quantum states.
For extracting {x rest }, let us consider that Bob only samples each pixel from |Ψ once, dubbed as one-shot measurement. When preparing |Ψ as a quantum state (instead of the classical data as a TN), the one-shot measurement corresponds to measuring only on one copy of |Ψ . To generate {x rest } from |Φ , he measures the qubits in the basis of the Pauli matrix σ z . The probability P (x n ) of the n-th pixel x n = 0 or 1 is determined byρ n as P (x n ) = x|ρ n |x with x = 0, 1, wherê ρ n is the reduced density matrix with respect to the n-th qubit with Tr /n the trace over all degrees of freedom except for the n-th qubit. Note x P (x) = Trρ n = 1 due to the normalization of |Ψ . Considering TNCS as a quantum protocol, the one-shot measurement will be quite cheap and feasible in experiments as it requires only one copy of the state. One drawback is that only black-or-white pixels (x = 0 or 1) will be generated, not gray-scale ones. From the perspective of machine learning on classical computers, such a way of obtaining {x rest } is in fact to generate {x rest } by the Born machine |Φ [15], which involves the sampling processes based on the probability distributions P (x). We test the TNCS with RO and the one-shot measurement on MNIST and fashion-MNIST datasets, which consists of the real-life images of hand-written digits and Zalando's articles, respectively. Each dataset contains 10 classes of images, and in total has 60,000 training images and 10,000 testing images. Each image contains 28 × 28 = 784 gray-scale pixels. In Fig.  2 (a) and (b), we show the accuracy of TNCS with different compression ratios r = N f /N (green solid and the purple dash lines). The accuracy is characterized by the average peak signal-to-noise ratio (PSNR), which (say between the original data {x} and the reconstructed data {y} ) is defined as PSNR({x}, {y}) = 10 log 10 784 We average the PSNR by the results of reconstructing all the images in the testing set, which the Born machine did not learn in the training process. We take the bond dimensions of the MPS χ = 16 and 40. Generally, the PSNR increases with r and χ as expected, and TNCS works well by simply sampling a small number of {x sent } randomly from {x} and implementing one-shot measurement on |Ψ .

III. IMPROVING EFFICIENCY WITH ENTANGLEMENT-ORDERED SAMPLING PROTOCOL AND POST-SELECTIONS
While RO works well for TNCS, in the following, we will propose to improve the performance (i.e., of higher PSNR and higher efficiency with smaller compression ratio) by incorporating with a quantum-inspired sampling protocol based on entanglement and the post-selections of measurements.
Regarding the sampling, the results will change if Alice selects differently the {x sent }. Other than RO, a natural selection way dubbed as variance ordering (VO) is to select the pixels according to the variance. The variance of the n-th pixel is calculated from the training set as where x i,n is the n-th pixel in the i-th image of the training set and K is the number of the training images. By choosing {x sent } as the pixels with the highest variance, the PSRN is obviously improved [see the black diamonds and orange pentagons in Fig. 2

(a) and (b)].
A more reasoned way is to select based on the entanglement of |Ψ , so that {x sent } will minimize the uncertainty of {x rest } from the probability distribution given by the Born machine.
where |s(x n ) stands for the state associated with the n-th pixel x n [see Eq. (1)], and |Φ satisfies Eq. (2). The task is to find the N f pixels {x [sent] } that minimize the Shannon entropy Aiming at this task, let us begin with a simpler question: which pixel should be sent if Alice sends only one pixel? This can be determined by the single-site entanglement entropy (SEE) that (say for the n-th qubit) is defined as S ent n quantifies the information of the rest of the system that will be gained if one has the information of the n-th qubit. Such a quantity has been utilized to safely reduce the number of pixels for efficient supervised TN machine learning [42]. With S ent n , Alice can choose theñ-th pixel withñ = arg max n S ent n , so that Bob will gain as much information as possible from one sent pixel.
Based on the above scheme, we propose the following Markov sampling strategy to select {x [sent] }, dubbed as entanglement-ordered sampling protocol (EOSP).
1. With an N -qubit state |Ψ(N ) (initialized as |Ψ ), calculate the SEE S ent n of all qubits, and find the qubit that has the maximal S ent n , i.e.,ñ = arg max n S ent n . 2. From the reduced density matrix of theñ-th qubit,ρñ, calculate its dominant eigenstate |sñ . In short, EOSP selects the pixels in the order of entanglement (EO). A simple example that helps to understand the EOSP is provided in Appendix C. We shall emphasize that the order of projections in the EOSP is solely determined by the entanglement properties of the generative MPS, and does not depend on the specific images to be sent.
To obtain better accuracy for reconstructing gray-scale images (note the images in the MNIST dataset are gray-scale), we generate the pixels {x [rest] } by locating the separable state with maximal probability, i.e., where the product n goes through {x [rest] }. It means that each projective basis |s(x n ) is the dominant eigenstate of the corresponding single-site reduced density matrix of |Φ [Eq.
(3)]. We dub such a generation way as "post-selection", considering that post-selections [43] will be needed to reach the maximum in Eq. (9) when the generative MPS is prepared as a quantum state. Note for the classical simulations of TNCS, the post-selection scheme will not increase the cost too much compared with the one-shot scheme, as Bob can sample as many times as he wants from one |Ψ , or he may simply calculate the reduced density matrices and their dominant eigenvectors by classical computers from the MPS. Fig. 2  Combined with the classical compressed sensing, it was proposed to utilize the classical probabilistic models to significantly reduce the compression ratio [38,39]. For instance, Ref. [38] investigated the compressed sensing assisted by auto-encoders on MNIST and calculated the mean-square error (MSE) defined as MSE = n (xn−yn) 2 784 = 10 −PSNR/10 . The auto-encoder is formed by two parts: a fully-connected deep neural network as the so-called recognizer to encode the data to some latent variables, and a neural network with the same architecture as the generator to decode the latent variables. The auto-encoder is trained and tested on all images of the ten classes in the dataset simultaneously. The MSE per image ranges approximately as MSE 0.06 ∼ 0.01 when taking the compressed ratio as r 1% ∼ 10% for all the samples in testing dataset. For TNCS, we train the MPS by the images in a single class of the train set, in order to control the computational complexity and focus on the performances of different reconstruction ways given a well-trained generative MPS. We obtain MSE 0.047 ∼ 0.016 for the samples in each class of the testing set. More numerical results are provided in the supplemental material [44].

IV. Q-SPARSITY
A prerequisite for the conventional compressed sensing to work efficiently is the sparsity of the signals. For processing images, it is known that the signals are usually not sparse in the real space. Therefore, transformation (such as discrete cosine/wavelet transformation) is implemented to transform to another space in which the signals are sparse.
In TNCS, the prerequisite is that the quantum probability distribution, i.e., the state |Ψ , should be "sparse". Here, the sparsity is gained in a completely different way from the standard compressed sensing, which is by mapping the data to the higher-dimensional quantum Hilbert space. This is analog to the support vector machines [45] by mapping to a higher-dimensional space where the data can be better classified. In the unsupervised TN machine learning algorithm, each pixel x is mapped to the state of a qubit [Eq. (1)], then one image is mapped to the direct product state of N qubits with N the number of pixels. Such a vector is defined in a (2 N )-dimensional space H. The MPS |Ψ describes the joint probability distribution of the "vectorized" images in H. Essentially, one still deals with the data in the real space. However, the probability distribution becomes sparse in this higher-dimensional space, since it can be well captured by an MPS. An MPS is sparse because such a representation can only reach a small corner of H that satisfies the so-call onedimensional area law of entanglement entropy [46,47].
However, it is not easy to characterize the sparsity of a many-qubit state (including MPS) by simply treating it as a vector, as its dimension is exponentially large. We here propose to use EOSP to do so. In each step of EOSP, the qubit with the maximal SEE is projected. The entanglement of the state |Φ formed by the unprojected qubits decreases after each projection. Fig. 3 shows the SEE per sitē S(ñ) = n S ent n (ñ)/ñ of |Φ(ñ) [see Eq. (8)] with different number of unprojected qubitsñ = N − N f . One can see thatS(ñ) decays rapidly asñ decreases, meaning the unprojected qubits are almost in a separable state for smallñ. For S(ñ) = 0, no information will be gained by knowing the unprojected pixels. It means all information is contained in the projected pixels, and there is no uncertainty for the rest pixels, whenS(ñ) becomes zero.
From the implication ofS(ñ) discussed above, we define q-sparsity to qualitatively describe the sparsity of a quantum state (including MPS) as with d the dimension of one vectorized pixel. In this work, we focus on qubits with d = 2. Q-sparsity characterizes how fast the information of a quantum state can be extracted (or how fast the uncertainty of the rest can be reduced) by projections. Thus it characterizes the sparsity of quantum probability distributions in an essentially different way from the sparsity of (classical) probability distributions, since quantum processes (e.g., measurements) are involved. For any given state, the q-sparsity satisfies Obviously, we have S q = 0 for separable states. Considering the maximally-entangled state [48], we have S q = 1 sincē S(ñ) = ln 2 for anyñ. For what is in the middle, we take the N -qubit GHZ state as an example. We haveS(N ) = ln 2 originally, andS(ñ = N ) = 0 after one projection. Therefore, we have S q = 2 −N +1 . For the conventional k-sparsity in comparison, we have S k = 2/2 N = 2 −N +1 = S q since it only has two non-zero coefficients in the 2 N -component vector. For the generative MPS's, we numerically have S q = 2 −768.6 and 2 −765.6 with χ = 16 for MNIST and fashion-MNIST, respectively, and S q = 2 −770.0 and 2 −767.5 with χ = 40.
For TNCS, S q characterizes the efficiency, i.e., the compression ratio. The smaller S q is, the fasterS(ñ) decays in general with the projections, and the less {x [sent] } Bob will require to accurately reconstruct the full information by TNCS. Therefore, analog to the conventional compressed sensing, TNCS requires the quantum probability distribution to be sparse in the higher-dimensional Hilbert space, i.e., N + log 2 S q N . Based on our results on MNIST and fashion-MNIST (Fig. 2), we have N f 10%N (with N = 784) to reach PSNR 20 using EOSP, and with c 6 (the precision of c is taken up to O(1) considering the fluctuations of different classes and datasets). For other orderings (RO and VO), c will be larger (meaning that more pixels are required) to reach a comparable PSNR than the one that EO achieves.

V. SUMMARY AND PERSPECTIVE
In this work, we propose a novel compressed sensing approach by combining the ideas of compressed sensing, quantum communication, and unsupervised TN machine learning. The key step is to train the state |Ψ (a Born machine) by the unsupervised TN machine learning algorithm, so that the targeted piece of information can be encoded in the separable state with the minimal distance to |Φ that is obtained by implementing projections on |Ψ in a designed way. The qsparsity is proposed as a property of any quantum states, and is used to estimate the efficiency of TNCS. We apply TNCS to the real-life datasets (hand-written digits and fashion images).
While the TNCS permits to compress and communicate data with competitive efficiency and accuracy as a classical method, the scheme can be potentially generalized to quantum states on quantum setups. One may build and optimize the Born machine on quantum platforms to compress and transmit information based on quantum theories. In particular, one could use TN to design the quantum circuit as the Born machine [31], or directly realize the Born machine as a manyqubit state. The main difficulties are to implement TN models (or the many-qubit states) on quantum devices and also the measurements of entanglement, which are not yet feasible with nowadays technologies. But it is hopeful that quantum devices will eventually surpass the classical counterparts in the future, with which the relevant obstacles would be tackled. Our work provides new possibilities for processing reallife data by secure quantum communications (see more discussions in Appendix E). In the generative TN machine learning algorithm proposed in Ref. [15], each image is mapped to a product state of N qubits as with |s(x i,n ) = cos(x i,n π/2)|0 + sin(x i,n π/2)|1 and N the total number of pixels in one image. Here, x i,n is the n-th pixel (gray with 0 ≤ x i,n ≤ 1) of the i-th image. The coefficients in the quantum state |Ψ are optimized to minimize the negative log-likelihood (NLL) defined as The summation i is over all the training images. NLL characterizes the resemblance between two probability distributions.
In this work, we choose the TN to be matrix product state (MPS). The coefficients of |Ψ are in a special form satisfying snan,an+1 |s n . (A3) represents a tensor that corresponds to the n-th pixel. The indexes {a} are known as virtual bonds of the MPS; their dimensions are bounded by dim(a n ) ≤ χ, with χ called virtual bond dimension. MPS is an efficient representation of quantum-many-body states where the total number of parameter scales linearly with N as ∼ 2N χ 2 . Note that the dimension of the Hilbert space actually scales exponentially as ∼ 2 N . The tensors in the MPS are updated alternatively by the gradient method as A [n] ← A [n] − τ ∂f /∂A [n] , with τ the gradient step; see Ref. [25] or [15] for more details. After converging, |Ψ gives the joint probability of the pixels. The probability for any image {x} in |Ψ is given as Note the probability is the square of the corresponding coefficient, thus such a TN state is also called the Born machine [35].

Appendix B: Ambiguous correlations of information in TNCS
Another immediate question about TNCS is how to determine the samples (denoted by A) for training the Born machine |Ψ , and what are the relations to the information (denoted by B) that can be transferred or reconstructed through |Ψ . Obviously, we have A ⊆ B. The size of the complementary set C = B − A characterizes the generalization power of the Born machine.
Evidently, C has to be "ambiguously" correlated to A somehow. Let us consider an extreme situation, where all training samples in A are formed by uncorrelated random numbers. The trained state |Ψ is an entangled state. However, such a state obviously cannot be used to effectively transfer a random image as no correlations exist between the random image and the state.
In this work, we choose A and B as the training and testing images of the same dataset, respectively. For instance, A and B are handwritten digits "3" or images of dresses. Although the "microscopic information" (pixels) of all the images in A and B are different from each other, a human being can recognize the "macroscopic information" of each image as a digit "3" (or a dress) without any problem. This suggests that A and B (thus A and C) must be correlated somehow. In other words, we here ensure the existence of the "ambiguous" correlations between A and B by the "macroscopic" information.
With the TN machine learning, we can define the "ambiguously" correlation in a relatively more rigorous way: A and B are "ambiguously" correlated if the Born machine trained by A can accurately recognize the data in B. For instance, one may train two Born machines by the "3" and "4" images in the training set, respectively, and construct a classifier that accurately recognizes "3" and "4" images [49]. To classify an image in B or the testing set, one compares the probability of have this image in the two Born machines, and classification is given by finding the largest probability.
The above recognition scheme can give us many useful information. For instance, the Born machine trained by the "3" images can be used to implement the TNCS for an image "3" written by the reader, as long as it can be recognized by the Born machine. Obviously, the TNCS cannot be implemented by the Born machine of "3" if the reader writes a "4". How to more rigorously characterize and quantify such ambiguous correlations is an important issue to TNCS. One direction is to develop more universal classifiers for pattern recognition (not limited to digits or some certain kind of data). This will also be helpful to further understand and model the recognition process.
Appendix C: A simple example to understand entanglement-ordered sampling protocol To explain why the entanglement-ordered sampling protocol (EOSP) works, let us consider the following four-qubit state as an example, Such a state can describe a dataset of four images (0, 1, 0, 1), (0, 1, 1, 0), (1, 0, 0, 1), and (1, 0, 1, 0), with the probability P = 1/8, 3/8, 1/8, and 3/8, respectively. If Alice wants to send two pixels and encode the rest two in the state, the pixel that Alice should firstly choose is obviously the first (or the second) pixel. Since the first two qubits are in the maximally entangled state, one of the pixels can be determined by knowing the other pixel. The second pixel Alice chooses should be the third or the forth one. These two qubits are entangled (but not maximally), thus knowing one of them will gain certain (but not the full) information of the other. In all, Alice should send the first (or second) and the third (or the forth) pixels to Bob.
The EOSP gives the same answer. The SEE of |ψ satisfies S ent 0.562. In the step 1 of the EOSP, Alice chooses the first or the second pixel. The reduced density matrices satisfyρ 1 =ρ 2 = I/2, with I the 2 × 2 identity. Therefore, Alice decides to measure the first qubit by |0 0| or |1 1|. In either case, the resulting three-qubit state will be ) with x = 0 or 1. In the second iteration, Alice has S ent 2 = 0 and S ent 3 = S ent 4 0.562, thus she decides to send the third (or forth) pixel. In comparison, Alice will choose to send the first and second pixels according to the variance, which is not a good idea since Bob will not be able to gain any information about the third and forth pixels. Again, we would like to emphasize that this example is to help understand EOSP; it is too simple to draw any general conclusions about the advantages/disadvantages of quantum methods over classical ones.

Appendix D: Quantum nature in TNCS
With N f = #{x sent } = 0, Bob will randomly generate an image according to the probability distribution give by |Ψ . If the post-selections are used, the result will approach to the separable state that has the minimal distance to |Ψ . This separable state gives the image that has the maximal probability in the probability distribution. We dub such an image from no known pixel as the quantum average. One |Ψ gives one unique quantum average (we assume that allρ n 's have nondegenerated eigenvalues). As shown in Fig. A1, the quantum average is different from the simple averagex n = i x i,n /K where no correlations are considered. Correlations (and entanglement) are considered in the quantum average when calculating the reduced density matrix. Fig. A2 shows which pixels are selected in EO and VO with different values of N f . To illustrate the orders, we mark a pixel redder than those pixels that are behind this pixel in the order. Both EO and VO manage to capture the general shapes. Particularly, the "checker-board" pattern appears in EO with relatively large N f . This brings higher efficiency for the following reason. Since each two nearest-neighbor pixels should possess a strong correlation, the corresponding qubits are expected in a highly entangled state. It means that one only needs to know the information of one qubit (pixel) to access the information of the other qubit (pixel). Taking the maximally entangled two-qubit state |01 + |10 as an example, if one knows that the first qubit is in the state |0 (or |1 ), meaning that the first pixel x 1 = 0 (or x 1 = 1), one will know that the second qubit is in the state |1 (or |0 ), meaning that the second pixel x 2 = 1 (or x 2 = 0). In this case, one only needs to send the information of one of the pixels, and the rest will be obtained from the state.
Intuitively, both the quantum entanglement and the (classical) variance measure the amount of the carried information. For instance, considering a pixel (labeled as n) that is always black in all the training images, such a pixel obviously carries no information, and we have S n = V n = 0. On the other hand, if a pixel changes dramatically with the training images, not necessarily but normally, this pixel may contain more information, and we will have large S n and V n . One essential difference is that S n and V n are properties from the quantum state and the classical data, respectively. In our case, the quantum quantity (EO) outperforms the classical one (VO), providing an evidence of the quantum advantage in the TNCS.
However, we cannot state here any general quantum advantages over classical information with these two specific methods. As we stated before, EO considers certain non-local properties while VO is purely local. Nevertheless, TNCS indeed provides a new path to investigate quantum advantages over classical information techniques. Several important and interesting questions are to be investigated, such as how to define new (classical or quantum) quantities that better suppress the compression ratio and/or increase the accuracy. Possible choices include the (classical) co-variance of the training data, the (quantum) correlation functions from |Ψ , and the multipartite entanglement. The performance of both quantum and classical methods for selecting {x sent } need to be pushed to their limits to discuss more clearly about the possible quantum advantages.
Appendix E: Security of TNCS with quantum many-body states In the scenario depicted above, TNCS can be potentially used to securely send information via quantum many-body states. The information is secure under the assumption that |Ψ cannot be intercepted or replaced without letting Alice or Bob know, and those without |Ψ cannot reconstruct the full information solely from N f N pixels. With a subset of the pixels of the image sent by Alice, Bob will need many copies of the state (the generative model) to project the corresponding qubits onto a product state according to the values of the sent pixels. An attacker Eve may just intercept the entire communication to get the information intended for Bob. Eve may also be able to hide her presence by sending Bob the same pixels she intercepted and certain new many-body states, e.g., the product states, which encode the image she received. As we argue below, this interception attack of Eve can be detected if the sent states are highly entangled.
Let us remind the readers that security in the "standard" quantum cryptographic protocols relies on shared non-locality in the Ekert's scheme [50], or on the entanglement of the effective shared two-party density matrix (c.f. [51,52]) in the prepare&measure schemes like the Bennett-Brassar one [53]. Since in the present situation, Alice sends to Bob many copies of a quantum many-body state, she may restrict herself to sending a (moderately or highly) entangled state. Our scheme belongs to such a case according to the entanglement properties of the generative models shown in Fig. 3. Moreover, for Bob, who has the access to many copies of this state, he can measure and detect this entanglement using many entanglement witnesses [54], for instance the ones aimed at measuring the many-body Bell correlations with pair-correlators (c.f. [55][56][57][58][59]). The witness operators for these protocols might be even announced publicly.
If Eve intercepts the communication, she needs to send to Bob the states having the same entanglement properties: not too weak, but also not too strong. Thus she has to measure what she needs to send on several, say initial copies, and replace them by something completely unrelated. Bob by performing the same measurement on these unrelated copies should be able to detect interception. Of course, this way of assuring security is not easy or cheap, but in a sense it consists in using necessary methods to characterize the sent state, which might be useful and even necessary for its application for other quantum protocols (simulation, computation, metrology, and etc.).
In this scheme, there is a possible leak of the information from {x [ Since the information to be sent is not restricted to the data that train |Ψ , Alice can provide previously the copies of |Ψ to multiple parties, and send any piece of "ambiguously" correlated information to each party anytime afterwards. Different pieces of information can be sent via the copies of the same state.
Meanwhile, Alice does not allow other parties to access the coefficients of |Ψ , to guarantee herself as the only provider of the state. One potential risk is that Alice provides too many copies of |Ψ to others, with which the coefficients of |Ψ can be cracked by, e.g., quantum state tomography [60]. In our case, this risk should be low since N is large. Specifically, as the state is designed in the form of MPS, Alice should avoid to send out polynomially many states or more [61].
In the scenario discussed above, Alice sends a small part of the classical information {x [ } classically. The disadvantage is that the qubits with Alice and the receivers need to be kept entangled in certain distance until Alice implements measurements on her qubits. The discussions about security given above also apply to this scheme.

Appendix F: Efficiency of TNCS with quantum many-body states
It is not obvious whether the TNCS with quantum manybody states could exhibit any advantages over the one using classical computers from the perspective of efficiency. Let us try to give some preliminary discussions by considering the total number of qubits in the copies that Bob needs to reconstruct the information by measurements, and the number of bits to send the state in the form of MPS classically. For the quantum-state scheme, we consider that Bob uses one-shot measurement for generating {x rest } for simplicity. Therefore, the number of the copies of the state |Ψ that Bob need is determined by how efficiently |Ψ can be projected to the targeted state |Φ [Eq. (2)].
The probability of projecting |Ψ to |Φ can be estimated as with |Φ = xn∈{x [sent] } s(x n )|Ψ the measured "state" without normalization. If Bob has M copies of |Ψ , the probability of obtaining at least one |Φ satisfies To further simplify our discussion on estimating P , let us consider the following "ideal" case. We assume that the im-  With this number of copies and supposing N i ∼ O(10 3 ), the probability of having at least one expected projected state is high.
We shall stress that the case with the fashion-MNIST dataset (and most other real-life cases) is much more complicated than the ideal one discussed above. The images are gray-scale (or colorful), where the number of values each pixel can take is much larger than 2. The images to be sent are normally not restrained to the training set. Consequently, | Ψ|φ i | 2 will probably suffer from large fluctuations for, e.g., different images (indexed by i) and different datasets. Meanwhile, it is unclear if any tricks (e.g., state tomography) can be used to reduce the number of needed copies. Therefore, systematic investigations and thorough simulations are needed to draw any conclusions, which we leave for our future study.
Appendix G: Images generated by TNCS Fig. A4 provides some reconstructed images with different numbers N f of {x sent } to give some intuitive impressions of how the accuracy increases with N f . The pixels {x sent } are picked in three different orders (EO, RO, and VO). Take the reconstruction of a dress image as an example (last three rows in Fig. A4). The quantum average (N f = 0) is quite different from the image to be sent. With only N f 5 known pixels picked by EO, the sleeves emerge. In contrast, the sleeves appear until 50 pixels are known if they are picked randomly. For the VO, the sleeves also emerge with 5 pixels but in a bad shape. The shape of the sleeves is reconstructed with N f 20 in VO to a similar quality as N f 5 in EO. The length of the sleeves is corrected with N f 50 for EO and N f 110 for RO and VO.