Data compression for quantum machine learning

The advent of noisy-intermediate scale quantum computers has introduced the exciting possibility of achieving quantum speedups in machine learning tasks. These devices, however, are composed of a small number of qubits, and can faithfully run only short circuits. This puts many proposed approaches for quantum machine learning beyond currently available devices. We address the problem of efficiently compressing and loading classical data for use on a quantum computer. Our proposed methods allow both the required number of qubits and depth of the quantum circuit to be tuned. We achieve this by using a correspondence between matrix-product states and quantum circuits, and further propose a hardware-efficient quantum circuit approach, which we benchmark on the Fashion-MNIST dataset. Finally, we demonstrate that a quantum circuit based classifier can achieve competitive accuracy with current tensor learning methods using only 11 qubits.


I. INTRODUCTION
The rapid development of quantum computers has spurred proposals for quantum speedups in many fields, not least for applications in machine learning.One direction can be summarized as quantum-enhanced machine learning, where quantum algorithms are applied to classical data [1].Exploration in this direction has led to various quantum machine learning algorithms [2][3][4][5][6][7].In certain settings, the use of fault-tolerant quantum computers provides a provable advantage over classical approaches [8,9].However, the significant resource cost of these methods makes finding practical methods for noisy intermediate-scale quantum (NISQ) devices a significant priority.
A recurring problem is the loading of classical data into quantum machine learning algorithms.A typical approach is to represent the data as a quantum state, which can be done in multiple ways.For example, one could encode a black and white image by mapping the classical bits with value 0 and 1 to the corresponding qubit (quantum bit) states of the quantum computer [10,11].While conceptually simple and easy to implement with a single layer quantum circuit, even modestly sized images with hundreds of pixels-such as Fashion-MNIST [12]-would be well beyond the qubit capabilities of current devices.By leveraging quantum entangling operations, it is also possible to encode the image in a state of a logarithmic number of qubits [13,14].For the same example of Fashion-MNIST, this requires a much more manageable 10 qubits, but the resulting circuit is too deep for the current fidelity of gates and qubit coherence times.
In parallel, tensor network methods have been applied to basic data compression and machine learning tasks with near state-of-the-art performance [15][16][17][18].These methods represent quantum states as a product of tensors with cutoff parameters -bond dimensions -that allow for systematic approximations of the quantum state by limiting the quantum entanglement.Importantly, a large class of these tensor networks known as matrix-product states (MPS) [19,20], can be directly mapped to quantum circuits with a depth that scales polynomially in the number of qubits and the bond dimension [21][22][23][24][25].This contrasts with the exponential scaling with number of qubits for exact parameterizations of generic states.
In this work, we resolve the problem of the loading of classical data by introducing a quantum data-encoding scheme that provides control over both the number of qubits and the quantum circuit depth.We achieve this in two steps.First, we exploit the mapping between matrixproduct states and quantum circuits to map each image into an MPS.We control the depth of the corresponding circuit via the bond dimension of the MPS, and we control the number of qubits by splitting the image into patches (where each patch is encoded as an independent MPS).We test this encoding by using an MPS-based classifier on the Fashion-MNIST dataset.This MPS-based approach can already then be directly implemented on a quantum computer.Second, we propose a hardwareefficient quantum circuit compression, which similarly allows for control over both the number of qubits and the circuit depth.In this case, however, the compression method is not limited by the entanglement of the quantum state in the same way as MPS.We demonstrate that a hardware-efficient quantum circuit classifier can achieve competitive accuracy on the Fashion-MNIST dataset using only 11 qubits.These two efforts together provide a scalable method to tune classification accuracy on quantum devices according to available hardware.

II. IMAGE CLASSIFICATION USING MATRIX-PRODUCT STATES
In this section, we describe the MPS approach for machine learning, including the data-encoding scheme and the classifier.We focus on the task of image classification on the Fashion-MNIST dataset, which contains 60000 training images and 10000 test images from ten label classes.In our experiments, we resize the default Fashion-MNIST images using a bilinear interpolation from 28 × 28 to 32 × 32 to facilitate the patching procedure that we introduce in the following section.

A. Data encoding
A standard way to encode classical images in a quantum system is the so-called flexible representation of quantum images (FRQI) [13,26], in which N pixels are encoded using log 2 N + 1 qubits.Each N -pixel grayscale image is viewed as a flattened N dimensional vector (p 0 , • • • , p N −1 ) with pixel values p i ∈ [0, 1].This vector is then encoded to the following quantum state.
The first log 2 N qubits, which we refer to as address qubits, label the pixel locations, i.e., the computational basis states |x correspond to binary representations of the location in the N -dimensional array, see Fig. 1a for a schematic.The remaining qubit, which we refer to as the color qubit, encodes the pixel value or brightness.This encoding is similar to amplitude encoding [10,14,27], but the use of the color qubit allows for an absolute intensity scale for the image which is lost due to normalization of the state in amplitude encoding.Several quantum image processing algorithms that exhibit quantum speed-ups also rely on FRQI [28,29].By convention, we enumerate the pixels following a snake pattern as depicted in Fig. 1a.
The FRQI uses quantum entanglement between the address and color qubits.For a generic image, this will require a circuit depth polynomial in N [13,25].Although FRQI uses only log 2 N + 1 qubits to encode an N -pixel image, the hardware requirements shift from the qubits to the gates.To address the circuit depth, we instead use an approximate compressed representation based on matrix-product states (see App.A for a brief review on MPS).The bond dimension, χ, of the MPS limits the quantum entanglement and thus controls the accuracy of the approximation, as illustrated in Fig. 1c.Importantly, there exists a direct mapping between MPS and sequential quantum circuits [21][22][23][24][25], as outlined in App. A. The circuit depth scales linearly in the number of qubits and polynomially with χ.The FRQI can thus be coupled with the MPS approximation to reduce the circuit depth to O(poly(χ) log 2 N ). .
To control the number of qubits, we can divide our image into patches and encode each patch independently using the FRQI (see Fig. 1).If we split the image into N p patches, the encoding scheme requires ( log 2 N/N p + 1) N p qubits.Taking N p = N means each pixel is encoded in a single qubit, as considered in Ref. [15,30].The encoded image is a product state of N qubits in the states |ψ x = cos πpx 2 |0 + sin πpx 2 |1 .We refer to this as the single-pixel limit.This patching procedure allows us to interpolate between the FRQI limit and the single-pixel limit.

B. The MPS classifier
To classify the different images, we train an MPS classifier [15,16] with dimension 2 physical legs and a single additional dimension L "label" leg, where L is the num- ber of labels (L = 10 for Fashion-MNIST).We contract each image MPS with the classifier MPS; the element with largest amplitude in the resulting length L vector is the predicted label.This contraction method is shown in Fig. 1b.In our experiments we use the Adam optimizer [31] with learning rate 10 −4 and batch size 128see App.B. In Fig. 2a we show the test accuracy obtained when using our MPS data compression for various numbers of patches and bond dimensions χ img , where we fixed the classifier bond dimension χ class = 10.We achieve performance comparable with state-of-the-art tensor network methods, but at a fraction of the hardware requirements.In particular, Ref. [16] achieved test accuracy of approximately 88% on the Fashion-MNIST dataset by assigning a single qubit to each pixel, which would cost 784 qubits using the original 28 × 28 images.As shown in Fig. 2, we can achieve similar accuracy with relatively shallow circuits (i.e., bond dimension 2 − 3) and only 64 qubits, corresponding to the 2 × 4 patch case.
Additionally, we find that increasing both the bond dimensions χ img (number of gates) and the number of patches (number of qubits) improves the test accuracy.Notably, the accuracy as a function of the image bond dimension plateaus at a different point for each number of patches.This suggests that the number of patches (and number of qubits) is important for improving the accuracy.Since our method allows us to tune both parameters, it allows us to find an optimal compression of the image that respects the limitation of the device.
Fig. 2b shows the dependence of the test accuracy on the classifier bond dimension χ class , for a fixed χ img .We find that beyond χ class = 10 increasing the bond dimension has a relatively small impact on the classification accuracy for most choices of patching.We similarly observe that increasing the number of patches increases the accuracy in all cases.

III. CLASSIFICATION USING QUANTUM CIRCUITS
In this section, we describe an approach to quantum machine learning based on parameterized quantum circuits [32][33][34].We use sequential circuits to both encode the classical data and to implement the classifier.The sequential circuit structure is inspired by the preceding MPS approach but is specifically tailored for the local and pairwise connectivity of many quantum computer realizations, and so we refer to them as hardware-efficient.

A. Quantum data encoding
When mapping the MPS based approach of the previous section to a quantum circuit, we are left with a circuit depth that scales polynomially with χ.This is because the mapping entails a sequence of multi-qubit gates where each gate acts on log 2 χ + 1 qubits, which must subsequently be decomposed into the two-qubit gates and rotations implemented on physical devices (see App. A).We propose an alternative circuit structure for encoding the classical data as a quantum state, which consists of M img layers of sequentially arranged two-qubit gates, as shown in Fig. 3a.These gates are parameterized and optimized such that the resulting state has maximal fidelity with the exact encoding of the state.Note that since the pixel values are implicitly contained in the probabilities for measuring each of the computational basis states, the optimization can in principle be performed without computing overlaps of states on the quantum computer.We open-source these processed circuits at https://zenodo.org/record/6562229.
In Fig. 3b, we display a sample scaled 32 × 32 image compressed using the sequential ansatz for M img = 1, 2 and 3, using the FRQI encoding on 11 qubits.We can additionally include the patching procedure to control the number of qubits used.However, due to the computational cost of simulating the quantum circuits, we restrict ourselves to a single patch N p = 1.We use the Adam optimizer to obtain the optimal circuit compression.
The sequential circuit structure that we use is a subclass of MPS with bond dimension χ = 2 Mimg [21-25], as explained in App. A. To generate entanglement entropy S ∼ log χ requires exponentially fewer parameters in our quantum circuit.Conversely, for the same number of parameters, our quantum circuits generate more entanglement.

B. Quantum circuit classifier
To classify the encoded images, we similarly use a hardware-efficient sequential circuit with M class layers, as shown in Fig. 3a.It is possible to directly implement the MPS classifier in Sec.II B as a quantum circuit but with two undesirable features.The first is that, similarly to the state, the circuit will consist of multi-qubit gates set by the bond dimension.The second is that this approach requires projections for some of the qubits.The result is that the number of shots (runs of the circuits) required to accurately measure the classification outcome scales exponentially with the number of qubits used.
To classify the images we measure the four right-most qubits in Fig. 3a.Of the 16 = 2 4 bit string outcomes, the first 10 correspond to the classes for our images.The classification is made by taking the bit string with highest probability.Note that following our sequential layers we include three additional gates before measuring, as shown in Fig. 3a.These ensure that information can propagate from the bottom color qubit to all measured qubits.For M class ≥ 3 these extra gates are not required but improve the accuracy of the classification.
We report the test accuracy achieved on the Fashion-MNIST dataset using our quantum circuit approach in Fig. 4. As we increase the number of layers in the encoded image state M img , we see a significant increase in the classification accuracy.We additionally include the results for the exact state encoded using FRQI, which are quickly approached by increasing the layers in the image encoding.Moreover, as a function of M class the accuracy appears to plateau for small values.This shows that with only a modest number of layers in both the state and the classifier we can achieve competitive classification accuracy.This demonstrates that our method facilitates effective classification with resource requirements that are realistic for NISQ quantum computers.
We also note the dashed red line in Fig. 4, which corresponds to an MPS experiment with χ class = 16 and χ img = 4.The circuit contains far fewer parameters but nonetheless achieves competitive accuracy for the M img = 3 case.

IV. DISCUSSION
In this paper, we proposed encoding and compression schemes for processing classical data on NISQ devices.Our approach provides the control over the required physical resources, namely the number of qubits and the circuit depth.Furthermore, we demonstrated that using hardware-efficient circuits for both the data encoding and classifier, we can achieve competitive accuracy on the Fashion-MNIST dataset.Having established the capabilities of hardware-efficient circuits on image classification problems, the protocol we use in our MPS experiments provides a straightforward method to scale accuracy to hardware availability.
We note that the investigation of patching for the quantum circuit case would be significantly more difficult on a classical computer; the clock-time required to compress the full then subsequently optimize the highly entangled sequential circuit is prohibitive.For a small number of layers, MPS based methods could be used, but these also become infeasible as the number of layers increases.On the other hand, the patching can be efficiently implemented on a near-term quantum device.
The quantum circuits are shallow representations capable of efficiently encoding long range entanglement.We contrast the circuits with matrix product states, which are ideally suited to encode locally entangled states.It is interesting to consider whether quantum advantages can be achieved exploiting different ways of encoding entanglement, depending on the learning task and dataset.
A natural extension of our work is to consider various other circuit and MPS structures, such as brickwallpatterned circuits, MERA [36], and higher dimensional variants.One could also consider hybrid architectures where neural networks act as autoencoders that preprocess the inputs to the quantum architecture.We also note that although the best image recognition methods on Fashion-MNIST typically achieve performances of 96% [37,38], they require several million parameters, which we contrast with the several thousand that large MPS and hardware efficient circuits would require.
Furthermore, while we discussed two methods for classification, our image compression scheme can be used more generally.Improvements to the quantum classifier, for instance by incorporating additional structure or matching the connectivity of NISQ devices, remain interesting open questions.Additionally, as with any quantum optimization problem, a realistic algorithm should take into account the effects of gate errors and decoherence.Nevertheless, the approach we introduce allows for practical machine learning tasks to be performed with realistic quantum resources, requiring as few as 11 qubits.The encoded Fashion-MNIST images can be used as a quantum dataset for benchmarking quantum classifiers.By providing control over the number of qubits and circuit depth, we have introduced a flexible image encoding approach for the NISQ-era and beyond.

FIG. 6. (a)
The mapping between an MPS tensor and a unitary matrix.The MPS tensors are used to define a unitary action on an additional set of qubits initialized to a product state, where the number of additional qubits depends on the bond dimension of the MPS.(b) An MPS with bond dimension χ is equivalent to a quantum circuit where each gate acts on log χ + 1 qubits.(c) Every M -layer sequential quantum circuit with two qubit gates is a subset of the set of sequential quantum circuit with (M + 1) qubit gates.Here, we show the equivalence for M = 3.
An MPS with bond dimension χ where all the tensors satisfy Eq.A2 can be exactly mapped to a sequential quantum circuit with unitaries acting on log χ + 1 qubits, as shown in Fig. 6.For practical implementations, each unitary gate must be further decomposed into single and two-qubit gates.For a generic quantum gate acting on log 2 χ+1 qubits, this requires O(poly(χ)) single and twoqubit gates [39], resulting in a total cost of O(poly(χ)n) quantum operations.
The mapping, which is diagrammatically depicted in Fig 6a, is given by where |0 k is a product state.We refer the reader to [25] for more details.On the other hand, a sequential quantum circuit with M layers of two-qubit gates can be viewed as an equivalent sequential circuit with a single layer of M + 1 qubit gates (see Fig 6b).This circuit, in turn, can be mapped to an MPS with bond dimension χ = 2 M .Every singlelayer circuit thus has an exact χ = 2 equivalence.

Appendix B: MPS training
As discussed in the main text, we train using the Adam optimizer with learning rate 10 −4 and a minibatch size of N b = 128.We trained for 3000 epochs for most cases For training, we made use of the Jax library [40].We used Pytorch [41] to load and transform the datasets.
For training, we used a log softmax cross entropy loss function with L 2 regularization.
Given the classifier output vectors and their corresponding labels within a minibatch (together denoted by {(s, t)}), the loss function is defined by B1) where the sum is over all tuples of classifier output s and the correct label t for the corresponding image.In the above equation, s j is the j th element of vector s, w i are the weights in the classifier MPS, C is a constant used to avoid vanishing gradients, and N b is the minibatch size.In our experiments, we set λ = 10 −4 and C = 1.
We initialized our classifier MPS using stacked identity matrices with Gaussian noise centered at 0 with width 10 −4 .For the most part, the choice of initialization had minimal impact on the training, but the random noise needed to be sufficiently small to prevent exploding loss functions stemming from exponential buildup due to the sequential nature of a tensor network.This was occa-sionally an issue in training as well; we resolved it by factoring out the norm of the tensor network as needed, since we ultimately only cared about the relative values of the output prediction vector.The training accuracy is shown in Fig. B. We note that the plot is not monotonic.The classifier MPS reaches a point of maximal accuracy.For short matrix-product states, this leads to overtraining.For longer matrix-product states, this leads to some degree of overtraining, but eventually the variations in the Adam optimizer build until the output explodes (again because of the sequential nature of an MPS, a small change in the tensors will build exponentially).The degree of overtraining in the longer matrix-product states is thus somewhat variable; we note that the test accuracy in  σρ ⊗ σγ , with the matrices σ0 = I, σ1 = σx , σ2 = σy and σ3 = σz .We set θ 0,0 = 0 to fix the phase degree of freedom of the gate.An input image state |ψ img is classified by feeding it to a circuit classifier Û , then measuring a subset of qubits L, which we call the label qubits.We denote the remainder of qubits in the system by S. The measurements yield a probability vector s, with elements given by where i S and i L denote the bitstrings corresponding to the computational basis states for the respective qubit sets.Note that the probability is normalized, i.e., i L s i L = 1.The prediction is given by arg max i L s i L .Because Fashion-MNIST contains 10 label classes, we use four label qubits.This outputs a 2 4 = 16 length vector, and we disregard the final six bitstrings.
In training our circuit model, we used the same loss function as in Eq.B1.We did not use any regularization (λ = 0), and chose C = N (the number of pixels).We use minibatch size N b = 100 for 1600 epochs, and train using the Adam optimizer with learning rate 8 × 10

FIG. 1 .
FIG. 1.(a) Our encoding scheme consists of splitting the image into patches, each of which is encoded in a quantum superposition state consisting of address qubits and a color qubit, as defined in Eq. (1).(b) In the MPS learning protocol, each patch is separately encoded into an MPS consisting of tensors with physical dimension 2 corresponding to address (circle) qubits and the color (square) qubit.These MPS are concatenated and contracted with an MPS-based classifier (purple) with fixed bond dimension.An additional dimension 10 classifer leg, represented by three lines, is used to classify the Fashion-MNIST images.(c) An example encoded Fashion-MNIST image with bond dimension χimg = 4 for different numbers of patches, compared with the original uncompressed image.(d) The compressed image as a single single patch with varying bond dimension χimg = 1, 2, 4, 8..

FIG. 2 .
FIG. 2. (a)The test accuracy for classification using MPS as a function of the image bond dimensions χimg for fixed classifier bond dimension χ class = 10.Each curve corresponds to a different number of patches, see Fig.1.We report the average of the best 100 test accuracies to limit stochastic effects.The numbers in parentheses in the legend are the required number of qubits on a quantum computer.The dashed red line shows the single pixel limit, which is an exact encoding of the image with χimg = 1.(b) The test accuracy as a function of the classifier bond dimension χ class , for fixed image bond dimension χimg = 4.

FIG. 3 .
FIG. 3. (a) The quantum circuit encoding and classification scheme using 11 qubits.We compress each image using Mimg layers of sequential circuits consisting of two-qubit gates.The address and color qubits are indicated by circles, and a square, respectively, similar to Fig. 1.We classify each image with M class layers of sequential two-qubit gates (the case Mimg = Mimg = 2 is shown for illustration).Before the final readout, three additional trainable two-qubit gates are added.(b) Compression of a selected image using Mimg = 1, 2, 3, compared with the uncompressed image.

FIG. 4 .
FIG.4.The test accuracy using hardware-efficient quantum circuits for image compression and classifier on the Fashion-MNIST dataset.We again report the average of the best 100 converged iterations to limit stochastic effects.The accuracy is plotted against the number of layers M class in the quantum circuit classifier.We show the results the images compressed using Mimg = 1, 2, 3, compared with the exact uncompressed images.For reference, we show the accuracy we achieve for the χimg = 4 single patch MPS case with χ class = 16.

FIG. 7 .
FIG. 7. (a) The training accuracy as a function of the image bond dimension.(b) The accuracy as a function of classifier bond dimension.
Fig 2 is much cleaner, and ultimately is the important property.