Ensemble-learning error mitigation for variational quantum shallow-circuit classifiers

Classification is one of the main applications of supervised learning. Recent advancement in developing quantum computers has opened a new possibility for machine learning on such machines. Due to the noisy performance of near-term quantum computers, error mitigation techniques are essential for extracting meaningful data from noisy raw experimental measurements. Here, we propose two ensemble-learning error mitigation methods, namely bootstrap aggregating and adaptive boosting, which can significantly enhance the performance of variational quantum classifiers for both classical and quantum datasets. The idea is to combine several weak classifiers, each implemented on a shallow noisy quantum circuit, to make a strong one with high accuracy. While both of our protocols substantially outperform error-mitigated primitive classifiers, the adaptive boosting shows better performance than the bootstrap aggregating. The protocols have been exemplified for classical handwriting digits as well as quantum phase discrimination of a symmetry-protected topological Hamiltonian, in which we observe a significant improvement in accuracy. Our ensemble-learning methods provide a systematic way of utilising shallow circuits to solve complex classification problems.


I. INTRODUCTION
Machine learning, as a method in which computers learn patterns within data, has revolutionized almost all aspects of our lives [1].Classification algorithms are among the most important types of machine learning tasks with a wide range of applications in finance, business, industry, marketing, and scientific research [2,3].In these algorithms, all data are divided into a few discrete classes that contain elements with certain common features.So far, numerous classification algorithms have been developed, such as logistic regression [4], decision trees [5], k-nearest neighbors [6], support vector machines [7], and neural network classifiers [8].The sophistication that big data brings to the training process may decrease the accuracy of algorithms.To overcome this, one can adopt ensemble-learning methods in which several classifiers are combined to make a stronger one with higher prediction accuracy.The most prominent ensemble-learning methods are Bootstrap Aggregating (Bagging) [9] and Adaptive Boosting (AdaBoost) [10,11] which have been developed in classical machine learning context.
In this paper, we develop two ensemble-learning error mitigation techniques, namely Bagging and AdaBoost, for VQCs to combine a few weak quantum classifiers and make a strong one with enhanced accuracy.This allows to use of shallow circuits for each of the classifiers and improves noise resilience.We find that both of the proposed protocols significantly outperform the ZNE method in classification tasks.In the two protocols, the AdaBoost shows stronger performance, namely higher accuracy and more noise resilience, than the Bagging.Since our ensemble learning algorithms can achieve accurate classification with only shallow circuits, they are NISQ-friendly and feasible for applications using existing quantumcomputing technologies.This makes them very distinct from other quantum versions of ensemble-learning algorithms [82][83][84][85][86] which utilize deep-circuit quantum subroutines, such as quantum phase estimation [87], quantum means estimation [88,89] and Grover search [90] algorithms, to speed up the training process and reduce the sample complexity.
We noted that a plain ensemble learning method, namely plurality voting, had been applied to improve the performance of VQCs [91].However, the plurality voting method is mainly used to reduce variance and improve the robustness of the model to data, but can not significantly improve the accuracy of the model.

A. Classification Problems
Classification tasks are types of supervised machine learning problems in which the goal is to predict a discrete class label y, for a given unknown data x.In general, the classifier is trained by a labeled training dataset with M s samples, D={(x i , y i )} Ms i=1 , where T represents an input vector with N f features and y i is the corresponding class label which takes K c different values (i.e.y i ∈ {1, • • • , K c }).After training, the classifier can be described as a map ŷ = f (x) where ŷ is the predicted label for a given input x.For a good classifier, we expect that y = ŷ, namely predicting the correct class label.In reality, our prediction might be wrong for some inputs, nonetheless, the objective is to keep the ratio of wrong predictions as small as possible.
Recent advancements in developing quantum computers have opened a new territory for exploiting such machines for solving classification problems.In this case, apart from solving conventional classification problems, which deal with classical datasets, one can also consider quantum datasets D={(|x i , y i )} Ms i=1 , where |x i is a quantum state to represent the input features and y i is the corresponding class label which takes K c different values.The inherent nature of quantum datasets D justifies the use of a quantum classifier as no classical counterpart can be used for such data.The situation is, however, very different for classical datasets as it is still an open question whether the full capacity of quantum computers can be exploited for the classification of classical data.
In general, one constructs a classifier f (x) by training it on dataset D, which can be either classical or quantum.We denote the accuracy of f (x) as 1−e where e is error rate as where I( Variational Quantum Algorithms (VQAs) are the most promising approach for achieving quantum advantage on NISQ computers.In these algorithms, the complexity is divided between a quantum circuit and a classical optimizer.Therefore, even a shallow quantum circuit might be sufficient to achieve a complex task.Recently, VQAs have also been used for developing quantum classifiers for both classical [38][39][40] and quantum [40,41,73] datasets.Nonetheless, in the NISQ era, developing new techniques for mitigating the effect of noise is essential for scaling up the classification algorithms to deal with more complex datasets which normally demand larger numbers of qubits and deeper circuit depths.
In this work, we focus on Variational Quantum Classifiers (VQC).The VQC circuits contain three parts: encoding circuit, parameterized circuit, and measurement.The schematic representation of the circuit is shown in Fig. 1(a).For quantum datasets, the encoding circuit is not needed as the data can directly be fed into the parameterized circuit.For classical datasets, however, the input data x i has to be encoded into a quantum state |x i .Amplitude encoding is the most efficient way for converting classical data into a quantum state with an exponential advantage through mapping N f features into x (θq3), shown in the lower panel, and then a series of two-qubit controlled-not gates act on nearest neighbor qubits.The whole parameterized circuit is repeated D l times.Then the quantum measurement is applied to a few qubits which depend on the number of classes, for obtaining the probabilities of different labels.The label with the largest probability is chosen as the final prediction label.(b) Four typical images of the MNIST dataset which show handwriting digits 1, 3, 5, 7.Each image contains 8 × 8 pixels which are reshaped as a normalized 64-dimensional vector xi as the input data.The dataset has 1541 training samples and 726 test samples with these four digits.
N q = log 2 (N f ) qubits as where ||x i ||= x T i x i is the norm of x i and |j is a quantum state of N q qubits with binary representation of j in the computational basis.This encoding is assumed to be done through a Quantum Random Access Memory (QRAM) module [92][93][94][95].It is worth emphasizing that our protocol does not depend on any specific encoding method and can easily be generalized to other encoders, such as rotation encoding [57,96,97].Therefore, for the sake of brevity, we only focus on amplitude encoding.The output of the encoder is fed into a parameterized circuit that contains several layers.Each layer of the parameterized circuit starts with a series of local rotations q G (q) (θ q ) acting on all qubits with where α (θ)=e −iθσ (q) α /2 (for α=x or z) and σ (q) α is the Pauli operator α acting on qubit q.The single qubit rotations are followed by a series of two-qubit controlled-not gates q U (q,q+1) CX with where I (q) represents identity acting on qubit q.Therefore, the action of the parameterized circuit with D l layers on N q qubits can be described by a unitary operator of the form The schematic of the circuit is shown in Fig. 1(a).The output of the circuit is given by U (θ)|x i .By measuring the last few qubits of the circuit one can determine the class label of the input |x i .In fact, the number of qubits that are measured is determined by log 2 (K c ) .The measurement outcomes can be described by projectors {Π k } Kc , where K c is the number of classes.For instance, for binary classification (i.e.K c =2), only the last qubit is measured and the projectors are given by Π 0 =|0 0| and Π 1 =|1 1|, acting on qubit N q .Similarly, for a four-class problem, one has to measure the last two qubits (namely qubits N q −1 and N q ), and the classes are determined by projectors Π k ∈ {|00 00|, |01 01|, |10 10|, |11 11|}, which act on qubits N q −1 and N q .The probabilities of measurement outcomes are considered as the probabilities of obtaining each class label ŷi , as The label ŷi is determined by the class k whose probability p ik is maximum, namely ŷi =arg max k p ik .The schematic of the procedure is shown in Fig. 1(a).The performance of VQC is evaluated by a loss function described by cross-entropy where T is the one-hot encoding of the true class label y i with only one of the elements y ik , which is the right class label, is 1 and the rest are 0. By using Adam optimizer [98], which is a gradient-based method, one can iteratively update θ in order to minimize the loss function.More details about the training can be found in the appendix section.For an optimal θ * where the loss function converges to its minimum the quantum circuit is trained and can be used for classifying unseen data x Our classifier is considered as a strong one if ŷ=f (x; θ * ) assigns the correct class label to most of the unseen data x.

C. The performance of VQC for Classical Datasets
In order to show the performance of VQC for classifying classical data, we consider the MNIST dataset which contains handwriting digital images with 8×8 pixels (i.e.N f =64 features) [99].Each pixel takes a number between 0 (perfectly white) to 1 (perfectly black).For the sake of simplicity and without loss of generality we only consider odd numbers and thus our classification has four different classes, labeled by digits 1, 3, 5 and 7. A typical image for each of these four classes is presented in Fig. 1(b).The dataset contains 2267 samples from which 1541 samples are used for training and the 726 unseen samples are used for testing the accuracy.We train the circuit shown in Fig. 1(a) with N q =6 for various layers D l .In Fig. 2 we plot both the training and test accuracies as a function of circuit layers D l .Each data point is averaged over 50 random initialization of the circuit parameters.The error bars show the variation of accuracy across these 50 repetitions.Furthermore, both training and test accuracies are very close to each other which shows that the training is not affected by overfitting.Due to this, in what follows, we only report test accuracy as a quantification measure for the quality of our procedure.In addition, the accuracy is improved rapidly up to D l ∼7 layers before entering a slow convergence regime.In fact, one needs D l =12 layers to achieve an accuracy of 0.94 and even for up to D l =16 one cannot still reach an accuracy of 0.95.By increasing the number of layers the error bars decrease indicating robustness against parameter initialization.Note that our quantum circuit is noise-free and all quantum gates operate perfectly.That is why the accuracy keeps improving by increasing the layers.In practice, since gates are imperfect and each of them induces noise in the system the accuracy has a more complex dependence on the circuit depth as will be discussed in the following sections.

D. VQC with ZNE
NISQ quantum computers suffer from gate operations and short qubit coherence times.While single-qubit operations can be achieved with fidelity ∼ 0.999 [100], the two-qubit gates are more susceptible to noise.For the sake of simplicity, in order to simulate the effect of noise in NISQ computers one can consider two-qubit gates as the only source of noise in the system.In this paper, we emulate the effect of noise as a depolarizing channel which affects the operation of controlled-not gates on qubits q and q + 1 as where P quantifies the strength of decoherence.Note that this noise model is very pessimistic as the output is considered to be a maximally mixed state with probability P which means that decoherence kills all the information in the system.In order to reduce the impact of noise in near-term quantum computers, error mitigation techniques [76][77][78][79][80][81] have been developed for post-processing the noisy data.In this paper, we use the ZNE method [76][77][78] to mitigate errors in primitive VQCs in which the zero-noise expectation value of an observable is extrapolated from its values at different noise levels.To achieve this, one has to systematically increase the noise in the system and measure the expectation value of the desired observable at different noise strengths.In our case, since the noise is assumed to be only in controlled-not gates, we can increase the noise strength by gate folding [101]: We replace each controlled-not gate with an odd number of controlled-not gates.Since (U (q,q+1) CX ) 2 =I (q,q+1) , then all odd powers of U (q,q+1) CX is expected to be the same as one controlled-not gate.However, for the noisy operation ξ CX in Eq. ( 9), the multiplication of controlled-not gates induces more noise in the system.We perform ZNE for circuits in which every controlled-not gate is replaced with 1, 3, 5, and 7 consecutive gates which approximately correspond to noise strengths of P , 3P , 5P , and 7P , respectively.The error-mitigated result for P =0 is estimated through third-order polynomial extrapolation.

III. ENSEMBLE-LEARNING ALGORITHMS
Ensemble-learning classifiers have been introduced in classical machine learning literature for enhancing the precision of weak classifiers [102].In these methods, a group of weak classifiers is combined to make a strong classifier with high accuracy.There are several ensemblelearning techniques for classification problems.The most prominent of such algorithms include Bagging [9] and AdaBoost [10,11].
The ZNE method, at best, remove the effect of noise in VQCs, and they usually cannot outperform noise-free quantum computers.Therefore, when error-free shallow circuits are insufficient for accurate classification, the improvement by ZNE is limited.In the following, we adopt two ensemble-learning error mitigation algorithms, namely Bagging and AdaBoost, for VQCs and show how these methods can enhance our classification accuracy.

A. Ensemble-Learning: Bagging VQC
The Bagging algorithm has been developed as one of the most successful ensemble-learning techniques in the context of classical machine learning [9].In the Bagging algorithm, a group of classifiers, each trained independently with different training dataset D i , are combined to make a stronger one.After training, for any given data, all classifiers assign a class label, and the final prediction is decided by a majority vote among all these results.The simplicity of the Bagging algorithm has made it one of the most popular algorithms in classification problems.Here, we show how a Bagging algorithm can be adapted for VQCs.To implement this, we train L c different VQCs with shallow circuits, all with equal layers.The difference between these classifiers is in the initialization of the parameters, which results in different optimal values of θ * .Hence, one gets L c different VQCs, all trained independently.For unknown data x, we use the majority vote among these L c classifiers to assign a class label.Therefore, the final classifier can be described as where f l (x; θ * l ) is a VQC, given in Eq. (8).
To see the performance of the Bagging algorithm, in Fig. 4(a) we plot the test accuracy as a function of L c for two types of noise-free circuits with D l =2 and D l =3 layers, respectively.Each data point is again averaged over 50 random initializations.As expected, in the absence of noise, the performance of the quantum circuit with D l =3 layers always outperforms the circuit with D l =2 layers.More importantly, even for such shallow circuits, the accuracy enhances by increasing the number of classifiers L c such that for L c =10 one can achieve the accuracy of 0.8856 (for the circuit with D l =2 layers) and 0.9173 (for the circuit with D l =3 layers).To achieve a similar accuracy on a single circuit one needs a quantum circuit with D l =6 layers (0.9205), see Fig. 2. Note that these are all for noise-free computers (i.e.perfect controlled-not gates with P =0) and as we will see later the improvement achieved by Bagging becomes even more pronounced in the presence of noise.

B. Ensemble-Learning: AdaBoost VQC
AdaBoost is an alternative ensemble-learning algorithm that is used to improve the accuracy of weak classifiers [10,11].It can be used for those classifiers that slightly outperform a random guess, namely 0≤e≤(K c −1)/K c [11].While in the Bagging approach, the VQCs are trained in parallel (i.e.independently), in the AdaBoost scheme the VQCs should be trained sequentially.We consider L c different quantum classifiers f l (x; θ l ), with l=1, 2, • • • , L c .In the AdaBoost training process, one assigns a proper weight to each input data x i in the loss function.After training each classifier (namely finding an optimal set of parameters θ * ), the weights are updated for training the next one, based on the performance of the last classifier.Hence, the training procedure for the classifiers is interconnected and can only be accomplished sequentially.For simplicity, we assume that these quantum classifiers have the same circuit design with equal depths.However, each classifier starts with a different random initial parameterization θ l and uses a different loss function, depending on the weights.The AdaBoost algorithm pursues the following steps to make a single strong classifier via a proper interconnected training method of these L c classifiers: • Step 1: Initializing the input weights.We assign an initial weight with w 1,i =1/M s to all the inputs in the dataset.
• Step 2: Training the quantum classifier f l (x; θ l ).We train the quantum circuit of the classifier f l (x; θ l ) (initially we start with l=1) using W l and the loss function When training finishes one gets the optimal parameter θ * l which corresponds to the classifier f l (x; θ * l ).• Step 3: Computing error rate.For the trained classifier f l (x; θ * l ) one can compute the error rate as • Step 4: Computing the classifier's weight.
Based on the error rate e l , we can assign a weight to the classifier as Note that for classifiers better than random guess, namely e l < (K c − 1)/K c , the coefficient α l is always positive.
• Step 5: Updating the input weights.For those input data x i that the classifier f l (x i ; θ * l ) fails to estimate the correct class label y i , we increase the input weight w l+1,i .The reason is that during the training of the next classifier, this input data will have more impact on the loss function and thus might be correctly classified by the next classifier.The input weights are updated as where Z l is the normalizing factor The above steps are summarized in Fig. 3(a).Note that the weak performance of the L c chosen classifiers can be due to different reasons such as shallow circuits or noisy gate operations.
To see the performance of AdaBoost VQC, we first consider shallow VQC circuits whose gate operations are perfect (i.e. the controlled-not gates are noise-free with P =0) and thus their accuracy is only affected by the depth of their circuit.In Fig. 4(b) we plot the test accuracy as a function of L c for two types of circuits with only D l =2 and D l =3 layers.As the figure shows, by increasing the number of classifiers the accuracy increases.In addition, 3-layer circuits provide better accuracy in comparison with the 2-layers circuits.This is because, in the absence of noise, a 3-layer circuit naturally performs better than a 2-layer one.One can compare the performance of Bagging and AdaBoost in Figs.4(a  VQC with L c =6 classifiers can achieve an accuracy of 0.95 while the Bagging VQC even with L c =10 classifiers cannot exceed 0.92 accuracy.This is because, during the AdaBoost sequential training, each classifier is provoked to correct the mistakes of the previous classifier through weight updating.In contrast, in the Bagging algorithm, the classifiers are trained in parallel and independent from each other.Therefore, the mistakes are not corrected as efficiently as in the AdaBoost algorithm.

C. Ensemble-Learning VQCs on NISQ computers
In this section, we consider noisy quantum computers in which controlled-not gates are noisy and operate according to Eq. ( 9).The strength of noise is quantified with decoherence rate P , which affects all the two-qubit gates of the circuit equally.We compare four different scenarios: (i) a normal VQC with a deep circuit of D l =12 layers without ZNE; (ii) a VQC with a deep circuit of D l =12 layers with ZNE; (iii) Bagging VQC with shallow circuits of D l =2 and D l =3 layers with various numbers of classifiers; and (iv) AdaBoost VQC with shallow circuits of D l =2 and D l =3 layers with various numbers of classifiers.We first fix the circuit layer D l =2 for Bagging and AdaBoost and plot the test accuracy as a function of decoherence rate P in Figs.5(a)-(c), for L c =3, 6 and 9 classifiers, respectively.As the figure shows, the test accuracy for a normal VQC with a deep circuit of D l =12 decays rapidly as P increases.ZNE can indeed enhance the accuracy for such a deep circuit, but for larger P the decay is still significant.Interestingly, a shallow Bagging VQC with D l =2 can outperform the deep circuit classi-fier even with ZNE when P > 0.06.Increasing the number of classifiers from L c =3 to L c =9 slightly improves the performance of Bagging.Remarkably, the AdaBoost algorithm with even shallow circuits of D l =2 layers can outperform the other scenarios for noise rates of P > 0.02 and remains stably high even for very strong decoherence rates up to P =0.18.
Similarly, one can consider the Bagging and the Ad-aBoost with D l =3 layers in Figs.5(d)-(f) for L c =3, 6 and 9 classifiers, respectively.In this case, the Bagging and AdaBoost outperform deep circuits with ZNE when P > 0.02.As P increases, the performance of Bagging and AdaBoost remains fairly close to each other for L c =3 and L c =6 classifiers.By increasing the number of classifiers L c or noise rate P , again AdaBoost outperforms Bagging.Note that the AdaBoost algorithm is hugely benefited by increasing the number of classifiers due to its interconnected training method, which improves the classifiers based on the mistakes of the previous ones.Another interesting observation is that in very noisy quantum computers, i.e. large P , AdaBoost with D l =2 layers is better than AdaBoost with D l =3 layers.This is because deeper circuits naturally have more two-qubit gates and thus are more susceptible to the effect of noise.
In summary, both our ensemble-learning error mitigation methods, namely Bagging VQC and AdaBoost VQC, provide a significant improvement over conventional ZNE method.This is a general behavior and can also be observed for rotation encoding of the input data (results not shown).Moreover, thanks to its interconnected training method, the AdaBoost algorithm can outperform the Bagging, in particular, when the number of classifiers increases.The AdaBoost accuracy en- hancement over the other methods becomes even more pronounced when the quantum computer is subjected to strong decoherence, namely large P .

D. Classification of Quantum Data
In this section, we apply our ensemble classification methods to a quantum dataset.The input data are quantum states which are taken from the ground state of a chain of N q qubits interacting via Hamiltonian where J is the three-body spin coupling, h 1 is the twobody spin exchange interaction and h 2 is the magnetic field.Note that the three-body interaction term flips a central spin with the addition of a phase that depends on the quantum states of its neighbors.This Hamiltonian commutes with two string operators X odd (even) = i∈odd (even) This implies that the Hamiltonian has a Z 2 × Z 2 symmetry which results in the emergence of a Symmetry-Protected Topological (SPT) phase which is described by a non-local order parameter [104,105].In Ref. [41] the phase diagram of this Hamiltonian has been determined through density matrix renormalization group analysis [103].The Hamiltonian has three different phases as (h 1 /J, h 2 /J) vary, namely antiferromagnetic, paramagnetic, and SPT phases.In the absence of two-body interaction, i.e. h 1 =0, the Hamiltonian becomes solvable via Jordan-Wigner transformation and shows a quantum phase transition from the SPT to the paramagnetic phase at a specific value of h 2 /J.Recently, the phase diagram of this system has also been determined through quantum convolution neural networks [41] which has been experimentally realized in superconducting quantum computers for a system of size N q =7 [66].
Here, we use our ensemble classification methods for determining the phase diagram of the system.The quantum circuit is exactly the same as before, shown in FIG. 6.
The test performance for phase recognition of the ground state of the SPT Hamiltonian ( 17) with 15 qubits as a function of h1/J and h2/J.The upper panels show the performance of Bagging for quantum circuits with D l =4 layers with ensembles of size: (a) Lc=1; (b) Lc=3; and (c) Lc=7 classifiers, respectively.The lower panels show the performance of AdaBoost for quantum circuits with only D l =2 layers with ensembles of size: (d) Lc=1; (e) Lc=3; and (f) Lc=7 classifiers, respectively.The blue and red lines represent the real phase boundaries computed through density matrix renormalization group [41,103].Note that using a single circuit Lc=1 is not really an ensemble learning but we just include it to show how the results improve as the number of classifiers increases.
Fig. 1(a), with one important difference.Since the input is itself a quantum state, the encoder is no longer needed and the quantum state can directly be fed into the parameterized circuit.Similar to the approach of Ref. [41], we only measure the last qubit despite having three phases, i.e. three classes.This method labels the phases as SPT and non-SPT phases.Since anti-ferromagnetic and paramagnetic phases are well separated and have no boundary they can be easily recognized in the phase diagram, as we will see in the following.
First, we focus on the Bagging algorithm for phase recognition of the SPT Hamiltonian with N q =15 qubits using an ensemble of circuits with D l =4 layers.We consider the phase diagram in the (h 1 /J, h 2 /J) plane with the resolution of 64 × 64 pixels.For training the circuit, We randomly select the ground states of M s =400 random samples in the (h 1 /J, h 2 /J) plane, as our training data.In Figs.6(a)-(c) we plot the result of our Bagging VQC for an ensemble of L c =1, 3, and 7 classifiers, respectively.The phase boundaries, computed by density matrix renormalization group [41,103], are plotted by the blue and red lines.As evident in the figures, Bagging VQC can indeed capture the phase diagram and the precision becomes better as the number of classifiers increases.It is worth emphasizing that for shallower circuits with the depth D l < 4 layers, the precision for capturing the phase diagram goes down (results not shown).
In particular, the performance is poor for circuits with D l =2 layers, no matter how many classifiers we use.This shows that increasing the number of classifiers alone cannot compensate the circuit dept.This is because the classifiers are trained independently and their weakness cannot be improved during training.
Second, we also exploit AdaBoost for capturing the phase diagram of the Hamiltonian with very shallow circuits of D l =2 layers.Similar to the previous cases, we use the same circuit as shown in Fig. 1(a) without the encoder part.We use the same dataset that we used for the Bagging algorithm.In Figs.6(d)-(f) we depict the phase diagram of the system using AdaBoost circuits with the depth of D l =2 layers and L c =1, 3 and 7 classifiers, respectively.Note that the AdaBoost can only become effective for more than one classifier.As the figures clearly show, the AdaBoost protocol can indeed determine the phase diagram even with shallow circuits with only D l =2 layers.As expected, the precision is improved as the number of classifiers L c increases.In particular, for L c =7 the phase boundaries between the SPT and the other phases are captured quite precisely.The fact that circuits with only D l =2 layers are enough for recognizing the phase boundaries already shows the superiority of AdaBoost VQC over Bagging VQC.As mentioned before, this is because in AdaBoost VQC the training of classifiers is not independent of each other in such a way that each classifier tries to correct the errors of the previous ones through weight updating.

IV. CONCLUSIONS
We introduced two ensemble-learning error mitigation algorithms, namely Bagging and AdaBoost, for VQCs.These algorithms can significantly enhance the precision of classification using only shallow quantum circuits with very few parameters to train.Our protocols have been tested on both classical (handwriting digits) and quantum (the phase recognition of an SPT Hamiltonian) datasets.Considering imperfect NISQ computers, our ensemble-learning error mitigation methods significantly outperform the ZNE method in classification tasks.Thanks to its interconnecting training approach, which tends to correct the mistakes of one classifier in the training of the next one, the AdaBoost method achieves better accuracy and shows better robustness against noise than the Bagging algorithm.The superiority of AdaBoost over ZNE and Bagging becomes even more prominent when the number of classifiers increases, in particular at the large noise limit.
Our ensemble-learning error mitigation techniques are very general.For classification problems, they are applicable to both classical and quantum datasets and work for both amplitude and rotation encodings.The application of our ensemble-learning error mitigation methods is not limited to classification and can be generalized to other supervised machine-learning problems such as Kernel learning and regression.Moreover, it can also be used for non-variational classification methods such as quantum support vector machines.Our ensemble-learning VQCs are NISQ-friendly and very distinct from other ensemble-learning proposals [82][83][84][85][86] which are hardware demanding, relying on multi-qubit controlled unitaries and quantum subroutines such as quantum phase estimation, Grover search and quantum mean estimation.Therefore, our approach can solve a broad category of machine learning problems with today's NISQ technologies.
For numerical simulations, we rely on Julia packages "VQC.jl" and "QuantumCircuit.jl".In the training process of our VQCs, we use Adam optimizer [98], which is a gradient-based method, with a learning rate of 5 × 10 −3 , to update the quantum circuit parameters θ.The gradients are obtained by Automatic differentiation methods supported by the VQC.jl package.The optimization iterations of the training procedure are 500 times for the weak quantum classifiers in both AdaBoost VQC and Bagging VQC and 1500 times for the normal deep VQC.In order to be initialization-independent, for noise-free and noisy circuits, the performance is averaged over 50 and 20 random initial samples, respectively.For quantum classification of the SPT Hamiltonian, the training dataset takes the ground state of M s =400 random samples in the (h 1 /J, h 2 /J) plane.To show the performance of the classifier, we depict the phase diagram with the resolution of 64×64 averaged in the (h 1 /J, h 2 /J) plane, as shown in Fig. 6.The optimization iteration is fixed to 1000 and each data point has been averaged over 10 different random initial samples.

FIG. 1 .
FIG. 1. Circuit design and the classical dataset.(a) The quantum circuit used in our ensemble-learning VQC protocols contains three different parts, namely encoder, parameterized circuit and measurement.The encoder part transforms classical input data into a quantum state.While in the paper, we use amplitude encoding the protocol works equally well for rotation encoding too.For quantum datasets, the encoder part is not needed.The prepared quantum states are fed into a parameterized circuit in which each qubit first undergoes a local rotation G (q) (θq) = R

FIG. 2 .
FIG. 2. Increasing layers in noise-free quantum circuits.The training and testing accuracies of normal VQC for classifying MNIST dataset with odd digits {1, 3, 5, 7} are shown as a function of the depth D l of the parameterized circuit.Each of the data points is averaged over 50 different random initial sets of parameters and the error bars represent their standard deviation.The performance monotonically improves by increasing the circuit layers in a noise-free quantum computer.The closeness of the two types of accuracies shows that the training does not impose over-fitting.

FIG. 3 .
FIG. 3. (a) The training process of AdaBoost VQC.The dataset D and the data weights W l , which is initially taken to be uniform for the first weak VQC, are used to train the l−th weak VQC using a shallow circuit, shown in the top dotted box.After training, the error rate e l is computed with which one can get the classifier's weight α l .Then the data weights are updated to get W l+1 using α l and the previous data weights W l .The process repeats until all the Lc classifiers are trained.These trained weak VQCs are combined according to their weights α l to make a single strong AdaBoost VQC with high accuracy.(b) The unknown input data |x is fed into Lc different trained weak VQCs.Then the probabilities Π k l of measurement outcome k from different weak VQCs are averaged with weights α l .The final predicted class label is the k with the largest outcome, namely arg max k Lc l=1 α l Π k l .

) • Step 6 :• Step 7 :
Training the next classifier.Repeat from Step 2 until all the L c classifiers are trained.Combining the classifiers.One can combine the trained classifiers in order to obtain a stronger one.The combination is weighted according to the strength of each classifier, quantified by α l : ŷ = F AB (x; θ * ) = arg max k Lc l=1 ) and (b) when the circuit depths are the same.The figures clearly show that AdaBoost can outperform Bagging.For instance, by considering circuits with D l =3 layers, the AdaBoost

FIG. 4 .
FIG. 4. The performance of ensemble-learning VQCs in noise-free circuits.The test accuracy of MNIST {1, 3, 5, 7} classification using ensemble-learning VQCs is plotted as a function of the number of weak classifiers Lc.The ensembles contain noise-free shallow quantum circuits with either D l =2 or D l =3 layers.The panels represent: (a) Bagging VQC; and (b) AdaBoost VQC.For both algorithms, each data point is averaged over 20 different samples of the initial parameters and the error bars are the standard deviation of those results.Increasing the number of classifiers monotonically enhances the performance of both algorithms.Since the quantum circuits are noise-free, the performance of the circuit with D l =3 layers is always better than the circuit with D l =2 layers.It is worth noting that for the same circuit layers D l and the number of classifiers Lc, the AdaBoost VQC always outperforms the Bagging VQC.

FIG. 5 .
FIG. 5. Comparison of various VQC algorithms on noisy quantum circuits.The test accuracy of four different VQC algorithms is plotted as a function of decoherence rate P .The four strategies include normal VQCs, performed on a deep circuit of D l =12 layers, with and without ZNE, as well as our ensemble-learning algorithms, namely Bagging VQC and AdaBoost VQC.In the upper panels, both Bagging and Adaboost are performed on shallow circuits with D l =2 layers and ensembles of size: (a) Lc=3; (b) Lc=6; and (c) Lc=9 classifiers, respectively.In the lower panels, both Bagging and Adaboost are performed on shallow circuits with D l =3 layers and ensembles of size: (d) Lc=3; (e) Lc=6; and (f) Lc=9 classifiers, respectively.Each of the data points plotted in these panels is averaged over 50 random samples of initial parameters.The results show that while conventional error mitigation ZNE can indeed enhance the classification accuracy of deep circuits, its performance remains below ensemble-learning classifiers with shallow circuits, as P increases.The best outcome is indeed achieved by AdaBoost whose performance significantly enhances as the number of classifiers increases and remains very robust even at large decoherence rates.