Quantum Adversarial Machine Learning

Adversarial machine learning is an emerging field that focuses on studying vulnerabilities of machine learning approaches in adversarial settings and developing techniques accordingly to make learning robust to adversarial manipulations. It plays a vital role in various machine learning applications and has attracted tremendous attention across different communities recently. In this paper, we explore different adversarial scenarios in the context of quantum machine learning. We find that, similar to traditional classifiers based on classical neural networks, quantum learning systems are likewise vulnerable to crafted adversarial examples, independent of whether the input data is classical or quantum. In particular, we find that a quantum classifier that achieves nearly the state-of-the-art accuracy can be conclusively deceived by adversarial examples obtained via adding imperceptible perturbations to the original legitimate samples. This is explicitly demonstrated with quantum adversarial learning in different scenarios, including classifying real-life images (e.g., handwritten digit images in the dataset MNIST), learning phases of matter (such as, ferromagnetic/paramagnetic orders and symmetry protected topological phases), and classifying quantum data. Furthermore, we show that based on the information of the adversarial examples at hand, practical defense strategies can be designed to fight against a number of different attacks. Our results uncover the notable vulnerability of quantum machine learning systems to adversarial perturbations, which not only reveals a novel perspective in bridging machine learning and quantum physics in theory but also provides valuable guidance for practical applications of quantum classifiers based on both near-term and future quantum technologies.


I. INTRODUCTION
The interplay between machine learning and quantum physics may lead to unprecedented perspectives for both fields [1].On the one hand, machine learning, or more broadly artificial intelligence, has progressed dramatically over the past two decades [2,3] and many problems that were extremely challenging or even inaccessible to automated learning have been solved successfully [4,5].This raises new possibilities for utilizing machine learning to crack outstanding problems in quantum science as well [1,[6][7][8][9][10][11][12][13][14][15][16].On the other hand, the idea of quantum computing has revolutionized theories and implementations of computation, giving rise to new striking opportunities to enhance, speed up or innovate machine learning with quantum devices, in turn [17][18][19].This emergent field is growing rapidly, and notable progress is made on a daily basis.Yet, it is largely still in its infancy, and many important issues remain barely explored [1,[17][18][19].In this paper, we study such an issue concerning quantum machine learning in various adversarial scenarios.We show, with concrete examples, that quantum machine learning systems are likewise vulnerable to adversarial perturbations (see Fig. 1 for an illustration) and suitable countermeasures should be designed to mitigate the threat associated with them.
In classical machine learning, the vulnerability of machine learning to intentionally-crafted adversarial examples as well as the design of proper defense strategies has been actively investigated, giving rise to an emergent field of adversarial machine learning [20][21][22][23][24][25][26][27][28][29][30][31][32][33].Adversarial examples are inputs to machine learning models that an attacker has crafted to cause * lmduan@tsinghua.edu.cn† dldeng@tsinghua.edu.cnAdding a small amount of carefully-crafted noise will cause the same quantum classifier to misclassify the slightly modified image, which is indistinguishable from the original one to human eyes, into a "gibbon" with notable high confidence.
the model to make a mistake.The first seminal adversarial example dates back to 2004 when Dalvi et al. studied the techniques used by spammers to circumvent spam filters [34].It was shown that linear classifiers could be easily fooled by few carefully-crafted modifications (such as adding innocent text or substituting synonyms for words that are common in malignant message) in the content of the spam emails, with no significant change of the meaning and readability of the spam message.Since then, adversarial learning has attracted enormous attention, and different attack and defense strategies were proposed [22,27,32,33,35,36].More strikingly, adversarial examples can even come in the form of imperceptibly small perturbations to input data, such as making a human-invisible change to every pixel in an image [21,37,38].A prominent example of this kind in the context of deep learning was first observed by Szegedy et al. and has been nowadays a celebrated prototype example that showcases the vulnerability of machine learning in a dramatic way [21]: starting with an image of a panda, an attacker may add a tiny amount of carefully-crafted noise (which is imperceptible to the human eye) to make the image be classified incorrectly as a gibbon with notably high confidence.In fact, the existence of adversarial examples is now widely believed to be ubiquitous in classical machine learning.Almost all type of learning models suffer from adversarial attacks, for a wide range of data types including images, audio, text, and other inputs [23,24].From a more theoretical computer science perspective, the vulnerability of classical classifiers to adversarial perturbations is reminiscent of the "No Free Lunch" theorem-there exists an intrinsic tension between adversarial robustness and generalization accuracy [39][40][41].More precisely, it has been proved recently that if the data distribution satisfies the W 2 Talagrand transportation-cost inequality (a general condition satisfied in a large number of situations, such as the cases where the classconditional distribution has log-concave density or is the uniform measure on a compact Riemannian manifold with positive Ricci curvature), any classical classifier could be adversarially deceived with high probability [42].
Meanwhile, over the past few years, a number of intriguing quantum learning algorithms have been discovered [17,[43][44][45][46][47][48][49][50][51][52][53][54][55][56][57][58][59][60][61], and some been demonstrated in proof-of-principle experiments [62].These algorithms exploit the unique enigmatic properties of quantum phenomena (such as superposition and entanglement) and promise to have exponential advantages compared to their classical counterparts.Notable examples include the HHL (Harrow-Hassidim-Lloyd) algorithm [63], quantum principal component analysis [64], quantum support-vector machine [65,66], and quantum generative model [58], etc.Despite this remarkable progress, quantum learning within different adversarial scenarios remains largely unexplored [67,68].A noteworthy step along this direction has been made recently by Liu and Wittek [67], where they showed in theory that a perturbation by an amount scaling inversely with the Hilbert space dimension of a quantum system to be classified should be sufficient to cause a misclassification, indicating a fundamental trade-off between the robustness of the classification algorithms against adversarial attacks and the potential quantum advantages we expect for high-dimensional problems.Yet, in practice, it is unclear how to obtain adversarial examples in a quantum learning system, and the corresponding defense strategy is lacking as well.
In this paper, we study the vulnerability of quantum machine learning to various adversarial attacks, with a focus on a specific learning model called quantum classifiers.We show that, similar to traditional classifiers based on classical neural networks, quantum classifiers are likewise vulnerable to carefully-crafted adversarial examples, which are obtained by adding imperceptible perturbations to the legitimate input data.We carry out extensive numerical simulations for several concrete examples, which cover different scenarios with diverse types of data (including handwritten digit images in the dataset MNIST, simulated time-of-flight images in cold-atom experiment, and quantum data from an one-dimensional transverse field Ising model) and different attack strategies (such as, fast gradient sign method [32], basic iterative method [27], momentum iterative method [35], and projected gradient descent [32] in the white-box attack setting, and transfer-attack method [69] and zeroth-order optimization [33] in the blackbox attack setting, etc.) to obtain the adversarial perturbations.Based on these adversarial examples, practical defense strategies, such as adversarial training, can be developed to fight against the corresponding attacks.We demonstrate that, after the adversarial training, the robustness of the quantum classifier to the specific attack will increase significantly.Our results shed new light on the fledgling field of quantum machine learning by uncovering the vulnerability aspect of quantum classifiers with comprehensive numerical simulations, which will provide valuable guidance for practical applications of using quantum classifiers to solve intricate problems where adversarial considerations are inevitable.

II. CLASSICAL ADVERSARIAL LEARNING AND QUANTUM CLASSIFIERS: CONCEPTS AND NOTATIONS
Modern technologies based on machine learning (especially deep learning) and data-driven artificial intelligence have achieved remarkable success in a broad spectrum of application domains [2,3], ranging from face/speech recognition, spam/malware detection, language translation, to selfdriving cars and autonomous robots, etc.This success raises the illusion that machine learning is currently at a state to be applied robustly and reliably on virtually any tasks.Yet, as machine learning has found its way from labs to real world, the security and integrity of its applications leads to more and more serious concerns as well, especially for these applications in safety and security-critical environments [20,23,24], such as self-driving cars, malware detection, biometric authentication and medical diagnostics [70].For instance, the sign recognition system of a self-driving car may misclassify a stop sign with a little dirt on it as a parking prohibition sign, and subsequently result in a catastrophic accident.In medical diagnostics, a deep neural network may incorrectly identify a slightly-modified dermatoscopic image of a benign melanocytic nevus as malignant with even 100% confidence [71], leading to a possible medical disaster.To address these crucial concerns and problems, a new field of adversarial machine learning has emerged to study vulnerabilities of different machine learning approaches in various adversarial settings and to develop appropriate techniques to make learning more robust to adversarial manipulations [25].
This field has attracted considerable attention and is growing rapidly.In this paper, we take one step further to study the vulnerabilities of quantum classifiers and possible strategies to make them more robust to adversarial perturbations.For simplicity and concreteness, we will only focus our discussion on supervised learning scenarios, although a generalization to unsupervised cases is possible and worth systematic future investigations.We start with a brief intro-duction to the basic concepts, notations, and ideas of classical adversarial learning and quantum classifiers.In supervised learning, the training data is labeled beforehand: D N = {(x (1) , y (1) ), • • • , (x (N ) , y (N ) )}, where is the data to be classified and y (i) denotes its corresponding label.The essential task of supervised learning is to learn from the labeled data a model y = h(x; η) (a classifier) that provides a general rule on how to assign labels to data outside the training set [72].This is usually accomplished by minimizing certain loss function over some set of model parameters that are collectively denoted as η: min η L N (η), where ) denotes the averaged loss function over the training data set.To solve this minimization problem, different loss functions and optimization methods have been developed.Each of them bearing its own advantages and disadvantages, and the choice of which one to use depends on the specific problem.
Unlike training the classifiers, generating adversarial examples is a different process, where we consider the model parameters η as fixed and instead optimize over the input space.More specifically, we search for a perturbation δ within a small region ∆, which can be added into the input sample x (i) so as to maximize the loss function: Here in order to ensure that the adversarial perturbation is not completely changing the input data, we constrain δ to be from a small region ∆, the choice of which is domain-specific and vitally depends on the problem under consideration.A widely adopted choice of ∆ is the p -norm bound: ||δ|| p ≤ , where the p -norm is defined as: In addition, since there is more than one way to attack machine learning systems, different classification schemes of the attacking strategies have been proposed in adversarial machine learning [24,25,73,74].Here, we follow Ref. [25] and classify attacks along the following three dimensions: timing (considering when the attack takes place, such as attacks on models vs. on algorithms), information (considering what information the attacker has about the learning model or algorithm, such as white-box vs. black-box attacks), and goals (considering different reasons for attacking, such as targeted vs. untargeted attacks).We will not attempt to exhaust all possible attacking scenarios, which is implausible due to its vastness and complexity.Instead, we only focus on several types of attacks that have already capture the essential messages we want to deliver in this paper.In particular, along the "information" dimension, we consider white-box and black-box attacks.In the white-box setting, the attacker has full information about the learned model and the learning algorithm, whereas the blackbox setting assumes that the adversary does not have precise information about either the model or the algorithm used by the learner.In general, obtaining adversarial examples in the black-box setting is more challenging.Along the "goals" dimension, we distinguish two major categories: targeted and untargeted attacks.In a targeted attack, the attacker aims to deceive the classifier into outputting a particularly targeted label.In contrast, untargeted attacks (also called reliability at-tacks in the literature) just attempt to cause the classifier make erroneous predictions, but no particular class is aimed.We also mention that a number of different methods have been proposed to solve the optimization problem in Eq. ( 1) or its variants in different scenarios [23].We refer to Refs.[21-23, 25, 27, 31-33, 35, 36, 69, 75] for more technique details.As for our purpose, we will mainly explore the fast gradient sign method (FGSM) [32], basic iterative method (BIM) [27], projected gradient descent (PGD) [32], and momentum iterative method (MIM) [35] in the white-box setting and the transfer attack [22], substitute model attack [31,69], and zerothorder optimization (ZOO) attack [33] methods in the blackbox setting.
On the other hand, another major motivation for studying adversarial learning is to develop proper defense strategies to enhance the robustness of machine learning systems to adversarial attacks.Along this direction, a number of countermeasures have been proposed as well in recent years [25].For instance, Kurakin et al. introduced the idea of adversarial training [76], where the robustness of the targeted classifier is enhanced by retraining with both the original legitimate data and the crafted data.Samangouei et al. came up with a mechanism [77] that uses generative adversarial network [78] as a countermeasure for adversarial perturbations.Papernot et al. proposed a defensive mechanism [79] against adversarial examples based on distilling knowledge in neural networks [80].Each of these proposed defense mechanisms works notably well against particular classes of attacks, but none of them could be used as a generic solution for all kinds of attacks.In fact, we cannot expect a universal defense strategy that can make all machine learning systems robust to all types of attacks, as one strategy that closes a certain kind of attack will unavoidably open another vulnerability for other types of attacks which exploit the underlying defense mechanism.In this work, we will use adversarial learning to enhance the robustness of quantum classifiers against certain types of adversarial perturbations.
Quantum classifiers are counterparts of classical ones that run on quantum devices.In recent years, a number of different approaches have been proposed to construct efficient quantum classifiers [45, 47-57, 65, 81, 82], with some of them even been implemented in proof-of-principle experiments.One straightforward construction, called the quantum variational classifier [45,47,49], is to use a variational quantum circuit to classify the data in a way analogous to the classical support vector machines [72].Variants of this type of classifiers include hierarchical quantum classifiers [55] (such as these inspired by the structure of tree tensor network or multiscale entanglement renormalization ansatz) and quantum convolutional neural networks [53].Another approach, called the quantum kernel [50,51,81], utilizes the quantum Hilbert space as the feature space for data and compute the kernel function via quantum devices.Both the quantum variational classifier and the quantum kernel approach have been demonstrated in a recent experiment with superconducting qubits [51].In addition, hierarchical quantum classifiers have also been implemented by using the IBM quantum experience [83] and their robustness to depolarizing noises has been demon-p layers Rot.layer Ent.layer < l a t e x i t s h a 1 _ b a s e 6 4 = " 3 H 5 B C T z H M W C J J e u e z 0 6 Z θ a q,i with θ representing the Euler angles, q identifying the qubit, and i = 1, 2, • • • , p referring to the label of layers.The entangler unit entangles all qubits and is composed of a series of CNOT gates.The initial state |ψ in, which is a n-qubit state, encodes the complete information of the input data to be classified.The projection measurement on the output qubits give the predicting probability for each category and the input data is assigned a label that bearing the largest probability.strated in principle [55].These experiments showcase the intriguing potentials of using the noisy intermediate-scale quantum devices [84] (which are widely expected to be available in the near future) to solve practical machine learning problems, although an unambiguous demonstration of quantum advantages is still lacking.Despite these exciting progresses, an important question of both theoretical and experimental relevance concerning the reliability of quantum classifiers remains largely unexplored: are they robust to adversarial perturbations?

III. VULNERABILITY OF QUANTUM CLASSIFIERS
As advertised in the above discussion, quantum classifiers are vulnerable to adversarial perturbations.In this section, we will first introduce the general structure of the quantum classifiers and the learning algorithms used in this paper and several attacking methods to obtain adversarial perturbations with technique details provided in the Appendix.We then apply these methods to concrete examples to explicitly show the vulnerability of quantum classifiers in diverse scenarios, including quantum adversarial learning of real-life images (e.g., handwritten digit images in MNIST), topological phases of matter, and quantum data from the ground states of physical Hamiltonians.
A. Quantum classifiers: training and adversarial attacks Quantum classifiers take quantum states as input.Thus, when they are used to classify classical data, we need first to convert classical data into quantum states.This can be done with an encoding operation, which basically implements a feature map from the D-dimensional Euclidean space (where the class data is typically represented by D-dimensional vectors) to the 2 n -dimensional Hilbert space for n qubits: ϕ : R D → C 2 n .There are two common ways of encoding classical data into quantum states: amplitude encoding and qubit encoding [45,48,[63][64][65][85][86][87][88][89][90].Amplitude encoder maps an input vector x ∈ R D (with some possible preprocessing such as normalization) directly into the amplitudes of the 2 ndimensional ket vector |ψ in for n qubits in the computational basis.Here, for simplicity, we assume that D is a power of two such that we can use D = 2 n amplitudes of a n-qubit system (in fact, if D < 2 n we can add 2 n − D zeros at the end of the input vector to make it of length 2 n ).Such a converting procedure can be achieved with a circuit whose depth is linear in the number of features in the input vectors with the routines in Refs.[91][92][93].With certain approximation or structure, the required overhead can be reduced to polylogarithmic in D [94,95].This encoding operation can also be made more efficient by using more complicated approaches such as tensorial feature maps [45].Qubit encoder, in contrast, uses D (rather than O(log D) as in amplitude encoding) qubits to encode the input vector.We first rescale the data vectors element-wise to lie in [0, π  2 ] and encode each element with a qubit using the following scheme: , where x d is the d-th element of the rescaled vector.The total quantum input state that encodes the data vectors is then a tensor product |φ = ⊗ D d=1 |φ d .Qubit encoding does not require a quantum random access memory [89] or a complicated circuit to prepare the highly entangled state |ψ in , but it demands much more qubits to perform the encoding and hence is more challenging to numerically simulate the training and adversarial attacking processes on a classical computer.As a result, we will only focus on amplitude encoding in this work, but the generalization to other encoding schemes is straightforward and worth investigation in the future.
We choose a hardware-efficient quantum circuit classification model, which has been used as a variational quantum eigensolver for small molecules and quantum magnets in a recent experiment with superconducting qubits [96].The schematic illustration of the model is shown in Fig. 2. Without loss of generality, we assume that the number of categories to be classified is K and each class is labeled by an integer number 1 ≤ k ≤ K.We use m qubits (2 m−1 < K ≤ 2 m ) to serve as output qubits that encode the category labels.A convenient encoding strategy that turns discrete labels into a vector of real numbers is the so-called one-hot encoding [72], which converts a discrete input value 0 < k ≤ K into a vector a ≡ (a 1 , • • • , a 2 m ) of length 2 m with a k = 1 and a j = 0 for j = k.For the convenience of presentation, we will use y and a interchangeably to denote the labels throughout the rest of the paper.In such a circuit model , we first prepare the input state to be |ψ in ⊗ |1 ⊗m with |ψ in a n-qubit state encoding the complete information of the data to be classified, and then apply a unitary transform consisting of p layers of interleaved operations.Each layer contains a rotation unit that performs arbitrary single-qubit Euler rotations and an entangler layer that generates entanglement between qubits.This generates a variational wavefunction )U ENT denotes the unitary operation for the i-th layer.Here U ENT represents the unitary operation generated by the entangler unit and we use θ i to denote collectively all the parameters in the i-th layer and Θ to denote collectively all the parameters evolved in the whole model.We mention that the arbitrary single-qubit rotation together with the control-NOT gate gives a universal gate set in quantum computation.Hence our choice of this circuit classifier is universal as well, in the sense that it can approximate any desired function as long as p is large enough.One may choose other models, such as hierarchical quantum classifiers [55] or the quantum convolutional neural network [53], and we expect that the attacking methods and the general conclusion should carry over straightforwardly to these models.
During the training process, the variational parameters Θ will be updated iteratively so as to minimize certain loss functions.The measurement statistics on the output qubits will determine the predicted label for the input data encoded in state |ψ in .For example, in the case of two-category classification, we can use y ∈ {0, 1} to label the two categories and the number of output qubits is one.We estimate the probability for each class by measuring the expectation values of the projections: P (y = l) = Tr(ρ out |l l|), where l = 0, 1 and is the reduced density matrix for the output qubit.We assign a label y = 0 to the data sample x if P (y = 0) is larger than P (y = 1) and say that x is classified to be in the 0 category with probability P (y = 0) by the classifier.The generalization to multi-category classification is straightforward.One observation which may simplify the numerical simulations a bit is that the diagonal elements of ρ out , denoted as g ≡ (g 1 , • • • , g 2 m ) = diag(ρ out ), in fact give all the probabilities for the corresponding categories.
In classical machine learning, a number of different loss functions have been introduced for training the networks and characterizing their performances.Different loss functions possess their own pros and cons and are best suitable for different problems.For our purpose, we define the following loss function based on cross-entropy for a single data sample encoded as |ψ in : During the training process, a classical optimizer is used to search for the optimal parameters Θ * that minimize the averaged loss function over the training data set: ). Various gradient descent algorithms, such as the stochastic gradient descent [97] and quantum natural gradient descent [98,99], etc., can be employed to do the optimization.We use Adam [100,101], which is an adaptive learning rate optimization algorithm designed specif- 3. A sketch of adding adversarial perturbations to the input data for quantum classifiers.Throughout this paper, we mainly focus on evasion attack [25], which is the most common type of attack in adversarial learning.In this setting, the attacker attempts to deceive the quantum classifiers by adjusting malicious samples during the testing phase.Adding a tiny amount of adversarial noise can cause quantum classifiers to make incorrect predictions.
ically for training deep neural networks, to train the quantum classifiers.
A crucial quantity that plays a vital role in minimizing L N (Θ) is its gradient with respect to model parameters.Interestingly, owing to the special structures of our quantum classifiers this quantity can be directly obtained from the projection measurements through the following equality [59]: where ϑ denotes an arbitrary single parameter in our circuit classifier and L N (Θ) ξ (ξ = ϑ, ϑ + π 2 , and ϑ − π 2 ) represents the expectation value of L N (Θ) with the corresponding parameter set to be ξ.We note that the equality in Eq. ( 3) is exact, in sharp contrast to other models for quantum variational classifiers where the gradients can only be approximated by finite-difference methods in general.It has been proved that an accurate gradient based on quantum measurements could lead to substantially faster convergence to the optimum in many scenarios [102], in comparison with the finitedifference method approach.
We now give a general recipe on how to generate adversarial perturbations for quantum classifiers.Similar to the case of classical adversarial learning, this task essentially reduces to another optimization problem where we search for a small perturbation within an appropriate region ∆ that can be added into the input data so that the loss function is maximized.A quantum classifier can classify both classical and quantum data.Yet, adding perturbations to classical data is equivalent to adding perturbations to the initial quantum state |ψ in .Hence, it is sufficient to consider only perturbations to |ψ in , regardless of whether the data to be classified is quan-tum or classical.A pictorial illustration of adding adversarial perturbations to the input data for a quantum classifier is shown in Fig. 3.In the case of untargeted attacks, we attempt to search a perturbation operator U δ acting on |ψ in to maximize the loss function: where Θ * denotes the fixed parameters determined during the training process, |ψ in encodes the information of the data sample x supposed to be under attack, and a represents the correct label for x in the form of one-hot encoding.On the other hand, in the case of targeted attacks we aim to search a perturbation δ that minimizes (rather than maximizes) the loss function under the condition that the predicted label is targeted to be a particular one: where a (t) is the targeted label that is different from the correct one a = a (t) .In general, ∆ can be a set of all unitaries that are close to the identity operator.This corresponds to the additive attack in classical adversarial machine learning, where we modify each component of the data vector independently.In our simulations, we use automatic differentiation [103], which computes derivatives to machine precision, to implement this type of attack.In addition, for simplicity we can further restrict ∆ to be a set of products of local unitaries that are close to the identity operator.This corresponds to the functional adversarial attack [104] in classical machine learning.It is clear that the searching space for the functional attack is much smaller than that for the additive attack and one may regard the former as a special case for the later.
We numerically simulate the training and inference process of the quantum classifiers on a classical cluster by using the Julia language [105] and Yao.jl [106] framework.We run the simulation parallelly on the CPUs or GPUs, depending on different scenarios.The parallel nature of the mini-batch gradient descent algorithm naturally fits the merits of GPUs and thus we use CuYao.jl[107], which is a very efficient GPU implementation of Yao.jl [106], to gain speedups for the cases that are more resource-consuming.We find that the performance of calculating mini-batch gradients on a single GPU is ten times better than that of parallelly running on CPUs with forty cores.The automatic differentiation is implemented with Flux.jl [108] and Zygote.jl[109].Based on this implementation, we can optimize over a large number of parameters for circuit depth as large as p = 50.In general, we find that increases in circuit depth (model capacity) are conducive to the achieved accuracy.We check that the model does not overfit because the loss of the training data set and validation data set is close.So there is no need for introducing regularization techniques such as Dropout [110] to avoid overfitting.Now we have introduced the general structure of our quantum classifiers and the methods to train them and to obtain adversarial perturbations.In the following subsections, we will

B. Quantum adversarial learning images
Quantum information processors possess unique properties such as quantum parallelism and quantum superposition, making them intriguing candidates for speeding up image recognitions in machine learning.It has been shown that some quantum image processing algorithms may achieve exponential speedups over their classical counterparts [111,112].Researchers have employed quantum classifiers for many different image data sets [45].Here, we focus on the MNIST handwritten digit classification dataset [113], which is widely considered to be a real-life testbed for new machine learning paradigms.For this dataset, near-perfect results have been reached using various classical supervised learning algorithms [114].The MNIST data set consists of hand-drawn digits, from 0 through 9 in the form of gray-scale images.Each image is two dimensional, and contains 28 × 28 pixels.Each pixel of an image in the dataset has a pixel-value, which is an integer ranging from 0 to 255 with 0 meaning the darkest and 255 the whitest color.For our purpose, we slightly reduced the size of the images from 28 × 28 pixels to 16 × 16 pixels, so that we can simulate the training and attacking processes of the quantum classifier with moderate classical computational resources.In addition, we normalize these pixel values and encode them into a pure quantum state using the amplitude encoding method mentioned in Sec.III A.
We first train the quantum classifiers to identify different images in the MNIST with sufficient classification accuracy.The first case we consider is a two-category classification problem, where we aim to classify the images of digits 1 and 9 by a quantum classifier with structures introduced shown in Fig. 2 .From the MNIST dataset, we select out all images of 1 and 9 to form a sub-dataset, which contains a training dataset of size 11633 (used for training the quantum classifier), a validation dataset of size 1058 (used for tuning hyperparameters, such as the learning rate), and a testing set of size 2144 (used for evaluating the final performance of the quantum classifier).In Fig. 4, we plot the average accuracy and loss for the training and validation datasets respectively as a function of the number of epochs.From this figure, the accuracy for both the training and validation increases rapidly at the beginning of the training process and then saturate at a high value of ≈ 98%.Meanwhile, the average loss for both training and validation decreases as the number of epochs increases.The difference between the training loss and validation loss is very small, indicating that the model does not overfit.In addition, the performance of the quantum classifier is also tested on the testing set and we find that our classifier can achieve a notable accuracy of 98% after around fifteen epochs.For two-category classifications, the distinction between targeted and untargeted attacks blurs since the target label can only be simply the alternative label.Hence, in order to illustrate the vulnerability of quantum classifiers under targeted attacks, we also need to consider a case of multi-category classification.To this end, we train a quantum classifier to distinguish four categories of handwritten digits: 1, 3, 7, and 9. Our results are plotted Fig. 5. Similar to the case of two-category classification, we find that both the training and validation accuracies increase rapidly at the beginning of the training process and then saturate at a value of ≈ 92%, which is smaller than that for the two-category case.After training, the classifier is capable of predicting the corresponding digits for the testing dataset with an accuracy of 91.6%.We mention that one can further increase the accuracy for both the two-and four-category classifications, by using the original 28 × 28-pixel images in MNIST or using a quantum classifier with more layers.But this demands more computational resources.
After training, we now fix the parameters of the correspond-

Clean
Predicted as 9

Adversarial
Predicted as 9 Predicted as 1 Predicted as 1

Clean
Predicted as 9

Adversarial
Predicted as 9 Predicted as 1 Predicted as 1 FIG.6.The clean and the corresponding adversarial images for the quantum classifier generated by the basic iterative method (see Appendix).Here, we apply the additive attack in the white-box untargeted setting.For the legitimate clean images, the quantum classifier can correctly predict their labels with confidence larger than 78%.After attacks, the classifier will misclassify the crafted images of digit 1 (9) as digit 9 (1) with notably high confidence, although the differences between the crafted and clean images are almost imperceptible to human eyes.In fact, the average fidelity is 0.916, which is very close to unity.
ing In the white-box setting, the attacker has full information about the quantum classifiers and the learning algorithms.In particular, the attacker knows the loss function that has been used and hence can calculate its gradients with respect to the parameters that characterize the perturbations.As a consequence, we can use different gradient-based methods developed in the classical adversarial machine learning literature, such as the FGSM [32], BIM [27], PGD [32], and MIM [35], to generate adversarial examples.For untargeted attacks, the attacker only attempts to cause the classifier to make incorrect predictions, but no particular class is aimed.In classical adversarial learning, a well-known example in the white-box untargeted scenario concerns facial biometric systems [115], whereby wearing a pair of carefully-crafted eyeglasses the attacker can have her face misidentified by the state-of-the-art face-recognition system as any other arbitrary face (dodging attacks).Here, we show that quantum classifiers are vulnerable to such attacks as well.
For the simplest illustration, we first consider attacking additively the two-category quantum classifier discussed above in the withe-box untargeted setting.In Fig. 6, we randomly choose samples for digits 1 and 9 from MNIST and then solve the Eq. ( 4) iteratively by the BIM method to obtain their corre- sponding adversarial examples.This figure shows the original clean images and their corresponding adversarial ones for the two-category quantum classifier.For these particular clean images, the quantum classifier can correctly assign their labels with confidence larger than 78%.Yet, after attacks the same classifier will misclassify the crafted images of digit 1 (9) as digit 9 (1) with decent high confidence 73%.Strikingly, the obtained adversarial examples look the same as the original legitimate samples.They only differ by a tiny amount of noise that is almost imperceptible to human eyes.To further verify that the vulnerability of the quantum classifier is not specific to particular images, but rather generic for most of (if not all) images in the dataset, we apply the same attack to all images of digits 1 and 9 in the testing set of MNIST.In Fig. 7(a), we plot the accuracy as a function of the number of the BIM iterations.It is clear from this figure that the accuracy decreases rapidly at the beginning of the attack, indicating that more and more adjusted images are misclassified.After five BIM iterations, the accuracy decreases to zero and all adjusted images become adversarial examples misclassified by the quantum classifier.In addition, to characterize how close a clean legitimate image is to its adversarial counterpart in the quantum framework, we define the fidelity between the quantum states that encode them: F = | ψ adv.|ψ leg.| 2 , where |ψ adv.and |ψ leg.denote the states that encode the legitimate and adversarial sample, respectively.In Fig. 7(b), we compute the average fidelity at each BIM iteration and plot the accuracy as a function of average fidelity.Since the fidelity basically measures the difference between the legitimate and adversarial images, hence it is straightforward to obtain that the accuracy will decrease as the average fidelity decreases.This is explicitly demonstrated in Fig. 7(b).What is more interesting is that even when the accuracy decreases to zero, namely when all the adjusted images are misclassified, the average fidelity is still larger than 0.73.We mention that this is a fairly high average fidelity, given that the Hilbert space dimension of the quantum classifier is already very large.
In the above discussion, we have used Eq. ( 4), which is suitable for the untargeted attack, to generate adversarial examples.However, the problem we considered is a two-category classification problem and the distinction between targeted and untargeted attacks is ambiguous.A more unambiguous approach is to consider untargeted attacks to the four-category quantum classifier.Indeed, we have carried out such attacks and our results are plotted in Fig. 7(c-d), which are similar to the corresponding results for the two-category scenarios.Moreover, we can also consider utilizing different optimization methods to do white-box untargeted attacking for the quantum classifiers.In Table I, we summarize the performance of two different methods (BIM and FGSM) in attacking both the two-category and four-category quantum classifiers.Both the BIM and FGSM methods perform noticeably well.Now, we have demonstrated how to obtained adversarial examples for the quantum classifiers by additive attacks, where each component of the data vectors are modified independently.In real experiments, to realize such adversarial examples with quantum devices might be challenging because this requires implementations of complicated global unitaries with very high precision.To this end, a more practical approach is to consider functional attacks, where the adversarial perturbation operators are implemented with a layer of local unitary transformations.In this case, the searching space is much smaller than that for the additive attacks, hence we may not be able to find the most efficient adversarial perturbations.Yet, once we find the adversarial perturbations, it could be much easier to realize such perturbations in the quantum laboratory.To study functional attacks, in our numerical simulations we consider adding a layer of local unitary transformations be- fore sending the quantum states to the classifiers.We restrict that these local unitaries are close to the identity operators so as to keep the perturbations reasonably small.We apply both the BIM and FGSM methods to solve Eq. ( 4) in the whitebox untargeted setting.Partial of our results for the case of functional attacks are plotted in Fig. 8. From this figure, it is easy to see that the performances of both the BIM and FGSM methods are a bit poorer than that for the case of additive attacks.For instance, in the case of functional attacks after six BIM iterations there is still a residue accuracy about 14% [see Fig. 8(a)], despite the fact that the average fidelity has already decreased to 0.2 [see Fig. 8(c)].This is in sharp contrast to the case of additive attacks, where five BIM iterations are enough to reduce the accuracy down to zero [see Fig. 7(a)] and meanwhile maintain the average fidelity larger than 0.73 [see Fig. 7(b)].The reduction of the performances for both methods is consistent with the fact that the searching space for functional attacks are much smaller than that for additive attacks.

White-box attack: targeted
Unlike in the case of untargeted attacks, in targeted attacks the attacker attempts to mislead the classifier to classify a data sample incorrectly into a specific targeted category.A good example that manifestly showcases the importance of targeted attacks occurs in face recognition as well: in some situations the attacker may attempt to disguise her face inconspicuously to be recognized as an authorized user of a laptop or phone that authenticates users through face recognition.This type of attack has a particular name of impersonation attack in classical adversarial learning.It has been shown surprisingly in Ref. [115] that physically realizable and inconspicuous impersonation attacks can be carried out by wearing a pair of carefully-crafted glasses designed for deceiving the state-ofthe-art face recognition systems.In this subsection, we show that quantum classifiers are likewise vulnerable to targeted attacks in the white-box setting.
We consider attacking the four-category quantum classifier.In Fig. 9, we randomly choose samples for digits 1, 3, 7, and 9 from MNIST and then solve the Eq. ( 5) iteratively by the BIM method to obtain their corresponding adversarial examples.This figure shows the original legitimate images and their corresponding targeted adversarial ones for the fourcategory quantum classifier.For these legitimate samples, the quantum classifier can assign their labels correctly with high confidence.But after targeted attacks, the same classifier is misled to classify the crafted images of digits {7, 1, 3, 9} erroneously as the targeted digits {9, 3, 7, 7} with a decent high confidence, despite the fact that the differences between the crafted and legitimate images are almost imperceptible.To further illustrate how this works, in Figs. 10 (a-d) we plot the classification probabilities for each digit and the loss functions with respect to particular digits as a function of the number of epochs.Here, we randomly choose an image of a given digit and then consider either additive [Figs.10(a-b)] or functional [Figs.10(c-d)] targeted attacks through the BIM method.For instance, in Fig. 10(a) the image we choose is an image for digit 1 and the targeted label is digit 3. From this figure, at the beginning the quantum classifier is able to correctly identify this image as digit 1 with probability P (y = 1) ≈ 0.41.As the number of BIM iteration increases P (y = 1) decreases and P (y = 3) increases, and after about six iterations P (y = 3) becomes larger than P (y = 1), indicating that the classifier begins to be deceived into predict the image as a digit 3. Fig. 10(b) shows the loss as a function of the number of epochs.From this figure, as the iteration number increases, the loss for classifying the image as digit 1 (3) increases (decreases), which is consistent with the classification probability behaviors in Fig. 10(a).More surprisingly, we can in fact fool the quantum classifier to identify any images as a given targeted digit.This is clearly observed from Figs. 10 (e-f) and Table II, where we perform additive attacks for all the images of digits {1, 3, 7, 9} with different targeted labels and different attacking methods.In Figs.10(e-f), we plot the accuracy versus the average fidelity.Here, for a given targeted label l (l = 1, 3, 7, or, 9), we perform additive attacks for all images with original labels not equal to l and compute the accuracy and the average fidelity based on these images.From these figures, even when the TABLE II.The accuracy α adv (in %) and average fidelity F for the four-category quantum classifier with depth p = 10 on the test dataset when being attacked by different methods for different targeted labels.Here, we consider additive attacks with both the BIM and FGSM methods.For the BIM method, we generate adversarial examples using three iterations with a step size of 0.05.Whereas, for the FGSM method, we use a single step with step size of 0.03.average fidelity maintains larger than 0.85 the accuracy can indeed decrease to zero, indicating that all the images are classified by the quantum classifier incorrectly as digit l.In Table II, we summarize the performance of the BIM and FGSM methods in attacking the four-category quantum classifier in the white-box targeted setting.

Black-box attack: transferability
Unlike white-box attacks, black-box attacks assume limited or even no information about the internal structures of the classifiers and the learning algorithms.In classical adversarial learning, two basic premises that make black-box attacks possible have been actively studied [73]: the transferability of the adversarial examples and probing the behavior of the classifier.Adversarial sample transferability is the property that an adversarial example produced to deceive one specific learning model can deceive another different model, even if their architectures differ greatly or they are trained on different sets of training data [21,22,31].Whereas, probing is another important premise of the black-box attack that the attacker uses the victim model as an oracle to label a synthetic training set for training a substitute model, hence the attacker needs not even collect a training set to mount the attack.Here, we study the transferability of adversarial examples in a more exotic setting, where we first generate adversarial examples for different classical classifiers and then investigate whether they transfer to the quantum classifiers or not.This would have important future applications considering a situation where the attacker may only have access to classical resources.
Our results are summarized in Table III.To obtain these results, we first train two classical classifiers, one based on a convolutional neural network (CNN) and the other based on a feedforward neural network (see Appendix.V for details), with training data from the original MNIST dataset.Then we use three different methods (i.e., BIM, FGSM, and MIM) to produce adversarial examples in a white-box untargeted setting for both classical classifiers separately.After these adversarial examples are obtained, we evaluate the performance of the trained quantum classifier on them.From Table III, it is evident that the performance of the quantum classifier on the adversarial examples is much worse than that on the original legitimate samples.For instance, for the adversarial examples generated for the CNN classifier by the MIM method, the accuracy of the quantum classifier is only 62.3%, which is 29.7% lower than that for the clean legitimate samples.This indicates roughly that 29.7% of the adversarial examples originally produced for attacking the CNN classifier transfer to the quantum classifier.This transferability ratio may not be as large as that for adversarial transferability between two classical classifiers.Yet, given the fact that the structure of the quantum classifier is completely different from the classical ones, it is in fact a bit surprising that such a high transferability ratio can be achieved in reality.We expect that if we use another quantum classifier to play as the surrogate classifier, the transferability ratio might increase significantly.We leave this interesting problem for future studies.

Adversarial perturbations are not random noises
The above discussions explicitly demonstrated the vulnerability of quantum classifiers against adversarial perturbations.The existence of adversarial examples is likewise a general property for quantum learning systems with high-dimensional Hilbert space.For almost all the images of hand-writing digits in MNIST, there always exists at least one corresponding adversarial example.Yet, it is worthwhile to clarify that adversarial perturbations are not random noises.They are carefullyengineered to mislead the quantum classifiers and in fact only occupy a tiny subspace of the total Hilbert space.To demonstrate this more explicitly, we compare the effects of random noises on the accuracy of both two-and four-category quantum classifiers with the effects of adversarial perturbations.For simplicity and concreteness, we consider the uncorrelated decoherence noises that occur in a number of experimental platforms (such as, Rydberg atoms, superconducting qubits, and trapped ions, etc.) for quantum computing [116][117][118][119]: where ρ denotes the density state of a qubit, σ x,y,z are the usual Pauli matrices, and β ∈ [0, 1] is a positive number characterizing the strength of the decoherence noises.In Fig. 11, we plot the classification accuracy of the quantum classifiers versus the noise strength p and the average fidelity between the original state and the state affected by a single layer of depolarizing noise on each qubit described by Eq. 6.From this figure, we observe that the accuracy for both the two-and four-category quantum classifiers decreases roughly linearly with the increase of p and the decrease of the average fidelity.This is in sharp contrast to the case for adversarial perturbations [see Fig. 10 (e-f), Fig. 8(c), and Fig. 7(b)(d) for comparison], where the accuracy has a dramatic reduction as the average fidelity begins to decrease from unity, indicating that the adversarial perturbations are not random noises.In fact, since the accuracy only decreases linearly with the average fidelity, this result also implies that quantum classifiers are actually rather robust to random noises.We mention that one may also consider the bit-flip or phase-flip noises and observe similar results.The fact that the adversarial perturbations are distinct from random noises is also reflected in our numerical simulations of the defense strategy by data augmentation- we find that the performance of the quantum classifier is noticeably better if we augment the training set by adversarial examples, rather than samples with random noises.

C. Quantum adversarial learning topological phases of matter
Classifying different phases and the transitions between them is one of the central problems in condensed matter physics.Recently, various machine learning tools and techniques have been adopted to tackle this intricate problem.In particular, a number of supervised and unsupervised learning methods have been introduced to classify phases of matter and identify phase transitions [8,10,[120][121][122][123][124][125][126][127][128][129][130], giving rise to an emergent research frontier for machine learning phases of matter.Following these theoretical approaches, proof-of-principle experiments with different platforms [131][132][133][134], such as doped CuO 2 [134], electron spins in diamond nitrogen-vacancy centers [131], and cold atoms in optical lattices [132,133], have been carried out in laboratories to demonstrate their feasibility and unparalleled potentials.In addition, the vulnerability of these machine learning approaches to adversarial perturbations has been pointed out in a recent work as well [135].It has been shown that typical phase classifiers based on classical deep neural networks are extremely vulnerable to adversarial attacks: adding a tiny amount of carefully-crafted noises or even just changing a single pixel of the legitimate sample may cause the classifier to make erroneous predictions with a surprisingly high confidence level.
Despite these exciting progresses made in the area of machine learning phases of matter, most previous approaches are based on classical classifiers and using quantum classifiers to classify different phases and transitions still remains barely explored hitherto.Here, in this section we study the problem of using quantum classifiers to classify different phases of matter, with a focus on topological phases that are widely believed to be more challenging than conventional symmetry-breaking phases (such as the paramagnetic/ferromagnetic phases) for machine-learning approaches [120,128,129,136].We show, through a concrete example, that the quantum classifiers are likewise vulnerable to adversarial perturbations.We consider the following 2D square-lattice model for quantum anomalous Hall (QAH) effect, where a combination of spontaneous magnetization and spin-orbit coupling leads to quantized Hall conductivity in the absence of an external magnetic field: + iJ Here c † rσ (c rσ ) is the fermionic creation (annihilation) operator with pseudospin σ = (↑, ↓) at site r, and x, ŷ are unit lattice vectors along the x, y directions.The first two terms describe the spin-orbit coupling with J (x) SO and J (y) SO denoting its strength along the x and y directions, respectively.The third and the fourth terms denote respectively the spin-conserved nearest-neighbor hopping and the on-site Zeeman interaction.In momentum space, this Hamiltonian has two Bloch bands and the topological structure of this model can be characterized by the first Chern number: where F xy denotes the Berry curvature ϕ(k)|i∂ kµ |ϕ(k) [µ = x, y and ϕ(k) is the Bloch wavefunction of the lower band], and the integration is over the whole first Brillouin zone (BZ).It is straightforward to obtain that C 1 = − sign(µ) when 0 < |µ| < 4t and C 1 = 0 otherwise.The above Hamiltonian can be implemented with synthetic spin-orbit couplings in cold-atom experiment [137] and the topological index C 1 can be obtained from the standard time-of-flight images [138,139].Indeed, by using ultracold fermionic atoms in a periodically modulated optical honeycomb lattice, the experimental realization of the Haldane model, which bears similar physics and Hamiltonian structures as in Eq. ( 8), has been reported [140].For our purpose, we first train a two-category quantum classifier to assign labels of C 1 = 0 or C 1 = 1 to the time-of-flight images.To obtain the training data, we diagonalize the Hamiltonian in Eq. ( 7) with an open boundary condition and calculate the atomic density distributions with different spin bases for the lower band.These density distributions can be directly measured through the time-of-flight imaging techniques in cold atom experiments and serve as our input data.We vary λ SO and t in both the topological and topologically trivial regions to generate several thousand of data samples.Similar as in the above discussion on identifying images of hand-writing digits, we use amplitude encoding to convert the data for density distributions to the input quantum states for the quantum classifier.In Fig. 12(a), we plot the average accuracy and loss as a function of the number of epochs.It shows that after training, the quantum classifier can successfully identify the time-offlight images with reasonably high accuracy.Yet, we note that this accuracy is a bit lower than that for the case of classifying paramagnetic/ferromagnetic phases discussed in the next section, which is consistent with the general belief that topological phases are harder to learning.
Unlike the conventional phases or the hand-writing digit images, topological phases are described by nonlocal topological invariants (such as the first Chern number), rather than local order parameters.Thus, intuitively the obtaining of adversarial examples might also be more challenging, since the topological invariants capture only the global properties of the systems and are insensitive to local perturbations.Yet, here we show that adversarial examples do exist in this case and the quantum classifier is indeed vulnerable in learning topological phases.To obtain adversarial examples, we consider attacking the quantum classifier additively in the white-box untargeted setting.Partial of our results are plotted in Fig. 12(b).From this figure, the accuracy for the quantum classifier in classifying time-of-flight images decreases rapidly as the number of attacking iterations increases and after about six iterations it becomes less than 0.4, indicating that more than 60% the attacked images in the test set are misclassified.To illustrate this even more concretely, in Fig. 13 we randomly choose a timeof-flight image and then solve the Eq. ( 4) iteratively by the BIM method to obtain its corresponding adversarial examples.Again, as shown in this figure the obtained adversarial example looks like the same as the clean legitimate time-of-flight image.They differ only by a tiny amount of perturbation that is imperceptible to human eyes.In addition, we summarize the performance of two different methods (BIM and FGSM) in attacking the quantum classifier in Table IV.Both the BIM and FGSM methods perform noticeably well.

D. Adversarial learning quantum data
In the above discussion, we considered using quantum classifiers to classify classical data (images) and studied their vulnerabilities to adversarial perturbations.This may have important applications in solving practical machine learning problems in our daily life.However, in such a scenario a prerequisite is to first transfer classical data to quantum states, which may require certain costly processes or techniques (such as quantum random access memories [89]) and thus renders the potential quantum speedups nullified [90].Unlike classical classifiers that can only take classical data as input, quantum classifiers can also classify directly quantum states produced by quantum devices.Indeed, it has been shown that certain quantum classifiers, such as quantum principal component analysis [141] and quantum support vector machine [65], could offer an exponential speedup over their classical counterparts in classifying quantum data directly.In this subsection, we consider the vulnerability of quantum classifiers in classifying quantum states.For simplicity and concreteness, we consider the following 1D transverse field Ising model: where σ z i and σ x i are the usual Pauli matrices acting on the ith spin and J x is a positive parameter describing the strength of the transverse field.This model maps to free fermions through a Jordan-Wigner transformation and is exactly solvable.At zero temperature, it features a well-understood quantum phase transition at J x = 1, between a paramagnetic phase for J x > 1 and a ferromagnetic phase for J x < 1.It is an exemplary toy model for studying quantum phase transitions and an excellent testbed for different new methods and techniques.Here, we use a quantum classifier, with structures shown in Fig. 2, to classify the ground states of H Ising with varying J x (from J x = 0 to J x = 2) and show that this approach is extremely vulnerable to adversarial perturbations as well.
To generate the data sets for training, validation, and testing, we sample a series of Hamiltonians with varying J x from 0 to 2 and calculating their corresponding ground states, which are used as input data to the quantum classifier.We train the quantum classifier with the generated training dataset and our results for training is shown in Fig. 14.Strikingly, our quantum classifier is very efficient in classifying these ground states of H Ising into categories of paramagnetic/ferromagnetic phases and we find that a model circuit with depth p = 5 is enough to achieve near-perfect classification accuracy.This is in contrast to the case of learning topological phases, where a quantum classifier with depth p = 10 only gives an accuracy of around 90%.In addition, we mention that one can also use the quantum classifier to study the quantum phase transition.
Similar to the cases for classical input data, the quantum classifiers are vulnerable to adversarial perturbations in classi-fying quantum data as well.To show this more explicitly, we consider attacking the above quantum classifier trained with quantum inputs additively in the white-box untargeted setting.Partial of our results are plotted in Fig. 15.In Fig. 15(a), we plot the accuracy as a function of the number of the BIM iterations and find that it decreases to zero after ten BIM iterations, indicating that all the slightly-adjusted quantum states, including even these far away from the phase transition point, are misclassified by the quantum classifier.In Fig. 15(b), we plot the accuracy as a function of averaged fidelity for different attacking methods.From this figure, both the BIM and FGSM methods are notably effective in this scenario and the accuracy of the quantum classifier on the generated adversarial examples decreases to zero, whereas the average fidelity maintains moderately large for both methods.

IV. DEFENSE: QUANTUM ADVERSARIAL TRAINING
In the above discussions, we have explicitly shown that quantum classifiers are vulnerable to adversarial perturbations.This may raise serious concerns about the reliability and security of quantum learning systems, especially for these applications that are safety and security-critical, such as selfdriving cars and biometric authentications.Thus, it is of both fundamental and practical importance to study possible defense strategies to increase the robustness of quantum classifiers to adversarial perturbations.
In general, adversarial examples are hard to defend against because of the following two reasons.First, it is difficult to build a precise theoretical model for the adversarial example crafting process.This is a highly non-linear and non-convex sophisticated optimization process and we lack proper theoretical tools to analyse this process, making it notoriously hard to obtain any theoretical argument that a particular defense strategy will rule out a set of adversarial examples.Second, defending adversarial examples requires the learning system to produce proper outputs for every possible input, the number of which typically scales exponentially with the size of the problem.Most of the time, the machine learning models work very well but only for a very small ratio of all the possible inputs.Nevertheless, in the field of classical adversarial machine learning, a variety of defense strategies have been proposed in recent years to mitigate the effect of adversarial attacks, including adversarial training [76], gradient hiding [142], defensive distillation [79], and defense-GAN [77], etc.Each of these strategies has its own advantages and disadvantages and none of them is adaptive to all types of adversarial attacks.In this section, we study the problem of how to increase the robustness of quantum classifiers against adversarial perturbations.We adopt one of the simplest and effective methods, namely adversarial training, to the case of quantum learning and show that it can significantly enhance the performance of quantum classifiers in defending adversarial attacks.
The or more chosen attacking strategies and then retrain the classifier with both the legitimate and adversarial samples.For our purpose, we employ a robust optimization [143] approach and reduce the task to solving a typical min-max optimization problem: where |ψ in is the i-th sample under attack, and y (i) denotes its original corresponding label.The meaning of Eq. ( 10) is clear: we are training the quantum classifier to minimize the adversarial risk, which is described by the average loss for the worst-case perturbations of the input samples.We mention that this min-max formulation has already been extensively studied in the field of robust optimization and many methods for solving such min-max problems have been developed [143].One efficient method is to split Eq. ( 10) into two parts: the outer minimization and the inner maximization.The inner maximization problem is exactly the same problem of generating adversarial perturbations, which have discussed in detail in Sec.II and Sec.III.The outer minimization task boils down to a task of minimizing the loss function on adversarial examples.With this in mind, we develop a threestep procedure to solve the total optimization problem.In the first step, we randomly choose a batch of input samples |ψ (i) in together with their corresponding labels y (i) .Then, we calculate the 'worst-case' perturbation of |ψ (i) in with respect to the current model parameters Θ t .That is to solve: U δ * = argmax U δ ∈∆ L(h(U δ |ψ ; Θ), y (i) ).In the third step, we update the parameters Θ t according to the minimization problem at We repeat these three steps until the accuracy converges to a reasonable value.
Partial of our results are shown in Fig.We also notice that, due to the competition between the inner maximization and outer minimization, the accuracies for the legitimate data sets for training and validation both have an oscillation at the beginning of the adversarial training process.
The above example explicitly shows that adversarial training can indeed increase the robustness of quantum classifiers against a certain type of adversarial perturbations.Yet, it is worthwhile to mention that the adversarially trained quantum classifier may only perform well on adversarial examples that are generated by the same attacking method.It does not perform as well when a different attack strategy is used by the attacker.In addition, adversarial training tends to make the quantum classifier more robust to white-box attacks than to black-box attacks due to gradient masking [31,142].In fact, we expect no universal defense strategy that is adaptive to all types of adversarial attacks, as one approach may block one kind of attack for the quantum classifier but will inevitably leave another vulnerability open to an attacker who knows and makes use of the underlying defense mechanism.In the field of classical adversarial learning, a novel intriguing defense mechanism that is effective against both white-box and black-box attacks has been proposed recently [77].This strategy is called defense-GAN, which leverages the representative power of GAN to diminish the effect of adversarial perturbations via projecting input data onto the range of the GAN's generator before feeding it to the classifier.More recently, a quantum version of GAN (dubbed QGAN) has been theoretically proposed [43,44] and a proof-of-principle experimental realization of QGAN has been reported with superconducting quantum circuits [62].Likewise, it would be interesting and important to develop a defense-QGAN strategy to enhance the robustness of quantum classifiers against adversarial perturbations.We leave this interesting topic for future study.

V. CONCLUSION AND OUTLOOK
In summary, we have systematically studied the vulnerability of quantum classifiers to adversarial examples in different scenarios.We found that, similar to classical classifiers based on deep neural networks, quantum classifiers are likewise extremely vulnerable to adversarial attacks: adding a tiny amount of carefully-crafted perturbations, which are imperceptible to human eyes or ineffective to conventional methods, into the original legitimate data (either classical or quantum mechanical) will cause the quantum classifiers to make incorrect predictions with a notably high confidence level.We introduced a generic recipe on how to generate adversarial perturbations for quantum classifiers with different attacking methods and gave three concrete examples in different adversarial settings, including classifying real-life handwritten digit images in MNIST, simulated time-of-flight images for topological phases of matter, and quantum ground states for studying the paramagnetic/ferromagnetic quantum phase transition.In addition, through adversarial training, we have shown that the vulnerability of quantum classifiers to specific types of adversarial perturbations can be significantly suppressed.Our discussion is mainly focused on supervised learning based on quantum circuit classifiers, but its generalizations to the case of unsupervised learning and other types of quantum classifiers are possible and straightforward.Our results reveal a novel vulnerability aspect for quantum machine learning systems to adversarial perturbations, which would be crucial for practical applications of quantum classifiers in the realms of both artificial intelligence and machine learning phases of matter as well.
It is worthwhile to clarify the differences between the quantum adversarial learning discussed in this paper and the quantum generative adversarial networks (QGAN) studied in previous works [43,44,46,62,144].A QGAN contains two major components, a generator and a discriminator, which are trained alternatively in the way of an adversarial game: at each learning round, the discriminator optimizes her strategies to identify the fake data produced by the generator, whereas the generator updates his strategies to fool the discriminator.At the end of the training, such an adversarial procedure will end up at a Nash equilibrium point, where the generator produces data that match the statistics of the true data from the original training set and the discriminator can no longer distinguish the fake data with a probability larger than one half.The major goal of QGAN is to produce new data (either classical or quantum mechanical) that match the statistics of the training data, rather than to generate adversarial examples that are endowed with wild patterns.
This work only reveals the tip of the iceberg.Many important questions remain unexplored and deserve further investigations.First, the existence of adversarial examples seems to be a fundamental feature of quantum machine learning applications in high-dimensional spaces [67] due to the concentration of measure phenomenon [145].Thus, we expect that various machine learning approaches to a variety of high-dimensional problems, such as separabilityentanglement classification [146,147], quantum state discrimination [148], quantum Hamiltonian learning [149], and quantum state tomography [7,150], should also be vulnerable to adversarial attacks.Yet, in practice how to find out all possible adversarial perturbations in these scenarios and develop appropriate countermeasures feasible in experiments to strengthen the reliability of these approaches still remain unclear.Second, in classical adversarial learning a strong "No Free Lunch" theorem has been established recently [39][40][41], which shows that there exists an intrinsic tension between adversarial robustness and generalization accuracy.In the future, it would be interesting and important to prove a quantum version of such a profound theorem and study its implications in practical applications of quantum technologies.In addition, there seems to be a deep connection between the existence of adversarial perturbations in quantum deep learning and the phenomenon of orthogonality catastrophe in quantum manybody physics [151,152], where adding a week local perturbation into a metallic or many-body localized Hamiltonian will make the ground state of the slightly-modified Hamiltonian orthogonal to that of the original one in the thermodynamic limit.A thorough investigation of this will provide new insight into the understanding of both adversarial learning and orthogonality catastrophe.Finally, an experimental demonstration of quantum adversarial learning should be a crucial step towards practical applications of quantum technologies in artificial intelligence in the future.x * i = xi + δi 5: end for 6: return x * or its equivalent |ψ * ples during the testing phase.This setting assumes no modification of the training data, which is in sharp contrast to poisoning attack, where the adversary tries to poison the training data by injecting carefully-crafted samples to compromise the whole learning process.Within the evasion-attack umbrella, the attacks considered in this paper can be further categorized into additive or functional, targeted or untargeted, and whitebox or black-box attacks along different classification dimensions.Here, in this Appendix, we give more technique details about the attack algorithms used.

White-box attacks
White-box attacks assume full information about the classifier, so the attacker can exploit the gradient of the loss function: ∇ x L(h(x + δ; θ), y).For the convenience and conciseness of the presentation, we will use x (y) and |ψ in (a) interchangeably to represent the input data (corresponding label) throughout the whole Appendix sections.Based on the information of gradients, a number of methods have been proposed in the classical adversarial learning community to generate adversarial samples.In this work, we adopt some of these methods to the quantum setting, including the FGSM, BIM, and PGD methods.In the following, we introduce these methods one by one and provide a pseudocode for each method.
Quantum-adapted FGSM method (Q-FGSM).-TheFGSM method is a simple one-step scheme for obtaining adversarial examples and has been widely used in the classical adversarial machine learning community [22,32].It calculates the gradient of the loss function with respect to the input of the classifier.The adversarial examples are generated using the following equation: where L(h(|ψ in ; Θ * ), a) is the loss function of the trained quantum classifier, is the perturbation bound, ∇ x denotes the gradient of the loss with respect to a legitimate sample x with correct label a, and x * denotes the generated adversarial example corresponding to x.For the case of additive attacks, where we modify each component of the data vector independently, ∇ x is computed componentwise and a normalization of the data vector will be performed if necessary.For the case (x k )j = (x k−1 )j + δj 8: end for 9: (x k ) = πC (x k ) 10: end for 11: return |ψ * = |ψ T of functional attacks, we use a layer of parametrized local unitaries to implement the perturbations to the input data |ψ in .In this case, ∇ x is implemented via the gradient of the loss with respect to the parameters defining the local unitaries.The Eq. (A1) should be understood as: where ω denotes collectively all the parameters for the local unitaries.A pseudocode representation of the Q-FGSM algorithm for the case of additive attacks is shown in Algorithm 1.The pseudocode for the case of functional attacks is similar and straightforward, thus been omitted for brevity.Quantum-adapted BIM method (Q-BIM).-TheBIM method is a straightforward extension of the basic FGSM method [27].It generates adversarial examples by iteratively applying the FGSM method with a small step size α: where x * k denotes the modified sample at step k and π C is projection operator that normalizes the wavefunction.A pseudocode representation of the Q-BIM algorithm for the case of additive attacks is shown in Algorithm 2.

Black-box attacks: transfer attack
Unlike in the white-box setting, black-box attacks assume that the adversary does not have full information about either the model or the algorithm used by the learner.In particular, the adversary does not have the information about the loss function used by the quantum classifier, thus cannot use the gradient-based attacking methods to generate adversarial examples.Yet, for simplicity we do assume that the attacker has access to a vast dataset to train a local substitute classifier that approximates the decision boundary of the target classifier.Once the substitute classifier is trained with high The CNN architecture consists of three layers: a 2D convolution layer, an activational ReLu layer [155], and a fully-connected flattening layer with 0.5 dropout regularization.The last layer is then connected to the final softmax classifier, which outputs the probability for each possible handwritten digit.In our case, we have four categories: 1, 3, 7, 9. (b) The feedforward neural network architecture consists of fully-connected layers and dropout [110] layers with a dropping rate 0.1, which are important for avoiding overfitting.

Classifier based on CNN
Classifier based on FNN Conv (64,8,8)+ReLu FC(512)+ReLu Conv (128,4,4) We use three different methods, namely the BIM, FGSM and MIM methods, to attack both the CNN and FNN classifiers in a white-box setting to obtain adversarial examples.These attacks are implemented by using of Cleverhans [156].For the BIM attack, the number of attack iteration is set to be ten and the step size α is set to be 0.01.For the FGSM attack, the number of iteration is one and the step size is set to be 0.3.For the MIM method, the number of attack iterations is set to be ten, the step size is set to be 0.06, and the decay factor µ is set to be 1.0.A detailed description of the MIM method, together with a pseudocode, can be find in Ref. [135].The performance of both classifiers on the corresponding sets of adversarial examples is shown in Table III in the main text, from which it is clear that the attack is very effective (the accuracy for both classifiers decreases to a value less than 1%).After the adversarial examples were generated, we test the performance of the quantum classifiers on them and find that its accuracy decrease noticeably (see Table III in the main text).
FIG.1.A schematic illustration of quantum adversarial machine learning.(a) A quantum classifier that can successfully identify the image of a panda as "panda" with the state-of-the-art accuracy.(b) Adding a small amount of carefully-crafted noise will cause the same quantum classifier to misclassify the slightly modified image, which is indistinguishable from the original one to human eyes, into a "gibbon" with notable high confidence.
r 1 b m 0 v Y T H + M H y a e z 0 e 7 5 P c 0 6 G b 3 W x I k 2 6 k n o S l J D P j W e 9 v / v Y 4 Y S q 4 N p 3 O 3 9 e u 3 / i h s b G 5 d b N 5 6 8 f b d 3 7 a 3 v n 5 j U 4 y R d k x T U

FIG. 4 .
FIG. 4. The average accuracy and loss as a function of the number of training steps.We use a depth-10 quantum classifier with structures shown in Fig. 2 to perform binary classification for images of digits 1 and 9 in MNIST.To train the classifier, we use the Adam optimizer with a batch size of 256 and a learning rate of 0.005 to minimize the loss function in Eq. (2).The accuracy and loss are averaged on 11633 training samples and 1058 validation samples (which are not contained in the training dataset).

FIG. 5 .
FIG. 5.The average accuracy and loss for the four-category quantum classifier as a function of the number of epochs.Here, we use a quantum classifier with structures shown in Fig. 2 and depth forty (p = 40) to perform multi-class classification for images of digits 1, 3, 7, and 9. To train the classifier, we use the Adam optimizer with a batch size of 512 and learning rate of 0.005 to minimize the loss function in Eq. (2).The accuracy and loss are averaged on 20000 training samples and 2000 validation samples.

FIG. 7 .
FIG. 7. Effect of adversarial untargeted additive attacks on the accuracy of the quantum classifier for the problem of classifying handwritten digits.We use the basic iterative method to obtain adversarial examples.The circuit depth of the model is 20.We choose the step size as 0.1.(a)-(b) For the classifier that classifies digit 1 and 9, accuracy decreases as the average fidelity between the adversarial samples and clean samples decreases.Accuracy decreases as we increase the number of iterations of the attacking algorithm.(c)-(d) Similar plots for the problem of classifying four digits 1, 3, 7, and 9.

FGSMFIG. 8 .
FIG.8.Effects of adversarial untargeted functional attack on the accuracy of the quantum classifier for the problem of classifying handwritten digits 1 and 9. Here, the adversarial perturbation operators are assumed to be a layer of local unitary transformation.We use both the BIM method and the FGSM method to obtain adversarial examples.(a) For the BIM method, we generated adversarial perturbations using different number of iterations with the fixed step size 0.1.(b) For the FGSM method, we generate adversarial perturbations using different step sizes, and the accuracy drops accordingly with increasing step size.

7 FIG. 9 .
FIG. 9. Visual illustration of adversarial examples crafted using different attacks.From top to bottom: the clean and adversarial images generated for the quantum classifier by the BIM algorithm.By applying the additive attack, we can change the quantum classifier's classification result.The top images represent an correctly predicted legitimate example.The bottom images are incorrectly predicted adversarial example, even though they bear a close resemblance to the clean image.Here, the attacking algorithm we employed is BIM(0.1,3)

9 FIG. 10 .
FIG. 10.White-box targeted attacks for the four-category quantum classifier with depth p = 40.(a) The classification probabilities for each digits as a function of the number of attacking epochs.Here, we use the BIM method to attack the quantum classifier.(b) The loss for classifying the image to be 1 or 3 as a function of the number of epochs.(c-d) Similar plots for the functional attacks.(e-f) The accuracy as a function of the average fidelity during the attacking process.Here, we consider additive attacks with both the BIM (e) and FGSM (f) methods.

FIG. 11 .
FIG. 11.Effects of depolarizing noises with varying strength on the accuracy of the quantum classifiers with depth p = 20.The mean classification accuracy is computed on the test set with respect to the fidelity between the original input states and the states affected by depolarizing noises on each qubit with varying strengths.The accuracy and fidelity are averaged over 1000 random realizations.(a) Results for the two-category quantum classifier.(b) Results for the four-category quantum classifier.

FIG. 12 .
FIG. 12. (a) The average accuracy and loss for the two-category classifier as a function of the number of epochs.Here, we use a quantum classifier with structures shown in Fig. 2 and depth ten (p = 10) to perform binary classification for topological/nontopological phases.To train the classifier, we use the Adam optimizer with a batch size of 512 and a learning rate of 0.005 to minimize the loss function in Eq. (2).The accuracy and loss are averaged on 19956 training samples and validation samples.(b) The accuracy of the quantum classifier as a function of the iterations of the BIM attack.Here, the BIM step size is 0.01.

FIG. 13 .
FIG. 13.The clean and the corresponding adversarial time-offlight images for using the quantum classifier to classify topological phases.(Top) A legitimate sample of the density distribution in momentum space for the lower band with lattice size 10 × 10. (Bottom) An adversarial example obtained by the fast gradient sign method, which only differs with the original one by a tiny amount of noises that are imperceptible to human eyes.

FIG. 14 .
FIG.14.The average accuracy and loss function as a function of the number of training steps.We use a depth-10 quantum classifier with structures shown in Fig.2to classify the ferromagnetic/paramagnetic phases for the ground states of HIsing.We plot the accuracy of 1182 training samples and 395 validation samples (which are not in the training dataset).We present the results of the first 200 iteration epochs.The learning rate is 0.005.The difference between the training loss and validation loss is very small, indicating that the quantum classifier does not overfit.The final accuracy on the 395 test samples is roughly (98%).

FIG. 15 .
FIG.15.Effect of additive adversarial attack on the accuracy of the two-category quantum classifier in classifying the ferromagnetic/paramagnetic phases for the ground states of the transverse field Ising model.We use both the BIM and FGSM methods to generate adversarial examples in the white-box untargeted setting.For the BIM method, we fix the step size to be 0.05 and the iteration number to be ten.For the FGSM method, we perform the attack using a single step but with step size ranging from 0.1 to 1.0.The circuit depth of the quantum classifier being attacked is p = 10 and the system size for the Ising model is L = 8.(a) The results for the BIM attack.(b) The accuracy as a function of average fidelity between the legitimate and adversarial samples for both the BIM and FGSM methods.

TABLE I .
Average fidelity ( F ) and accuracy (in %) of the quantum classifier when being additively attacked by the BIM and FGSM methods in the white-box untargeted setting.For the two-category (four-category) classification, we use a model circuit of depth p = 10 (p = 40).For the BIM method, we generate adversarial examples using three iterations with a step size of 0.1.We denote such attack as BIM(3, 0.1).For the FGSM method, we generate adversarial examples using a single step with a step size of 0.03 (0.05) for the two-category (four-category) classifier.We denote such attacks as FGSM(1, 0.03) and FGSM(1, 0.05), respectively.

TABLE III .
Black-box attacks to the quantum classifier.Here, the adversarial examples are generated by three different methods (i.e., BIM, FGSM, and MIM) for two different classical classifiers, one based on CNN and the other on FNN (see Appendix ).This table shows the corresponding accuracy (in %) for each case on the MNIST test dataset.We denote the predication accuracy of the classical neural networks (quantum classifier) on the test set as αC (αQ), and the predication accuracy on the adversarial test set as α adv C (α adv Q ).The accuracy of the quantum classifier drops significantly on the adversarial examples generated for the classical neural networks.

TABLE IV .
Average fidelity F and accuracy (in %) of the twocategory quantum classifier with depth p = 10 when being attacked by the BIM and FGSM methods in the white-box untargeted setting.Here, the accuracy and fidelity are averaged over 2000 testing samples.
basic idea of adversarial training is to strengthen model robustness by injecting adversarial examples into the training set.It is a straightforward brute force approach where one simply generates a lot of adversarial examples using one FIG.16.Strengthening the robustness of the quantum classifier against adversarial perturbations by quantum adversarial training.In each epoch, we first generate adequate adversarial examples with the BIM method for the quantum classifier with the current model parameters.The iteration number is set to be three and the BIM step size is set to be 0.05.Then, we train the quantum classifier with both the legitimate and crafted samples.The circuit depth of the quantum classifier is ten and the learning rate is set to be 0.005.
16.In this figure, we consider the adversarial training of a quantum classifier in identifying handwritten digits in MNIST.We use the BIM method in the white-box untargeted setting to generate adversarial examples.We use 20000 clean images and generate their corresponding adversarial images.The clean images and the adversarial ones together form the training data set, and another 2000 images are used for the testing.From this figure, it is evident that, after adversarial training, the accuracy of the quantum classifier for both the adversarial samples and legitimate samples increases significantly.At the beginning of the training, the accuracy for the adversarial samples in the testing set remains zero.This is because the initial model parameters are randomly chosen, so the quantum classifier does not learn enough information and its performance on even legitimate samples is still very poor at the beginning (hence for each sample it is always possible to find an adversarial example by the BIM method, resulting in a zero accuracy on the testing set of adversarial examples).After the early stage of the adversarial training, this accuracy begins to increase rapidly and the quantum classifier is able to classify more and more crafted samples correctly.In other words, the BIM attack becomes less and less effective on more and more samples.At the end of the training, the accuracies for both the legitimate and adversarial data sets converge to a saturated value larger than 98%, indicating that the adversarially retrained quantum classifier is immune to the adversarial examples generated by the BIM attack.

TABLE V .
Model architectures for the classical neural networks.(a) [154] white-box attack strategy can be applied on it to generate adversarial examples, which can be used to deceive the target classifier due to the transferability property of adversarial examples.In this work, we consider the transfer attack in a more exotic setting, where we use different classical classifiers as the local substitute classifier to generate adversarial examples for the quantum classifier.The two classical classifiers are based on the CNN and FNN, respectively.In TableV, we show the detailed structures of the CNN and FNN.To train these two classical classifiers, we use the Adam optimizer[100]and a batch size of 256.The learning rate is set to be 10 −3 during training.The corresponding learning process is implemented using Keras[153], a high-level deep learning library running on top of the TensorFlow framework[154].After training, both the CNN and FNN classifiers achieve a remarkably high accuracy on the legitimate testing dataset (98.9% and 99.9% respectively, see TableIIIin the main text). confidence