Enhancing Quantum Adversarial Robustness by Randomized Encodings

The interplay between quantum physics and machine learning gives rise to the emergent frontier of quantum machine learning, where advanced quantum learning models may outperform their classical counterparts in solving certain challenging problems. However, quantum learning systems are vulnerable to adversarial attacks: adding tiny carefully-crafted perturbations on legitimate input samples can cause misclassifications. To address this issue, we propose a general scheme to protect quantum learning systems from adversarial attacks by randomly encoding the legitimate data samples through unitary or quantum error correction encoders. In particular, we rigorously prove that both global and local random unitary encoders lead to exponentially vanishing gradients (i.e. barren plateaus) for any variational quantum circuits that aim to add adversarial perturbations, independent of the input data and the inner structures of adversarial circuits and quantum classifiers. In addition, we prove a rigorous bound on the vulnerability of quantum classifiers under local unitary adversarial attacks. We show that random black-box quantum error correction encoders can protect quantum classifiers against local adversarial noises and their robustness increases as we concatenate error correction codes. To quantify the robustness enhancement, we adapt quantum differential privacy as a measure of the prediction stability for quantum classifiers. Our results establish versatile defense strategies for quantum classifiers against adversarial perturbations, which provide valuable guidance to enhance the reliability and security for both near-term and future quantum learning technologies.


I. INTRODUCTION
The flourish of machine learning has led to unprecedented opportunities and achieved dramatic success in both research and commercial fields [1,2].Some notoriously challenging problems, ranging from predicting protein structures [3] and weather forecasting [4] to playing the game of Go [5,6], have been cracked recently.Meanwhile, the field of quantum computation has also made tremendous progress in recent years [7,8], giving rise to unparalleled opportunities to speedup, enhance or innovate machine learning [9][10][11][12].Within this vein, ideas and concepts from the physics domain have been utilized as core ingredients for quantum machine learning algorithms [13][14][15][16][17][18][19][20].Notable examples in this direction include the Harrow-Hassidim-Lloyd algorithm [13], quantum principal component analysis [14], quantum generative models [12,15,16], quantum support vector machines [17], and variational quantum algorithms based on parametrized quantum circuits [18][19][20][21], etc.Yet, an important issue regarding quantum learning systems concerns their reliability and security in adversarial scenarios, especially for noisy intermediate-scale quantum (NISQ) devices [22].Here, we introduce general defense strategies by randomly encoding legitimate data samples, and analytically show their adversarial robustness in a rigorous fashion (see Fig. 1

for illustration).
Adversarial machine learning is an emerging frontier that studies the vulnerability of machine learning systems and develops defense strategies against adversarial attacks [23,24].In the classical scenario, the prediction of a deep neural network can be susceptible to tiny carefully-crafted noises, which are even imperceptible to human eyes, added to the legitimate input data [25][26][27][28][29].These adversarial noises can be gener- * Electronic address: dldeng@tsinghua.edu.cnated by either a malicious adversary or the worst-case experimental noise from an unknown source.Recent works have demonstrated that quantum learning systems are vulnerable under adversarial settings similar to their classical counterparts [30][31][32], sparking a new interdisciplinary research frontier of quantum adversarial machine learning [30][31][32][33][34][35].From the theoretical aspect, even an exponentially small perturbation can cause a moderate adversarial risk for a given quantum classifier [32].Furthermore, it has been shown that there exist universal adversarial attacks for multiple quantum classifiers or input data samples [31].More recently, quantum adversarial learning has been experimentally demonstrated with both large-scale real-life datasets and quantum datasets on superconducting quantum devices [33].To improve the robustness of quantum machine learning algorithms and defend against adversarial attacks, a straightforward approach is to employ a quantum-adaptive adversarial training [30].However, adversarial training in general requires generation of a large number of adversarial samples and may only perform well for the same attacking method that generates those samples.
In classical adversarial learning, randomness is suggested to be the possible resource for developing defense strategies against adversarial perturbations [36][37][38][39][40][41][42].However, these results are mostly empirically and there has been no unified framework for employing randomness in this context.In quantum computation, quantum error correction codes are widely used to detect and correct experimental errors.However, the errors that can be corrected are assumed to be local while adversarial perturbations are either carefully engineered or worst-case noises.In addition, the vanishing gradients (i.e.barren plateaus) for quantum circuits with randomly distributed parameters is a potential protection from most commonly used gradient-based adversarial algorithms [30,43].A potential approach to achieve provable adversarial robustness for quantum classifiers is to combine randomness with quantum error correction and barren plateaus phenomenon studied in quantum computation.
In this paper, we propose an approach employing a randomized encoding procedure to protect the quantum learning systems from potential adversarial perturbations.Under practical adversarial learning scenarios, adversarial perturbations can originate from either carefully-crafted perturbations created by the attackers that have full access to the gradient information [30] or the worst-case experimental noises from unknown resources [44].We show the effectiveness of our scheme by using two concrete types of random encoders to mask the gradient information from the adversary and improve the robustness of quantum learning algorithms.The first type uses random unitary encoders and is more practical for NISQ devices, whereas the second type exploits quantum error correction encoders that are necessary for the future fault-tolerant quantum computation.
For the first type, we rigorously prove that a random global unitary encoder that satisfies 2-design property [45] leads to exponentially small gradients for adversarial variational circuits, and thus creates barren plateaus [46][47][48][49][50][51][52][53][54][55][56][57][58][59][60] that may hinder gradient-based algorithms in generating adversarial perturbations.We further prove that even random encoders that can be decomposed into tensor products of unitary 2-design blocks of smaller sizes can generate barren plateaus for the adversaries as well.To benchmark the performance, we carry out numerical simulations concerning the classification of topological phases of the cluster-Ising model [61,62] with different loss functions and system sizes.For the second type of encoders, we consider local adversarial perturbations generated by worst-case experimental noises.We prove a lower bound for the adversarial risk in this setting based on the concentration of measure phenomenon in the high dimensional space.We analytically show that a random black-box quantum error correction (QEC) [63,64] encoding procedure can improve the robustness of quantum learning systems for local unitary attacks.In particular, we show that it is sufficient to concatenate only O(log log(n)) levels of QEC encoders to bound the adversarial risk below a constant value.We adapt quantum differential privacy [65][66][67] to measure the robustness of quantum classifiers against adversarial perturbations.We prove an information-theoretical upper bound for the adversarial risk of quantum learning algorithms satisfying differential privacy.
The randomized encoding approach introduced in this paper is distinct from the previous literature that either exploit deterministic encoders for binary classification [68] or add white noises [44].Compared to the deterministic encoder scheme which uses amplitude and phase encoding, our approach uses variational unitary circuits that are more experimental compatible for NISQ devices.Whereas adding white noise may diminish the performance of the quantum classifiers, our approach will not influence the accuracy of classification algorithms.Furthermore, in contrast to the classical algorithms that employ randomness against adversarial attacks, our approaches provide rigorous theoretical bounds rather than empirical performance benchmarks.Our results not only establish a profound connection among quantum error correction, quantum differential privacy, barren plateau phenomenon, and quantum adversarial robustness, but also provide practical defense strategies that may prove valuable in future applications of quantum learning technologies.
The paper is organized as follows.In Sec.II, we introduce the basic concepts and the general framework for quantum adversarial learning.In Sec.III, we present two theorems demonstrating that both global and local randomized unitary encoders on input data samples can lead to vanishing gradients, which may hamper gradient-based algorithms from creating adversarial perturbations.We provide numerical evidence concerning classifications on the phases of the cluster-Ising model to benchmark the effectiveness of our approach.In Sec.IV, we give two theorems, one proving the vulnerability of quantum classifiers against local unitary adversarial perturbations, the other demonstrating that black-box quantum error correction encoders can effectively defend the local unitary adversarial noises on the input data samples.Finally, in Sec.V, we discuss several open problems and conclude the paper.

II. BASIC CONCEPTS AND GENERAL FRAMEWORK
Machine learning technologies have recently achieved remarkable breakthroughs in various real-world applications [1,2] including natural language processing [69], automated driving [43], and medical diagnostics [70].Meanwhile, serious concerns have also been raised about the integrity and security of such technologies in various adversarial scenarios [23,25,26].For instance, the medical recognition software from a medical diagnostics or a sign recognition system from a self-driving car may cause catastrophic medical or traffic accidents if they are not robust against some occasional modifications (which may even be imperceptible to human eyes) in identifying medical scans or traffic images [71].To address these vital problems and concerns, the field of adversarial machine learning has been developed to construct and defend the potential adversarial manipulations against machine learning systems under different scenarios [72].The field has attracted considerable attention and there are rapid developments for both the attack and defense strategies in different adversarial settings.For simplicity and concreteness, we will only focus our discussion on the setting of supervised learning, although generalizations to unsupervised or reinforcement learning settings are possible and worth systematic future investigations.
On the one hand, there have been a number of algorithms proposed to transfer the adversarial attack problem into an optimization one and solve the corresponding problem or its variants through optimization strategies [25,[27][28][29][72][73][74][75][76][77].We divide the adversarial attacks into black-box and whitebox attacks according to the amount of information known by the adversary about the target classifier.In the white-box setting, the attacker has full information about the inner structure and algorithm of the classifier.Whereas, in the black-box setting the attacker possesses only partial or even no information about the classifier.A crucial piece of information under adversarial settings is the gradient information about the classifier.The gradients can be calculated based on the inner structure, algorithm and the loss function of the classifier.In the white-box setting, various algorithms such as the fast gradient sign method (FGSM ) [74], basic iterative method (BIM) [29], projected gradient descent (PGD) [74], and momentum iterative method (MIM) [78] have been developed based on the gradient information.In the black-box setting, algorithms that exploit the transferability property of neural-network classifiers have been developed, including the transfer attack [28], substitute model attack [73,75], and zeroth-order optimization (ZOO) attack [77] methods.On the other hand, a number of defense strategies against adversarial attacks have been developed as well.Some notable examples includes adversarial training [79], defense generative adversarial network [80,81], and knowledge distillation [82,83].These algorithms have achieved satisfying robustness performance against particular types of adversarial attacks.In general, we cannot expect a defense strategy that can promote the robustness of all machine learning algorithms against any adversarial attacks as long as the adversary knows the information about the classifier.An alternative protocol to protect the classifier is to hide the information from the attackers.Some algorithms along this direction include adding random noise or transformations which smooths the gradients and the landscape of the loss function [36][37][38][39][40][41].As a trade-off, these approaches in general would increase the difficulty in training the classifier.
Quantum classifiers are analog of classical classifiers, which aim to solve classification problems with quantum devices [84].In this paper, we propose a defense strategy for quantum classifiers against adversarial attacks through randomized encoders.We start with a brief introduction to the basic concepts, notations, and ideas of quantum classifiers and quantum adversarial learning.In general, a quantum classification task in the supervised learning setting aims to assign a label s ∈ S to an input quantum data sample ρ ∈ H, with S a countable label set and H being a subspace of the entire Hilbert space.For technical simplicity, we suppose that the input quantum states are pure states.The supervised learning procedure aims to learn a function (called a hypothesis function) h : H → S that outputs a label s ∈ S for each input state ρ ∈ H.To achieve this goal, we parametrize the hypothesis function with θ ∈ Ξ, where Ξ is the parameter space.We train the classifier with a set of training data T N = {(|ψ (1) , s (1) ), ..., (|ψ (N ) , s (N ) )}, where |ψ (i) and s (i) (i = 1, ..., N ) are the input states and the corresponding labels.This procedure is usually achieved by minimizing a chosen loss function min θ∈Ξ L N (θ) over parameter space ) denoting the loss function averaged over the training set.A number of different quantum classifiers with different structures, loss functions, and optimization methods have been proposed [17,[85][86][87][88][89][90][91][92][93][94][95][96][97][98].Each approach bears its pros and cons, and the choice of the classifiers depends on the specific problem.A straightforward approach to construct a quantum classifier, known as variational quantum classifiers [85,86,88], is to exploit variational quantum circuits [18][19][20] to optimize the loss function analogously to quantum support vector machines [96].There exist a number of different variants on the structures of the variational quantum circuits, including hierarchical quantum classifiers [93] and quantum convolutional neural networks [91].
Recent researches have shown that quantum classifiers also suffer from the vulnerability problem under adversarial attacks [30][31][32]35], with an experimental demonstration marked as the latest progress [33].Unlike the training procedure, finding an adversarial example for quantum classifiers can be regarded as a different optimization program on the input data space.Specifically, our goal is to discover the unitary perturbation U δ within a restricted region ∆ close to identity, which after being added to the legitimate input states, will maximize the loss function: In the white-box setting, the inner structures of quantum classifiers and the loss function are known to the attackers.Hence, the attackers can solve the optimization problem in Eq. ( 1) exploiting the gradient information of the loss function.There have been several algorithms to attack quantum classifiers, such as quantum-adaptive BIM, FGSM, MIM algorithms [30], etc.
The defense strategy under these adversarial settings remains largely unexplored, with most attention concentrated on proving the robustness of a given classifier [68,99,100].Some notable algorithms to boost the robustness of a quantum classifier, such as adversarial training [30] and adding random noise [44], still suffer from white-box adversarial attacks or the loss of useful information.
Here, we propose a generally applicable scheme to protect the quantum machine learning systems using randomized encoders in adversarial settings.Our essential idea is illustrated in Fig. 1.We transfer the classification task into a three-party protocol, in which Alice prepares a legitimate quantum input data sample, Bob receives the data sample and performs the classification, and Eve is the potential adversary performing adversarial manipulations.We assume that Alice and Bob share a codebook C = {p i , E i } consisting of different encoders E i with the corresponding decoders, and the probability distribution {p i } of choosing E i .The agreement on the codebook can be realized by quantum key distribution [101] or quantum teleportation [63].We show that by randomly choosing an encoder from the codebook, the encoded quantum data can be robust against adversarial noises.Roughly speaking, the random transformation induced by the encoder masks the information that can be obtained by the adversary, thus mitigating the adversarial risk.Specifically, we consider two types of codebooks shared by Alice and Bob in the following two sections concerning the variational quantum machine learning on NISQ devices and the fault-tolerant quantum machine learning in the future.We provide analytical bounds for the robustness of protected quantum machine learning systems under adversarial settings.

III. DEFENSE ADVERSARIAL ATTACKS WITH BARREN PLATEAUS
We first consider the case of adding a random unitary transformation as an encoder.We note that any adversary attack can be effectively implemented as adding a L-layer parametrized variational quantum circuit (PVQC) U (θ), as shown in Fig. 2(a).More concretely, we can write the adversarial PVQC as where U l (θ l ) = exp(−iθ l A l ) is the parametrized variational component in each layer, A l is a Hermitian operator, and W l is a unitary operator that represents the fixed component in each layer.We assume the classifier V (Θ) is well-trained with parameters Θ.It can be a general unitary operator such as a Player PVQC shown in the figure.To perform a prediction, we simply measure some particular qubits at the output after the classifier and assign labels according to the measurement outcomes.Given an input pure state |ψ in , the loss function can be regarded as an expectation value over a Hermitian operator H.For the legitimate input and the adversarial input, the loss functions can be written as To protect the quantum classifier V (Θ) from the adversarial PVQC U (θ), we exploit a random encoder E and the corresponding decoder E † to encrypt the legitimate data sample |ψ in .We note that the codebook C = {E i } contains a particular set of encoders with probability distribution p i .We assume that C is unitary 2-design [45], namely that the first and the second moments are equivalent to the corresponding moments with respect to the Haar measure dµ H (E): where M is an arbitrary operator.
where U ≡ U (θ) and V ≡ V (Θ) are parametrized with θ and Θ, respectively.In the adversarial setting, we assume that the adversarial PVQC is initialized with θ 0 such that U (θ 0 ) = I, i.e., the adversary starts from a legitimate quantum sample and explores the gradient direction to maximize the value of the loss function.We denote ∂ θ l L(Θ, E i ; θ) to be the gradient of L(Θ, E i ; θ) with respect to each parameter θ l , l = 1, ..., L in the adversarial PVQC.Now, we are ready to present our first theorem regarding the expectation and variance on each Theorem 1. Suppose we exploit a randomly chosen global unitary encoder E i from a unitary 2-design codebook C = {p i , E i } .The expectation and variance of the derivatives of the loss function defined in Eq. ( 4) with respect to any component θ l ∈ θ satisfy the following (in)equalities: where θ 0 are the initial parameters for the adversarial PVQC with U (θ 0 ) = I, ρ = |ψ in ψ| in is the density matrix of the input state, A l is the Hermitian operator of the parametrized variational component in the l-th layer, H V = V † HV , and d = 2 n is the dimension of the Hilbert space.
Proof.We give a brief sketch of the essential idea here.The full proof is technically involved and thus left to Appendix.A. As we assume that the random encoder C = {p i , E i } satisfies the unitary 2-design properties and U (θ 0 ) = I, we can obtain the expectation and variance of th gradients by calculating the first and second moments using Haar integral.To derive the analytical results, the Haar integrals are calculated by Schur-Weyl duality [106].We first prove that the E Ei∈C [∂ θ l L(Θ, E i ; θ 0 )] is an integral over the first moment and thus vanishes.We then calculate the variance of the gradient using Var The variance can thus be obtained by a second moment integral, which results in an exponentially small value and yields Ineq.(6).
This theorem guarantees that by choosing a random unitary encoder from the codebook C, we can bound the variance of gradients for any parameters in any potential adversarial PVQC circuits with an exponentially small value.By using Chebyshev's inequality, this theorem indicates that the probability of finding a gradient along any direction of amplitude larger than a fixed constant τ > 0 is exponentially small.It has been proved in Ref. [32] that the vulnerability of a quantum classifier also grows exponentially with the system size n and perturbations of only O( 1/d), d = 2 n can render a considerable adversarial risk.However, this result does not crack the security guarantee in our result because we prove that the gradient vanishes at a more rapid speed O(1/d).The exponentially small gradients for the adversarial PVQC lead to a barren plateau which requires exponentially large precision and iteration steps for the adversary that exploits gradientbased algorithm to construct an adversarial example.Therefore, this algorithm protecting the quantum machine learning systems by masking the gradient information from the attackers.We emphasize that our protection encoder can be efficiently realized using a circuit containing only O(n 2 ) gates to satisfy the unitary 2-design requirement, which is roughly the same scaling as most quantum classifiers commonly used in practice.
We also stress that the adversarial PVQC is restricted to a small neighborhood of the identity operator, thus itself do not satisfy unitary 2-design.The barren plateaus faced by the adversary are induced by the random encoding process with the codebook.This is in sharp contrast to the barren plateaus for variational quantum circuits studied in the previous literature [46,47], where the variational circuits themselves are required to be unitary 2-design.
Theorem 1 can be further extended to other codebooks.For example, we consider another model, where the encoder E i can be written as a tensor product of m-qubit blocks (m < n) with each block satisfying unitary 2-design.We show that using these encoders, one can create barren plateaus for the adversary PVQC with a lower request on the number of gates.Without loss of generality, we assume that n = mξ and E i = ξ j=1 E j i such that ensemble {p j i , E j i } forms a unitary 2-design for all j.We can similarly decompose the operator A l in each layer of the adversarial PVQC as: We assume that , ∀l, i, and A j l,k is traceless.We remark that this assumption is reasonable, in the sense that it is satisfied by most commonly used quantum variational circuits.We have following theorem: Theorem 2. Assume we exploit a randomly chosen encoder E i , which can be written as the tensor product of ξ m-qubit blocks independently chosen from unitary 2-design codebook C = {p i , E j i } (j = 1, ..., ξ).We assume the operators in adversarial PVQC can be decomposed as Eq.following (in)equalities: Proof.We sketch the main idea for the proof here and leave the technical details in Appendix.A. As we assume that each block E j i in the encoder satisfies unitary 2-design independently, we can obtain the expectation and variance of the loss function by calculating the Haar integral separately on each E j i .According to the decomposition in Eq. ( 7), we regard A l as a summation of terms that are tensor products of operators on each block and calculate these terms separately.In the first step, we derive the zero expectation on the gradients in Eq. ( 8) by calculating the first moment similar to Theorem 1. Next, we compute the variance of the gradients by calculating the second moment Haar integral for each E j i .The result for the integral contains 2 ξ terms.We can derive the upper bound for the variance in Eq. ( 9) based on the assumption that A j l,k is traceless with Tr A j2 l,k ≤ 2 m , ∀l, i, k.
Theorem 2 indicates that, under particular assumptions on the adversarial PVQC, even if the encoder only satisfies unitary 2-design on each of the subspace SU (2 m ) for any m ≥ 2, the variance of the gradients for adversarial PVQC still decreases exponentially as the system size increases.By exploiting this scheme, we can reduce the gate count required in Theorem 1 from O(n 2 ) to O(ξm 2 ) = O(n).Compared with Theorem 1, the codebook requires fewer experimental resources at the price of a larger upper bound on the variance for the gradients.We mention that this encoder scheme carries over to adversarial PVQCs with other inner structures, although we can only analytically derive the variance bound under some constraints for the adversary due to technique difficulties.
We stress that our approach does not rely on any specific properties of the quantum classifiers V (Θ).It does not require that V (Θ) is unitary 2-design and applies to arbitrary quantum classifiers.Therefore, we can avoid the barren plateau landscape when training the quantum classifier by using shallow circuits or some quantum circuits with specific structures that are not unitary 2-design, such as quantum convolutional neural networks [49,91].Even though we only rigorously prove the case for the loss function that can be regarded as an expectation value over Hermitian operator H, our method can also effectively protect quantum classifiers equipped with other loss functions.This claim is supported by the numeric results using Kullback-Leibler (KL) divergence [107] in the subsequent paragraphs.
To verify that the scaling results in the above theorem are valid for quantum machine learning models with modest system sizes and different loss functions, we carry out numerical simulations on classifying topological phases for the ground states of the cluster-Ising model [61,62]: where σ α i , α = x, y, z denotes the Pauli matrices on the i-th qubit and λ is the interaction strength.Here, we take the open boundary condition.This model features a phase transition at λ = 1, between the cluster phase for 0 < λ < 1 and the antiferromagnetic phase for λ > 1.We sample the Hamiltonian with a different parameter λ from 0 to 2 and compute the corresponding ground states.We then construct the dataset using these ground states with the corresponding labels.We carry out the classification task using variational quantum classifiers of varying systems sizes from four to fourteen qubits and depth ten.We consider two types of loss functions for the classifiers: (i) the normalized square loss 1 − φ|ψ out 2 where |ψ out is the output state at the end of the circuit in Fig. 2 and |φ is the states encoded by the target labels ; (ii) the KL divergence between |ψ out and |φ .We construct the encoder via a PVQC of four layers and sample the gradients from an adversarial PVQC of four layers.The results are obtained by averaging over variational encoders and adversarial PVQC with random parameters and input data samples.Further details for numeric results are provided in Appendix.B. As shown in Fig. 3(a), the expectation values of the gradient along any directions in the adversarial PVQC converges to zero rapidly as we increase the number of samples, which is consistent with Eq. ( 5).From Fig. 3(b), we can observe that the variance of the gradients decays exponentially as the system size increase from four to fourteen qubits.The outcome from this numerical simulation fits the result for global encoder settings given by Eq. ( 6).In Fig. 3(c), we perform numerical experiments for the local encoder settings at m = 2 in Eq. ( 9).We construct the encoder by using a PVQC that can be written as a tensor product of two-qubit blocks each satisfies unitary 2design by randomly changes the parameters in the block.The two-qubit blocks are set to be a two-layer variational quantum circuit with the inner structure described in Appendix B. We observe that the variance of gradient approaches zero rapidly as the system size increases.The numerical result shows the exponential decay of gradients predicted in Eq. ( 9).

IV. DEFENDING LOCAL ADVERSARIAL NOISES BY BLACK-BOX QUANTUM ERROR CORRECTION
As we mentioned in the previous section, the adversarial perturbation can be regarded as experimental noises in the worst case.Under experimental settings, most operations and noises are local [7,108].Therefore, in this section we consider the case in which both the adversarial perturbation and state preparation can be written as tensor products of single-qubit rotations [30].This setting is widely employed in qubit-encoding quantum computation and machine learning [109].We consider the quantum classifier model C mentioned in Sec.II.We first analytically evaluate the vulnerability of quantum classifiers against such local adversarial perturbations.We suppose the quantum classifier h : H → S maps the locally encoded data from n i=1 SU (2) to a label set S = {s 1 , .., s K } that contains K labels.We assume that the input data sample g ρ is chosen from H according to a probability measure µ(•).We denote µ(h −1 (s k )) to be the fraction of data that will be assigned the label s k by the classifier.We now introduce the following measure of adversarial risk: Definition 1.Consider a hypothesis function h : H → S. Suppose the input data ρ is chosen from H according to the measure µ(•).Suppose an adversarial attack A : ρ → ρ , ∀ρ ∈ H occurs under the constraint d(ρ, ρ ) ≤ , we denote M = {ρ ∈ H|h(ρ) = h(ρ )} to be the set containing all the states that can be made as adversarial data samples.The adversarial risk is defined as µ(M ).
We consider the set of input states that can be encoded by a local unitary operator on a certain initial state (e.g., the |0 ⊗n state) and thus the classification of the quantum data is equivalent to the classification of special unitary groups n i=1 SU (2).For technical simplicity, we assume that the input data sample g ρ is uniformly chosen from H according to the Haar measure for each qubit µ ⊗n H (•), and denote µ ⊗n H (h −1 (s k )) to be the fraction of data that will be assigned the label s k by the classifier.For two states ρ = g ρ |0 ⊗n and σ = g σ |0 ⊗n , where g ρ = n i=1 g i ρ and g σ = n i=1 g i σ are chosen from n i=1 SU (2), we exploit the normalized Hamming distance to measure the difference between g ρ and g σ : This normalized Hamming distance measures the fraction of unequal g i ρ and g i σ for all the qubits.We can then deduce the following theorem concerning the effectiveness of local unitary adversarial attack: Theorem 3. Consider a quantum classifier that maps an input sample from n i=1 SU (2) to a K-label set S = {s 1 , ..., s K }.Suppose we choose an operator g ρ from n i=1 SU (2) according to the Haar measure on each qubit µ ⊗n H (•). Without loss of generality, we assume µ ⊗n There always exists a perturbation g ρ → g ρ with d NH (g ρ , g ρ ) ≤ τ , such that the adversarial risk is greater than R ∈ (0, 1) if Proof.We provide the intuition here and the technical details for the full proof are provided in Appendix.C. Notice that the input data space n i=1 SU (2), when equipped with the Haar measure µ ⊗n H (•) and the normalized Hamming distance d NH (•), forms a (2, n 2 n )-Levy family [110,111].By exploiting the concentration of measure phenomenon on the Levy family [112,113], we show that the measure of all data within distance τ from a subset H ⊆ We then use De Morgan's law to prove that in the case of a K-label classification, by choosing k = 2, 3, . . ., K that minimizes ln[ we can bound the adversarial risk below by R.This ends the proof of the theorem.
The above theorem indicates that, for any quantum classifiers receiving input data of n qubits, an adversarial attack that only changes a fraction O( 1 √ n ) of qubits will result in a moderate adversarial risk bounded below by R. As the system size n increases, the vulnerability of a quantum classifier becomes more severe even for local unitary adversarial attacks.It has been shown in Ref. [32] that in the setting of global encoding quantum data from SU (d) (d = 2 n ) and global adversarial perturbation, an perturbation of O( 1 √ d ) strength under the Hilbert-Schmidt distance measure can guarantee a moderate adversarial risk.Compared with this global case, the adversarial risk under a local unitary attack is not as severe since additional constraints has been assumed for possible attacks.However, for large quantum machine learning systems, Eq. ( 12) still shows that the prediction is unstable even under a tiny noise.We remark that Eq. ( 12) still holds for other distance measures, such as the normalized Hilbert-Schmidt distance d NHS (g ρ , g σ ) = 1 n n i=1 d HS (g i ρ , g i σ ) that calculates the average Hilbert-Schmidt distance between each qubit.This follows from the fact that the normalized Hilbert-Schmidt distance is always bounded above by the normalized Hamming distance.
The quantum error correction (QEC) codes [63,64] are widely used in quantum computation to protect the computation from local noises and are believed to be a crucial building block in the future implementation of quantum computers.Inspired by this, it would be natural to think whether quantum error correction codes can effectively protect quantum machine learning systems from local unitary adversarial attacks.A typical QEC procedure contains an encoder E, an error correction circuit C and a decoder D. The encoder E encodes a logical quantum state ρ L from the logical Hilbert space H L into a physical quantum state ρ P from physical Hilbert space H P When errors occur, we perform an error correction on physical qubits to correct particular types of errors and use the decoder to recover the original logical state ρ L .A popular choice for QEC is the [n 0 , k, t] code [63], which encodes each k logical qubits into n 0 physical qubits and is able to correct t 2 erroneous physical qubits for each logical qubit.Without loss of generality, we consider the case when k = 1 and t = 3.The corresponding QEC codes can correct one local error for each logical qubit.
We consider the model in Fig. 4(a).The adversarial settings are similar to Sec.III except that we assume both logical state |ψ L and the adversarial attack are local.To protect the quantum machine learning systems from such adversarial attacks, we consider applying a quantum error correction encoder after the state preparation stage and the corresponding decoder before the classification stage.The QEC encoder is a black-box oracle for the adversary and thus can be regarded as a random encoder that encodes each logical qubit into physical qubits and is able to correct particular types of local errors on these physical qubits.We remark that the assumption of a random QEC encoder can be experimentally practical.A straightforward approach is to permutate the physical qubits randomly such that the adversary does not know the corresponding encoding structures between logical qubits and physical qubits.The fact that short random circuits are good QEC codes [114] indicates that random QEC codes can also be realized using circuits only containing O(n log n) gates.
To quantify the enhancement of adversarial robustness, we adapt the idea of quantum differential privacy (QDP) and utilize it as a measure for adversarial robustness [67,115,116].Differential privacy [66] is the property of an algorithm whose outputs can not be distinguished when inputting neighboring dataset.We can thus measure the sensitivity and vulnerability of the algorithm when changing the input using the differential privacy.For two input data samples that are separated by a small distance, differential privacy bounds the distance of the outputs after the algorithm.A formal definition of quantum differential privacy is given below: Definition 2. (Quantum differential privacy [67]) Consider the quantum algorithm Q and a measurement M on its output.The algorithm Q is said to be ( (τ ), γ)-quantum differential privacy if for all input quantum states ρ, σ that satisfy d(ρ, σ) ≤ τ , the following inequality holds for any possible subset Y of all possible outcomes of the measurement: (13) where (τ ) is a function of distance τ .
For technical simplicity, we focus on the case γ = 0, referred to as -QDP.As shown in Fig. 4(b), if we consider two neighboring x and x input quantum data, the -QDP property bounds the difference between the probability distributions {p 1 , p 2 , p 3 , p 4 } and {p 1 , p 2 , p 3 , p 4 } after the classification by bounding each p i /p i ∈ (e − , e ).We focus on the locally encoded states in this section and exploit the normalized Hamming distance as a distance measure.It is shown in Refs.[44,67,115] that adding any amount of white noise can make the algorithm satisfy quantum differential privacy under trace distance and normalized Hamming distance.Therefore, it is reasonable to assume that under experimental settings the quantum machine learning system satisfies quantum differential privacy as we can always keep a tiny amount of quantum white noise whose influence is negligible.We remark that as distance τ increases, the corresponding (τ ) will always increase according to the definition given by Eq. ( 13).For a quantum classification algorithm Q : H → R |S| that maps a quantum data sample to a probability distribution on the label set S, the quantum differential privacy property in Definition 2 has direct connection with quantum adversarial risk in Definition 1.The adversarial risk for Q can be explicitly written as where δ is the adversarial perturbation and the expectation value is averaged over all choice of ρ according to the measure µ(•).By Jensen's inequality, the probability for two random variable chosen from probability distribution P = (p 1 , ..., p |S| ) and Q = (q 1 , ..., q |S| ) having the same value can be bounded by i p i q i ≤ exp( i p i log(q i )) = exp(−d KL (P, Q) − H(P )).Here, H(•) is the Shannon entropy and d KL (•) is the KL divergence between two distributions.Noticing that the algorithm Q is (τ )-QDP, P = Q(ρ), Q = Q(ρ + δ), and δ ≤ τ , |ln(p i /q i )| is bounded by (τ ).As a result, the KL divergence between P and Q is bounded above by 2 (τ ) 2 [117].Therefore, we can derive the following information-theoretical upper bound for the adversarial risk of Q that is (τ )-QDP.
Proposition 1. Assume we have a quantum classifier Q : H → R |S| that satisfies (τ )-QDP.When performing an adversarial attack A : ρ → ρ with d(ρ, ρ ) ≤ τ , the adversarial risk R(τ ) is bounded above by where the expectation value is averaged over all ρ chosen uniformly from H according to the measure µ(•).
It is worthwhile to mention that E ρ∈H e −H(Q(ρ)) is a constant that only depends on the property of the classifier Q itself.When the classifier is well-trained and can provide the correct label with a large confidence, E ρ∈H e −H(Q(ρ)) is close to 1.In particular, if Q predicts the label with a unity confidence, E ρ∈H e −H(Q(ρ)) = 1.As we decrease (τ ), the adversarial risk R(τ ) will decrease polynomially.This indicates that one can improve the adversarial robustness by simply amplifying the quantum differential privacy property (i.e.decrease the parameter).This leads us to the next theorem concerning the amplification of quantum differential privacy using black-box QEC: Theorem 4. Suppose we use a quantum classifier that satisfies (τ )-QDP under normalized Hamming distance.We apply a QEC encoder which encodes each logical qubit into n 0 physical qubits and is able to correct an arbitrary error on one of these qubits.Assume that the inner structure of the QEC is unknown by the adversary.We randomly choose arbitrary ρ and σ on the physical qubits with d NH (ρ, σ) ≤ τ , then for any subset Y of all possible outcomes of the measurement with probability at least 1 − δ.
Proof.The normalized Hamming distance measures the fraction of qubits that are different from the legitimate data.Assume an adversarial attack ρ → ρ occurs on the physical qubits such that d NH (ρ, ρ ) ≤ τ .Under a black-box QEC procedure, a logical qubit becomes erroneous if it contains more than two erroneous physical qubits.Therefore, the expected fraction of logical qubits affected is O(n 0 (n 0 −1)τ 2 ) [63].As a consequence, to achieve the bound in Theorem 3 on the logical qubits, the adversary should alter at least O( 1 n 1/4 ) fraction of physical qubits in expectation.The QEC encoder mitigates the adversarial risk in expectation.
By the Markov's inequality, one can choose ρ and ρ from physical quantum states such that d NH (ρ, ρ ) ≤ τ and with probability at least 1 − δ.By the definition of quantum differential privacy, it is guaranteed with probability at least where QEC = n0(n0−1)τ 2 δ .
The above theorem indicates that for a quantum classifier satisfying quantum differential privacy, a black-box QEC encoder can effectively amplify the quantum differential privacy property with high probability.In particular, the QEC encoder can promote the (τ )-QDP quantum classifier to a new quantum classifier that satisfies ( n0(n0−1)τ 2 δ )-QDP property for at least 1 − δ fraction of all possible input data samples.Under the normalized Hamming distance, a QEC encoder can always promote the robustness of the quantum classifier against perturbations added on the input quantum states with large probability, as long as τ n 0 (n 0 − 1)/δ ≤ 1.As n 0 is a constant for a fixed QEC encoder, this is a constant threshold for τ .We remark that a quantum algorithm satisfying quantum differential privacy can be obtained through adding white noise [44].However, white noise will erase the information required for classification and should be suppressed for variational quantum classifiers on NISQ devices.In contrast, our approach only assumes the existence of tiny noise and can guarantee -QDP for arbitrary small > 0 by concatenating QEC encoders.Compared with the previous work [44] that simply relies on the white noise to produce quantum differential privacy property, our approach prevents the risk of losing too much information due to noises.Theorem 4 opens a door for studying the promotion of adversarial robustness from QEC. Intuitively, a QEC encoder can mitigate the adversarial risk from the local unitary adversarial attack by reducing the bound in Eq. (12) to O( 1 n 1/4 ) with a large probability.This is because the QEC can reduce the error rate from p to p 2 in expectation [63].We mention that it is necessary to keep the inner structure of the QEC encoder confidential to the potential adversary.If the QEC encoder is known by the attacker, the QEC circuit together with the quantum classifier can be regarded as an enlarged quantum classifier with system size nn 0 .According to Theorem 3, such a quantum classifier is more vulnerable under adversarial attacks.
In fault-tolerant quantum computation, the errors are assumed to occur locally on each qubit for each quantum operation independently with probability below the threshold p thres .To mitigate the influence of these errors, multiple levels of QEC are concatenated to bound the error below an expected value ζ.It has been proved that O(log log(1/ζ)) levels of QEC is enough [63].As we mentioned in the previous sections, adversarial perturbation are not random experimental noises.Instead, these perturbations are either carefully engineered noises from hostile adversary or worst-case experimental noises.However, Theorem 4 indicates that we can always decrease the parameter in -QDP property of the quantum classifier by concatenating additional levels of QEC.We have shown in Theorem 3 that adversarial perturbations with strength O( 1 √ n ) can lead to moderate adversarial risk under local unitary adversarial attacks.To reduce the potential adversarial risk, we should bound the distance τ = Ω( 1 √ n ) after concatenating the QECs.We can thus deduce the following corollary.
Corollary 1.We consider the classifier Q discussed in Theorem 4. After concatenating L QEC levels of QEC, one can guarantee with high probability that Q satisfies 1 √ n -QDP for randomly chosen ρ and σ with d NH (ρ, σ) ≤ τ , as long as The proof of this corollary follows from using Theorem 4 repeatedly and fixing the δ in each level as δ LQEC .This corollary indicates that only O(log log n) layers of repeated QEC encoders can guarantee (O( 1 √ n ))-QDP for the quantum classifier.The number of levels of QEC required has a double logarithmic scaling over the system size n.We show through this theorem that fault-tolerant quantum computers with black-box QECs are robust against adversarial attacks with a large probability.This result shows the effectiveness of QECs under the condition of even worst-case noises.

V. CONCLUSIONS AND OUTLOOK
In this paper, we proposed a general approach to protect quantum learning systems in adversarial scenarios using randomized encoders.We rigorously proved that random unitary encoders forming a unitary 2-design set can create barren plateaus for any adversarial parametrized variational quantum circuit, which prevent the creations of adversarial perturbations.To benchmark the performance of our approach, we carried out numerical simulations on classifying topological phases of ground states for the clustered-Ising Hamiltonian.We remark that this approach is feasible on NISQ devices as the the classifiers, adversarial circuits, and encoders can be implemented by variational quantum circuits.In addition, we proved that black-box quantum error correction encoders unknown to the adversary can mitigate the adversarial risk by promoting the differential privacy against local unitary noises.Our results develop versatile defense strategies to enhance the reliability and security of quantum learning systems, which may have far-reaching consequences in applications of quantum artificial intelligence based on both near-term and future quantum technologies.
Many questions remain and warrant further investigations.For instance, our discussions in this paper mainly focus on quantum supervised learning scenarios.Yet, unsupervised and reinforcement learning approaches may also suffer from the vulnerability problem [72].Thus, it will be interesting and important to develop similar defense strategies in the context of quantum unsupervised or reinforcement learning, where obtaining analytical performance guarantees in a rigorous fashion might more challenging.In addition, how to extend our results to the scenario of quantum delegated learning with multiple clients [118] is well worth future studies.Finally, it is also of crucial importance to carry out an experiment to demonstrate our defense strategies against adversarial perturbations.This would be a key step toward secure and reliable quantum artificial intelligence technologies.(C4) Given k = 2, 3, ..., K, any data sample in the intersection of the τ -extensions {h −1 (s i ) τ } k i=1 can be transformed into a data sample in any h −1 (s i ), when a perturbation ρ → ρ of amplitude τ occurs.The adversarial attack can thus change the labels for all the data samples in this intersection set.By the De Morgan's law, the measure of this intersection set satisfies

FIG. 1 :
FIG. 1: An illustration for exploiting randomized encoding to defend against adversarial attacks.In the quantum learning task, Alice prepares an input data sample and sends it to Bob for classification.To protect the legitimate data against the potential adversary Eve, Alice and Bob share a codebook and Alice randomly chooses an encoder in the codebook to transform the original data into encoded data, from which Eve can barely obtain any useful information.Then Alice sends to Bob the encoded quantum data and classical information about the encoder.Bob receives the messages, translates the encoded quantum data into the original figure, and performs the classification.

FIG. 2 :
FIG.2:(a)An illustration of exploiting a random unitary encoder to defend against adversarial attack from a parametrized variational quantum circuits.In the scenario without adversarial attacks, an input state |ψ in is input to the parametrized variational classifier V (Θ) directly.While in the adversarial scenario, a parametrized adversarial variational circuit U (θ) is used to add an evasion attack[72], as sketched in the upper panel.In the lower panel, a random unitary encoder E and the corresponding decoder E † are added before and after U (θ) to protect the data sample against potential adversary Eve.(b) By using a random global unitary encoder, the landscape for any adversarial circuit exhibits a barren plateau (i.e.vanishing gradients) regardless of the inner structure of the circuit.The variables θ1 and θ2 are variational parameters of U (θ).

( 7 )FIG. 3 :
FIG. 3: Numerical results for the gradients of adversarial variational circuits.(a) The mean values of ∂ θ l L(Θ, Ei; θ0) as functions of the sample size N for different system sizes n.The loss function is taken as the KL divergence.The mean values of gradients are averaged over all the parameters in the adversarial parametrized variational quantum circuits.(b) The average variances of ∂ θ l L(Θ, Ei; θ0) for the KL divergence and normalized square loss as functions of the system size n.The encoders used here are global random parametrized variational quantum circuits.(c) Similar to (b), whereas the encoders used here are tensor products of two-qubit random parametrized variational blocks.

2 FIG. 4 :
FIG. 4: (a) An illustration of exploiting black-box quantum error correction (QEC) encoders to defend against local unitary adversarial attacks.An initial state |0 ⊗n is sequentially encoded by a local encoder and a QEC encoder into the logical state |ψ L and the physical state |ψ p .The physical state is exposed to local adversarial attacks from potential adversaries.It then enters a QEC decoder and is classified by a quantum classifier.(b) A sketch of the connection between quantum differential privacy (QDP) and adversarial robustness.The quantum classifier maps input data x into a probability distribution {p1, ..., p4} and predicts its label according to the maximum likelihood.The different labels are distinguished by different colors.Given a perturbed data x with d(x, x ) ≤ τ , -QDP limits the shift on probability distribution by bounding | ln(p i /pi)| ≤ (i = 1, • • • , 4).
FIG. S1: (a)The illustration for P -layer variational quantum circuits to construct encoders and classifiers in the numerical simulations.Each layer contains two single-qubit rotation units (the green boxes) and one entangling unit (the red box).In each rotation unit, we perform and Euler angle rotationZ(θ k i,u )X(θ k i,v ).(u, v) = (d,c) or (b, a) distinguishes the two rotation units.i = 1, 2, ..., P denotes the index of the layer.k = 1, 2, ..., m + n denotes the qubit index.(b) and (c).The loss and accuracy averaged over the training set and the validation set as the function of epoch, for the 12-qubit quantum classifier with (b) the KL divergence and (c) the normalized square loss as the loss function.Each epoch consists of 10 iterations.
The type of the gates can be further restricted to single-qubit rotations and nearest neighbor entangling gates.Therefore, such a random encoder E i ∈ C can be efficiently realized by a PVQC with O(n 2 ) gates.In this case, the loss function given a fixed encoder E i can be represented by [102][103][104][105]2][103][104][105], quantum circuits can implement unitary 2-design efficiently-a circuit with only O(n 2 ) [O(n)] gates is sufficient for attaining exact (approximate) unitary 2-design.