The Expressive Power of Parameterized Quantum Circuits

Parameterized quantum circuits (PQCs) have been broadly used as a hybrid quantum-classical machine learning scheme to accomplish generative tasks. However, whether PQCs have better expressive power than classical generative neural networks, such as restricted or deep Boltzmann machines, remains an open issue. In this paper, we prove that PQCs with a simple structure already outperform any classical neural network for generative tasks, unless the polynomial hierarchy collapses. Our proof builds on known results from tensor networks and quantum circuits (in particular, instantaneous quantum polynomial circuits). In addition, PQCs equipped with ancillary qubits for post-selection have even stronger expressive power than those without post-selection. We employ them as an application for Bayesian learning, since it is possible to learn prior probabilities rather than assuming they are known. We expect that it will find many more applications in semi-supervised learning where prior distributions are normally assumed to be unknown. Lastly, we conduct several numerical experiments using the Rigetti Forest platform to demonstrate the performance of the proposed Bayesian quantum circuit.


I. INTRODUCTION
There is a ubiquitous belief called 'quantum supremacy' that quantum computers will outperform classical computers [1].One characterization of quantum supremacy relates to the expressive power of quantum computing, since the probability distribution generated by quantum devices may not, classically, be sampled efficiently and accurately.Two leading proposals toward this goal are Boson sampling [2] and instantaneous quantum polynomial time (IQP) circuits [3].
The system noise in current implementations is known to be the major roadblock.Widespread explorations have been conducted to verify whether noisy intermediate-scale quantum (NISQ) [4] devices can also outperform classical computers for specific computation tasks.It has been proved that, with system noise, quantum supremacy will disappear in Boson sampling [5] but will remain in IQP [6].In addition to demonstrating the existence of quantum supremacy, the issue of finding practical applications for NISQ devices with quantum advantages needs to be further studied.
Quantum machine learning problems have been popularized because of their ability to efficiently process tremendous amounts of data.They are also exploited as alternative testbeds to confirm quantum advantages [7][8][9][10][11][12][13]. By employing NISQ devices, potential quantum advantages may still be retained, benefiting from the fact that most statistical machine learning algorithms are robust to system noise, i.e., the noise contained in the input data and models has a negligible influence on the final results [14,15].
Expressive power is a central topic in classical machine learning and it has generated great interest in quantum machine learning.It is deeply tied to two major topics in machine learning: discriminative modeling and generative modeling, which aim to learn patterns and the probability distribution of input data [16], respectively.Expressive power in discriminative learning relates strongly to classification performance, e.g., by employing the kernel method [17], the kernel support vector machine (SVM) can efficiently classify nonlinear data.In generative modeling, the expressive power of two highly successful models, restricted Boltzmann machine (RBM) and deep Boltzmann Machine (DBM) [18,19], to represent quantum manybody states have been extensively investigated [20][21][22][23][24]. Consequently, RBM and DBM have been broadly applied to physics research, e.g., identifying phase transition, solving many-body wave functions, and accelerating Monte Carlo simulations [25][26][27][28][29].
Parametrized quantum circuits (PQCs) are a promising NISQ scheme that has demonstrated their potential to be applied to practical applications with quantum advantages.By employing variational hybrid quantum/classical algorithms, PQCs have been applied to accomplish both the generative [30][31][32] and discriminative [33][34][35][36] tasks.PQC is composed of a set of parameterized single and controlled single qubit gates with noise, and the parameters are iteratively optimized by a classical optimizer.In general, the proposed PQCs can be divided into two types: multiple-layer PQCs (MPQCs) and tensor network PQCs (TPQCs).An MPQC consists of multiple blocks of quantum circuits in which the arrangement of quantum gates in each block is identical [30,31,37].Mathematically, we denote the input quantum state as |0 ⊗N with N qubits, the total number of blocks as L, and the i-th block as U (θ i ), where the number of parameters is proportional to the number of qubits |θ| ∝ N and N is logarithmically proportional to the dimension of the generated data.The generated quantum state of MPQC, |Φ , is defined as |Φ = L i=1 U (θ i ) |0 ⊗N .The tensor PQCs (TPQCs) treat each block as a local tensor.The arrangement of the blocks follows a specified tensor network, such as matrix product states and tree tensor network [35].Mathematically, the i-th block U (θ i ) is composed of M i local tensor blocks, with M i ∝ N/2 i , denoted as U (θ i ) = Mi j=1 U (θ i j ).The generated state from TPQC is defined as |Φ = L i=1 Mi j=1 U (θ i j ) |0 ⊗N .Refer Section II for more details.
Although PQCs have provided strong evidence of quantum advantage, [38,39], two important questions remain unexplored: (1) What is the expressive power of PQCs? (2) Is there any quantum advantage of PQCs that can be used to solve practical problems?A comparison of expressive power between PQCs and classical neural networks is desirable, and may benefit both physics and machine learning areas, since PQCs are capable of solving many kinds of machine learning tasks, and classical machine learning methods have also been extensively applied to physics research.
To analyze their relationships, we will first prove that MPQCs can be formulated by the tensor network language.This will show that MPQCs, TPQCs, and classical neural networks have a close connection with tensor networks, such as matrix product states (MPS) and multi-scale entanglement renormalization ansatz (MERA) [40,41].We will then exploit entanglement entropy as a metric to evaluate the expressive power of tensor network states, to characterize the expressive power of PQCs and neural networks.We will provide a rigorous proof that, given the number of trainable parameters that polynomially scale with the number of qubits, MPQCs, TQPCs, DBM and long range RBM exhibit volume law entanglement efficiently, while and short range RBM only exhibit area law entanglement efficiently [42].
Before answering the question of whether PQCs have any quantum advantages over classical generative algorithms, we remark that entanglement entropy is not the only metric for quantifying expressive power.Even though MPQCs, DBM, and long range RBM can efficiently represent quantum states with volume law, we devise a toy model to prove that some probability distributions can be efficiently generated by MPQCs, DBM, and long range RBM, but the distributions are difficult to be generated by TPQCs.We further prove that instantaneous quantum polytime (IQP) circuits [43][43] are a special subclass of MPQCs.The probability distribution generated by IQP cannot be sampled efficiently and accurately by any classical neural network [3].This indicates that, from the perspective of complexity theory, MPQCs have a stronger expressive power than classical neural networks and have the potential to become a practical application with 'quantum supremacy' [44].
Finally, we equip MPQCs with ancillary qubits for postselection-a model we called ancillary driven MPQCs (AD-MPQCs).We show that the class of AD-MPQCs contains post-IQP circuits as a special case.Apart from the stronger expressive power, AD-MPQCs also provide additional benefits from the machine learning perspective.Specifically, AD-MPQCs with a simple structure, that we call Bayesian quantum circuit (BQC), is devised for Bayesian learning.Theoretically, we prove that the expressive power of BQC is equivalent to post-IQP.From the machine learning point of view, the ancillary qubits of BQC can be used to represent the additional information, such as a prior distribution.BQC not only can exploit priors to improve the performance of a learning task, but can also enable the estimation of prior distributions from the given data, which is highly desired for semi-supervised learning [46].To the best of our knowledge, BQC is the first PQCs that can learn prior distributions from given data.A toy model is designed to verify its effectiveness.The BQC experiments are implemented in Python, leveraging the pyQuil library to access the numerical simulator known as quantum virtual machine (QVM) [47].

A. Boltzmann Machine
The Boltzmann machine (BM), inspired by the Ising model, plays a significant role in the development of the deep neural network, which aims to learn a distribution over the set of their inputs [48,49].Specifically, BM can be divided into two parts: N visible units v = {v i } N i=1 and M hidden units h = {h j } M j=1 .Given trainable parameters w ij and b i , the Hamiltonian is defined as H(s) = i b i s i + i<j w ij s i s j , with s = {v, h}.The joint probability distribution over the visible and hidden units is defined as where Z = v h e −H(v,h) is called the partition function.For generative tasks, the marginal probability distribution of visible units P (v) = h P (v, h) is expected to be maximized by optimizing w ij and b i .The restricted Boltzmann machine (RBM) [50] is a special type of BM, which can be learned more efficiently.Mathematically, the Hamiltonian of RBM is defined as where only the inner connections between visible units and hidden units remain.An RBM is sparse (or short range), if the connection between visible and hidden units is sparse.For short range RBM, the visible unit v i only connects with 2k + 1 hidden units with a small constant k or k ∼ O(log M ).Similarly, an RBM is non-sparse (or long range) if k satisfies k ∼ O(M ).
A deep Boltzmann machine (DBM) [19], different from a RBM that includes only one layer of hidden units, con-tains many layers of hidden units.In DBM, multiple hidden layers can be learned by training one hidden layer once at a time as for RBM.When we calculate the probability distribution between the n-th layer and n + 1-th layer, the hidden units of the previous n-th layer h n are treated as visible units v n and P (v n , h n+1 ) is obtained as RBM does.

B. Tensor Networks
Matrix Product State (MPS) is a natural choice to efficiently represent 1D low energy quantum states [40].We denote a quantum state of one dimensional lattice with N sites as |Ψ = d j1,j2,...,j N =1 C j1j2...j N |j 1 ⊗ |j 2 ⊗ ... ⊗ |j N , where all sites have the same dimension d.The state |Ψ can be completely described by a rank-N tensor C j1j2...j N with total d N elements.However, such an exponentially scaling relation implies that the computation cost becomes expensive for large N .MPS enables |Ψ to be approximated with a high accuracy using only O(poly(N )) parameters.We rewrite |Ψ as follows: where l corresponds to the first site l = j 1 and r corresponds to the rest N − 1 sites r = (j 2 , ..., j N ).Let C l,r = a U l,a S a,a V † a,r be the singular value decomposition (SVD) of C l,r .Then we have where |a l = l U l,a |l , |a r = r V r,a |r , and S a = S a,a .Eqn. (3) is called Schmidt decomposition and the entanglement of the bipartite systems l and r is characterized by S a .Specifically, the bond dimensions between the first site j 1 and the rest N − 1 sites {j 2 , ..., j N } are evaluated by the the number of non-zero values in S a , where the entanglement of the corresponding bipartite systems closely relates to the bond dimension as explained in Subsection II C. Through successively performing SVD along each single site in turn, we can split out the rank-N tensor C j1j2...j N into N local tensors {A ji } N i=1 .Mathematically, analogous to the Eqn.(3), the matrix product state of |Ψ is defined as where S a1,a1 and V † a1,(j2,...,j N ) have been multiplied and reshaped to a vector C a1,(j2,...,j N ) , and the matrix U j1 is decomposed into a collection of d row vectors A j1 with entries A j1 a1 = U j1,a1 .The number of parameters (elements) in MPS scales as O(N dM 2 ), where M represents the maximum of bond dimensions among all S a,a .When M is small or some truncated methods are employed to keep M small, MPS can efficiently approximate the quantum states with polynomial parameters.
The String Bond States (SBS) [51] can be treated as an extension of MPS.The mathematical representation of SBS is where S is a set of strings, s ∈ S is an ordered subset of {1, 2, ..., N }, and A ji,s ai corresponds to the A ji ai−1,ai in Eqn.(4) given a fixed s.The key idea of SBS is to place strings of operators on a lattice with N sites.Some examples of string operators is illustrated in Figure 1, where each string operator is denoted by a specific color, i.e., Figures 1 (a

C. Entanglement Entropy
The entanglement (also called von Neumann entropy) S(ρ) of a bipartite system ρ AB is defined as (6) where ρ A = Tr B ρ AB is the reduced density matrix of system A. For a quantum system A that satisfies area (volume) law, its entanglement entropy grows proportionally with the boundary area (volume) of system A, denoted as The maximum entanglement entropy of a bipartite system is logarithmically bounded by the bond dimension D as defined in Subsection II B, i.e., S(ρ A ) ∼ ln D. A quantum system A that satisfies area law has an efficient MPS representation, since in one dimensional case a constant O(|∂A|) implies D is also a constant and the number of parameters used in MPS is small.On the contrary, a quantum system A that satisfies the volume low implies the entanglement entropy scales with the number of sites, i.e., S(ρ A ) ∼ N .Due to S(ρ A ) ∼ ln D, the bond dimension D is scaled as O(D N ), the required parameters in MPS is exponentially large and the quantum system A cannot be efficiently represented by MPS.

D. Quantum Circuits
Analogous to classical computers, a quantum computer accomplishes its computation by applying quantum gates to quantum bits (qubits).
As stated in [52], a set of single and two qubits gates, which consists of rotation gates and controlled-Not (CNOT) gates, is universal for quantum computation.In other words, any function computable in this model can be computed only using these gates.We denote the phase rotation gate R φ , z-axis rotation gate R Z (θ), x-axis rotation gate R X (γ), and y-axis rotation gate R Y (α) as follows: The CNOT gate, defined as flips the target qubit iff the the control qubit is |1 .Other quantum gates can be represented by the above universal gate set, e.g., the Pauli-Z gate, defined as Z = 1 0 0 −1 , can be represented by R Z (θ = π), the T gate, defined as T = 1 0 0 e iπ/4 , can be represented by R φ (φ = π/4), the Hadamard gate (H gate), defined as , and the two-qubit Controlled-Z gate (shorted as CZ gate), defined as can be represented by (I ⊗ H)CNOT(I ⊗ H).Proposition 1 below demonstrates how to use the universal gate set to express other quantum gates [53].Proposition 1.A controlled unitary W gate (CW ) can be simulated by a quantum network composed of single qubit gates and CNOT gate.Suppose that W = R Z (θ)R Y (α)R Z (β), then as shown in Figure 2, it can be simulated by the quantum circuits A, B and C, where 2: Simulation of controlled unitary gates.

IQP circuits
The instantaneous quantum polynomial (IQP) circuit consists of commute gates that are diagonal in the Z basis.The basic framework of IQP circuits is illustrated in Fig 3 . . . .
Given N qubits, the IQP circuits can generate dis- ⊗N 2 , where U Z is composed of O(poly(N )) commuting gates, e.g., the single-qubit T gate and CZ gate.
IQP circuits are proven to be capable of generating probability distributions p I that cannot be classically simulated efficiently [45].The main result of IQP is summarized in the following proposition.
Proposition 2. If the output probability distributions generated by uniform families of IQP circuits could be weakly classically simulated to within multiplicative error 1 ≤ c ≤ √ 2, then post-BPP = P P and the P H would collapse to its third level.

E. Parameterized Quantum Circuits
Parameterized quantum circuits (PQCs), as a special type of quantum circuit model, are composed of a set of parameterized single and controlled single qubit gates.In this work, a PQC is used to implement a unitary transformation operator U (θ) with O(poly(N )) parameterized quantum gates, where N is the number of input qubits.
Several recent works [30,31,35] have employed PQCs to accomplish generative tasks.One major reason is that the superposition property allows the number of trainable parameters to be dramatically reduced.In generative tasks, PQCs produce the probability q(X = x) = | x|Ψ G | 2 measured by the computational basis |x , where The parameters θ can be optimized using only classical approaches, where L(•, •) is a loss function that measures the dissimilarity of the generated and the targeted probability distributions.For example, suppose that the loss function is negative log-likelihood [54], the optimizing process is, where the dataset D = {x i } D i=1 is sampled from the targeted probability distribution p(X), the size of D is D, and each example of D is denoted as Another loss function that is broadly employed is the maximum mean discrepancy (MMD).The MMD loss is defined as where φ(x i ) maps the i-th input data, x i , into a highdimensional Reproducing Kernel Hilbert Space [55], and x i ∈x p(x i ) refers to the target probability distribution.More details about the MMD loss and how to optimize it by employing the gradient descent method with unbiased estimation are introduced in [31,56].
In the following, we define two types of PQCs that are the focus of the present work, where the major difference is the layout of quantum gates to compose U (θ).

Multilayer Parameterized Quantum Circuits
Multilayer Parameterized Quantum Circuits (MPQCs) are composed of L blocks, where each block implements U (θ i ), with i ∈ [1, L] and L ∼ poly(N ).A unitary operator U (θ) = L i=1 U (θ i ) is applied to N input qubits.An example of MPQC is illustrated in Fig. 4. In each block, the arrangement of quantum gates is identical.Moreover, each qubit is operated with at least one parameterized gate (denoted by yellow color), and CNOT gates within the block can connect arbitrary two qubits.Another requirement in MPQCs is, the amount of CNOT gates is no larger than N in each block.Using MPQCs to accomplish generative tasks have been explored by [30,31], while the layout of quantum gates in each block and the optimization methods are varied.

Tensor Network Parameterized Quantum Circuits
Another type of PQCs is the tensor network PQCs (TPQCs), which generally inherit the tensor network structures, i.e., MPS, tree tensor network, or MERA.In other words, CNOT gates can only connect two local qubits.Mathematically, the quantum state |Ψ G generated by TPQCs is formulated as where M i represents the number of local blocks in the block U (θ i ).For example, a TPQC that inherits from the layout of tree tensor network is given in Fig. 5. Figure 6 FIG. 5: An Example of TPQCs where the CNOT gates in different layers has different local constraints.
illustrates another example of TPQC.Employing TPQCs to accomplish generative tasks has been investigated in [35].

III. EXPRESSIVE POWER PARAMETERIZED QUANTUM CIRCUITS
The goal of a generative learning network is to learn a distribution q(x) that approximates a targeted probability distribution p(x) within a tolerable error .The expressive power of a generative learning machine directly determines how well the generated distribution can match the target distribution (e.g.Eqn. ( 9)).The stronger the expressive FIG. 6: An Example of TPQCs that inherits the layout of MPS.
power is, the smaller the dissimilarity of two distributions will be.
One of the main results in this paper is as follows.Since it has been proved that DBM has a stronger expressive power than long range RBM [20], this concludes the theorem.
We first relate MPQCs and TPQCs to tensor network states in the following theorem. where = 1 and 0 otherwise.The CNOT gate can be decomposed into two local tensors with bond dimension D = 2.One possible solution is where W σ ,σ 1b and W τ ,τ 2b correspond to two local rank-3 tensors, and their explicit representations are as follows: Suppose that there exists k CNOT gates between the i-th and (i + 1)-th qubits, where the first i qubits and the remaining N − i qubits compose a bipartite system, the maximal bond dimension of such a bipartite system is 2 k .Since the bond dimension exponentially scales with the number of CNOT gates, O(poly(log D)) blocks are required to generate an MPS with bond dimension D.

FIG. 7:
The mapping between MPQC and MPS.
Although CNOT increases the bond dimensions, it cannot directly represent arbitrary local tensors A ji ai defined in Eqn.(4), because the local tensors W of CNOT gates defined in Eqn.(14) are fixed.This issue can be tackled by using the parameterized single qubit gates.In summary, any MPS with bond dimension D can be simulated by PQCs with O(poly(log D)) blocks so that CNOT gates contribute to increase the bond dimensions and parameterized single qubit gates contribute to form arbitrary local tensors.
Figure 7 depicts a mapping between MPQC and MPS, where, for illustrative purpose, we assume that N − 1 CNOT gates are applied to the data qubits in sequence.The middle section of Figure 7 indicates the effects of CNOT gates and parameterized single qubit gates.All local tensors applied to the same qubit can be merged into one local tensor (c.f.Eqn. ( 4)) and yield the corresponding MPS, as shown in the right section of Figure 7.
Theorem 4 implies that MPQCs and TPQCs with polynomial (logarithmic) blocks can efficiently represent quan-tum states with volume (area) law entanglement.However, the expressive power does not solely depend on the volume of entanglement alone.Even though both long range RBM and MPQCs can represent quantum states with volume law, some quantum states, such as those generated by the translation-invariant Ising spin model [20], can be efficiently represented by constant depth quantum circuits, but are hard for RBM.
Two major differences between TPQCs and MPQCs are that (i) CNOT gates in TPQCs cannot connect any two qubits arbitrarily and (ii) the blocks are replicated based on the structure of the tensor networks.This restriction limits the expressive power of TPQCs.
Theorem 5. Some probability distributions generated by MPQCS, DBM, and long range RBM cannot be efficiently generated by TPQCs.
Proof.The theorem is proved by construction.In DBM and long range RBM, the correlation between any two visible units can be built by linking to the same hidden unit.Similarly, in MPQCs, any two qubits can build correlation by applying a CNOT gate.We provide an example to show that the distribution can be easily generated by DBM, long range RBM, MPQCs but can be difficult for TPQCs.Given N binary inputs {v i } N i=1 where v 1 = 1 and v i = 0 for i ∈ {2, 3, ..., N }, we define the targeted distribution as p(v 1 = 1, v i = 0, v N = 1) = 1 with i ∈ {2, 3, ..., N − 1}.For DBM and long range RBM, this distribution can be generated by introducing one hidden unit h 1 .As shown in the left panel of Figure 8, each visible unit encodes a binary input and the number of trainable parameters is 2. Similarly, the distribution can be generated by MPQCs.By encoding v i into the i-th qubit, only one CNOT gate is required to connect the first and the N -th qubit, as illustrated in the right panel of the Figure 8.However, this distribution cannot be efficiently generated by TPQCs, which prevents long range interaction.From the perspective of computational complexity, we can obtain the following theorem: Theorem 6.There exist probability distributions generated by MPQCs with O(poly(N )) blocks, where N is the number of input quantum states which cannot be simulated efficiently by classical neural networks unless the polynomial hierarchy (PH) collapses.
Proof.This theorem can be proved by combining Theorem 7 below with Proposition 2. Theorem 7 shows that any IQP circuits with N qubits and O(poly(N )) commuting gates can be transformed into MPQCs with O(poly(N )) blocks.As stated in proposition 2, there exist probability distributions, generated by IQP, that cannot be efficiently simulated by classical circuits (including DBM or long range RBM).proving that an IQP circuit can be efficiently simulated by MPQC, we first define the arrangement of quantum gates in each block.As shown in Fig. 10, from left to right in each block, the seven parameterized single qubit gates are R X , R Z , R X , R φ , R Z , R Y and R Z , followed by N − 1 CNOT gates, where the controlled qubit of all of them is the first qubit.For simplicity, we will use to represent the composition of the seven parameterized qubit gates.
Next we demonstrate that the internal diagonal matrix U Z can also be simulated using the predefined block structure.Without loss of generality, we assume that the i-th circuit depth in U Z contains M T T gates and M CZ CZ gates, with M T + M CZ ≤ N .For example, the colored region in Figure 9 indicates that M T = 2 and M CZ = 2. Similar to the simulation of H gates, two blocks are sufficient to simulate M T (M T ≤ N ) T gates at the same circuit depth.Since T = R φ (π/4), then the T gates can be simulated by application of (0, 0, 0, π/4, 0, 0, 0) followed by (0, 0, 0, 0, 0, 0, 0).
We next prove how to use predefined blocks to efficiently simulate a CZ gate.Suppose that a CZ gate is applied to k-th qubit, which is controlled by the j-th qubit, with j ≤ k.Since the explicit connection between the two qubits may not exist in the predefined block, we first use 14 blocks to simulate a SWAP gate that switches the j-th controlled qubit to the first qubit.We then use six blocks to simulate the CZ gate that is applied to the k-th qubit and controlled by the first qubit.Lastly, 14 blocks is employed to simulate another SWAP gate to switch the first control qubit back to its original position.For example, as shown in the left panel of Figure 11, the CZ gate as indicated by the blue box can be represented by an equivalent circuit controlled by the first qubit.The central problem in simulating the SWAP operation is how to simulate a single CNOT gate applied arbitrarily to two qubits, since a SWAP gate is composed of three CNOT gates, as illustrated in the left panel of Figure 11.The first and third CNOT gate of the SWAP operation can be simulated by four blocks.Recall that, in Proposition 1, a single CNOT gate (namely CX gate) can be decomposed into We set all parameters of the first block as 0 except the parameters corresponding to the k-th qubit, which are set as (0, 0, 0, 0, 0, π/2, 0) to simulate A 1 .Next, we set all parameters of the second block as 0 except the parameters corresponding to the k-th qubit, which are set as (0, 0, 0, 0, 0, −π/2, π) to simulate B 1 .Then, we set all parameters of the third block as 0 except the parameters corresponding to the k-th qubit, which are set as (0, 0, 0, 0, 0, 0, π) to simulate C 1 .Lastly, all parameters of the fourth block are set as 0.
Six blocks are required to simulate the second reversed CNOT gate (R-CNOT gate) in the SWAP operation.Since R-CNOT = (H ⊗ H)CNOT(H ⊗ H), we use four blocks to simulate the (H ⊗ H)CNOT and then use extra two blocks to simulate the last two Hadamard gates.For the first four blocks, the parameters of the first three parameterized gates that are applied to the first and i-th qubits are set as π/2, π/2, π/2, with the aim of simulating two H gates.The remaining parameters of the first four blocks follow with the same setting as simulating the CNOT gate as defined above.The last two blocks follow a similar setting as simulating the Hadamard layer, where the first three parameterized gates that are applied to the first and i-th qubits simulate two H gates and the remaining parameters are set as zero.To conclude, a SWAP gate can be composed by a total of 14 blocks.
Finally, because the CZ gate can be reformulated as CZ = (I ⊗ H)CNOT(I ⊗ H), it can also be simulated by using six blocks.
In summary, since H gates, T gates, and CZ gates can be efficiently simulated by using a constant number of blocks, O(N ) blocks are sufficient to simulate an IQP circuit with O(poly(N )) T and CZ gates.

IV. BAYESIAN QUANTUM CIRCUIT
In Bayesian inference, additional information about a prior probability distribution p(λ) which represents our beliefs about the parameters of the learning algorithm is given, and the posterior probability distribution p(λ|x) can be obtained by Bayes' rule where p(x|λ) is known as the likelihood function.It has been shown that the performance of many learning tasks can be dramatically improved if Bayesian models are employed [57][58][59][60][61].
Considering the significance of the Bayesian approach in classical machine learning, we devise a Bayesian quantum circuit (BQC) that enables PQCs to accomplish quantum machine learning tasks with Bayesian advantages.We remark that our BQC is the first quantum method for a Bayesian generative model based on PQC.The proposed BQC is capable of explicitly and efficiently generating prior, likelihood, and posterior distributions.Furthermore, we demonstrate that BQC has stronger expressive power than MPQCs studied in previous section.

A. Layouts and Optimization of BQC
Before elaborating BQC, we first define the ancillary driven MPQCs (AD-MPQCs).AD-MPQCs can be divided into two parts, of which the first part aims to generate the targeted distribution and the second part aims to conduct post-selection.In contrast to MPQC, in which all blocks are directly applied to the data qubits, some blocks in AD-MPQC are conditionally applied to the data qubits for specific ancillary quantum states.A general layout of AD-MPQCs is illustrated in Figure 13 The BQC, in Figure 14, is a special case of AD-MPQCs in which the commonly shared blocks (green blocks in Fig. 13) do not exist.In BQC, after applying K blocks {U (γ i )} K i=1 to M ancillary qubits, the generated state is ⊗M .Measuring the state |Ψ A by computational basis, the prior distribution q(λ) = | λ|Ψ A | 2 is generated.Similarly, after conditionally applying L blocks {U (θ i λi )} L i=1 to N data qubits iff the ancillary state is |λ i , ∀λ i ∈ λ, and measuring by computational basis |x , the likelihood distribution q(x|λ i ) = | x, λ i |Ψ x,λ | 2 is generated, where |Ψ x,λ is the quantum state generated by data qubits and ancillary qubits after applying a total of K + |λ|L blocks.
FIG. 14: The general scheme of the proposed BQC.
In BQC, the parameterized gates in U (θ i λi ) are controlled rotational qubits gates, e.g, controlled phase gate CR φ (φ), controlled rotation gate along x-axis CR X (γ), controlled rotation gate along y-axis CR Y (α), controlled rotation gate along z-axis CR Z (θ), which are controlled by the ancillary quantum state |λ .To reduce the gate complexity, we introduce a flag qubit that is conditionally activated for the specified ancillary state, which enables each parameterized controlled-rotational gate to have only one control qubit.As a result of this extra controlled qubit, the CNOT gates used in MPQCs are replaced by N Toffoli gates.Each Toffoli gate can be efficiently implemented by 10 single qubit gates and 6 CNOT gates.We give an intuitive example of how to apply the block U (θ 1 λ k ) to the data qubits iff the ancillary state is λ k = |10 in Figure 15.The green region represents encoding the state |Ψ A into ancillary qubits.The two pink regions represent how to conditionally activate and uncompute the flag qubit for the specific ancillary state |01 .The black dotted box illustrates how the block U (θ 1 λ k ) is conditionally applied to the data register for the specified ancillary state |λ k = |01 .
In the training process, we employ MMD defined in Eqn.(11) as the loss function.By measuring the data register and the ancillary register, the joint distribution q(x i , λ) is obtained by q where |Φ x,λ refers to the entanglement quantum states generated by BQC.

B. Expressive Power of BQC and AD-MPQCs
We first prove that BQC can be formulated by string bond states (SBS) and discuss the expressive power of FIG. 15: An example of conditionally applying U (θ BQC and AD-MPQC.By exploiting the connection between BQC and SBS, we prove that if the layout of the quantum gates in each block of AD-MPQCs is allowed to be varied, the AD-MPQC can be efficiently formulated by general tensor networks (GTNs) [23].
The central idea in formulating BQC by SBS is to treat all blocks controlled by the same ancillary state as a string operator, as defined in Eqn.(5).Given N data qubits and M ancillary qubits, the maximum number of string operators is |λ| = 2 M and the generated quantum state is where α i stands for the probability amplitude of state |λ i with i |α i | 2 = 1.Since x i |x j = δ ij , the generated states corresponding to different ancillary quantum states are independent.Analogous to the string operator A ji,s ai defined in Eqn. ( 5) that is conditionally controlled by s, this mutually independent property guarantees that block U (θ j λi ) is conditionally operated with the data register iff the ancillary state is |x i .
When there is only one ancillary quantum state |λ| = 1, the number of string operators is one and BQC is equivalent to MPQC.This implies that the expressive power of BQC cannot be worse than that of MPQCs.Additionally, since BQC is a special case of AD-MPQCs, the expressive power of AD-MPQC cannot be worse than that of BQC.Therefore, from the perspective of the entanglement entropy, the expressive power of BQC and AD-MPQCs cannot be worse than that of MPQCs.Since the post-IQP can be efficiently formulated by both AD-MPQCs and BQC, a better expressive power of BQC is obtained compared to MPQCs from the perspective of computational complexity.
The main difference between general tensor networks (GTNs) and regular tensor networks is that GTN allows us to reuse information from a tensor to another part of the network, as also called copy operation [23].GTN effec-tively combines different types of regular tensor networks into one network, which exponentially reduces the number of parameters for describing some functions compared to regular tensor networks.Figure 16 (a) gives an example of GTN, which is composed of tree tensor networks (denoted by blue dots) and SBS (denoted by orange and green dots).Two blue arrows indicate the copy operations.Since the essence of the copy operation is independence, i.e., the orange and green dots are independent of each other, AD-MPQCs can efficiently represent such an independent relation through employing the ancillary register.As shown in Figure 16 (b), if the layout of quantum gates in each block is allowed to be varied, a quantum circuit corresponding to the GTN illustrated in the Figure 16 (a) is constructed.Although AD-MPQCs can be formulated by GTN, whether there exists some quantum states can be efficiently simulated by AD-MPQCs that are hard for BQC is an open question.
FIG. 16: The left panel illustrates an example of a general tensor network, composed of tree tensor networks and string bond states.The right panel illustrates the corresponding quantum circuit.

V. NUMERICAL EXPERIMENTS A. Generating Bar-and-Stripe Dataset
To demonstrate the advancements of the proposed BQC, we firstly use BQC to accomplish generative tasks, e.g., generating 2 × 2 and 3 × 3 bars and stripes (BAS) dataset.BAS dataset is composed of vertical bars and horizontal stripes, and some examples of BAS are shown in Figure 17 (a).For n × m pixels, the number of images that belongs to BAS is N BAS = 2 n + 2 m − 2. The target distribution of such a generative task is denoted as p(x), where p(x i ) = 1/N BAS iff x i is a valid BAS image.The generated probability distribution of BQC q(x) = λi∈λ q(x, λ i ) aims to approximate the targeted distribution p(x), where the x refers to the generated the images, |λ| refers to the number of valid BAS patterns, and q(x, λ i ) refers to the probability distribution of the generated images given specific λ i .
We compare the generative performance of BQC with two existing MPQCs in the literature, i.e., data driven quantum circuit learning (DDQCL) [30] and quantum circuit born machine (QCBM) [31].Two major differences between DDQCL and QCBM are the layout of CNOT gates in each block and the optimization methods.In DDQCL, the topology of CNOT gates is based on the topology of qantum devices, such as chain, star and all connections.A gradient-free optimization approach is employed, i.e., the swarm optimization algorithm.In QCBM, the topology of CNOT gates is determined by the Chow-Liu tree algorithm, which is inspired by the graphical models to efficiently extract information from training data among different nodes.The unbiased gradient-based optimization approach is employed in the training process.In accordance with the conventions in previous study, in BQC, all BAS patterns are encoded in the qubits, where each data qubit stands for a pixel of the BAS image.
For the task of generating BAS images, the prior is a uniform distribution, since all BAS images are expected to generated with the same probability.Through applying K blocks to the ancillary register with M qubits, the generated quantum state |λ is formulated as Since the BAS patterns are encoded into the qubits, the total number of data qubits is N = n × m.For the specified ancillary state λ = λ i , L blocks {U (θ i λi )} L i=1 are conditionally applied to the N data qubits, where total |λ|L blocks are required in BQC.Since there exists a one-toone mapping that each λ i aims to represent a specific BAS image, we have q(x = x i ) = q(x = x i , λ = λ i ), where q(x = x i , λ = λ j ) = 0 for i = j.We remark that it is a special case in generative tasks.
We first train BQC to generate BAS images with 2 × 2 pixels, where N BAS = 6 valid images are expected to be generated uniformly after learning.In the experiment, the numbers of data qubits and ancillary qubits are set to be N = 4 and M = 3, respectively.Since the prior distribution is known, the parameters of K = 2 blocks {U j (γ)} 2 j=1 are fixed, where the generated state is 2 j=1 U j (γ) |0 ⊗M = 1/ √ 6 i |λ i with λ i ∈ λ and |λ| = 6.In the numerical simulation, we use the function provided by QVM to directly generate the prior distribution p(λ).In the learning process, we set L = 2 blocks {U j (θ i λi )} 2 i=1 for the specified ancillary quantum state, where each block only contains 4 CR Y (α) gates (interacting with 4 data qubits separately) and the number of Toffoli gates is also 4 that connect two qubits in sequence, as illustrated in Figure 15.Total 48 trainable parameters are updated in the learning process.
When BQC is applied to generate 3 × 3 BAS images, with N BAS = 14, the numbers of data qubits N and ancillary qubits M are set as 9 and 4, respectively.A uniformly ancillary state is first generated by using the function provided by QVM.Analogous to the 2 × 2 BAS case, we set L = 2 and each block contains 9 CR Y (α) gates (interacting with 9 data qubits separately) and 9 Toffoli gates.Therefore, total 112 parameters are updated in the learning process.
Since QVM allows us to read the quantum states directly, the distribution of BAS images can be accessed accurately as measuring infinite times.The experimental results are illustrated in Figure 17.Here we define the accuracy as N BAS /N , where N represents the total number of generated images and N BAS represents the number of generated images that has BAS patterns.As shown in Table I, BQC outperforms state-of-the-art PQCs, where the accuracy to generate BAS 2 × 2 and 3 × 3 images is 99.96% and 98.65%, respectively.How to learn a prior distribution q(λ) efficiently and accurately is one critical topic in machine learning, e.g., to learn the class priors in semi-supervised learning.Meanwhile, class priors are also important in learning very sparse data and developing binary classifiers to discriminate positive and unlabeled data [62,63] To confirm the effectiveness of BQC to learn class prior distributions q(λ) from given data, we devise a toy model.Specifically, the training data (referred to the test data with unlabeled class in the above example) are sampled form a joint distribution p(x, λ), i.In the training process, we estimate two sets of targeted coefficients, i.e., p(λ 1 ) = 0.7, p(λ 2 ) = 0.  II.The small variance is mainly caused by that the limited parameters θ cannot approximate N 1 and N 2 well.We remark that different initial parameters have subtle influences to the convergence but the number of measurement determines if the loss can be converged.Here QVM * stands for employing a general optimization method, while QVM employs the unbiased estimation optimization method.

VI. CONCLUSION AND DISCUSSION
In this paper, our first contribution is on evaluation of the expressive power of MPQCs, TPQCs and classical neural networks.Characterized by the entanglement entropy, we prove that MPQCs, TPQCs, long range RBM and DBM can efficiently simulate the quantum state satisfying the volume law, which cannot be efficiently simulated by the short range RBM.We next prove that MPQCs can efficiently simulate probability distributions generated by an IQP circuit.These distributions are difficult to simulate efficiently by classical neural networks unless the polynomial hierarchy collapses.We therefore see that MPQCs have stronger expressive power than TPQCs and classical neural networks.
Our second contribution is the proposal of BQC to accomplish Bayesian learning tasks.BQC is a special case of AD-MPQCs that can efficiently simulate the probability distribution generated by post-IQP circuits.It has stronger expressive power over MPQCs without ancillary qubits.In addition, the post-selection operation enables BQC to accomplish machine learning tasks without knowledge about prior distributions.We perform two numerical simulations to validate the effectiveness of BQC.The first numerical simulation uses BQC to generate BAS images, in which BQC outperforms state-of-the-art PQCs.The second numerical simulation uses BQC to learn the class prior distribution, which is highly desirable for semisupervised learning.The simulation results demonstrate that BQC can accurately estimate the prior distributions.
These two tasks can be efficiently implemented on near term quantum devices.
Parameterized quantum circuit (PQC) is a hybrid quantum classical learning scheme that has accomplished various learning tasks using a limited number of quantum gates and a shallow quantum circuit depth.With the benefit of the strong expressive power and efficient implementation on near-term quantum devices, PQCs have the potential to tackle practical problems with quantum advantages.One future direction is to explore how to use PQCs to solve practical machine learning problems and to investigate whether the proposed quantum learning model can provide a definitive quantum advantage.

FIG. 8 :
FIG.8:A toy example to demonstrate that a probability distribution cannot be efficiently generated by TPQCs.

FIG. 11 :
FIG. 11: The left panel illustrates an equivalent circuit described by SWAP operation.The right panel shows the implementation of SWAP by two CNOT gates and one reversed CNOT gate.

FIG. 13 :
FIG.13:A general framework of AD-MPQC.The arrangement of quantum gates in each block is identical.

FIG. 17 :
FIG. 17: The generative results obtained from DDQCL, QCBM, and our model.Since the BAS dataset can be regard as a set of binary images, it can be mapped into different integers, as the x-axis of figures.Figure (b) (c) (e) are the generated result of 2 × 2 BAS images using QCBM, DDQCL, and BQC respectively.Figure (d) and (f) are the generated results of 3 × 3 BAS images using QCBM and BQC, respectively.
Figure (d) and (f) are the generated results of 3 × 3 BAS images using QCBM and BQC, respectively.
Theorem 7. MPQCs can efficiently simulate any IQP circuits with N qubits and O(poly(N )) commuting gates, with at most O(poly(N )) blocks, where each block contains no more than 7N single qubit gates and N − 1 CNOT gates.Proof.A general IQP circuit is shown in Figure9.Before