Maximising Quantum-Computing Expressive Power through Randomised Circuits

In the noisy intermediate-scale quantum era, variational quantum algorithms (VQAs) have emerged as a promising avenue to obtain quantum advantage. However, the success of VQAs depends on the expressive power of parameterised quantum circuits, which is constrained by the limited gate number and the presence of barren plateaus. In this work, we propose and numerically demonstrate a novel approach for VQAs, utilizing randomised quantum circuits to generate the variational wavefunction. We parameterize the distribution function of these random circuits using artificial neural networks and optimize it to find the solution. This random-circuit approach presents a trade-off between the expressive power of the variational wavefunction and time cost, in terms of the sampling cost of quantum circuits. Given a fixed gate number, we can systematically increase the expressive power by extending the quantum-computing time. With a sufficiently large permissible time cost, the variational wavefunction can approximate any quantum state with arbitrary accuracy. Furthermore, we establish explicit relationships between expressive power, time cost, and gate number for variational quantum eigensolvers. These results highlight the promising potential of the random-circuit approach in achieving a high expressive power in quantum computing.


I. INTRODUCTION
In quantum computing, a central issue is to solve problems using as few quantum gates as possible.The reason is that the accuracy of quantum gates is inevitably degraded by noise, causing the possibility of errors in each gate operation.Consequently, when the gate number increases, quantum computing becomes more and more unreliable, ultimately resulting in catastrophic failure.Therefore, the number of gates must be minimised for realising any quantum computing application.This issue is particularly relevant for noisy intermediate-scale quantum (NISQ) computers, in which large-scale quantum error correction is unavailable [1].
The success of VQAs depends on the expressive power of the variational wavefunction [36][37][38][39][40]. Since the variational wavefunction can only express a subset of all possible quantum states, a fundamental assumption is that the target quantum state is within this subset.Therefore, it is desirable to use a variational wavefunction with higher expressive power, capable of exploring a larger portion of the entire state space.By employing such a variational wavefunction, VQA has a better chance of finding the solution or attaining higher accuracy, as depicted in Fig. 2. To maximise the expressive power, we confront two challenges.First, regarding the practical implementation, the gate number limits the circuit size and the number of parameters.Second, highly expressive circuits usually resemble the unitary t-design, which can lead to difficulties during the training process due to the problem of vanishing gradients [41][42][43][44].Despite the challenges, much attention has been devoted to designing circuits [45][46][47][48][49][50][51][52][53].However, it remains largely unexplored that the expressive power can be improved by optimising the strategy of using circuits in the quantum-classical feedback loop.
In this work, we develop a new paradigm of utilising quantum circuits in VQAs, within which we can systematically increase the expressive power while the gate number stays the same.Instead of a deterministic quantum circuit, we consider a variational wavefunction that is the average of wavefunctions generated by random circuits, in the form |ψ(λ)⟩ = E[e iγ(θ) |ϕ(θ)⟩].In contrast to deterministic-circuit VQAs, the distribution and phase functions are parameterised and optimised.Specifically, we propose to parameterise the distribution and phase with artificial neural networks (ANNs) [54].In the spirit of VQAs, we solve problems by optimising the parame- terised distribution and phase functions, as illustrated in Fig. 1(b).
The key feature of the random-circuit approach is the trade-off between the expressive power and the time cost.When the gate number is fixed, we can increase the expressive power with an enlarged time cost.This feature is because of the statistical error caused by introducing randomness.To suppress the statistical error, we must repeat the measurement more times than in the deterministic-circuit approach, resulting in an enlarged time cost.We propose methods to control this time cost.When we apply a strong constraint on the time cost, the distribution is close to a delta function.In this case, the time cost and expressive power of |ψ(λ)⟩ are almost the same as |ϕ(θ)⟩.When we apply a weak constraint on the time cost, distributions with more randomness are allowed.Then, the time cost is larger, and the expressive power becomes higher.
In the first part of this paper, we describe a general framework of VQAs using random circuits.Besides the framework, we also analyse the statistical error and introduce two methods for controlling the time cost.In the second part, we present a simple numerical demonstration taking the ground-state problem (i.e. the VQE algorithm) as an example.The numerical result illustrates the trade-off between expressive power and time cost.In the third part, we give a set of theorems to justify the performance and potential of the random-circuit approach.
The rigorous theoretical results are listed as follows.Firstly, in the low-cost limit, the expressive power of |ψ(λ)⟩ is not lower than |ϕ(θ)⟩.This result is obtained using a rectifier ANN [55][56][57] to parameterise the distribution function.Secondly, to justify the power-cost trade-off, we show that increasing the time cost always enlarges the expressive power.Thirdly, in the high-cost limit, we prove the universal approximation theorem of the random-circuit approach: For many finite-size ansatz circuits, we can approximate an arbitrary state to arbitrary accuracy as long as the time cost is sufficiently large.Fourthly, to have a concrete discussion about this trade-off feature, we specifically focus on the groundstate problem and take the Hamiltonian ansatz circuit [58] as an example.We obtain an upper bound of the error in approximating the ground state.We find that the error upper bound is a monotonic decreasing function of the time cost.Subject to a proper initial state, the error upper bound always vanishes in the limit of large time cost.Lastly, although VQAs are mainly devel-oped for NISQ devices, we show the long-term potential of the random-circuit approach by analysing its performance with an increasing gate number.We prove that given a finite time cost, the error upper bound decreases with the gate number in the circuit.The gate number scales with the permissible error ϵ as O(1/ϵ 2 ).

II. VARIATIONAL QUANTUM-CIRCUIT MONTE CARLO
In this section, firstly, we introduce concepts necessary for understanding the random-circuit approach.Secondly, we give a general formalism of the approach.Finally, we analyse the statistical error and relate it to a quantity that can be evaluated with a quantum computer, and we introduce the methods for controlling the time cost.

A. Preliminaries
Our approach is called variational quantum-circuit Monte Carlo (VQCMC), which is inspired by quantum Monte Carlo in classical computing [59,60].Quantum Monte Carlo methods combined with quantum computing have recently proposed, including auxiliary-field Monte Carlo [61,62], Green's function Monte Carlo [62,63], variational Monte Carlo [62,64,65], full configuration interaction Monte Carlo [66,67] and stochastic series expansion Monte Carlo [68].VQCMC consists of two main components: sample space and guiding function.In this section, we introduce the two components and present a way of parameterising the guiding function using an ANN.

Sample space
The sample space Ω is a set of quantum states from which we draw random samples.A quantum computer can prepare states in the form |ϕ(θ)⟩ = U (θ)|0⟩ ⊗n , where |0⟩ ⊗n is the initial state of n qubits, U (θ) is the unitary operator of a parameterised quantum circuit, and θ is a vector of parameters.An example of the parameterised circuit is illustrated in Fig. 1(b).For those familiar with conventional deterministic-circuit VQAs, U (θ) could be one of the prominent ansatz circuits, e.g.unitary coupled cluster ansatz [3], hardware-efficient ansatz [4], Hamiltonian ansatz [58], ADAPT-VQE [69], etc.All these states generated by the circuit form the sample space Ω = {|ϕ(θ)⟩ | θ ∈ Θ}, where Θ denotes the space of parameters (Θ ⊆ R K when the circuit has K real parameters).

Guiding function
The guiding function α(θ; λ) determines the distribution of states in sample space.It also determines the phase factor e iγ .In this work, we parameterise the guiding function as ANN, as shown in Fig. 1(b).ANNs, consisting of interconnected neurons, has the ability to approximate a function with an arbitrary accuracy [70][71][72].Recently, they have been used to represent wavefunctions in classical computing [73][74][75][76][77][78].In our case, we use an ANN to parameterise the guiding function α(θ; λ).The input to the network is the circuit parameter vector θ, and the output is a complex number α.The map from input to output depends on network parameters, represented by λ.Details of ANN are given in Sec.II A 3. We solve problems by optimising λ.
The distribution of states |ϕ(θ)⟩ is described by a probability density function that is proportional to the guiding function, i.e. where is the normalisation factor.Note that θ is the random variable, and λ represents the parameters of the distribution.Given the guiding function, we can sample θ from the distribution using Markov chain Monte Carlo methods, e.g. the Metropolis-Hastings algorithm [79].And the phase factor is e iγ(θ;λ) = α(θ; λ) |α(θ; λ)| . (3)

Rectifier artificial neural network
We take the rectifier ANN as an example.Rectifier ANN is a feed-forward neural network with the rectifier activation function ReLU(z) = max{0, z} [80].The activation function is an essential part of ANNs, making them have nonlinear expression ability.Activation functions include the rectifier activation function, sigmoid activation function, tanh activation function, etc. Rectifier feed-forward neural networks is one of the most generally used activation functions.Rectifier feed-forward neural networks, one of the most commonly used activation functions, stand out as a potent tool in various applications due to their exceptional attributes such as rapid convergence rate, swift learning speed, and streamlined expression for simplified calculations.
In the theoretical analysis, we take the rectifier ANN with only one hidden layer to parameterise the guiding function, see Fig. 3.It is straightforward to generalise to the case of multiple hidden layers and other activation functions.In the numerical experiment, we use a neural network with two hidden layers to have better training results.Without loss of generality, we suppose that the circuit parameter θ = (ϑ 1 , ϑ 2 , . . ., ϑ K ) T is an K-dimensional real vector (For clarity, we use the curly theta to denote the component of θ).Then the input layer has K neurons.Suppose the hidden layer has L neurons.The output of the hidden layer is an L-dimensional real vector h = ReLU(wθ + b).The output layer has only two neurons corresponding to the magnitude and phase of the guiding function α, respectively.Their outputs are two real numbers where , and F (θ) is a prior guiding function (a complex-valued function in general).
We introduce the prior guiding function to incorporate knowledge/intuition about the target state, such as which states |ϕ(θ)⟩ may contribute more to the target state.We can also use the prior guiding function to confine θ to a subset of Θ, i.e. taking F (θ) = 0 for θ outside the subset.Without any intuition or confinement, we can take F (θ) = 1.

B. General formalism
The core idea of VQCMC is expressing a quantum state as a weighted average of states in the sample space.Given the expression, we can evaluate the expected value of operators with the Monte Carlo method and then the loss function used to find optimal parameters.
In VQAs, we solve problems by optimising a variational wavefunction.In the deterministic-circuit approach, if U (θ) is the ansatz circuit, the state |ϕ(θ)⟩ is the variational wavefunction, i.e. we aim to find the solution in the state subset Ω.
In many VQAs, we determine the parameters of the variational wavefunction by minimising a loss function.In this work, we focus on VQAs in this category, including VQE, quantum approximate optimisation algorithm and quantum neural networks.However, we would like to note that the random-circuit approach can also be employed in variational quantum simulators that update parameters following certain differential equations.
We take VQE as an example, in which the loss function is the expected value of a Hamiltonian operator H.Because the variational wavefunction |ψ(λ)⟩ is unnormalised, the expected value of H reads By minimising the loss function, we can find the optimal |ψ(λ)⟩ to approximate the ground state of H.
In general, the loss function is a function of such expected values.We define the expected value of an operator O in the state |ψ(λ)⟩ as and the expected value in the normalised state as where 1 1 is the identity operator.With the notations, the loss function in VQE can be rewritten as L(λ) = ⟨H⟩(λ).
Let O 1 , O 2 , . . .be a set of operators.The general loss function reads We need to evaluate such loss functions to carry out VQAs.
Next, we present the method of evaluating an expected value ⟨O⟩(λ) by sampling states in Ω, and we also give a detailed pseudocode.

4:
Xθ,θ ′ ← QuantumComputing(λ,θ,θ ′ ,O) 5: Given the expression of the variational wavefunction in Eq. ( 5), we can rewrite the expected value as In this way, we have transformed the expected value in a quantum state into the expected value of the quantity According to the Monte Carlo method, we can evaluate ⟨O⟩(λ) by computing the sample mean in randomly drawn (θ, θ ′ ).We measure the quantity X θ,θ ′ on a quantum computer.We can use the Hadamard test [81] to measure such a quantity, including methods with and without ancilla qubit.See Appendix A for a brief review of these methods.Regardless of which method we use, quantum computing outputs an estimate of X θ,θ ′ , denoted by Xθ,θ ′ .Due to the randomness in quantum measurement, Xθ,θ ′ is usually inexact.In what follows, we suppose that Xθ,θ ′ is unbiased and has a finite variance, which is true in the Hadamard test.
The pseudocode for evaluating ⟨O⟩(λ) is shown in Algorithm 1. ⟨O⟩ is the output value.

C. Statistical errors
There are two sources of statistical errors: quantum measurement and randomised circuits.We will show that both of them can be suppressed by taking a large sample size M (see Algorithm 1).We will also show that statistical errors are amplified when ⟨1 1⟩(λ) is small.Therefore, we have to take a larger sample size (i.e.time cost) when ⟨1 1⟩(λ) is smaller.Through ⟨1 1⟩(λ), we can control the time cost.
For the first error source, when we use a quantum circuit to measure the quantity X θ,θ ′ , the measurement outcome in each run of the circuit is random.To reduce this statistical error due to quantum measurement, we can repeat the measurement.For example, in the Hadamard test, we take Xθ,θ ′ as the mean of outcomes in the repeated measurements.Then, the variance of Xθ,θ ′ decreases with the number of measurements, denoted by M Q .In the ancilla-qubit Hadamard test, suppose O is a unitary Hermitian operator, the variance of Xθ,θ ′ has the upper bound The result is similar for a general operator: for a general operator, σ 2 O depends on properties of O and details of the measurement protocol, see Appendix A.
For the second error source, ⟨O⟩ is the mean of the random variable X θ,θ ′ , which has fluctuation.
Overall, the variance of ⟨O⟩ has two terms and is proportional to 1/M : where Here, A and B correspond to the first and second error sources, respectively.
Let ∥O∥ be the spectral norm of O.Then, B ≤ ∥O∥ 2 , and the overall variance has the upper bound When σ 2 O ∝ 1/M Q , it is optimal to take M Q = 1 to minimise the total number of measurements M M Q required for achieving a certain value of the variance.
To complete the discussion on statistical properties of ⟨O⟩, we note that when Xθ,θ ′ is unbiased, ⟨O⟩ is also unbiased.

Sign problem
In classical computing, quantum Monte Carlo suffers from the sign problem.It arises when the probability distribution sampled in the Monte Carlo simulation has a complex phase, resulting in a cancellation of positive and negative contributions.VQCMC has a similar problem: when the phase of X θ,θ ′ is oscillatory, the absolute value of ⟨O⟩ is small; then the relative error is large.The large relative error can lead to a large absolute error when evaluating expected values in the normalised state ⟨O⟩.
Let δ O = ⟨O⟩ − ⟨O⟩ be the error in ⟨O⟩.The error in evaluating ⟨O⟩ is We can find that the error is amplified by the factor of 1/⟨1 1⟩.To suppress the error, we have to take a sufficiently large M , such that δ O and δ 1 1 are sufficiently small compared with ⟨1 1⟩.Therefore, when ⟨1 1⟩ is small, the sample size is large.This observation is summarised in the following theorem, and the proof is in Appendix B.
Theorem 1.Let ε and κ be any positive numbers.When the sample size satisfies where the statistical error is smaller than ε with the probability

Modified loss functions and time-cost control
According to Theorem 1, we can control the time cost (i.e. the sample size) by confining the variational wavefunction in the region where ⟨1 1⟩ is large.Here, we propose two methods for this purpose.The first method is generic for all VQAs that work through minimising a loss function.The second method is specific to VQE.We will give a rigorous theoretical justification of the second method.
In the first method, we modify the loss function and take where the tanh function is added to the raw loss function.
The tanh function plays the role of a barrier at ⟨1 1⟩(λ) = x (see Fig. 4): In the vicinity of the barrier, the loss function increases rapidly (y determines how rapidly) when ⟨1 1⟩(λ) decreases.The height of the barrier is determined by z.With the tanh function, λ prefers to stay where ⟨1 1⟩ > x, thus controlling the time cost.Some other functions, e.g. the rectifier and sigmoid functions, can be used in a similar way (After a preliminary numerical test, we find that the tanh may perform better than the other two).
In the second method, we regularise the loss function of VQE by adding positive constants to the denominator and numerator, i.e.
Now we explain why such a loss function can prevent ⟨1 1⟩ from vanishing.The exact value of ⟨1 1⟩ is always positive.Adding η 1 1 to the denominator makes sure that the denominator is positive (with a certain probability) even with the presence of statistical error, noticing that ⟨ 1 1⟩ may be negative due to the statistical error.When the denominator is positive, adding η H to the numerator increases the value of the loss function.The increment is larger when the denominator is smaller.Therefore, minimising the modified loss function can prevent the denominator from vanishing.
By taking proper values of η 1 1 and η H , the modified loss function L ′′ (λ) has two good properties.First, it is variational, i.e.L ′′ (λ) ≥ E g , where E g is the ground-state energy of H. Therefore, minimising L ′′ (λ) always leads to a better result of the ground-state energy.The estimator of the modified loss function inherits this property.Second, we have an analytical upper bound of the statistical error in the energy (The raw loss function L(λ) = ⟨H⟩(λ) is the expected value of the energy in the normalised state).These two properties as summarised in the following theorem, and the proof is in Appendix C.
Theorem 2. Let κ be any positive number.Take An estimate of the modified loss function with a probability of at least 1 − 2κ.
We can find that the estimator of the modified loss function is variational when the true ground-state energy E g is negative.The negativity condition can always be satisfied by subtracting a sufficiently large positive constant from the Hamiltonian, i.e. taking H ← H − E const .When E g is negative, L′′ ≥ E g holds in the entire parameter space up to a controllable probability (Notice that ⟨H⟩(λ) ≥ E g ).

III. NUMERICAL DEMONSTRATION
In this section, we demonstrate the random-circuit approach by numerical simulations.We take VQE as an example of VQAs and solve the ground state of an antiferromagnetic Heisenberg model.We choose the barrier method to control the time cost, and we illustrate the trade-off between expressive power and time cost.The anti-ferromagnetic Heisenberg model is on a randomly generated graph, as shown in Fig. 5. On the graph, each vertex represents a spin-1/2 particle, and each edge represents the interaction between the two spins.The Hamiltonian reads where X i , Y i and Z i denote Pauli operators of the ith qubit, J is the coupling strength, and ⟨i, j⟩ denotes two connected spins on the graph.We take the coupling strength J such that the Hamiltonian is normalised by the spectral norm, i.e. ∥H∥ = 1.We parameterise the circuit with a simplified version of the Hamiltonian ansatz.The circuit has only two parameters θ XY and θ Z : where N T denotes the number of gate layers, R 0 prepares the pairwise singlet state and is the singlet state of spins i and j.With this simplified Hamiltonian ansatz circuit, we can plot the energy landscape as shown in Fig. 6.This energy landscape has undesired properties.First, it has a rugged surface.When using gradient descent and randomly choosing the initial point, we find that the point easily falls into a local minimum.Second, even for the global minimum, its error in the energy is about 0.05 (The exact ground-state energy is −1, and the global minimum is about −0.95).In other words, the expressive power of the ansatz circuit is insufficient for approximating the ground state to achieve an error smaller than 0.05.Therefore, the performance of the ansatz circuit is poor in the deterministic-circuit approach.
In the random-circuit approach, we can solve the ground state to a satisfactory accuracy even with the poor-performance ansatz circuit.By controlling the time cost, we can systematically increase the expressive power and approaches the exact solution with a sufficiently large time cost.
In the random-circuit approach, we parameterise the guiding function using a rectifier ANN with two hidden layers, each consisting of 200 neurons.We take θ = (θ XY , θ Z ) as the input to ANN.For the output function, we take W (A) = e −A .We choose the prior guiding function F (θ) = N θ i=1 δ(θ − θ i ), where θ i are N θ = 100 uniformly generated points in the parameter space.With this prior guiding function, we effectively utilise a finite sample space to simplify the numerical simulation.
Implementing the barrier method requires a proper initial value of the parameter λ.Suppose the initial λ is randomly chosen in the parameter space, the corresponding value of ⟨1 1⟩(λ) may be small and violate the ⟨1 1⟩(λ) > x restriction.In this case, the statistical error can be large and cause problems in the initial stage of training.To avoid this issue, we have to choose the initial value satisfying ⟨1 1⟩(λ) > x.We give two methods of doing this.First, we can take λ such that P (θ; λ) ≈ δ(θ − θ 0 ) is a delta-function distribution.For such a distribution ⟨1 1⟩(λ) ≈ 1.In Sec.IV A, we will show how to take λ to approximate the delta-function distribution by the ANN.Second, we can employ twostage training, as shown in Fig. 7.In the first stage, we can take L b (λ) = −z tanh ⟨1 1⟩(λ)−x y as the loss function In both stages, we take z = 5, y = 1, and we use a gradient descent algorithm Adam to minimise loss functions.
to find a value of λ satisfying ⟨1 1⟩(λ) > x.This tanh loss function is robust to the statistical error in ⟨1 1⟩(λ) even when ⟨1 1⟩(λ) is small [in contrast to L(λ), in which ⟨1 1⟩(λ) is the denominator].In the second stage, we take the value λ determined in the first stage as the initial value and minimise L ′ (λ).We obtain our numerical results with the two-stage method.
We set the barrier at x = 0.1, 0.2, . . ., 0.9.Then, the denominator ⟨1 1⟩(λ) is confined in the ⟨1 1⟩(λ) > x regime, respectively.For each value of x, we repeat the numerical experiment ten times.The results are shown in Fig. 8.We can find that the error decreases when we reduce the value of x, which illustrates the power-cost trade-off.

IV. THEORETICAL RESULTS ON THE EXPRESSIVE POWER
In this section, we present a few theoretical results on the expressive power of the random-circuit approach.First of all, let us introduce the ways of characterising expressive power and time cost.
We characterise the expressive power in two straightforward ways: the subset of variational wavefunctions and the error in the ground-state energy.Discussions based on the subset are general for various VQAs tasks.When focusing on VQE, we employ the energy error.
We define the variational-wavefunction subset of the |ψ(λ)⟩ as where Λ denotes the space of parameters Notice that states in this subset are normalised.The normalisation factor ⟨1 1⟩(λ) is related to the sample size M (see Theorem 1).Given a finite permitted run time, the sample size is finite, and we can only utilise a subset of V in VQAs: We cannot evaluate a state to an adequate accuracy when ⟨1 1⟩(λ) is too small.To reflect this time cost constraint, we define For a larger subset V x , its expressive power is higher.
As shown in Appendix D, there is a simple relation be-tween the variational-wavefunction subset and the covering number of the hypothesis space (a measure of expressive power).
For the variational-wavefunction subset V x , we define the minimum error in the ground-state energy as The aim of VQE is to find the ground state of a Hamiltonian.The error ϵ g describes how well the variational wavefunction can approximate the ground state.
We characterise the time cost with x.According to Theorem 1, the sample size M is proportional to 1/⟨1 1⟩(λ) 2 .When a larger M is taken, we can search for the solution in a subset V x with a smaller x.Specifically, given the sample size M , an upper bound of the statistical error ε and failure probability 2κ, we can search for the solution in V x with x = χ/(ε √ κM ).In short, when the time cost is larger, x is smaller, and vice versa.

A. Low-cost limit
The time cost of evaluating a state in the randomcircuit approach is determined by the normalisation factor ⟨1 1⟩(λ).The largest value that ⟨1 1⟩(λ) can take is one, i.e. ⟨1 1⟩(λ) = 1 corresponds to the low-cost limit.
Our first theoretical result is that Ω is a subset of the closure of V x (for all x < 1).In other words, we can express all states |ϕ(θ)⟩ ∈ Ω to arbitrary accuracy with the random-circuit variational wavefunction |ψ(λ)⟩, and the normalisation factor ⟨1 1⟩(λ) approaches one.Therefore, the expressive power of |ψ(λ)⟩ in the low-cost limit is not lower than |ϕ(θ)⟩.
We assume that |ϕ(θ)⟩ has a finite gradient with respect to θ.Let n be a unit vector in the space of θ.We assume there exists a positive number ξ such that for all n and θ, Here, ∥ • ∥ 2 denotes the ℓ 2 norm.This assumption is true for all ansatz circuits to the best of our knowledge.For example, if each parameter component ϑ i is the angle of a rotation gate e −iσiϑi , Under this assumption, we have the following theorem.Theorem 3. Suppose the guiding function α is parameterised as a rectifier ANN, the hidden layer has L ≥ 2K neurons, W (A) = e −A , and The proof is given in Appendix E. To approximate |ϕ(θ)⟩ with the random-circuit variational wavefunction, we consider values of ANN parameters as follows [see Fig. 9(a)], we use θ ′ to denote the input to the neural network for clarity): i) For the jth hidden neuron, we take w j,i = δ j,2i−1 − δ j,2i and b j = − i (δ j,2i−1 − δ j,2i )ϑ i ; ii) For the output neuron A, we take w A,j = τ and b A = K ln(2/τ ); and iii) For the output neuron B, we take w B,j = b B = 0.With these parameters, hidden neurons realise the modulus calculation, i.e.

B. Power-cost trade-off
Our second theoretical result is a simple observation that justifies the trade-off between expressive power and time cost.The following proposition is straightforward, and we present it without proof.It shows that the variational-wavefunction subset V x enlarges monotonically when x decreases.Therefore, the expressive power increases monotonically with the time cost.
C. High-cost limit: Universal approximation theorem Now, we discuss the extreme case of expressive power in the random-circuit approach.We ask whether all states in the Hilbert space can be approximated with the random-circuit variational wavefunction when an arbitrarily large time cost is allowed.
Suppose Ω is a spanning set of the entire Hilbert space, all states can be expressed in the form If we can approximate β(θ) with the guiding function α(θ; λ), we can approximate φ with the variational wavefunction |ψ(λ)⟩.Owing to the universal approximation theorem of ANNs, we can approximate an arbitrary continuous function to arbitrary accuracy [82].Therefore, we have the following theorem.The proof is in Appendix F.
Theorem 4. Suppose the guiding function α is parameterised as a rectifier neural network, the hidden layer has L neurons, W (A) = e −A , and F (θ) = 1.Under conditions i) Ω is a spanning set of the Hilbert space H, ii) the circuit parameter space Θ is compact, and iii) |ϕ(θ)⟩ has a finite gradient with respect to θ, for all normalised states |φ⟩ ∈ H and ϵ > 0, there exist L ∈ N and λ ∈ Λ such that The compact condition holds when each component is a rotation angle of a single-qubit gate, i.e. its value is in the interval [0, 2π].The spanning-set condition holds for many ansatz circuits, for example, a circuit with one layer of single-qubit rotation gates on the initial state |0⟩ ⊗n .

D. Hamiltonian ansatz: Energy error versus time cost
In what follows, we discuss the trade-off feature taking the ground-state problem and the Hamiltonian ansatz as an example.We take the error in the ground-state energy ϵ g (V x ) as the measure of the expressive power.We show that 1) an upper bound of ϵ g (V x ) decreases with 1/x; 2) the error upper bound approaches zero in the limit 1/x → ∞.It is noteworthy that the error upper bound approaches zero for Hamiltonian ansatz with any circuit depth.

Hamiltonian ansatz
The Hamiltonian ansatz is a parameterised Trotterisation circuit [58].Suppose that H = N H j=1 h j σ j , where σ j are Pauli operators, h j are real coefficients, and N H is the number of terms.The Hamiltonian ansatz reads where R 0 is a unitary operator that prepares the initial state is the unitary operator of one Trotter step, ω i,j is the angle of the jth rotation gate in the ith Trotter step, and N T is the number of Trotter steps.We take the circuit parameter vector as θ = (t, ω 1 , ω 2 , . . ., ω N T ), in which t is a redundant real parameter.The usage of the redundant parameter will be shown later.

ANN configuration and prior guiding function
We use a rectifier ANN with L ≥ 2 hidden-layer neurons to parameterise the guiding function, and we take We choose a non-trivial prior guiding function F (θ) inspired by the Pauli-operator-expansion formula of real-time evolution [83]; See Appendix G.The prior guiding function has the following property: To approximate the ground state, we consider values of ANN parameters as follows [see Fig. 9(b)]: i) We suppose that ϑ 1 = t is the first input neuron; ii) For the jth hidden neuron, we take w j,i = (δ j,1 − δ j,2 )δ i,1 N T and b j = 0, such that only the first two hidden neurons are non-trivial; iii) For the output neuron A, we take w A,j = (δ j,1 − δ j,2 )τ −1 and b A = 0; and iv) For the output neuron B, we take w B,j = (δ j,1 − δ j,2 )E g and b B = 0. Here, τ is a positive number.
With the above configuration of parameters, outputs are A = τ −1 N T t and B = E g N T t.Then, the corresponding variational wavefunction is The operator e − 1 2 (H−Eg) 2 τ 2 is a result of the integral over t [84], and it projects the initial state onto the ground state in the limit τ → +∞.

Projection onto the ground state
The operator e − 1 2 (H−Eg) 2 τ 2 partially projects the initial state onto the ground state when τ is finite.Let ∆ be the energy gap between the ground state |Ψ g ⟩ and the first excited state.When ∆ ≥ ( √ 2τ ) −1 , the error in the energy decreases exponentially with τ ; otherwise, the error is inversely proportional to τ .Lemma 1.Let E(λ) be the energy of the state in Eq. (41).Then, the error in the energy has the upper bound where p g = |⟨Ψ g |Ψ 0 ⟩| 2 and ∆ = max{∆, ( √ 2τ ) −1 }.
The proof is given in Appendix H.

Monotonic dependence on the time cost
First, we give a lower bound of the normalisation factor ⟨1 1⟩(λ), which determines the time cost.See the following lemma, and the proof is in Appendix I.
Lemma 2. Let ⟨1 1⟩(λ) be the normalisation factor of the state in Eq. (41).Then, the normalisation factor has the lower bound where and Then, with the lower bound in Eq. ( 43), we can conclude that the state in Eq. ( 41) with τ = c −1 ( 2πp g /x) is in the subset V x .Here, we have used that c(τ ) is strictly monotonic, and c −1 (•) denotes the inverse function.Then, we can apply the error upper bound in Eq. ( 42) to V x .We have the following result (which holds for all N T ), and the proof is straightforward.
Theorem 5. Take the Hamiltonian ansatz circuit.Suppose the guiding function α is parameterised as a rectifier neural network, the hidden layer has L ≥ 2 neurons, and With a proper prior guiding function F (θ), the error in the energy satisfies where τ = c −1 ( 2πp g /x).The error upper bound decreases monotonically with 1/x and approaches zero (τ approaches ∞) in the limit 1/x → ∞.
Notice that 1/x ∝ √ M .Therefore, the error upper bound decreases monotonically with the time cost.We remark that a condition of the above result is p g > 0, which is a requirement on the initial state |Ψ 0 ⟩.

E. Hamiltonian ansatz: Scaling with the circuit depth
In Theorem 5, the monotonic relation between the error upper bound and the time cost factor 1/x holds for all circuit depths N T .It raises the question: What is the advantage of using a more powerful quantum computer that can realise quantum circuits with more gates?We answer the question in this section.We show that for all x < p g , the error upper bound decreases with the circuit depth and vanishes in the limit of a large depth.
The upper bound of c(τ ) in Eq. ( 45) implies a lower bound of τ = c −1 ( 2πp g /x).When x < p g , Therefore, the upper bound of ϵ g (V x ) in Eq. ( 46) decreases with N T .In the limit N T → ∞, the error upper bound approaches zero.Let ϵ be the permissible error in the ground-state energy.According to Lemma 1, we can approximate the ground state and satisfy the permissible error ϵ by taking This value of τ determines the required circuit depth.Substituting τ into τ = c −1 ( 2πp g /x) and taking into account Eq. ( 47), we can work out N T .
Theorem 6.Take the same setup as in Theorem 5.For all ϵ > 0 and all x < p g , ϵ g (V x ) ≤ ϵ holds when Here, τ is given by Eq. (48).
Because the gate number is proportional to N T , the gate number scales as O(1/ϵ 2 ).

V. CONCLUSIONS
In this work, we have presented a method of realising a variational wavefunction with randomised quantum circuits in VQAs.This method can systematically increase the expressive power without changing the gate number.The cost is an enlarged time for evaluating a quantity in the Monte Carlo calculation.The trade-off between the expressive power and time cost is analysed theoretically and illustrated numerically.Especially in VQE, we have shown that the energy error, that is due to the finite expressive power, decreases with the time cost and eventually vanishes.These results demonstrate the viability and potential advantages of the random-circuit approach in VQAs, providing a pathway for improving the performance of quantum computing with the constraints of gate numbers and associated errors.i=1 Λi,i+1H1, where Λi,j is a controlled-NOT gate with the i-th qubit as the control qubit and the j-th qubit as the target qubit, and H1 is a Hadamard gate on the first qubit.The gate UGHZ prepares a GHZ state The conventional Hadamard test circuit requires an ancilla qubit [81], as shown in Fig. 10 (a) .In the final state of the circuit, the expected value of the ancilla-qubit Z Pauli operator is Therefore, at the end of the circuit, we measure the ancilla qubit in the computational (i.e.Z) basis.For each circuit shot, we obtain a measurement outcome µ = ±1, and The measurement protocol depends on the property of operator O.The simplest case is that O is unitary and Hermitian.In this case, we have where ν = γ(λ, θ) − γ(λ, θ ′ ).This quantity can be measured by taking V j = O and ν j = ν.If the number of circuit shots is M Q , the variance of Xθ,θ ′ has the upper bound σ 2 O = 1/M Q .When O is Hermitian, e.g. the Hamiltonian H in VQE, we can evaluate X θ,θ ′ with the Hadamard test incorporating Monte Carlo.First, we express the operator O as a linear combination of unitary operators, O = j a j V j .Here, a j are complex coefficients, and V j are unitary operators.Then, we can express X θ,θ ′ as We can measure the two parts separately.If we measure each part with M Q circuit shots, the variance of Xθ,θ ′ has the upper bound σ 2 O = 2C 2 O /M Q .

Ancilla-free Hadamard test
For certain models, the Hadamard test can be implemented without the ancilla qubit, e.g.fermion models with particle number conservation [62,85,86].The ancilla-free circuit with particle number conservation is illustrated in Fig. 10 (b).Suppose that the fermion system in states |ϕ(θ)⟩ contains m particle.If we take the Jordan-Wigner transformation for encoding fermions into qubits, qubit states |0⟩ and |1⟩ denote unoccupied and occupied fermion modes, respectively.Then, the number of ones is the same as the number of particles.Let's introduce an m-particle state as the initial state, |m⟩ = |1⟩ ⊗m ⊗ |0⟩ ⊗(n−m) .Given this m-particle initial state, we can prepare |ϕ(θ)⟩ via a particle-number-preserving transformation on |m⟩.Let Ū (θ) = U (θ) m i=1 X i , where X i is the X Pauli operator on the ith qubit.It can be verified that |ϕ(θ)⟩ = Ū (θ)|m⟩.The ancilla-free Hadamard test works under the condition that all Ū (θ) and V j are particle-number-preserving operators.In the final state, we have Z ⊗ |0⟩⟨0| ⊗(n−1) = Re e iνj ⟨ϕ(θ ′ )|V j |ϕ(θ)⟩ .(A5) Accordingly, we measure all qubits in the computational basis.The measurement outcome takes three values µ = 0, ±1: The outcome is the eigenvalue of the first-qubit Z Pauli operator subjected to the condition that all other qubits are in the state |0⟩; otherwise the outcome is zero.
Accordingly, we can express |φ⟩ as an integral, as given in Eq. ( 36), in which Now, inspired by the wavefunction in Eq. ( 35), we approximate β(θ) with a continuous function Here, we ave used that which can be proved as the same as in Appendix E.

FIG. 1 .
FIG. 1.(a) The schematic diagram of variational quantum algorithms using deterministic circuits.The problem is solved by finding the optimal circuit parameters θ to minimize loss function L (θ).(b) The schematic diagram of variational quantum algorithms using random circuits.Circuits are sampled according to a parameterised guiding function α (θ; λ), which determines the possibility distribution.Instead of θ, we optimise the parameterised guiding function α (θ; λ) to solve a problem.See Section II for details.

FIG. 2 .
FIG. 2. The subset of variational wavefunctions.In variational quantum algorithms (VQAs), the solution to a problem is represented by a quantum state.If the solution state is in the subset (e.g.state a) or close to the subset (e.g.state b) with a tolerable error ϵ, VQA can successfully solve the problem (up to a proper optimiser and other practical issues).If the solution state is far from the subset (e.g.state c), VQA fails.

FIG. 4 .
FIG. 4. The barrier term L b (λ) = −z tanh ⟨1 1⟩(λ)−x y in the modified loss function, where the parameter x specifies the barrier's position.We illustrate this with two examples: for y = 0.4 and y = 0.1, while holding z = 5 and x = 0.8 constant.

FIG. 6 .
FIG. 6.Energy landscape of the simplified Hamiltonian ansatz circuit with NT = 2.The colored lines represent the training process of finding the minimum value using the VQA method from different initial points.

FIG. 7 .
FIG. 7. The two-stage training in the barrier method.(a) In the first stage, we minimise the loss function L b and it ends when ⟨1 1⟩(λ) > x is satisfied.Here, we take x = 0.2, 0.4, 0.8 as examples.(b) In the second stage, we take the resulting value of λ trained in the first stage as the initial value, then we minimise the loss function L ′ (λ).L(λ) decreases with the training times.The colorbar represents the value of ⟨1 1⟩(λ).

FIG. 8 .
FIG.8.The ground-state energy estimated in the randomcircuit approach with different barriers.Each point represents the final value of L(λ) in a numerical experiment.For each value of x, the experiment is repeated for 10 times.The dashed line indicates the global minimum energy in the deterministic-circuit approach.

FIG. 9 .
FIG. 9. (a) The values of ANN parameters when approximating |ϕ(θ)⟩ with the random-circuit variational wavefunction.(b) The values of ANN parameters when considering Hamiltonian Ansatz to approximate the ground state.

FIG. 10 .
FIG. 10.(a) Hadamard test with an ancilla qubit.The rotation gate RZ (νj) = e −iZν j /2 .(b)Hadamard test without ancilla qubit.UGHZ = m−1i=1 Λi,i+1H1, where Λi,j is a controlled-NOT gate with the i-th qubit as the control qubit and the j-th qubit as the target qubit, and H1 is a Hadamard gate on the first qubit.The gate UGHZ prepares a GHZ state |GHZ⟩ = 1