Capacity and quantum geometry of parametrized quantum circuits

To harness the potential of noisy intermediate-scale quantum devices, it is paramount to find the best type of circuits to run hybrid quantum-classical algorithms. Key candidates are parametrized quantum circuits that can be effectively implemented on current devices. Here, we evaluate the capacity and trainability of these circuits using the geometric structure of the parameter space via the effective quantum dimension, which reveals the expressive power of circuits in general as well as of particular initialization strategies. We assess the expressive power of various popular circuit types and find striking differences depending on the type of entangling gates used. Particular circuits are characterized by scaling laws in their expressiveness. We identify a transition in the quantum geometry of the parameter space, which leads to a decay of the quantum natural gradient for deep circuits. For shallow circuits, the quantum natural gradient can be orders of magnitude larger in value compared to the regular gradient; however, both of them can suffer from vanishing gradients. By tuning a fixed set of circuit parameters to randomized ones, we find a region where the circuit is expressive, but does not suffer from barren plateaus, hinting at a good way to initialize circuits. We show an algorithm that prunes redundant parameters of a circuit without affecting its effective dimension. Our results enhance the understanding of parametrized quantum circuits and can be immediately applied to improve variational quantum algorithms.

To harness the potential of noisy intermediate-scale quantum devices, it is paramount to find the best type of circuits to run hybrid quantum-classical algorithms. Key candidates are parametrized quantum circuits that can be effectively implemented on current devices. Here, we evaluate the capacity and trainability of these circuits using the geometric structure of the parameter space via the effective quantum dimension, which reveals the expressive power of circuits in general as well as of particular initialization strategies. We assess the expressive power of various popular circuit types and find striking differences depending on the type of entangling gates used. Particular circuits are characterized by scaling laws in their expressiveness. We identify a transition in the quantum geometry of the parameter space, which leads to a decay of the quantum natural gradient for deep circuits. For shallow circuits, the quantum natural gradient can be orders of magnitude larger in value compared to the regular gradient; however, both of them can suffer from vanishing gradients. By tuning a fixed set of circuit parameters to randomized ones, we find a region where the circuit is expressive, but does not suffer from barren plateaus, hinting at a good way to initialize circuits. We show an algorithm that prunes redundant parameters of a circuit without affecting its effective dimension. Our results enhance the understanding of parametrized quantum circuits and can be immediately applied to improve variational quantum algorithms.

I. INTRODUCTION
Quantum computers promise to tackle challenging problems for classical computers such as drug design, combinatorial optimisation and simulation of many-body physics. While fully-fledged large-scale quantum computers with error correction are not expected to be available for many years, noisy intermediate-scale quantum (NISQ) devices have been investigated as a way to approach computationally hard problems with quantum processors available now and in the near future [1,2]. Variational quantum algorithms (VQA) [3][4][5][6] have been a major hope in achieving a quantum speedup with NISQ devices. The core idea is to update a parametrized quantum circuit (PQC) in a hybrid quantum-classical fashion. Measurements performed on the PQC are fed into a classical computer to propose a new set of variational parameters. A key challenge has been the occurrence of barren plateaus, i.e. the gradients used for optimisation vanish exponentially with increasing number of qubits [7], as well as for various types of cost functions [8], entanglement [9] and noise [10]. Further, the classical optimization part of variational algorithms was shown to be NPhard [11]. Quantum algorithms that avoid the feed-back loop to circumvent the barren plateau problems have been proposed [12][13][14][15][16][17][18]. Besides this approach, initialization strategies [19][20][21] and layer-wise learning [22] for VQA could help to solve the aforementioned problems. However, tools to evaluate the power of these strategies are lacking. Hardware efficient ansätze have been proposed to tailor a PQC to the restrictions of the hardware [23]. A widely used choice is quantum circuits ar- * thaug@ic.ac.uk ranged in layers of single-qubit rotations followed by twoqubit entangling gates. However, a key question is the space of possible states this ansatz type can express [24][25][26][27].
Here, we introduce the effective quantum dimension G C and parameter dimension D C as a quantitative measure of the capacity of a PQC. Parameter dimension D C measures the total number of independent parameters a quantum state defined by the PQC can express. In contrast, the effective quantum dimension G C [28,29] is a local measure to quantify the space of states that can be accessed by locally perturbing the parameters of the PQC. Both measures can be derived from the quantum geometric structure of the PQC via the quantum Fisher information metric (QFI) F [30,31]. From the QFI, one can obtain the quantum natural gradient (QNG) for a more efficient optimisation via gradients [30][31][32]. These methods allow us to evaluate the expressive power, trainability and number of redundant parameters of different PQCs, and find better initialization strategies.
As demonstration of our tools, we provide an in-depth investigation of popular hardware-efficient circuits, composed of layered single-qubit rotations and two-qubit entangling gates in various arrangements. We find striking differences depending on the choice of circuit structure that affect both the expressive power of the PQC in general as well as the quality of specific initialization strategies. We calculate the number of redundant parameters of various PQC types, as well as how fast they converge towards random quantum states as a function of the number of layers. The choice of entangling gate has a pronounced effect on the expressive power of particular initialization strategies.
We reveal a transition in the spectrum of the QFI in deep circuits, which leads to a decay of the QNG. For shallow circuits, the QNG can be orders of magnitude parameters θ and initial state |0 of all N qubits being in state zero. The PQC consists of an initial layer of √ H d gates applied to each qubit, where H d is the Hadamard gate, followed by p repeated layers of parametrized single qubit rotations V l (θ l ) and entangling gates W l . V l (θ l ) consists of single-qubit rotations Rα(θ l,n ) = exp(−iσ α n θ l,n /2) at layer l and qubit n around axis α ∈ {x, y, z}. b) Two-qubit entangling gates w l considered are CNOT gates (control-σx), CPHASE gates (control-σz, diag(1, 1, 1, −1)) or √ iSWAP gates. c) Entangling layer W l is composed of the two-qubit entangling gates w l , which are arranged in either a nearest-neighbor one-dimensional chain topology (denoted as CHAIN), all-to-all connection (ALL) or in a alternating fashion (ALT) for even and odd layers l.
larger than the regular gradient. However, both suffer from the barren plateau problem. By tuning the PQCs parameters from zero to a random set of parameters, we find a region where both large gradients and large effective quantum dimension G C coexist, which could serve as a good set of initial parameters for the training of variational algorithms. Finally, as an application of our method we propose and apply an algorithm that prunes redundant parameters from PQCs, while keeping the parameter dimension constant. This algorithm helps us to find expressive PQCs with a reduced number of parameters to simplify training for variational algorithms as well as a reduced circuit depth to ease the impact of noise.
The paper is organized as follows. First, we define PQCs in Sec.II and the parameter dimension D C in Sec.III. Then, we introduce the effective quantum dimension G C in Sec.IV. Our results and the algorithm are presented in Sec.V, which are discussed in Sec.VI. We give an overview of the definitions of symbols in Tab.I.

II. PARAMETRIZED QUANTUM CIRCUITS
A PQC generates a quantum state of N qubits with the unitary U (θ), the M -dimensional parameter vector θ and product state |0 ⊗N as shown in Fig.1. The structure of the PQC influences its power to express quantum states [24,25,33]. One way to measure expressiveness is by determining the distance between the distribution of states generated by the circuit and the Haar random distribution of states [24,25]. This tells us how well the PQC can express arbitrary states across the Hilbert space. The appearance of barren plateaus or vanishing gradients is connected to the aforementioned measure [7,34]. The variance of the gradient var(∂ i E) = (∂ i E) 2 − ∂ i E 2 ( . denoting statistical average over many random instances) in respect to the expectation value of a Hamiltonian H (E = 0|U † (θ)HU (θ)|0 ) can vanish exponentially with the number of qubits for PQCs with a random choice of parameters. The variance decreases also with number of layers p of the PQC until a specific p r , where it remains constant upon further increase of p > p r . For local cost functions, it has been shown that in most cases low variance of the gradient of such PQCs correlates with high expressibility [34].

III. PARAMETER DIMENSION
We now introduce the parameter dimension D C of a PQC as another measure of capacity. As example, we take a PQC that can represent arbitrary N qubit quantum states which is parametrized by in total M = 2 N +1 parameters a, b where |j is the j-th computational basis state and a j , b j ∈ R.
One can map the above state to D C = 2 N +1 − 2 independent parameters, that lie on the surface of 2 N +1 − 1 dimensional sphere. Of the in total M parameters, the final 2 parameters are dependent and do not change the quantum state, as they correspond to the norm and global phase of the quantum state. Conversely, for a generic real-valued quantum state with b j = 0, we find D C = 2 N − 1 independent parameters, with 1 dependent parameter due to the norm of the realvalued quantum state. Analogous to the generic quantum state, we now define the parameter dimension D C for a PQC C as the number of independent parameters that the PQC can express in the space of quantum states. In general, D C for N qubits is upper bounded by the generic state Eq. (2) with D C ≤ 2 N +1 − 2. We define the redundancy which is the fraction of dependent parameters of the PQC that do not contribute to changing the quantum state. In the next section, we show how D C can be determined for hardware efficient PQCs.

IV. EFFECTIVE QUANTUM DIMENSION
Now, we explain how the QFI F(θ) quantifies the expressive power of a PQC (see Appendix B for an introduction to the QFI and QNG, and Appendix G on how to calculate it). One can relate F(θ) to the distance in the space of pure quantum states, which is given by the Fubini-Study distance Example to demonstrate the effective quantum dimension GC and parameter dimension DC for a single qubit parametrized as |ψ(θ, ϕ) = cos(θ/2)|0 + exp(iϕ) sin(θ/2)|1 . The quantum state is described by M = 2 parameters θ and ϕ. DC = 2 is the number of independent parameters of the quantum state. GC(θ, ϕ) denotes the number of independent directions the quantum state can change to by locally perturbing its parameters θ, ϕ. For a random state |v (θ / ∈ {0, π}) two possible directions exist, along v θ and vϕ. The particular state |w(θ = π, ϕ) can only be perturbed in direction w θ as adjusting ϕ does not change the state (e.g. |w(π, ϕ + ) = |w(π, ϕ) ), thus GC(θ = π, ϕ) = 1.
where Dist Q (x, y) = | x|y | 2 and the QFI [30,31] which corresponds to the real part of the quantum geometric tensor. F(θ) quantifies the change of the quantum state when adjusting its parameter θ infinitesimally to θ + dθ. The eigenvalue decomposition gives us V , which is a real-valued unitary with the ith eigenvector α (i) placed at the i-th column of V , and S, which is a diagonal matrix with the M non-negative eigenvalues λ (i) of F(θ) along the diagonal. The eigenvalues and eigenvectors obey the equation F(θ)α (i) = λ (i) α (i) . Inserting Eq. (6) into Eq. (4) gives us Now, we assume that the small variations in θ are in the direction of the i-th eigenvector of F(θ) with dθ = dµα (i) , where dµ is an infinitesimal scalar. We find , where e (i) is the ith basis vector. When updating θ = θ + dµα (i) , the quantum state changes at a rate that is proportional to λ (i) . Eigenvalues λ (i) = 0 are called singularities as there is no change in the quantum state at all, i.e. | ψ(θ)|ψ(θ + dµα (i) ) | = 1. The case λ (i) being very small, i.e. 1 λ (i) > 0, is called near singularity and is associated with plateaus in classical machine learning where training slows down [35].
We now define the effective quantum dimension G C (θ) for a PQC C as the rank of the QFI F(θ). It is given as the total number of non-zero eigenvalues where I(x) = 0 for x = 0 and I(x) = 1 for x = 0. G C (θ) is a local measure of expressiveness that counts the number of independent directions in the state space that can be accessed by an infinitesimal update of θ.
A straightforward example is a generic single qubit quantum state shown in Fig.2 |ψ(θ, ϕ) = cos The eigenvalues and eigenvectors of the QFI F are straightforward to calculate with λ 1 = 1, α 1 = {1, 0} and λ 2 = sin 2 (θ), α 2 = {0, 1}. The effective quantum dimension is G C (θ, ϕ) = D C = 2, except for the special case θ = nπ, n integer, wjere the eigenvalue is λ 2 = 0 and thus G C (nπ, ϕ) = 1. Here, any change in the direction of eigenvector α 2 (corresponding to changing ϕ) will not yield any change in the underlying quantum state. However note that except for these singular parameters we find G C = 2, which is equivalent to the maximal number of independent parameters D C of the system. As further example we consider the single qubit circuit with Pauli z matrix σ z and Hadamard gate H d Here, we find F = For the type of PQC as shown in Fig.1, which are arranged in a layer-wise structure with the parametrized gates being Pauli operators, the effective quantum dimension G C is equal or less than the parameter dimension D C , which in turn is equal or less than the number of parameters M Given the aforementioned PQC types with a random set of parameters θ random ∈ random(0, 2π), we find numeric Thus, we can calculate D C by determining G C (θ random ) for random sets of PQC parameters. The core intuition is that starting from a sufficiently random initial parameter set, a change of the PQC parameters in the right direction is able to bring one closer to any quantum state that can be expressed by the PQC. For specific choices of parameters such as θ = 0 we find G C < D C . Moving sufficiently away from these special points, we recover that We stress that Eq. (13) is not valid for arbitrary quantum circuits, e.g. circuits where the parameters do not enjoy a 2π periodicity. As simple example take the evolution of a single qubit with a single parameter t The evolution over all possible t (note the absence of 2π periodicity) will cover all possible quantum states and thus D C = 2, whereas the effective quantum dimension (with only a single parameter t) is G C = 1 < D C .
We now consider different types of hardware efficient PQC |ψ(θ) = U (θ)|0 ⊗N , which are circuits that can be efficiently run on NISQ quantum processors. We choose an initial state |0 ⊗N , followed by a single layer of the square root of the Hadamard gate ( √ H d ) on every qubit. Then, we repeat p layers composed of parametrized single qubit rotations and a set of two-qubit entangling gates (see Fig.1a). The single qubit rotations are either chosen randomly to be around the {x, y, z} axis, or fixed to a specific axis. The two-qubit entangling gates are either CNOT, CPHASE or √ iSWAP gates (see Fig.1b), that are common native gates in current quantum processors [36]. The entangling gates in each layer are arranged in either a nearest-neighbor chain topology (CHAIN), all-to-all connections (ALL) or in an alternating nearestneighbor fashion (ALT) (see Fig.1c). The numerical calculations are performed using Yao [37]. Histogram of logarithm of eigenvalues of Fisher information matrix F. The width of the distribution increases with p, with a pronounced tail at small F developing around p ≈ pc, which disappears for p > pc. e) Variance of the gradient var(∂ k E) and QNG var(F −1 ∂ k E) in respect to the Hamiltonian H = σ z 1 σ z 2 . The gradient decays until p ≈ 20, after which it remains constant. The QNG remains larger than the regular gradient, but decreases for p > pc. f ) Variance of gradients and QNG for varying qubit number N for depth p = 2N , showing approximate exponential decrease with N .

V. RESULTS
As a demonstration of our methods, we provide an indepth characterization of a PQC consisting of randomly chosen x, y, z rotations and CNOT gates in a chain topology as function of number of layers p in Fig.3. The parameter dimension D C (i.e. number of independent parameters of the quantum state that can be expressed by the PQC) increases linearly with p in Fig.3a, until it reaches the maximal possible value for D C = 2 N +1 − 2 at a characteristic number of layers p c . This point is re-flected in the spectrum of the QFI F, averaged over random instances of the PQC (see Fig.3b-d). Most notably, the variance of the logarithm of the non-zero eigenvalues reaches a maximum for p c (Fig.3b). Further, the minimum taken over all eigenvalues becomes minimal (Fig.3c). We can see this more clearly in the distribution of eigenvalues (Fig.3d). With increasing p, the distribution becomes broader, with a pronounced tail of small eigenvalues of F appearing close to the transition at p c . Above the transition p > p c , the small eigenvalues suddenly disappear from the distribution. We investigate the variance of the gradient and QNG in Fig.3e for the two-qubit Hamiltonian H = σ z 1 σ z 2 . The variance of the regular gradient decays with p, reaching a minimum around p ≈ 20 [7], upon which it remains constant. The variance of the QNG remains larger than the regular gradient, however the QNG decays for p > p c . In Fig.3f, we numerically find that variance of both regular gradient and QNG vanish exponentially with increasing number of qubits N , demonstrating the barren plateau problem. In the Appendix D, we show that the same result is found also for more complicated Hamiltonians such as the transverse Ising model.
In Fig.4, we compare different types of PQCs with different entangling gates and arrangements. We note that all circuits show the same qualitative behavior regarding the transition in the QFI (see Fig.3 and Appendix C) as well as suffer from exponential decrease of the variance of the gradient with increasing number of qubits. However, key differences in the different PQCs appear. We show the variance of the gradient for the Hamiltonian H = σ z 1 σ z 2 in Fig.4a,c,e for different arrangements of the entangling gates (CHAIN, ALL, ALT) as well as different types of entangling gates (CNOT, CPHASE, √ iSWAP). The variance decays with increasing p, until it reaches a constant level, the value of which is the same for all gates and arrangements. However, CPHASE requires the most layers p to converge, followed by √ iSWAP and CNOT. Fig.4b,d,f shows the redundancy R, which is the fraction of redundant parameters of the PQC. It quickly reaches a constant level with increasing p.
√ iSWAP has consistently low R, while for CNOT it varies depending on the arrangement of entangling gates. For CPHASE, we have consistently larger R. This can be easily understood when considering that z rotations commute with the entangling CPHASE layer. When two z rotations appear consecutively on the same qubit, they yield a redundant parameter. R for CNOT depends highly on the entangling gates arrangement.
We note that for these PQCs the number of layers p c at which the transition of the QFI occurs can be estimated from the value of redundancy R. We find p c ≈ (1 − R C )(2 N +1 − 2)/b, where R C is the value of R for sufficiently large p and b is the number of parameterized rotations per layer. The eigenvalue spectrum of these PQCs and further types of PQCs are discussed in Appendix C.
In Fig.5, we fix the single-qubit rotations around the y- axis and investigate different entangling gates arranged in a nearest-neighbor one-dimensional chain. Depending on the choice of entangling gates, we find that the variance of the gradient for H = σ z 1 σ z 2 decays to a different constant level with increasing p (see Fig.5a). y √ iSWAP matches the variance found in Fig.3e, whereas y CNOT and y CPHASE have higher variance. In Fig.5b we show the maximal D C for many layers p. D C scales exponentially for y CNOT (D C ∝ 2 N ) and y √ iSWAP (D C ∝ 2 N +1 ), whereas for y CPHASE we find numerically an approximate quadratic scaling D C ∝ N 2 .
In Fig.6 we show how G C and the variance of the gradient for H = σ z 1 σ z 2 changes when tuning the parameters of a PQC defined as U (aθ random )|0 , θ random ∈ [0, 2π), a ∈ [0, 1]. When adjusting a = 0 to a = 1, this corresponds to changing the PQC from parameters all zero to a PQC with random parameters. As example, we show a PQC consisting of layered randomly chosen sin-  gle qubit rotations around x,y,z axis and entangling gates arranged in a chain. In Fig.6a, we show G C for different types of entangling gates. G C increases with a, reaching the parameter dimension D C for a = 1. CNOT and √ iSWAP increase faster with a compared to the PQC with CPHASE gates. In Fig.6b, the variance of the gradient decreases sharply once a particular a is reached. Note that there is a specific range of parameters log 10 (a) ≈ −2.5 where the PQCs have nearly maximal G C and the variance of gradients remains large.
In Fig.7 we show the scaling of G C (θ = 0, N ) with number of qubits N for a PQC with entangling gates in a chain arrangement initialized with θ = 0, corresponding to the point a = 0 in Fig.6. Numerically, we find linear scaling of G C (θ = 0, N ) for CPHASE entangling gates, quadratic scaling for CNOT gates and higher order polynomial or even exponential scaling for √ iSWAP gates.
As an application, we propose Algorithm 1 to remove Effective quantum dimension GC(θ = 0) plotted against number of qubits N for a circuit consisting of randomly chosen parametrized rotations around x, y or z axis with parameters θ = 0, and two-qubit entangling gates arranged in a nearest-neighbor chain. We compare CNOT, CPHASE and √ iSWAP entangling gates. From numerical results, we find GC scales quadratically for CNOT gates, linearly for CPHASE gates and higher order polynomial or even exponential scaling for √ iSWAP. Number of layers p is chosen such that GC(θ = 0) is maximized. redundant parameters from a PQC C. The algorithm calculates the eigenvectors of the QFI with eigenvalue zero. Parameters which have a non-zero amplitude in the eigenvectors can potentially be removed from the PQC without changing its expressive power. The algorithm removes one redundant gate and removes the corresponding entry in the QFI, then re-calculates the eigenvectors of the QFI. These steps are repeated until no redundant gates are left. The resulting pruned PQC C pruned has as many parameters as the parameter dimension D C of the original PQC. We demonstrate our algorithm on the CPHASE-CHAIN PQC in Appendix F and find a substantial reduction of parameters without affecting D C .

Algorithm 1: Prune PQC of redundant parameters
Input : PQC C, QFI F C (θ random ), number of parameters N C param , DC < N C param , empty set Get eigenvalues λ (i) of F C (θ random ) sorted in ascending order, eigenvectors α (i) and rank r Pick largest index k such that β k = 0 5 Update F C (θ random ) by removing row k and column k 6 Add k to set K 7 while F C (θ random ) has non-zero eigenvalues; 8 Removing parameters corresponding to set K from C gives pruned PQC C pruned

VI. DISCUSSION
We investigated the capacity and trainability of hardware efficient PQCs using the quantum geometric structure of the parameter space. We introduced the notion of parameter dimension D C and effective quantum dimension G C which are global and local measures respectively of the space of quantum states that can be accessed by the PQC. Both can be derived from the QFI. We applied these concepts on PQCs composed of layers of singlequbit rotations and different types of entangling gates arranged in various geometries. For comparable circuit depth p, we find strong numerical evidence that PQCs constructed from CNOT or √ iSWAP gates have lower variance of the gradient, and thus higher expressibility compared to PQCs with CPHASE gates. While twoqubit gates such as CNOT and CPHASE gate can be expressed as each other by applying specific single-qubit rotations, the PQCs we use only have a limited amount of single qubit rotations and thus the choice and arrangement of two-qubit gates strongly affects the expressibility of the PQC. Without loosing generality, we study the properties of the variance of the gradient using a twoqubit Hamiltonian H = σ z 1 σ z 2 , where we take the variance over an ensemble of randomized PQCs. The variance of the gradient of a generic Hamiltonian H = i b i H i that consists of a polynomial number of Pauli operators H i shows the same exponential decay as the two-qubit Hamiltonian [7,8], which we demonstrate for a manybody Hamiltonian in the Appendix D.
For a specific type of PQC composed of y rotations and CPHASE gates, D C scales only quadratically with number of qubits, which may imply that this PQC can be efficiently simulated on classical computers. We find that the redundancy of parameters varies strongly depending on the configuration of the PQC as well as the type of gates.
The effective quantum dimension G C reveals the expressive power of a PQC by local variations around a specific parameter set. We find that depending on the entangling gates, G C shows widely different scaling with number of qubits, with the largest value found for √ iSWAP gates. While we only studied the case θ = 0, PQCs with correlated parameters could feature similar behavior [19]. Tuning the parameters of a PQC from zero to a random set of parameters yields a crossover from large gradients and small G C to vanishing gradients and large G C . For the PQCs investigated, we can find a range of parameters that combines large gradients with a nearly maximal G C , which could be an optimal starting point for gradient based optimisation. Trade-offs between the expressibility of a circuit and the magnitude of its gradients are a key challenge in finding good initialization strategies [34].
When increasing the number of layers p to a value p c , a transition occurs in the QFI when D C reaches its maximal possible value. The transition is characterized by a disappearance of small eigenvalues of the QFI and a peak in the variance of the logarithm of eigenvalues. This peak may be related to a transition in the optimization landscape of control theory. When the system becomes overparameterized with more parameters than degrees of freedom, the optimization landscape changes from being spin-glass like with many near-degenerate minima to one with many degenerate global minima [38,39]. This peak in the QFI could be used to identify the transition. The overparameterized regime may be useful for mitigating the effect of noise [40,41]. For deep circuits p > p c , the transition leads to a decay of the QNG as small eigenvalues are suppressed. For shallow circuits p < p c , the QNG can be orders of magnitude larger in value compared to the regular gradient, however our numerical results suggest that both regular gradient and QNG decrease exponentially with number of qubits. Thus, the QNG most likely cannot help to solve the barren plateau problem. This contrasts the natural gradient in classical machine learning, which is known to be able to overcome the plateau phenomena that leads to a slow down of optimization [35].
Imaginary-time evolution and variational quantum simulation use a matrix related to the QFI to update the parameters of the PQC [31,42]. The effective quantum dimension G C could give major insights on the convergence properties of these algorithms. Recent proposals for adaptively generated ansätze could benefit from the QFI by taking the geometry of the PQC into account when designing PQCs [21].
We demonstrated an algorithm to systematically reduce the number of parameters and depth of PQCs while keeping the parameter dimension constant. This algorithm can be immediately applied to PQCs used in VQAs to reduce the number of parameters M without sacrificing expressive power. Commonly used PQCs often contain more parameters than necessary. Removing them reduces the computational effort for calculating the gradient as well as the QFI necessary for the QNG, which has been shown to be highly beneficial for training [32,43]. When compared to ordinary gradient descent, the sampling overhead of training using the QFI and QNG is constant asymptotically for both an increasing number of iterations and number of qubits, as has been proven recently [43]. Furthermore, training with the QNG has a reduced total cost since it approaches the optimum faster [43]. Thus, NISQ algorithms that use the QFI to update their parameters accomplish faster training than ordinary gradient descent. Our algorithm reduces the cost of calculating QFI in each iteration of the training by truncating the size of the QFI. As the QFI is a matrix, removing a single parameter already reduces the number of elements to measure by M . Further, with our approach one can lower the number of parameterized gates needed to run the VQA, which is especially important for NISQ era algorithms.
The QFI has widespread use in quantum metrology [44] and quantum computing [31,45,46]. To facilitate its application, various methods to calculate the QFI on quantum computers have been developed and are continuously improved [46,47], which we review in Appendix G. The most commonly applied methods are the shift-rule [47,48], the Hadamard test [45,49,50] and direct measurement methods [51]. For these approaches, the number of circuits to measure scales as the square M 2 of the number of parameters (M ). Various approximations for the QFI have been proposed [31,[52][53][54]. Improved methods for numerical simulation of the QFI are being developed as well [55]. We provide code that can simulate the QFI for 26 qubits on a desktop computer [56]. We note that calculations relying on a reduced number of qubits or layers can help to design better PQCs. Most commonly used PQCs are constructed according to specific rules in a layer-wise fashion. By evaluating the effective dimension within smaller PQCs, one can identify rules and patterns for constructing PQCs with few redundant parameters. Then, one can extrapolate these rules to PQCs with many qubits and layers.
During the training of a PQC, the eigenvalue spectrum of the QFI can gain specific features, as has been shown for restricted Boltzmann machines [57]. We show that the PQCs have a characteristic eigenvalue spectra depending on the type of gates and their arrangement (see Appendix E). The eigenvalues hold important information about the trainability and generalization of a model. For example, a model that generalizes well is known to have a low effective dimension in classical machine learning [29]. It would be interesting to study in what way these statements translate to quantum machine learning. The eigenvalues of the Hessian could be applied as well [58]. Further, connections to complementary measures of capacity based on classical Fisher information [59] and memory capacity [60] respectively could be explored.
While we studied hardware efficient PQCs, some of our results can be carried over to other types of PQCs. The transition in the QFI spectrum we observed could be used to characterize when a PQC is overparameterized. Further, G C (θ) can be used to determine the amount of quantum states that can be reached by varying the parameters of PQCs. It would be straightforward to extend our concepts to evaluate the capacity and trainability of noisy PQCs [61], convolutional PQCs [62], optimal control [63], quantum metrology [64] and programmable analog quantum simulators [65].
Python and Julia code for the numerical calculations performed in this work are available at [56]. We find that the variance of the gradient divided by the number of terms in the Hamiltonian has nearly the same value for both the two-qubit Hamiltonian and the transverse Ising Hamiltonian. As we take the variance over an ensemble of randomized PQCs, it does not matter which Pauli operator we use to calculate the variance.   Fig.4 in main text. a) nearest-neighbor chain arrangement of entangling gates b) all-to-all connectivity c) alternating nearest-neighbor. All graphs for N = 10 qubits and number of layers p = 50. to circuits with general generators G [47,68,69]. The shift-rule for the QFI takes the following form [47,48] F ij (θ) = − 1 8 [| ψ(θ|ψ(θ + (e i + e j )π/2 | 2 + − | ψ(θ|ψ(θ + (e i − e j )π/2 | 2 − | ψ(θ|ψ(θ + (−e i + e j )π/2 | 2 + | ψ(θ|ψ(θ − (e i + e j )π/2 where e i is the basis vector for i-th index of parameter θ. The diagonal elements of the QFI simplify to