Effects of noise on the overparametrization of quantum neural networks

Overparametrization is one of the most surprising and notorious phenomena in machine learning. Recently, there have been several efforts to study if, and how, Quantum Neural Networks (QNNs) acting in the absence of hardware noise can be overparametrized. In particular, it has been proposed that a QNN can be defined as overparametrized if it has enough parameters to explore all available directions in state space. That is, if the rank of the Quantum Fisher Information Matrix (QFIM) for the QNN’s output state is saturated. Here, we explore how the presence of noise affects the overparametrization phenomenon. Our results show that noise can “turn on” previously-zero eigenvalues of the QFIM. This enables the parametrized state to explore directions that were otherwise inaccessible, thus potentially turning an overparametrized QNN into an underparametrized one. For small noise levels, the QNN is quasi-overparametrized, as large eigenvalues coexists with small ones. Then, we prove that as the magnitude of noise increases all the eigenvalues of the QFIM become exponentially suppressed, indicating that the state becomes insensitive to any change in the parameters. As such, there is a pull-and-tug effect where noise can enable new directions, but also suppress the sensitivity to parameter updates. Finally, our results imply that current QNN capacity measures are ill-defined when hardware noise is present.


I. INTRODUCTION
Overparametrization has become one of the most important concepts for studying neural networks in classical machine learning.When a neural network is overparametrized, it has a capacity which is larger than the number of training points [1].Despite being initially counterintuitive, as increasing the number of parameters can lead to overfitting, research has shown that overparametrization can actually improve the performance of a model [1][2][3][4].For example, it has been observed that the generalization error can decrease when the model size is increased, a phenomenon known as double descent [5][6][7].Additionally, overparametrization can provide convergence guarantees, ensuring that a model will be able to find a good solution during its optimization [8,9].These benefits make overparametrization an important consideration in the design of classical machine learning algorithms.
In the past few years, there has been a significant amount of effort towards merging concepts from classical machine learning with those of quantum computing, leading to the blossoming field of Quantum Machine Learning (QML) [10][11][12][13].The key idea here is that one can leverage * cerezo@lanl.govthe exponentially large dimension of the Hilbert space as a feature space to process and learn from data.Crucially, there is hope that QML has the potential of enabling a quantum advantage in the near-term [14,15].
In this work we study how the recently developed understanding of overparametrization in QNNs [21][22][23][24][25][26] is affected by the presence of quantum noise.In particular, we will review the results of Ref. [21], which characterizes the critical number of parameters needed to overparametrize a QNN.It has been observed that underparametrized QNNs exhibit spurious local minima in the optimization landscape that hinder their trainability.By adding enough parameters to the circuit (hence overparametrizing it), these false local traps disappear.Since the previous facilitates the QNN's parameter training, the overparametrization onset corresponds to a veritable computational phase transition.Notably, in Ref. [21] the number of parameters needed to overparametrize a QNN is defined as those needed to saturate the rank of the Quantum Fisher Information Matrix (QFIM), and concomitantly the QNN's capacity, as introduced in [27,28].
Our results show that the presence of hardware noise can increase the rank of a QFIM whose rank would have been saturated in a noiseless scenario.That is, noise can turn null eigenvalues of a noiseless-state QFIM into nonnull eigenvalues of the corresponding noisy-state QFIM.As schematically depicted in Fig. 1, this means that hardware noise allows the QNN to explore previously unavailable directions.Hence, some of the redundant parameters in an overparametrized noiseless QNN become relevant to control trajectories in state space when the effect of noise is accounted for.As such, noise can potentially render an overparametrized model into an underparametrized one.In addition, we analytically prove that as the noise strength (or depth of the circuit) increases, the eigenvalues of the QFIM become exponentially suppressed.Thus, for large noise levels (or for deep QNNs), the states become insensitive to any change in the parameters.On the positive side, our numerics show that for small noise levels, the model behaves as being quasi-overparametrized : Large eigenvalues of the QFIM (the ones that are non-zero in the noiseless setting) coexists with small ones (the ones that were previously zero).Additionally, we prove that certain types of noise, specifically global depolarizing noise or measurement noise [42,47], cannot increase the rank of the QFIM.To conclude, we discuss the implications of our results to QNN capacity measures proposed in the literature, and to other fields such as quantum metrology.

A. Quantum Neural Networks
In this work we consider a QML task where the goal is to train a model on a dataset S = {ρ (s) } N s=1 consisting of n-qubit quantum states.We use d to denote the dimension of the composite quantum system, i.e., d = 2 n .The quantum model is parametrized through a QNN, which is a unitary quantum channel C θ acting on input states ρ (s) as C θ (ρ (s) ) = U (θ)ρ (s) U † (θ).Here, U (θ) is taken to be of the form Figure 1.Schematic diagram of our main results.Consider the task of implementing a QNN, i.e., a parametrized unitary channel on a quantum computer.As shown in [21], the overparametrization phenomenon is defined as the QNN having enough parameters to explore all relevant directions in state space.a) For certain ansatzes the QNN can be efficiently overparametrized with few parameters, as there only exists a small number of available directions in state space.Moreover, for (most) such directions, changes in the parameter values usually translate into changes in state space.b) When the quantum device is faulty, quantum noise will act throughout the computation.In this work, we explore how hardware noise modifies the overparametrization phenomenon.Our results show that quantum noise can enable additional directions in state space.However, we also find that as the noise probability increases, the system becomes more and more insensitive to variations in the parameters.
where H m are traceless Hermitian operators taken from a set of generators G, and θ ∈ R M is a vector of trainable parameters.The previous allows us to express C θ as a concatenation of M unitary channels with Thus, the output of the QNN is a parametrized state The variational parameters θ are trained by minimizing an appropriately chosen loss function L(θ) which we consider to be of the form where O s and f s are respectively a (potentially) datainstance-dependent measurement, and a post-processing function.
While there are many aspects that define and distinguish a given QNN from another, we note that one of the most important is the choice of generators G from which the QNN in Eq. ( 1) is built.Once G is determined, the next aspect that defines a QNN is its depth, or equivalently, the number of parameters M .In particular, one wants to choose G and M such that there exist parameters values for which the task at hand is solved.While in this work we will not discuss how to appropriately choose G (we instead refer the reader to [17,31]), let us consider the effect of increasing the value of M .In a nutshell, adding more parameters to a QNN increases its expressibility (up to a certain point) [31,48,49], meaning that the QNN can generate a wider breadth of unitaries.From a practical stand-point, adding new parameters can potentially enable new directions in the state space1 , and concomitantly in the loss functions landscape.This can improve the trainability of the model by removing spurious local minima, and increasing the dimension of the solution manifold [21,51].In the following subsection we will see that the overparametrization phenomenon is indeed linked to the number of independent directions that are accessible in state space.

B. Dynamical Lie algebra, quantum Fisher information, and overparametrization
We will briefly recall here the main results in [21].We will begin by defining the Dynamical Lie Algebra (DLA) of a QNN [52,53], which can be used to characterize the group of unitaries that it can be implemented [31,48,49].It follows that the DLA also determines the manifold of all reachable states by the QNN.This will allow us to interpret the overparametrization regime as that in which the QNN has enough parameters to explore all accessible directions in said manifold.In particular, we will show that the rank of the quantum Fisher information matrix can be used to detect the onset of overparametrization.
Definition 1 (Dynamical Lie Algebra).Given a set of Hermitian generators G, the dynamical Lie algebra g is the subspace of operator space spanned by the repeated nested commutators of the elements in iG.That is where ⟨iG⟩ Lie denotes the Lie closure of iG.
The DLA contains information about the ultimate expressiveness of the QNN, since the group of reachable unitaries obtained for any possible parameter values θ ∈ R M (for an arbitrary large number of parameters M ) is obtained from the DLA via exponentiation, i.e., as {U (θ)} θ = G = e g ⊆ SU(d).We remark that G is known as the dynamical Lie group.Moreover, the manifold of states obtained from the action of the QNN on an input state ρ, given by {U ρU † , U ∈ G}, is known as the orbit of ρ under G.
From here, we can ask: By varying the parameters in the QNN, can we explore all accessible directions in the orbits of the input states, i.e., are we in the overparametrized regime?To answer this question, let us assume for now that the dataset consist of a single parametrized pure state |ψ⟩.One can study the action of the QNN on |ψ⟩ via the Quantum Fisher Information Matrix (QFIM).To define the QFIM, start by considering a distance measure D between two pure states.In particular, we take D to be the infidelity, i.e., D(|ψ⟩ , |ϕ⟩) = 1 − |⟨ψ|ϕ⟩| 2 .Then, given a set of parameters θ and an infinitesimal perturbation δ, an expansion to second-order of D between the quantum states |ψ(θ)⟩ = U (θ) |ψ⟩ and |ψ(θ + δ)⟩ = U (θ + δ) |ψ⟩ gives the Fubini-Study metric [54,55], i.e., Here, F (|ψ(θ)⟩) is the QFIM for the state |ψ(θ)⟩, an M ×M matrix whose elements are given by [56] [ where As shown in Fig. 2, the eigenvalues and eigenvectors of the QFIM provide valuable geometrical information regarding how changes in the parameters translate into changes in the state.Crucially, the rank of the QFIM quantifies the number of independent directions in state space that can be explored by making infinitesimal changes in θ.
As such, one can determine if the QNN is overparametrized by checking if it has enough parameters so that the QFIM saturates its maximum achievable rank.Let U (θ) ∈ e g be a parametrized unitary and |ψ⟩ some pure state, so that |ψ(θ)⟩ = U (θ) |ψ⟩.Here we schematically show that the eigenvalues and eigenvectors of the QFIM, F (|ψ(θ)⟩), inform how changes in parameter space translate into changes in state space.In particular, when modifying θ following the eigenvectors of the QFIM, the state |ψ(θ)⟩ explores the corresponding available direction in the tangent space g |ψ(θ)⟩.Additionally, the magnitude of the QFIM eigenvalues determines the sensitivity of the state to a change along an eigenvector direction [55].As such, a large eigenvalue means that it is "easy" to nudge the state in state space, while small eigenvalues indicate that the state is insensitive to parameter changes in the direction of the associated eigenvector.

Definition 2 (Overparametrization).
A QNN is said to be overparametrized if the number of parameters M is such that the QFIM saturates its achievable rank R at least in one point of the loss landscape.That is, if increasing the number of parameters past some minimal (critical) value M c does not further increase the rank of the QFIM, i.e., max The main result of Ref. [21] is that M c is directly linked to the dimension of g (the circuit's DLA).In particular, the rank of the QFIM is upper bounded by dim(g).Hence, one can potentially reach the overparametrization regime if the QNN has ∼ dim(g) parameters.Clearly, if dim(g) ∈ Ω(exp(n)) (as is the case of controllable unitaries) then one cannot efficiently overparametrize the QNN.More interesting, however, are the cases when dim(g) ∈ O(poly(n)), such as those arising in [31,57].We remark that while we have defined overparametrization as the regime where the number of parameters is such that the QFIM saturates its achievable rank in at least one point in the landscape, in practice one finds that the rank saturates simultaneously throughout most of the landscape (which is what brings about a computational phase transition) [21].
To finish, we note that in Ref. [21] it was also shown that Definition 2 has operational meaning in terms of the capacity of the QNN [27,28].We recall that the capacity (or power) of a QNN is used to quantify the breadth of functions that it can capture [58].For instance, let us consider the capacity measure of [28], which defines the effective quantum dimension of a QNN as Here λ m (θ) are the eigenvalues of the QFIM for the state |ψ(θ)⟩, and Z(x) is a function such that Z(x) = 0 for x = 0, and Z(x) = 1 for x ̸ = 0.Moreover, the expectation value is taken over the probability distribution from which states are sampled from S. It is straightforward to see that for a single-state dataset, An alternative definition for the capacity of a QNN can be found in [27].In the limit of large datasets, i.e., when |S| → ∞, the effective quantum dimension of [27] converges to where I(|ψ(θ)⟩) is the classical Fisher Information matrix, defined as Here, p(|ψ⟩ , y; θ), describes the joint relationship between an input |ψ⟩ and an output y of the QNN.In addition, the expectation value is taken over the probability distribution that samples input states from the dataset.As shown in [21] the model's capacity, as quantified by the effective dimensions of Eqs. ( 9) or (10), is upper bounded as Moreover, one can show that when the QNN is overparametrized, D 1 (θ) achieves its maximum value.This shows that overparametrizing a QNN is equivalent to saturating its capacity.

C. Quantum noise preliminaries
Quantum noise refers to the uncontrolled errors that occur when implementing a QNN on quantum hardware.Such errors may arise from a wide variety of sources, such as imperfections when implementing gates or when performing measurements, undesired qubit-qubit couplings or unwanted interactions between the qubits and their environment.
In this work, we model the action of the hardware noise present throughout a QNN by considering that noise channels N m act before and after each unitary U m (θ m ) (see Fig. 3).Here, we recall the definition of a unital Pauli channel.
Definition 3 (Unital Pauli channel).A unital Pauli channel is a CPTP map N whose action on an operator ρ is given by where {p αβ } is a probability distribution (i.e., p αβ ⩾ 0 and In other words, a unital (identity preserving) Pauli channel consists of Pauli operators applied randomly according to a certain probability distribution.It is easy to see that it is diagonal in the Pauli basis.That is, its action maps a Pauli operator where we used the following properties, together with the fact that the square of a Pauli operator is equal to the identity.Note that c 00 = 1 implies that a unital Pauli noise channel maps the identity operator onto itself, which is a necessary and sufficient condition for a diagonal superoperator to be trace preserving.In what follows we will assume that c α ′ β ′ ∈ (−1, 1) for all α ′ and β ′ (this is necessary for Lemma 2 in Appendix E to hold, i.e., for the identity operator to be the only fixed point of the noisy channel).Pauli unital noise includes, as a special case, local depolarizing noise, which acts on each qubit j ∈ Here, 0 < p ⩽ 1 denotes the probability of depolarization, and X j , Y j and Z j are Pauli operators acting on the jth qubit.Moreover, Tr j indicates the partial trace over qubit j.Similarly, we can construct an n-qubit channel consisting of a local depolarizing channel acting on each qubit as or the global depolarizing channel, whose action is where Other examples of Pauli noise channels include bit-and phase-flip channels, as well as T 2 processes (i.e., the dephasing channel is a unital Pauli channel).
As shown in Fig. 3, in the presence of quantum noise the action of the QNN is modeled by noise channels interleaved with the unitary channels.Hence, the output of the noisy QNN is given by for some (potentially layer-dependent) noise channels N m , with m = 1, . . ., M + 1.

D. Mixed-state quantum Fisher information matrix
As previously discussed, in the presence of noise the quantum states evolving through the circuit become mixed, and we must extend the formula of the QFIM in Eq. ( 7) to account for this.Following the same program that led to the QFIM for pure states, we can define a mixed-state QFIM as an expansion of the Bures distance, which is a measure of distinguishability between mixed states.The Bures distance is defined as where Let ρ θ be a parametrized mixed state.And let its spectral decomposition be where {r µ } d µ=1 are the eigenvalues of ρ θ (such that r µ ⩾ 0 for all µ, and µ r µ = 1), and {|r µ ⟩} d µ=1 are the associated eigenvectors 2 .Then, a second order expansion of the Bures distance between ρ θ and ρ θ+δ leads to the mixed state QFIM (which reduces to Eq. ( 7) when ρ θ is pure), whose entries are [55,61] Here we recall a few properties of the QFIM which will be used below, and we refer the reader to [61] for their proof.
1. F is a symmetric matrix: 3. F is convex: For any pair of states ρ θ and σ θ and for 0 ⩽ q ⩽ 1 we have 4. F is invariant under unitary transformations: 5. F is non-increasing under quantum channels: If Φ is a quantum channel, then F (Φ(ρ θ )) ⩽ F (ρ θ ).

III. RESULTS
The previous section reviewed the results of Ref. [21], which analyzed the overparametrization phenomenon when no hardware noise is present.However, in a realistic scenario where the QNN is implemented on a near-term quantum device [62] we can expect that quantum noise will act throughout the circuit.Therefore, in what follows we set out to study how the results of Ref. [21] change when noise is considered.For simplicity, we will study the case when the dataset is composed of a single mixed state S = {ρ}.The extension of our results to multi-state datasets is straightforward, as one simply needs to follow the approach taken in [21].

A. Single-qubit toy model
We start with a simple toy model that will help us gather intuition on the effects that noise may have on the rank and the eigenvalues of the QFIM, and hence on the QNNs' overparametrization.As we will show, we can expect that presence of quantum noise will generally: i) Increase the rank of the QFIM, and ii) Decrease the overall magnitude of the QFIM eigenvalues.To illustrate these two phenomena, we consider a simple single-qubit model undergoing bit-flip noise.The setup is as follows.First, we initialize the state of the single qubit to ρ = 0.9 |+⟩⟨+| + 0.1 We choose a full rank state to avoid issues in the QFIM (namely, discontinuities in its entries) arising from a change in the rank of the state [63,64].Then, this state is sent through a circuit composed of four single qubit rotations This setup is depicted in Fig. 4(a, top).In channel notation, this QNN is expressed as the concatenation of four unitary channels where C Z θ (ρ) = e −iθZ/2 ρe iθZ/2 , and analogously for C X θ (ρ).The generators of the QNN are the Pauli matrices G = {X, Z}, and it is straightforward to check that the DLA is simply g = span{iX, iY, iZ} ∼ = su(2), meaning that the QNN is universal or controllable [31,52].Moreover, we can see that the maximum possible rank of the QFIM is max θ (rank [F (ρ θ )]) = 2, as the state lives on a twodimensional shell inside of the Bloch sphere.As such, it  24) is sent through a noiseless QNN with four parameters as in Eq. ( 26).We plot in the Bloch sphere the three trajectories defined by θ1, θ2 and θ3.b) We consider the case where the single qubit state of Eq. ( 24) is sent through a noisy QNN with four parameters, as in Eq. ( 28).Here, bit-flip noise channels act before and after every gate with probability p = 0.1.We plot in the Bloch sphere the three trajectories defined by θ1, θ2 and θ3.The action of the unitary gates is marked in blue, whereas the action of the noise channels is marked in red.
is clear that the QNN is already overparametrized, since the maximum attainable rank of the QFIM is smaller than the number of parameters.To exemplify how noise affects the QNN, we evaluate the QFIM at three different sets of parameter values, }, leading to rank [F (ρ θ3 )] = 2.The (noiseless) trajectories corresponding to these choices are presented in Fig. 4(a, bottom).While the rank of the QFIM is indeed saturated at θ 2 and θ 3 , for θ 1 we have rank [F (ρ θ1 )] = 1 (this follows from R z (0)ρR z (0 We have thus added this example to showcase the important role that the interplay between the initial state and the QNN parameters has in determining the rank of the QFIM.Now, let us consider the case where bit-flip noise channels act before and after every unitary gate in the circuit (see Fig. 4(b, top) for a schematic portrayal of the setup).For convenience, we recall that the bit-flip channel is a special case of Pauli noise of the form such that the noisy QNN channel becomes ) A direct evaluation of the QFIM rank at the sets of parameter values previously considered reveals that  24) is sent through a noiseless QNN as in Eq. ( 26), with parameters θ3.Here we show within the Bloch sphere how the state ρ θ changes when the parameters are varied following the directions given by three eigenvectors of the QFIM F (ρ θ ).Two such directions are associated with the two non-zero eigenvalues (blue and red curves) and with a zero eigenvalue (green, non-visible, curve).b) We consider that the single qubit state of Eq. ( 24) is sent through a noisy QNN as in Eq. ( 28), with parameters θ3.Here we show within the Bloch sphere how the state ρ θ changes when the parameters are varied following the directions given by the three eigenvectors of the QFIM F ( ρ θ ) with associated non-zero eigenvalues (blue, red, and green curves). 1.
The trajectories defined by these rotations are presented in Fig. 4(b, bottom), for a value of p = 0.1.Here we can see that for θ 1 and θ 2 the rank of the QFIM is not increased.While it is obvious that for θ 1 the noise does not change the output state of the QNN (as ρ is a fixed point of the noise model), for θ 2 the noise channels do change the output state of the QNN.Notably, we can see that all the noise channels are effectively applied at the end of the parametrized evolution in both cases.As we will show below, this implies that they cannot change the rank of Here, ρ is a single-qubit input state to a noisy QNN.Then, let ρ θ , ρ θ+δ 1 and ρ θ+δ 2 be the output states when the QNN parameters are θ, θ + δ1 and θ + δ2, respectively.As schematically shown, the parameters θ + δ1 (θ + δ2) lead to the output state ρ θ+δ 1 (ρ θ+δ 2 ) having more (less) purity than ρ θ , as the output state is farther (closer) to the center of the Bloch sphere.Note that in all cases, the output states ρ θ , ρ θ+δ 1 and ρ θ+δ 2 are less pure than the input state ρ due to the presence of noise.
the QFIM (see Theorem 1).Finally, for θ 3 the noise does increases the rank of the QFIM from two to three.Here, the rank of the QFIM is maximal, which follows from the state evolving in the three-dimensional Bloch sphere.
The fact that for θ 3 the rank of the QFIM is increased indicates that the presence of noise enables a new direction in state space.As shown in Fig. 5(a), in the absence of noise (and hence when the rank of the QFIM is two), there are only two available directions in state space.These directions are depicted as blue and red lines corresponding to the trajectories followed by the state when the parameters are changed along the directions dictated by the eigenvectors of the QFIM with non-zero eigenvalues.Since the channel is unitary, these trajectories lie on the surface of a (fixedpurity) shell of the Bloch sphere.We have also verified that, as expected, the state remains unchanged when the parameters are varied along a direction corresponding to an eigenvector associated to a null eigenvalue of the QFIM (although the previous cannot be visualized in the plot because the initial and final state of the evolution are the same).
On the contrary, as shown in Fig. 5(b), when noise acts throughout the circuit (and hence when the rank of the QFIM is three), there are three available directions in state space.Here, the red, blue and green curves correspond to the trajectories that the state follows when changing the parameters along the directions given by the three eigenvectors of the QFIM with associated non-zero eigenvalue.Crucially, we can now see that there exists a direction (the blue curve) that preserves the purity of the quantum state.Here we consider the case where the noisy QNN from Eq. ( 28) acts on the initial single-qubit state of Eq. ( 24).We plot the magnitude of the eigenvalues of the QFIM for the single-qubit toy model versus the probability of bit-flip error p.The top, middle and bottom panels respectively correspond to the parameter values θ1, θ2, θ3.
The other two directions, however, can both increase and decrease the purity of the output state.This is evidenced from the fact that the trajectories in the state space move inwards and outwards from the fixed-purity shell in the Bloch sphere.
We find it important to remark that while some of the directions in state space can change the purity of the state ρ θ , this does not imply that the QNN is purifying the state.As shown in Fig. 5(b), by perturbing the parameters θ along a direction δ one can move the final state inwards or outwards in the Bloch sphere.That is, we can decrease or increase the purity of ρ θ+δ with respect to that of ρ θ , see Fig. 6.However, this does not imply that ρ θ+δ has less entropy than the initial state ρ.In other words, changing the variational parameters by δ implies preparing again the initial state ρ and applying C θ+δ , not evolving from ρ θ to ρ θ+δ .Physically, we can interpret the previous as saying that the state evolving under the noisy QNN C θ+δ is less sensitive to noise than that evolving under C θ , and hence its purity gets less degraded by noise.
Next, let us evaluate how noise affects the magnitude of the QFIM eigenvalues.In Fig. 7 we plot the eigenvalues of the QFIM versus the probability of a bit-flip error p.In this case, it is manifest that for all the parameter values considered above, the magnitude of the non-zero eigenvalues decreases with p. Crucially, the previous holds not only for the eigenvalues of the QFIM that were non-zero in the noiseless setting, but also for those that the noise "turns on".This result indicates that the state's sensitivity to parameter changes decreases with increasing noise levels.As we will prove below (see Theorems 3 and 4), this is a general consequence of the presence of noise in a QNN.

B. Global depolarizing noise
Here we study the overparametrization of general QNNs acting on an n-qubit state under a simple noise model: Global depolarizing noise (see Eq. ( 17)).We henceforth assume that global depolarizing noise channels act before and after every unitary channel in the QNN with the same probability p.That is, we consider the case when We remark here that since the noise channel N Depol acts before the first parametrized gate in the circuit, C 1 θ1 , we avoid the change of rank in the quantum state (from a pure state to a full-rank state) that would occur otherwise.It is not hard to see that the action of the global depolarizing noise channels can be commuted through to the end of the circuit, so that Here, we have defined This shows that we can express the output state of the QNN as More generally, we note that the previous result can be extended to the case when global depolarizing noise channels with different layer-dependent probabilities p m act in between the gates of the circuit.In particular, one now finds Next, we study how the rank of the QFIM and the magnitude of its eigenvalues change due to the presence of global depolarizing noise.In this context, we find it convenient to first present a useful theorem.
Theorem 1.Consider the case when a single noise channel acts at the end of the QNN as The rank of the QFIM cannot be increased by the action of the noise.That is where ρ θ and ρ θ respectively denote the output states of the noiseless and noisy QNNs (see Eqs. (3) and ( 18)).
We refer the reader to Appendix A for a proof of Theorem 1.The key implication of this theorem is that if the noise acts exclusively at the end of the circuit, then the rank of the QFIM cannot be increased.Hence, it follows that one can overparametrize the noisy QNN, C θ , with the same number of parameters needed to overparametrize the noiseless one, C θ .
Using Theorem 1 we can readily prove the following result for the case of global depolarizing noise.
Theorem 2. When global depolarizing channels act before and after every unitary in the QNN, then the rank of the QFIM cannot be increased by the action of the noise.
The proof of this theorem can be found in Appendix B. Theorem 2 shows that the presence of global depolarizing channels cannot increase the number of available directions in state space, as the rank of the QFIM is non-increasing.Again, this means that one can overparametrize the noisy QNN with the same number of parameters needed to overparametrize the noiseless QNN, when global depolarizing noise acts throughout the circuit.
In addition, we can also analyze how the eigenvalues of the QFIM change due to the presence of the global depolarizing channels.Here, the following theorem holds.Theorem 3. When global depolarizing channels act before and after every unitary in the QNN, the entries of the QFIM (and therefore its eigenvalues) satisfy i.e., they become exponentially suppressed with the product of the number of gates M and the probability of depolarization p.
This theorem is proven in Appendix C. Theorem 3 indicates that while global depolarizing noise cannot increase the number of available directions in state space, it does suppress the sensitivity of the state to any variations in the parameters.This result can be used to further understand the so-called noise-induced barren plateau phenomenon [40,41] whereby noise erases all the features in the QML model's training landscape.For instance, let us note that given a linear loss function (i.e., f s (x) = x in Eq. ( 4)), one has Equation (36) shows that the optimization landscape becomes exponentially flat with M and p (hence a noiseinduced barren plateau).As such, our results show that noise-induced insensitivities arise already at the level of state space, thus providing a more fundamental understanding of the noise-induced barren plateau phenomenon.

C. Local depolarizing plus unital Pauli channels
In this section we show that some of the intuition gathered from the previous sections can be extended to more general Pauli noise models.Namely, we here consider how noise affects the QFIM for a QNN acting on n-qubits when a fairly general Pauli noise acts.As we will show, the entries of the QFIM, and concomitantly its eigenvalues, get exponentially suppressed with the product of the number of noise channels and the noise probability.We note that here we will not attempt to prove that, on average, noise increases the rank of the QFIM.This is due to the fact that there can exist special types of noise and parameter values for which the rank is not increased (see Sections III A and III B).Because of these subtleties, we will leave a more detailed rank analysis for future work.
In what follows we will consider a general noise model where noise channels are interleaved with the unitary channels of the QNN as for some (potentially layer dependent) noise channels N m , m = 1, . . ., M + 1.Again, we note that N 1 acts before C 1 θ1 , which avoids the state changing from pure to mixed after the first parametrized gate.Moreover, as shown in Fig. 8, we will assume that each noise channel is composed of local depolarizing noise channels acting on each qubit plus some general unital Pauli noise.That is, where N P m (ρ) is an arbitrary unital Pauli quantum channel and N Dep loc is a product of local depolarizing channels as given by Eq. ( 16).For simplicity, we will assume that all local depolarizing noise channels have the same probability p.In Appendix F we show how our results can be generalized to the case where they have different (qubit-and layer-dependent) probabilities.Moreover, we note that the order in which N Dep loc (ρ) and N P m (ρ) act in Eq. ( 38) will be irrelevant for our purposes, as our results can also be shown to hold when the order is reversed (see Appendix F).
For this noise model, we prove the following theorem.
Theorem 4. Let C θ be a noisy channel as in Eqs.(37) and (38), where a Pauli noise channel (composed of a local depolarizing noise acting on each qubit plus a general unital Pauli channel) acts before and after each gate of the QNN.Furthermore, let p be the probability of the local depolarizing channels as in Eq. (15).The entries of the QFIM, and thus its eigenvalues, are exponentially suppressed with the product of M and p as O(e −2p(M +1) ).
See Appendix D for a proof of Theorem 4. This theorem states that under very general noise models, the entries of the QFIM and its eigenvalues vanish exponentially with the noise probability and the number of gates.Crucially, we will have that irrespective of whether the rank of the QFIM is increased or not by the noise, if the circuit is too deep (large M ), or if the noise levels are too high (large p), the state becomes insensitive to parameter changes.Similarly to Theorem 3, this result sheds new light into the noiseinduced barren plateau phenomenon [40,41].Here we consider a problem where an n = 10 qubit state is sent through an HVA quantum circuit as in Eq. ( 40), with L = 20 (i.e., with M = 40 parameters).In the simulations, global depolarizing noise acts on all qubits before and after each gate.We show the magnitude of the m-th eigenvalue of the QFIM for different noise values p, at a random point in the landscape.The inset shows the scaling of two non-null eigenvalues with p.

IV. NUMERICAL RESULTS
In this section we present numerical results that extend and complement our theoretical findings.All the simulations presented here have been performed in double precision with the open-source library qibo [65], using the fast qibojit backend [66].The simulations have been carried out on CPUs, namely IntelCore i7-9750H and AMD Ryzen Threadripper PRO 3955WX cores.
In particular, we consider the problem where the QNN is given by a Hamiltonian Variational Ansatz (HVA) [22,67] with generators inspired by the transverse-field Ising model with periodic boundary conditions.That is, we have G = {H 0 , H 1 }, with and Here, the action of the noiseless QNN U (θ) is given by where L is the number of layers.Thus, the QNN has M = 2L parameters.We have fixed the initial state of the Here we consider a problem where an n = 10 qubit state is sent through an HVA quantum circuit as in Eq. ( 40), with L = 20 (i.e., with M = 40 parameters).In the simulations, local depolarizing noise channels act with the same probability p on all qubits before and after each gate.We show the magnitude of the m-th eigenvalue of the QFIM for different noise values p, at a random point in the landscape.
QNN to be the state |+⟩ ⊗n .As shown in [21], the DLA associated with this ansatz has dimension dim(g) = 3  2 n, meaning that the QNN can be overparametrized with only a polynomial (linear) number of parameters (or layers).In what follows, we will study how the presence of noise affects the QFIM.In all cases, the computations have been carried out at random points in parameter space.
First, in order to validate Theorem 2 we have simulated the action of global depolarizing channels acting before and after each gate.The results are depicted in Fig. 9, where the eigenspectrum of the QFIM is plotted for n = 10 qubits, M = 40 parameters, and different noise probabilities.Here we see that the rank of the QFIM is unaffected by global depolarizing noise as indicated by Theorem 2. These numerical results also allow us to verify the exponential decrease of the QFIM eigenvalues with the probability of the depolarizing noise, predicted by Theorem 3 (see inset in Fig. 9).
Second, we have simulated the case where local depolarizing channels act on each qubit before and after every gate in the circuit.The results are shown in Fig. 10, where we plot the eigenspectrum of the QFIM for n = 10 qubits, M = 40 parameters, and different noise probabilities.In contrast to global depolarizing channels, local noise does increase the rank of the QFIM.This can be observed in the plot from the fact that as soon as p is larger than zero, all the eigenvalues of the QFIM become non-null (as opposed to the noiseless case).As already discussed, this implies Here we consider a problem where an n = 10 qubit state is sent through an HVA quantum circuit as in Eq. ( 40), with fixed parameter values for each (L, p).In the simulations, local depolarizing noise channels act with the same probability p on all qubits before and after each gate.We show the average magnitude of the entries of the QFIM and its eigenvalues for different a) number of layers L and b) noise values p. Bars depict the standard deviation across the different entries (or eigenvalues) of the QFIM.
that noise enables new directions in state space.Moreover, here we can see that according to Definition 2, noise can turn an overparametrized QNN (with saturated rank) into an underparametrized one (where the rank of the QFIM is equal to the number of parameters).Notably, Fig. 10 shows that there exists a certain robustness to noise in the overparametrization phenomenon.This is evidenced from the fact that when the probability of noise acting is small (e.g., p ∼ 10 −5 ), there still exist a gap of about two orders of magnitude between the dominant eigenvalues and the "newly-appeared" ones.Hence, for small noise levels the system can be considered to be in a quasi-overparametrized regime, where large eigenvalues of the QFIM (the ones that were previously non-zero) coexist with small eigenvalues (the ones that were previously zero).
This separation in eigenvalue magnitude disappears when noise levels increase.As shown in Fig. 10, for large enough noise probability (e.g., p = 0.08) all the eigenvalues are exponentially vanishing.Moreover, in Fig. 11 we show the scaling of the QFIM entries and eigenvalues with the number of gates and noise probability.As in the case of global depolarizing noise, these decrease exponentially (with some statistical fluctuations observed).Taken together, the results in Figs. 10 and 11 numerically confirm the result established in Theorem 4.

V. IMPLICATIONS TO CAPACITY MEASURES
Let us briefly discuss the implications of the previous results for the capacity measures of Refs.[27,28].In particular, we have seen that both these measures are related to the maximum rank of the quantum or classical Fisher information matrix.For simplicity, we will consider first a noiseless QNN that has enough parameters M to be well beyond the overparametrization threshold.In this case, the rank of the QFIM, and concomitantly its capacities D 1 (θ) = rank[F ( ρ θ )] and D 2 , are saturated, and are such that D 1 (θ), D 2 < M (see [21]).
The results presented in this work indicate that if hardware noise is present, then the rank of the QFIM can increase (e.g., the QFIM can become full rank).Using D 1 (θ) as capacity measure [28] would imply that the QNN's capacity is increased by noise.Moreover, a similar conclusion can be drawn for the capacity measure of [27] as follows.Indeed, when the noise renders the QFIM full rank, then it becomes invertible and the following inequality holds [68] This implies that the classical Fisher information is also invertible, and thus full rank.Since rank[F (ρ θ )] ⩾ rank[I(ρ θ )] [69], this means that when the rank of the QFIM increases and becomes full rank, so does the rank of the classical Fisher information matrix.Hence, using Eq. ( 10) we see that noise can also increase the capacity of the QNN when D 2 is used as capacity measure.
We remark that it is true that new directions are enabled in state space by the action of noise, and that these can be somewhat partially controlled (see Fig. 5).However, it is also worth recalling that the sensitivity of the noisy state to parameter updates along these direction decreases exponentially with the noise magnitude.In fact, one can expect that in the regime where the noise is sufficiently large, the QFIM can be full rank (i.e., rank[F ( ρ θ )] = M ) but the magnitude of its eigenvalues exponentially small (see Theorem 4).In this scenario, the QNN has a seemingly increased capacity due to noise, but the state is rendered insensitive to parameter changes.
The critical issue here is that the rank of the QFIM is a discrete number that depends on the number of strictly non-zero eigenvalues of the QFIM, but not on their magnitude.Such issue could be potentially alleviated by considering capacity measures that depend on the magnitude of the eigenvalues.For instance, one could modify the measure of [28] (see Eq. ( 9)) as where λ m (θ) are the eigenvalues of the QFIM for the state ρ θ , Z (ϵ) (x) = 0 for x ⩽ ϵ, and Z(x) = 1 for x > ϵ.As such, one would only account for the eigenvalues of the QFIM that are larger than a given tuneable constant ϵ.
We leave however the study of such a measure for future work, as more research is needed to understand the interplay between the capacity of QNNs and quantum noise.

VI. CONCLUSIONS
Theoretically understanding the performance of QNNs is a fundamental step to guaranteeing their success in practical realistic scenarios.While there has been tremendous efforts in studying noiseless QNNs, little is known about their performance when hardware noise acts throughout the computation.However, since noise is a defining property of near-term quantum devices, more research is needed to bridge the gap between our understanding of noiseless and noisy QNNs.
In this work we focus on the overparametrization of noisy QNNs.To analyze this phenomenon, we first present a toy model example which showcases how noise acting throughout a quantum circuit can indeed increase the rank of the QFIM.Crucially, this means that noise can transform an overparametrized QNN into an underparametrized one.Moreover, this toy model also illustrates a second general effect that noise exerts on QNNs, namely it produces a decrease in the magnitude of the QFIM eigenvalues and thus in the sensitivity of the quantum state to parameter updates.
We then derive analytical results proving that a noise channel at the end of a quantum circuit cannot increase the rank of the QFIM.This implies that certain noise models, like global depolarizing noise interleaved with unitary gates, or measurement noise, leave the rank of the QFIM unaffected.In turn, this means that the noisy QNN can be overparametrized with the same number of parameters as the noiseless QNN.However, we also prove that global depolarizing channels suppress the QFIM entries and its eigenvalues exponentially with the product of the number gates and the probability of depolarization.This renders the output of the QNN insensitive to changes in the variational parameters.
Furthermore, we prove that for fairly general Pauli noise models (consisting of local depolarizing channels and unital Pauli noise), the eigenvalues and entries of the QFIM get exponentially suppressed with the circuit depth and the noise probability.Our results point to a combined effect arising from noise, whereby the rank of the QFIM can be increased, but at the same time the magnitude of all the eigenvalues of the QFIM (both the pre-existing ones and the ones that the noise "turned on") get suppressed.Therefore, although noise enables new directions, it also makes the noisy state insensitive to changes in the parameter values.
With the help of numerical simulations we are able to identify three regimes for the overparametrization phenomenon in the presence of noise.The first corresponds to small noise levels.Here, the magnitude of the new non-zero eigenvalues is very small compared to that of preexisting ones, whose magnitude remains largely unchanged, indicating a certain robustness to noise.In this "quasioverparametrized" regime, the state is mostly insensitive when the parameters are moved along the directions associated with the newly appeared non-zero eigenvalues.We leave for future work to study the performance and potential for quantum advantage of parametrized quantum circuits in this quasi-overparametrized regime.Indeed, there is evidence that specific types of noise can be useful to improve the trainability of noisy quantum circuits [70].Then, there exists an intermediate regime where the magnitude of the new non-zero eigenvalues is comparable to that of the previous non-zero ones, but smaller than in the noiseless scenario.That is, in this regime there is no gap between large and small eigenvalues, but rather the eigenvalues (sorted from larger to smaller) lie on a continuous line.Finally, in the third regime, all the eigenvalues vanish and the state becomes (almost) completely insensitive to changes in the parameter values.Moreover, we find that some of the new directions are purity altering, meaning that the QNN can map the state to regions where it is more, or less, sensitive to the effects of noise.
We then study the implications of our results to current QNN capacity measures proposed in the literature [27,28].We find that measures based on the QFIM rank can be misleading when noise is taken into account.In the presence of noise, the QFIM can be transformed from singular to full rank, indicating (according to rank-based measures) that noise can increase a QNN's capacity.However, the eigenvalues of the QFIM are exponentially suppressed, meaning that the state does not significantly change with parameter updates.This dissonance arises from the fact that the eigenvalue magnitude is not accounted for in rank-based measures, only the number of strictly non-zero eigenvalues.These capacity measures should then be modified accordingly, which is left for future work.
To conclude, we discuss the impact of our results beyond the overparametrization phenomenon.For instance, our results can be understood as shedding new light on the noiseinduced barren plateau phenomenon whereby the optimization landscape of QML models gets exponentially flat with the noise (or the depth) of the circuit [40].Namely, the flatness in the landscape arises from the state being insensitive to changes in the parameters, as evidenced by the exponentially suppressed eigenvalues of the QFIM.In addition, our results have critical implications to noisy-state quantum sensing [71][72][73][74][75]. Since the ultimate precision achievable for sensing external parameters depends on the quantum Fisher information through the quantum Cramér-Rao bound [76,77], our results demonstrate how the utility of a noisy state as a sensor gets degraded by the presence of noise.Finally, an open question that has recently received a lot of attention [78][79][80] is whether QNNs with poly sized DLAs are classically simulable.Although a very relevant question to the present study -poly DLA circuits are the only ones admitting efficient overparametrizationwe want to stress that their efficient classical simulation is not always guaranteed without the access to samples from a quantum computer [80].Moreover, even if shown fully classically simulable, overparametrized QNNs would still find application in problems where one does not seek a computational advantage, e.g., in quantum sensing [72,81].

ACKNOWLEDGMENTS
We acknowledge the Referees of [21] for pointing us in this fruitful research direction.We would also like to thank Max Hunter Gordon, Zoë Holmes, Eddie Schoute and Frédéric Sauvage for useful conversations.D.G-M.was supported by the U. Using the right-hand-side of Eq. (B1) we know that the action of the depolarizing noise can be pulled to the end of the circuit.Then, the proof follows from Theorem 1.

Figure 2 .
Figure2.QFIM and directions in state space.Let U (θ) ∈ e g be a parametrized unitary and |ψ⟩ some pure state, so that |ψ(θ)⟩ = U (θ) |ψ⟩.Here we schematically show that the eigenvalues and eigenvectors of the QFIM, F (|ψ(θ)⟩), inform how changes in parameter space translate into changes in state space.In particular, when modifying θ following the eigenvectors of the QFIM, the state |ψ(θ)⟩ explores the corresponding available direction in the tangent space g |ψ(θ)⟩.Additionally, the magnitude of the QFIM eigenvalues determines the sensitivity of the state to a change along an eigenvector direction[55].As such, a large eigenvalue means that it is "easy" to nudge the state in state space, while small eigenvalues indicate that the state is insensitive to parameter changes in the direction of the associated eigenvector.

Figure 3 .
Figure 3. Noiseless and noisy quantum circuits.a) Noiseless quantum circuit consisting of parametrized unitary channels C m θm .b) Noisy quantum circuit where unital Pauli noise channels Nm are interleaved with the unitary channels.

Figure 4 .
Figure 4. Single-qubit toy model examples.a) We consider the case where the single qubit state of Eq. (24) is sent through a noiseless QNN with four parameters as in Eq. (26).We plot in the Bloch sphere the three trajectories defined by θ1, θ2 and θ3.b) We consider the case where the single qubit state of Eq. (24) is sent through a noisy QNN with four parameters, as in Eq. (28).Here, bit-flip noise channels act before and after every gate with probability p = 0.1.We plot in the Bloch sphere the three trajectories defined by θ1, θ2 and θ3.The action of the unitary gates is marked in blue, whereas the action of the noise channels is marked in red.

Figure 5 .
Figure 5. State space trajectories following perturbations along QFIM eigendirections.a) We consider that the single qubit state of Eq. (24) is sent through a noiseless QNN as in Eq. (26), with parameters θ3.Here we show within the Bloch sphere how the state ρ θ changes when the parameters are varied following the directions given by three eigenvectors of the QFIM F (ρ θ ).Two such directions are associated with the two non-zero eigenvalues (blue and red curves) and with a zero eigenvalue (green, non-visible, curve).b) We consider that the single qubit state of Eq. (24) is sent through a noisy QNN as in Eq. (28), with parameters θ3.Here we show within the Bloch sphere how the state ρ θ changes when the parameters are varied following the directions given by the three eigenvectors of the QFIM F ( ρ θ ) with associated non-zero eigenvalues (blue, red, and green curves).

Figure 6 .
Figure 6.Purity and new directions in state-space.Here, ρ is a single-qubit input state to a noisy QNN.Then, let ρ θ , ρ θ+δ 1 and ρ θ+δ 2 be the output states when the QNN parameters are θ, θ + δ1 and θ + δ2, respectively.As schematically shown, the parameters θ + δ1 (θ + δ2) lead to the output state ρ θ+δ 1 (ρ θ+δ 2 ) having more (less) purity than ρ θ , as the output state is farther (closer) to the center of the Bloch sphere.Note that in all cases, the output states ρ θ , ρ θ+δ 1 and ρ θ+δ 2 are less pure than the input state ρ due to the presence of noise.

Figure 7 .
Figure 7. Eigenvalues of the QFIM versus noise levels.Here we consider the case where the noisy QNN from Eq. (28) acts on the initial single-qubit state of Eq. (24).We plot the magnitude of the eigenvalues of the QFIM for the single-qubit toy model versus the probability of bit-flip error p.The top, middle and bottom panels respectively correspond to the parameter values θ1, θ2, θ3.

Figure 8 .
Figure 8. Schematic representation of a QNN under the general noise model considered.Our results are derived for a general noise model where unitary gates are interleaved by noise channels composed of local depolarizing noise channels acting on each qubit plus some general unital Pauli noise.

Figure 9 .
Figure 9. Eigenvalues of the QFIM under global depolarizing noise.Here we consider a problem where an n = 10 qubit state is sent through an HVA quantum circuit as in Eq. (40), with L = 20 (i.e., with M = 40 parameters).In the simulations, global depolarizing noise acts on all qubits before and after each gate.We show the magnitude of the m-th eigenvalue of the QFIM for different noise values p, at a random point in the landscape.The inset shows the scaling of two non-null eigenvalues with p.

Figure 10 .
Figure 10.Eigenvalues of the QFIM under local depolarizing noise.Here we consider a problem where an n = 10 qubit state is sent through an HVA quantum circuit as in Eq. (40), with L = 20 (i.e., with M = 40 parameters).In the simulations, local depolarizing noise channels act with the same probability p on all qubits before and after each gate.We show the magnitude of the m-th eigenvalue of the QFIM for different noise values p, at a random point in the landscape.

Figure 11 .
Figure 11.Average magnitude of the QFIM entries and eigenvalues in the presence of local depolarizing noise.Here we consider a problem where an n = 10 qubit state is sent through an HVA quantum circuit as in Eq. (40), with fixed parameter values for each (L, p).In the simulations, local depolarizing noise channels act with the same probability p on all qubits before and after each gate.We show the average magnitude of the entries of the QFIM and its eigenvalues for different a) number of layers L and b) noise values p. Bars depict the standard deviation across the different entries (or eigenvalues) of the QFIM.
S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, under Computational Partnerships program.M.L. acknowledges support by the Center for Nonlinear Studies at Los Alamos National Laboratory (LANL) andy the U.S. Department of Energy (DOE), Office of Science, Office of Advanced Scientific Computing Research, under the Accelerated Research in Quantum Computing (ARQC) program.M.C. acknowledges the Quantum Science Center