Quantum algorithms for approximate function loading

Loading classical data into quantum computers represents an essential stage in many relevant quantum algorithms, especially in the field of quantum machine learning. Therefore, the inefficiency of this loading process means a major bottleneck for the application of these algorithms. Here, we introduce two approximate quantum-state preparation methods for the NISQ era inspired by the Grover-Rudolph algorithm, which partially solve the problem of loading real functions. Indeed, by allowing for an infidelity $\epsilon$ and under certain smoothness conditions, we prove that the complexity of the implementation of the Grover-Rudolph algorithm without ancillary qubits, first introduced by M\"ott\"onen $\textit{et al}$, results into $\mathcal{O}(2^{k_0(\epsilon)})$, with $n$ the number of qubits and $k_0(\epsilon)$ asymptotically independent of $n$. This leads to a dramatic reduction in the number of required two-qubit gates. Aroused by this result, we also propose a variational algorithm capable of loading functions beyond the aforementioned smoothness conditions. Our variational Ansatz is explicitly tailored to the landscape of the function, leading to a quasi-optimized number of hyperparameters. This allows us to achieve high fidelity in the loaded state with high speed convergence for the studied examples.


I. INTRODUCTION
Quantum computing has triggered a great interest in the last decades due to its theoretical capability to outperform classical information processing.Even though noise and decoherence are major drawbacks for the computational capacity of current quantum computers, quantum advantage has been experimentally achieved [1][2][3].Unfortunately, these accomplishments lack any industrial or scientific relevance, so the search of a useful application still remains.In this sense, the realistic experimental implementation of many promising quantum algorithms in several fields like solving systems of linear equations [4,5], performing data fitting [6], computing scattering cross sections [7,8], pricing financial derivatives [9][10][11] or initial conditions in differential equations [12][13][14], is constrained by the assumption that data can be efficiently loaded into a quantum device.In this context, the efficient loading of classical data into quantum computers is a particularly important problem, and represents a major bottleneck of the practical application of quantum computation in the NISQ-Era, especially with the emergence of the quantum machine learning field [15][16][17][18][19][20].
There exist different quantum embedding techniques transforming classical data into quantum information [18,21].In particular, we can distinguish two main embedding protocols depending on how the information is encoded.On the one hand, the basis embedding, in which each bit value "0" or "1" is mapped to a computational qubit state |0⟩ or |1⟩, respectively [22].In this way, the embedded quantum state corresponds to a uniform superposition of the bit-wise translations of binary strings.On the other hand, the amplitudeembedding technique encodes the normalized vector of clas-sical data, which is now not necessarily binary, into the amplitudes of a quantum state [23-25, 27-35, 55].In particular, these feature maps have been proposed to load discretized real valued functions [24,25,27,55] with relevant applications in loading initial conditions for solving partial derivatives equations [11][12][13][14], computing Monte-Carlo integrations [9,10,36] and quantum field theory [37,38].However, a practical implementation of these approaches generally incurs into an overhead of resources, which can be reflect into either an exponential number of entangling gates [27-29, 39, 55, 56], or the employment of a huge number of ancillary qubits [31][32][33].A rather different approach sustained by the Solovay-Kitaev theorem [41,42] is based on the application of quantum generative models to efficiently accomplish an approximate amplitude encoding of discretized real valued functions [43,44].Nonetheless, in these generic ansätze, increasing the number of hyperparameters does not necessarily reflect in improving the expressability of the Ansatz to capture the function details [45].Additionally, these variational methods usually suffer from training problems such as local minima and barren plateaus [46].
In this article, we present two approximate quantum algorithms to load real functions into quantum computers for the NISQ era.Our first protocol, inspired by the Grover-Rudolph algorithm and its implementation proposed by Möttönen et al [27,55], implements the algorithm without ancillary qubits with complexity O(2 k 0 (ϵ) ) for functions whose second logarithm derivative is upper bounded by a constant, with n the number of qubits, ϵ the infidelity respect to the exact state, and k 0 (ϵ) asymptotically independent of n.This leads to a dramatic reduction in the number of required two-qubit gates.Inspired by this result, we also introduce and benchmark a variational quantum circuit with applications in a broader family of functions.Our proposed Ansatz is adapted to the structure of the function, which intuitively correlates hyperparameters and expressibility.Moreover, by taking the angles provided by the Grover-Rudolph protocol, we can define a suitable ini-tial training angle set, which considerably improves the training process, avoiding barren plateaus and local minima.Finally we have numerically proven resilience of our algorithm against several noise sources.

II. THE GROVER-RUDOLPH ALGORITHM
The method of Grover and Rudolph, originally proposed in Ref. [55], describes a constructive protocol to load into an n-qubit quantum state the discretized version { f i } of certain integrable density function f : [x min , x max ] ⊂IR→IR + as (1)

A. Grover and Rudolph Algorithm without ancillas
The first proposal in the literature that provided an explicit circuit implementation of the Grover-Rudolph idea without using ancillary qubits was proposed by M. Möttönen et al [27].Without using ancillas, this protocol provides a constructive algorithm which applies a sequence of operation blocks, F (k−1) k (y, θ (k−1) ), to the initial state |0⟩ ⊗n .For k ∈ {1, . . ., n}, each k-qubit block, ), corresponds to a uniformly-controlled y-axis rotation, where the l-th component of the angle vector θ (k−1) is calculated as Here, δ k = x max −x min 2 k−1 and l ∈ {0, . . ., 2 k−1 − 1} is the index corresponding to the (l + 1)-th subinterval of the 2 k−1 partition of the interval [x min , x max ].Each multi-controlled gate comprising

l
) and bisects the (l+1)-th partition interval of the function by using the conditional probability of being in right or left side of the interval, as depicted in Fig. 1.
In the complete algorithm, the total number of angles needed scales as n−1 m=1 2 m−1 = 2 n − 1 , which is exponential in the number of qubits and requires an exponential number of multi-controlled C-NOT gates.Therefore, without using ancillary qubits, the required number of two qubit gates scales exponentially as the number of angles required do, and therefore the circuit complexity of implementing Grover-Rudolph algorithm without ancillas is O(2 n ).
We can conclude that this protocol without ancillas is theoretically capable of loading a discretized density function at the cost of an exponential overhead of resources [39,56] to prepare the state in Eq. (1) using blocks F (k−1) k (y, θ (k−1) ).
Effect of the uniformly-controlled y-axis rotation ) for k = 3 and n qubits.Each multi-controlled gate comprising F (k−1) k (y, θ (k−1) ) bisects the (l + 1)-th partition interval of the function by using the conditional probability of being in right or left side of the interval.
The Grover-Rudolph Algorithm.-Themethod of Grover and Rudolph, originally proposed in Ref. [26], describes a constructive protocol to load into an n-qubit quantum state the discretized version { f i } of certain integrable density function f : [x min , x max ] ⊂IR→IR + [37] as The protocol provides a constructive algorithm which applies a sequence of operation blocks, F (k−1) k (y, θ (k−1) ), to the initial state |0 ⊗n .For k ∈ {1, . . ., n}, each k-qubit block,

l
), corresponds to a uniformly-controlled y-axis rotation (see Supplemental Material [47]), where the l-th component of the angle vector θ (k−1) is calculated as Here,

l
) and bisects the (l+1)-th partition interval of the function by using the conditional probability of being in right or left side of the interval, as depicted in Fig. 1.
Moreover, Möttönen et al. [27] presented a generalization of this algorithm showing how to prepare an arbitrary n-qubit quantum state.This protocol requires 2 n+2 − 4n − 4 CNOT gates.Additionally, in [39,40], the same authors proved that each block F (k−1) k (y, θ (k−1) ) can be implemented with 2 k−1 CNOTs and 2 k−1 single-qubit gates.Therefore, we can conclude that this protocol and its generalizations are theoretically capable to load a discretized density function at the cost of an expo Eq. ( 1).
First Al rithm [26] cretized de an error in conditions quired ent Let us fi sity functi where δ k = The proof terial [47].

B. Grover and Rudolph Algorithm with ancillas
According to the original Grover-Rudolph algorithm [55], for each step k, we can efficiently prepare l |l⟩ by first loading the rotation angles, Eq. 2, with bit precision m into a bit string of m ancillary registers and then performing k controlled rotations of angle 2π/2 j , j = 1, ... m controlled by the ancillary registers.Indeed, the encoding of the rotation angles can be efficiently achieved if we have access to the oracle: An example case when this operation can be efficiently performed is when the function θ (k−1) l (l) can be well approximated by a polynomial and then, implemented by employing a polynomial amount of classical half adder operations, i.e.NAND gates.Lastly, the NAND gates are efficiently mapped into Toffoli quantum gates by making use of at most 3 qubits per classical bit [47].As this map is input-dependent, it enables us to compute it simultaneously for all θ (k−1) l (l) when the input state is a quantum superposition.However, this oracle is not explicitly provided in the original manuscript and the efficient circuit to which the authors refer is not presented, remaining as an oracle.Additionally, the original manuscript of Grover and Rudolph does not provide the analytical bounds of these approximations (m bit precision, oracle implementation), as well as implicitly makes use of additional ancillary qubits which incurs into an important cost for the NISQ era.Last but not least, some recent works have risen criticism about the feasibility of this original proposal [14,48,49].Moreover, to our best knowledge, there is no explicit efficient implementation of this oracle in terms of gates without employing ancillary qubits.ϵ ∈ [0.0001, 0.01] and n → ∞, following Eq.( 10), where the dotted lines correspond to the contour lines and denote the change of values.We can appreciate that k 0 ≤ 10 for most of the cases.

III. FIRST ALGORITHM
Inspired by the Grover-Rudolph algorithm [55], we present an efficient method to encode discretized density functions into quantum states.By permitting an error in the final state and assuming certain smoothness conditions, an angle clustering significantly reduces the required entangling gates.
According to this definition, we encode the discretized function into the amplitude of a quantum state, in contrast to the Grover-Rudolph algorithm, which does it in the probability, Eq. (1).We also consider our function defined in the interval [0, 1] as a standardization criterion.
and the circuit to perform it is provided in Fig. 2(i).
Let us analyze each part of the algorithm.First, we provide a sufficient condition on the target density function to guarantee an upper bound over the difference between two contiguous angles.
Lemma 1.Let f be a continuous function such that f : [0, 1] →IR + and consider a block comprising a uniformlycontrolled rotation of k qubits.Then, the difference between two contiguous angles is bounded by the second derivative of the logarithm of f in the following way: where y log f (y) ≤ η, then for each block comprising a uniformly-controlled rotation of k qubits, the difference between any two angles is bounded by This result allows us to cluster angles of each block in which Eq. ( 6) is fulfilled.Thus, we define a cluster representative angle, θ(k−1) , as ∀k ≥ k 0 , with k 0 + 1 the index of the first block in which Eq. ( 6) is fulfilled (successive blocks also verify it).Note that ∂ 2 y log f (y) ≤ η is a sufficient condition for the bound in Eq. ( 5), however, if in a singularity point it grew slower than 1 δ 2 k , the difference between consecutive angles still vanishes as we will analyze later.
We now analyze how clustering the angles according to Eq. ( 7) affects the final error of the process, measured by means of the fidelity with respect to the exact discretized state | f (x)⟩ n .Considering an n-qubit system, the unitary gate to prepare the quantum state representing the target density function can be written in terms of the blocks as U n = U n−1 θ (n−1) . . .U 0 θ (0) , where we define Let Ũn denote the operation U n when the rotation angles corresponding to each block k are replaced by a representative, θ(k−1) , such that its difference with any angle of the block  is at most η k , i.e. |θ (k−1) l − θ(k−1) | ≤ η k for l = 0, . . ., 2 k−1 − 1 and k = 1, . . ., n.Then, the following lemma can be proven: Lemma 2. Consider a system of n qubits and an error η k between any angle of the k-th block and its representative such that η k ≤ π, with k = 1, . . ., n.Then, the fidelity between the final states with and without clustering, Assuming that we cluster angles of the blocks for k since cos(x) ≥ e −x 2 for x ⪅ π/2.Therefore, if an infidelity ϵ = 1−F k 0 is allowed and the angles of all blocks comprised of more than k 0 (ϵ) qubits are clustered, with k 0 given by Eq. ( 4), then the fidelity satisfies F ≥ 1 − ϵ.In the asymptotic limit of n → ∞, we have that k 0 (ϵ) tends to which is independent of the system size, n.In Fig.
We finally study the implementation cost of the proposed protocol.We only take into account the latter type of gates and ignore single-qubit operations [50].Using the result of [39,56], which illustrates how a uniformly-controlled rotation of k qubits can be implemented with 2 k−1 CNOTs, the complexity of the circuit described in Th. 4 is O(2 k 0 (ϵ) ).

A. Normal Distribution
We apply the algorithm given by Th. 4 to a normal distribution with a mean value µ = 0.5 and for different values of the variance σ.We numerically benchmark the fidelity attained by our first protocol using the value of k 0 resulting of Eq.( 4) when we assume a maximum infidelity of ϵ = 0.05 and a system of 8 qubits.Notice that for this distribution, η = 2/σ 2 .In Fig. 3(i) and Table I, we have depicted the results from the simulations of | f (x)⟩ 8 and |Ψ( f )⟩ 8 , for different values of σ.We appreciate that the condition of F ≥ 0.95 = 1 − ϵ is not only satisfied in all cases, but the fidelities obtained are considerably better.Furthermore, the significant reduction in the number of two-qubit gates required to achieve these results is noteworthy.In the worst case, for σ = 0.3, the quantity of gates needed represents the 12.16% of the original set, while the fidelity of the experiment reaches 0.99841.

B. Generalization: Singular Points
In this section we study the generalization of the Th. 4 when the functions are allowed to have singularities on the TABLE I. Numerical data for the simulations depicted in Fig. 3(i).The values of k 0 have been computed using Eq. ( 4).In addition, the number of two-qubit gates is given by 2 k 0 − 1, and the last column is the percentage between the required gates and the total given by the Grover-Rudolph algorithm, which for n = 8 are 255.
An example is the beta density function, defined as with α, β > 0 and B(α, β) the beta function.Then, the second derivative of its logarithm is Thus, for α 1, we have a singularity at x = 0. Also, we see that if β 1, there is a singularity at x = 1.
In this situation, ∂ 2 x log f (x) can not be bounded by a finite factor η in the whole interval and, hence, Th. 4 can not be applied.However, under certain circumstances, this issue can be solved.Recall that, in Lem. 1, we obtained that the difference between two consecutive angles in a block of k qubits satisfies with Next, if for the limit k → ∞ the term ∂ 2 x log f (x)| x=2 −k+1 2 −2k+2 → 0, this value is decreasing and there exists a k * ≥ k max from which an are satisfied.This k * acts as the k 0 of Th. 4 and represents the last block without clustering.Then, for k > k * , the angles corresponding to the interval [2 −k+1 , 1] can be clustered, following our protocol.The case for a singularity in x = 1 is analogous and in Fig. 4 the resulting gates corresponding to the process of clustering the inner angles for a block of 3 qubits are depicted.Given that R y ( θ)R y (− θ) = 1, we can complete the identity of the clusterized block by subtracting θ from the rest of the angles.
In this situation, if we consider that singularities exist in x = {0, 1}, the total number of two-qubit gates required to load f into a quantum state is see the Appendix A for further details in gates decomposition.
Notice that this result is only valid for k * > 6, but since we are interested in the asymptotical behaviour of the protocol, Example of the reduction of gates for the block of three qubits F 2 3 (ŷ, θ (2) ), where it is assumed that both θ (2)  01 and θ (2)  10 can be approximated by θ. it is not an issue.All in all, we obtain that the complexity of this process is exponentially dependent on k * with an extra polynomial term.
On the other hand, if a singularity is found in (0, 1), we end up with multiple clusters of angles in each block.The reason behind this is that the clusters are formed with contiguous angles.Therefore, the reduction of gates is not significant, and the protocol can not be performed efficiently (polynomial).However, if the representatives of the disjointed clusters are equal, then we can create a single cluster, so the reduction is doable.In this sense, we have numerically observed that far from the singularities, all angles converge to a value of π/2, but we have not found an analytical proof yet.Let us see an example of a function that meets the previous description and analyze the outcome of the protocol.Consider the function where N is the normalization factor, and a system of n = 10 qubits.The second derivative of its logarithm is Hence, ∂ 2 x log f (x) has a singularity at x = 0. First of all, since ∂ 2 x log f (x) is a monotonic decreasing function, its maximum in the interval [2 −k+1 , 1] is found in x = 2 −k+1 , for any k = 1, . . ., n.Then, we can set k max = 1.Next, we need to compute the limit of Therefore, the difference in the angles is bounded and we can select a k * ≥ 1 for which the conditions of Th. 4 are satisfied in [2 −k * +1 , 1].The first condition we need to check is the inequality given by Since this term is decreasing, its maximum is found when k = 1, with a value of 3 32 < π.Thus, this condition is met for any k = 1, . . ., n, so we can select k * = 1.Additionally, if we compute the value of k 0 with ϵ = 0.01 and η = ∂ 2 x log f (x)| x=2 −k * +1 = 0.75 using Eq. ( 4), we obtain k 0 = 2.Then, the last block to remain unclustered must be the maximum between k 0 and k * , which in this case is 2. Now, in Fig. 5, we have depicted the result of the experiment of the considered function for a system of n = 10 qubits and the clustering starting with the 3-qubit block, following the protocol described in this section.With a fidelity of 0.99975, larger than the one required, we have that the final state |Ψ( f )⟩ 10 successfully captures the features of f with a reduction of the complexity of two-qubit gates from O(2 10 ) to O(2 2 ).

C. Analysis of resilience to noise in NISQ era
In this subsection, we present a theoretical and numerical analysis of how experimental errors affect different clustering levels in our first protocol and their impact on the final fidelity.The crucial point here is that when digital accuracy increases, the number of gates requested grows exponentially.As the introduction of these gates implies a growing experimental error in NISQ quantum processors, we observe a trade-off, see Fig. 6, between the clusterization error and the experimental error, quite similar to the trade-off observed in digital quantum simulations between the number of Trotter steps and the experimental error in Refs.[52,53].This balance is crucial for algorithms in noisy quantum processors.Reproducing a similar reasoning as the one in the aforementioned reference and references thereof, we first propose an approximated model of how the experimental error combined with the clustering error of our algorithm scales as a function of the number of nonparallel two-qubit gates.Then, we perform some numerical simulations introducing multiple realistic noises in our algorithm to support our theoretical predictions.
In order to establish a theoretical framework to understand the behavior of our system when clusterization and experimental errors are considered, we make the assumption that the main source of experimental noise comes from the two-qubit gates, while ignoring the noise arising from single-qubit rotations.Additionally, we consider that the application of each of these gates onto a quantum state, denoted as ρ, is modeled in the following form Where U corresponds to the desired dynamics while the second term introduces a certain Taylor expansion of the deviation from the exact evolution, with ξ ≪ 1.After m nonparallel gates, ignoring quadratic terms in ξ, the infidelity of the state has approximately evolved according to as the leading term.Note that the argument could be extended by replacing Ũ → T (•) , an arbitrary quantum channel, but the calculation is more complicated to produce a similar argument.In a rough approximation, for a certain value of the clustering index k 0 , the total fidelity of the protocol run on a , where η = 22.22, k 0 = 1, ..., 6 and n = 6 .As we can appreciate, the expected fidelity of our protocol is above the fidelity resulting of implementing the protocol without clustering in noisy devices, which dramatically tends to 0. Actually, the fidelity reaches a maximum for k 0 = 4 , which corresponds to a clustering error ϵ 0 = 0.0726.NISQ device has a contribution coming from this experimental noise together with the digital infidelity due to the clustering procedure.We can approximate this quantity by which in terms of the clustering error, ϵ and by using that 1 − ϵ ∼ e −η 2 /24(4 −k 0 −4 −n ) , and hence, 2 − 1, can be expressed as From the equation above, we can find the value of clustering error ϵ 0 for which F total achieves its maximum by deriving the function.This condition holds with α = 2ξ∥ Ũ∥ and µ = 96/η 2 , which is a transcendent equality, so it cannot be analytically solved.However, for the range of parameters in which we are interested and for sufficiently small, we can approximate ϵ 0 ∼ α 2 n− 3 2 √ 3 .In order to provide a numerical example for this expression, we analyze a normal distribution with σ = 0.3 encoded in n = 6 qubits.For the experimental error, we consider α = 2ξ|| Ũ|| = 0.0087.We depict the results of this analysis in Fig. 1.If we compare results from this analysis, we can see that the maximum fidelity is achieved for a value of ϵ 0 = 0.0726 while the predicted value reads ϵ 0 = 0.1131.
Once the theoretical framework has been established, let us carry out some numerical simulations analyzing the robustness (in terms of fidelity) against different noises.The main objective is to support the aforementioned analytical findings.The guidelines of the numerical experiment is described as follows.The circuit is transpiled to a native set of gates given by CNOT, Id, Rz (θ), X and Sx.The noise quantum channels considered are • Amplitude-Damping (T1) • Dephasing (T2) • Gate errors (rD, CNOT error) For a realistic scenario, we have taken the value of these errors from the IBM Jakarta quantum processor, which are summarized in Table II.Considering this set up, we have studied again the normal distribution with σ = 0.3 encoded in n = 6 qubits, focusing on the fidelity and l 2 norm (normalized with the system size such that it converges to the L 2 norm in the continuous limit) with respect to the exact discretized state.The numerical results are depicted in Fig. 7, which also shows a trade-off between the clustering and the experimental error.More explicitly, the maximal fidelity, respectively the minimal error measure with the norm L 2 , is achieved for a clusterization level k 0 = 2 , which reproduces the structure showing a trade off between errors predicted by our theoretical model.However the maximum is reached for a smaller value of k 0 , compared to the predictions, as the theoretical model is a first order simplification which becomes not that accurate when the  The blocks which have a number of angles lower or equal to the necessary parameters will remain unclustered.From the first block which has more angles, we proceed to reduce the number of angles to the number of parameters by clustering the ones that do not correspond to the special points.This leads to a significant reduction in the number of gates.(c) Zoom in of the F 4 5 (y, θ) block clusterization.As we only need 8 parameters, the 16 angles of this block are reduced to the half.We keep one local rotation for the clustering parameter and 7 multi-controlled gates corresponding to the intervals where the singularity is located.
presence of more kind of noises is assumed.This means that our idea based on clustering can be implemented with shallow but not trivial circuits and presumably offers a robust performance for the NISQ era.
Consequently, although our protocol introduces a controllable error, it significantly reduces the depth required for the computation as well, resulting in a more balanced and reliable outcome.In the presence of experimental noise, our algorithm achieves a good balance between fidelity (experimental + clustering errors) and feasibility (realistic depth), which is crucial for this stage of quantum computing, where computers are characterized by high error rates and limited coherence times.

A. Ansatz
Based on the previous protocol, we propose a variational Ansatz for loading functions beyond the conditions required by Th. 1.We consider (z + s + 1)p(k) hyperparameters for each k-qubit block satisfying k > k 0 , where z and s are respectively the number of zeros and singular points in the function, p(k) is polynomial number in k denoting the number of hyperparameters allowed per singular point/zero for the k-th block, and k 0 = max{ k | z + s + 1 ≥ 2 k }.Assuming p(k) ≥ 1, the minimum number of hyperparameters needed to capture the singular behavior of the function is z + s + 1.Therefore, for each k-qubit block F (k−1) k (ŷ, θ) comprising more than z + s + 1 parameters, we cluster the angles which do not correspond to the position of zeros or singularities.In this form, this proposal establishes an intuitive correlation between the number of hyperparameters p(k) and the expressability of the circuit to capture the details of the target functions in the most relevant points.By using the decomposition of the multi-controlled rotations [39,56], the number of two-qubit gates which must be included in the variational circuit sums up to with m the number of zeros/singularities and r k 0 +1 (n, s + z) growing polynomially with the system size n.A remarkable advantage of this Ansatz is the training procedure, since Grover-Rudolph algorithm provides a suitable set of initial training angles which considerably enhances the convergence of the protocol with respect to a random initialization.This fact, together with a scaling in the number of hyperparameters p(k) substantially slower than the system size, allow us to avoid the training procedure to get stuck into both local minima and barren plateaus.In Fig. 8, we illustrate how to select the hyperparameters of the variational circuit for a general function containing multiple zeros and singular points.

B. Training Process
We proceed to illustrate the training method for the variational circuit proposed in this article.This process consists of iterative steps in which a loss function that measures how far the outcome is from the desired state is recursively minimized to obtain the optimal parameters.Consider a function f and n qubits.The desired quantum state is where the f l terms are discrete approximations to the objective function, as presented in Def. 1. Now, given a set of angles θ, our Ansatz U(θ) returns the state where |0⟩ ≡ |0⟩ ⊗n , and the components of the obtained state are products of sines and cosines.
This process aims to find the optimal parameters θ for which |Ψ (θ)⟩ n approximates |Ψ( f )⟩ n , which is equivalent to minimizing the loss function.Here, we use the gradient descent method [51] to do so.
Then, given any angle θ (k−1) l , with k ∈ {1, . . ., n} and l ∈ {0, 2 k−1 − 1}, its value gets updated after each training step in the following way: where γ is the learning rate.Let us now compute the expression of the derivative: Since Ψ l (θ) is a product of sines and cosines, its partial derivative with respect to a given angle is Finally, different stopping criteria exist to put to an end the training process.A simple example is to fix a number of training steps.Another criteria, which is the one we consider for our numerical simulations, is to set a tolerance for the loss function.Therefore, when the difference between the cost function of two consecutive steps is less than the given tolerance, the process is considered satisfactory.
As a summary, in Algorithm 1 we have depicted the pseudocode corresponding to the training process.Algorithm1 Algorithm for the training process.
Require: n qubits, normalized function f , objective state |Ψ( f )⟩ n , circuit U(θ) with hyperparameters, stopping criteria and learning rate γ.1: Initialize parameters θ 2: while stopping criteria is not met do Let us now illustrate the behavior the variational circuit different density functions.

Sine Function
As a first example, we have tested the variational circuit with the normalized sine function sin x in the domain [0, 3  2 π].This example contains zeros in x = 0 and x = π and clearly does not satisfy the conditions in Th. 4 or a possible generalization, since it is not even positive.In order to train the parametrized quantum circuit, we use the mean squared error loss function and the gradient descent as optimization method, with a learning rate γ = 1.5.In Fig. 3(ii), we illustrate the arising loaded states of our trained circuits for different values of p(k) = 1, 2, 3 and k (the last case means the introduction of k hyperparameters per zero of the function) and a system comprising n = 5 qubits, with k 0 = 2.For the studied cases, our trained Ansatz is able to load the sine function with a fidelity larger than 0.97.Additionally, in Fig. 9, we show the resulting fidelity of the different combinations of the number of qubits and p(k), for n ∈ {5, . . ., 10} and k 0 = 3.

Black-Scholes Distribution
As a second example, let us now apply the proposed variational circuit to the Black-Scholes distribution, which is given by with s = Kc, where K and c are the parameters.The zeros are found in x = ± log (K s).
We benchmark the performance of our Ansatz for loading the Black-Scholes distribution, Eq.( 35), into a 5-qubit system.We choose the parameters of the distribution K = 45 and c = 3, and for the training parameters, we consider, k 0 = 2, a learning rate of γ = 1.5 and a tolerance of 10 −9 .In Fig. 10 (i), we depict the results of the from the numerical simulations of the discretized state |Ψ (BS(x))⟩ 5 and the trained Ansatz |Ψ (θ)⟩ 5 , for different values of p(k) and n = 5 qubits.We can appreciate that in all cases the circuit is able to capture the main features of the target state, consequently all fidelities are close to 1.We highlight that for p(k) = 3 the resulting state and the objective one are almost equal, ϵ = 5e−5.
We also provide an analysis for larger sytems.In Table III, we depict the numerical results for |Ψ (BS(x))⟩ 12 and |Ψ (θ)⟩ 12 in a system of 12-qubit system, with k 0 = 2, and different values of p(k).We are able to obtain large fidelities with only a few percentage of the original angles, which implies that with only a small portion of the initial parameter space, the circuit can approximately simulate the target state efficiently.
Finally, in Fig 10 (ii) and Table IV we analyze the performance of initializing the value of the parameters with the angles provided by the Grover-Rudolph method for different sizes of the system.We present the training results of the Ansatz initialized with several random angles sets and with the angles provided by the Grover-Rudolph method.We analyze the performance in terms of infidelity and number of steps of the training.We can observe how this initialization reduces drastically the number of steps, as well as enables us to achieve the largest fidelities.

V. CONCLUSIONS
In this article, we have considered the problem of loading real valued functions into a quantum computer, which is a major bottleneck for solving partial derivatives equations [11][12][13][14], computing Monte-Carlo integrations [9,10,36] and quantum field theory [37,38] and quantum machine learning [15,[17][18][19][20]. Firstly, inspired by the Grover-Rudolph algorithm [55] without ancillas, we have analytically proven that the complexity for implementing our method on a n-qubit system scales as O(2 k 0 (ϵ) ), with ϵ the infidelity with respect to the exact state and k 0 (ϵ) asymptotically independent of n.This reduction of two-qubit gates leads to a significant speedup, which allows us to implement quantum protocols involving data embeddings in large qubit systems.Additionally, we have generalized this method for functions containing a certain type of singularities, obtaining promising results for density functions with singularities that satisfy the expanded theorem conditions.Furthermore, we have proposed a variational Ansatz inspired in our previous protocol.We have observed that it can efficiently and accurately load functions with zeros and singularities.Our proposed Ansatz is tailored to the landscape of the function, providing an intuitive correlation between hyperparameters and expressability.Moreover, our previous protocol allows us to define a suitable initial training angle set, which considerably improves the training process, avoiding barren plateaus and local minima.As a future work, tensor networks could be used to prove the quasi-optimality in the minimal number of hyperparameters in the variational Ansatz.
ing the 2 n−1 combinations of control bits.Analytically this gate can be expressed as and since 2 n −1 i=0 |i⟩ ⟨i| = 1 n×n and ⟨i| j⟩ = δ i j , we can simplify it to A.2 Cost of Multi-Controlled us now analyze the cost of implementing a multicontrolled rotation in terms of single and two-qubit gates.
Proof.First, let us see that R y (θ) = XR y (−θ/2)XR y (θ/2): Then, The intuition behind this decomposition is that when all the control states are in |1⟩, we obtain the desired rotation, as illustrated in Eq. (39).Also, when the controlled rotations are activated but the ∧ n−2 (X) is not, we obtain the identity, since R y (−θ/2)R y (θ/2) = 1, as expected.The same occurs when the ∧ n−2 (X) gates are activated but the controlled rotations are not.Finally, when none of the controls are triggered, the circuits in both sides are equal to 1 ⊗n .Now, let us check that the circuits are equivalent.On the left hand side, we have: On the right-hand side: where we have ignored the identities on the left hand side of the equation for the sake of simplicity.If we expand the previous expression, we obtain the following: where we have defined the projector . . .
It is straightforward to see that both terms are equal to 0, due to the projector F(α).Then, we have where we have used that XR y (−θ/2)XR y (θ/2) = R y (θ) and XX =

1.
Let us now compute the last term: where we have used that R y (−θ/2)R y (θ/2) = 1 and Finally, as we wanted to prove.■ Corollary 1.The order of implementing the ∧ n−1 R y (θ) gate is linear in n, for n > 6.In fact, ∧ n−1 R y (θ) can be decomposed with 80n − 398 two-qubit gates.
Proof.Following Th. 1, we know that the multi-controlled rotation ∧ n−1 R y (θ) can be decomposed into two one-qubit controlled rotations, ∧ 1 R y (θ/2) and ∧ 1 R y (−θ/2) , and two ∧ n−2 (X).Therefore, we need to study the number of twoqubit gates necessary to implement the last two gates ∧ n−2 (X).
Corollary 7.4 from [54] states that, for n > 6, ∧ n−2 (X) can be realized with 8(n − 5) ∧ 2 (X).Then, we reduce the problem to obtain the cost of the Toffoli gate, ∧ 2 (X).Lemma 6.1 from the previous reference [54] states that any one-qubit unitary, ∧ 2 (U) can be implemented with 5 two-qubit gates comprising two CNOTs, two ∧ 1 (V), and one ∧ 1 V † , with V 2 = U.In our case, we have U = X.Then, we can choose V = 1−i 2 (1 + iX).We provide the exact decomposition of the Toffoli gate using the previous description in Fig. 13.
13. Circuit of the decomposition of the Toffoli gate, ∧ n−2 (X) using five two-qubit controlled operations, where V 2 = X.
Finally, if n > 6, then the number of gates required for decomposing ∧ n−1 R y (θ) , considering only two-qubit operators, is Theorem 2. Let f be a continuous function such that f : [0, 1] → R + and consider a block comprising a uniformly controlled rotation of k qubits.Then, the difference between two contiguous angles is bounded by the second derivative of the logarithm of f in the following way: where δ k = 1 2 k−1 , l ∈ {1, . . ., 2 k−1 − 1}.Proof.The discrete expression of the angles for a block comprised of uniformly controlled rotation with k qubits, first introduced in [55], is given by with δ k = 1 2 k−1 and l ∈ {1, . . ., 2 k−1 − 1}.We can define the continuous extension of the previous function as where the original expression can be recovered replacing y = lδ k .Now, given a function g and a displacement µ, we define the numerical derivative of g with respect to µ as Notice that for µ → 0, ∂ (µ) y g(y) → ∂ y g(y), the usual derivative.Then, for two consecutive angles, we have Now, we require a connection between the difference in the angles and the partial derivative of Eq.( 58).The Mean Value Theorem guarantees that the numerical derivative, which corresponds to the slope of the straight line connecting θ (k−1) l and l+1 , is constrained by the maximum absolute value of the exact derivative in that interval.Then, we have Let us now develop the term of the exact derivative of the function θ (k−1) (y): .
By introducing ( f (y )) in the first term and reorganizing the expression, we have: where the first term is positive.Now, since With this result, we get the following inequality for the absolute value of the derivative of θ (k−1) (y): (65) By using again the numerical derivative defined in Eq. ( 59), we can simplify the previous expression in the following way: Also, since f must be integrable, its primitive F exists.Then, Let us now plug it into the inequality If we follow the same argument as in Eq. (61), we have that the term of the numerical derivative is upper bounded by the exact derivative, in the absolute value, which corresponds to the second derivative of the function's logarithm.Hence, the following inequality holds ) Finally, by plugging this result in Eq. (61), we obtain as we wanted to proof.■ Corollary 2. Let f be a continuous function such that f : [0, 1] → R + and consider a block of uniformly controlled rotations of k qubits.Then, if ∃η ≥ 0 such that ∂ 2 y log f (y) ≤ η ∀y ∈ [0, 1], Corollary 3.For a non-standardized function f : [x min , x max ] → R + , the result in Corollary 2 holds with the modification in the bound where L = x max − x min and δ k = L 2 k−1 .Proof.The change of variables that map the x ′ ∈ [0, 1] with x ∈ [x max − x min ] is given by Now, by using the chain rule, we obtain: Then, the difference between any two angles is ∀l, l ′ ∈ {1, . . ., 2 k−1 }.
Proof.The worst scenario is when l = 1 and l ′ = 2 k−1 .In this case, by using the triangular inequality, we have Let us consider a system with n qubits, thus the unitary gate to prepare the quantum state representing the target density function can written in terms of the blocks as where we define Let denote the operation U n given a representative with an error η k between the angles for each block, i.e. |θ (k−1) l − θ(k−1) | ≤ η k for l = 0, . . ., 2 k−1 − 1 and k = 1, . . ., n.
Theorem 3. Consider a system of n qubits and an error η k between any angle of the k-th block and its representative such that η k ≤ π, with k = 1, . . ., n.Then, the fidelity between the final states with and without clustering, Proof.We use induction to prove the inequality.Hence, we start with the elemental case of n = 1 and later proceed assuming it is satisfied for n − 1 and check if it holds for n: • n = 1: Notice that the angles, given by take values between 2 arccos(1) = 0 and 2 arccos(0) = π.Then, since the cosine is a decreasing function in the interval [0, π/2] and η ≤ π, we have that • n > 1: Assume the condition holds for n − 1.We aim to recover the expression of ⟨0| ⊗n−1 U † n−1 Ũn−1 |0⟩ ⊗n−1 so we can use the induction hypothesis.Therefore, as we did for n = 1, we sandwich the operator with the state |0⟩ at each side.From now on during this proof, to simplify the notation, we denote the gate ).Then, for n: Notice that we now we can substitute the rotation terms in the following way: since θ (n−1) i 1 ...i n−1 , θ(n−1) , η n ∈ [0, π] ∀i 1 , . . .i n−1 ∈ {0, 1}.However, before introducing the inequality, we need to check that the rest of the terms have the same sign.To do so, we first sandwich the terms corresponding to the block n − 2 with the state |0⟩ as well: Here, since all the angles are between 0 and π, we have that ■ Notice that one of the conditions of the previous Theorem is that η k ≤ π ∀k ∈ {1, . . ., n}.Recall that, as seen in Eq. (78), η k ≡ δ k 8 η.Therefore, we obtain the following condition for the value of η for Th. 3 to be satisfied: In particular, if this holds for the smallest k, with value 1, it will be satisfied for the rest of the blocks.Then, we can write the previous condition simply as η ≤ 8π.(101) Note that in the limit for both F 0 → 1 and η → ∞, we have that as expected.Also, since k 0 must be an integer, we take the ceiling of its value.In addition, we consider the minimum of this parameter to be 2, so the condition of η ≤ 8π is satisfied.Hence, the final expression for k 0 is where we have defined ϵ ≡ − F 0 .
Once we have a fixed number of uniformly controlled rotation blocks, k 0 , for which there is no clustering, we count the number of necessary gates.The gate F k−1 k (ŷ, θ) can be implemented with 2 k−1 CNOTs and 2 k−1 single-qubit gates [56].Thus, the total number of gates is: and the circuit to perform it is given in Fig. 14.
Proof.Using Th. 2 and Cor. 4, we can define the representative angle for each block as in Eq. ( 78).Then, by applying Th. 3 with the development between Eqs. ( 94) and (101), it is clear that the fidelity will be larger than or equal to 1 − ϵ.Finally, Eq. ( 104) states that the number of two-qubit gates necessary to realize this protocol is 2 k 0 − 1, as we wanted to prove.■ l

FIG. 1 .
FIG. 1.Effect of the uniformly-controlled y-axis rotation F (k−1) k (y, θ (k−1) ) for k = 3 and n qubits.Each multi-controlled gate comprising F (k−1) k (y, θ (k−1) ) bisects the (l + 1)-th partition interval of the function by using the conditional probability of being in the right or left side of the interval.

25 FIG. 2 .
FIG. 2. (i)Quantum circuit performing the protocol presented in Th. 4, based on the Grover-Rudolph method for a system of n qubits.We cluster the angles of the blocks for k ≥ k 0 + 1 > 2, leading to a drastic reduction in the number of gates needed.(ii) Values of k 0 for η ∈ [0, 8π], ϵ ∈ [0.0001, 0.01] and n → ∞, following Eq.(10), where the dotted lines correspond to the contour lines and denote the change of values.We can appreciate that k 0 ≤ 10 for most of the cases.

8 FIG. 6 .
FIG.6.Total fidelity, F total , in terms of the clustering infidelity assumed in the protocol according to the expression ϵ(k 0 ) ∼ 1 − e −η 2 /24(4 −k 0 −4 −n ) , where η = 22.22, k 0 = 1, ..., 6 and n = 6 .As we can appreciate, the expected fidelity of our protocol is above the fidelity resulting of implementing the protocol without clustering in noisy devices, which dramatically tends to 0. Actually, the fidelity reaches a maximum for k 0 = 4 , which corresponds to a clustering error ϵ 0 = 0.0726.

FIG. 7 .
FIG. 7. (i) and(ii) Total fidelity, F total , in terms of the clustering infidelity assumed and k 0 in the protocol according to the expression ϵ(k 0 ) ∼ e −η 2 /24(4 −k 0 −4 −n ) , where η = 22.22, k 0 = 1, ..., 6 and n = 6 for the ideal and noisy cases.(iii) l 2 normalized error in terms of the clustering in the same conditions.The maximal fidelity, respectively the minimal error measure with the l 2 norm, is achieved for a clusterization level k 0 = 2 , which corresponds to ϵ 0 = 0.7222

FIG. 8 .
FIG. 8. Illustration of proposed variational circuit for a general function.(a) Target function.The proposed landscape generalizes the possible cases of special points that lead to singularities in the second logarithmic derivative.The special points are classified in the legend of the graph, where z denotes the zeros, s the singularities, and the index denotes whether the slope in the contiguous points has the same sign, index= 1, or not, index= 2. This is translated to a minimum amount of 7 necessary parameters required to capture the local behavior of the function in these points, one per index= 1 and 2 per index= 2. We add an additional parameter which represents the clusterization.(b) Variational circuit.The blocks which have a number of angles lower or equal to the necessary parameters will remain unclustered.From the first block which has more angles, we proceed to reduce the number of angles to the number of parameters by clustering the ones that do not correspond to the special points.This leads to a significant reduction in the number of gates.(c) Zoom in of the F 4 5 (y, θ) block clusterization.As we only need 8 parameters, the 16 angles of this block are reduced to the half.We keep one local rotation for the clustering parameter and 7 multi-controlled gates corresponding to the intervals where the singularity is located.

FIG. 10 .
FIG. 10. (i) Simulations of |Ψ (BS(x))⟩ 5 and |Ψ (θ)⟩ 5 for the Black-Scholes distribution, n = 5, k 0 = 2, and different values of p(k): (a) p(k) = 1, (b) p(k) = 2, and (c) p(k) = 3. Parameters of the distribution: K = 45 and c = 3. Training parameters: k 0 = 2, learning rate γ = 1.5 and tolerance of 10 −9.We can appreciate that in all cases the circuit is able to capture the main features of the target state, consequently all fidelities are close to 1. (ii) Results of training the variational ansätze for a different number of qubits (n=15,16,17,18) to load the Black-Scholes distribution.We compare the training results (numbers of step and final infidelity) of 40 randomly initialized parameters versus the initialization with Grover-Rudolph method angles.These randomly initialized parameters are generated by drawing samples from a uniform distribution between the values of 0 and π.As we can appreciate the initialization with Grover-Rudolph method always achieves the best performance, while most of random initializations get stuck around a fidelity value (vertical lines).Training parameters: k 0 = 2, p(k) = 1, learning rate γ = 1.5 and tolerance of 10 −9 .

FIG. 11 .
FIG. 11. Circuit corresponding to the uniformly controlled rotation gate F n−1 n (u, θ), with n − 1 control qubits, a target at the n-th qubit, an axis of rotation u, and a set of angles θ.
) ■ APPENDIX B: THEOREM PROOFS B.1 Relation Between the Difference in the Angles and the Target Function

√ a − √ b 2 = a + b − 2 √
ab ≥ 0, we have the well-known inequality for the geometric and arithmetic means √ ab ≤ a+b 2 .Then, if we select a =

δ k 8 η
77) since there are less than δ −1 k elements in the sum.■ In this situation, we can define the representative angle θ(k−1) for the clustering process as the one corresponding to the middle part of the interval, satisfying θ(k−1) − θ (k−1) l ≤ ∀l ∈ {0, 1, . . ., 2 k−1 − 1}.(78) B.2 Error Bound of the Algorithm 1. Relation Between the Difference in the Angles and the Fidelity

FIG. 14 .
FIG.14.Quantum circuit performing the protocol presented in this section, based on the Grover-Rudolph method for a system of n qubits.

TABLE III .
Numerical data for the simulations of |Ψ (BS(x))⟩ 12 and |Ψ (θ)⟩ 12 , n = 12, and different values of p(k), with k 0 = 2.The value #Angles is the number of independent angles necessary to generate |Ψ (θ)⟩ 12 , and %Angles is in comparison with the full circuit.We are able to obtain large fidelities with only a few percentage of the original angles, which implies that with only a small portion of the initial parameter space, the circuit can approximately simulate the target state efficiently.

TABLE IV .
16,17,18) results of training the variational ansätze for a different number of qubits (n=15,16,17,18)to load the Black-Scholes distribution.We compare the training results (numbers of step and final infidelity) of 40 randomly initialized parameters versus the initialization with Grover-Rudolph method angles.These randomly initialized parameters are generated by drawing samples from a uniform distribution between the values of 0 and π.Training parameters: k 0 = 2, p(k) = 1, learning rate γ = 1.5 and tolerance of 10 −9 .