Quantum state preparation protocol for encoding classical data into the amplitudes of a quantum information processing register's wave function

We present a protocol for encoding $N$ real numbers stored in $N$ memory registers into the amplitudes of the quantum superposition that describes the state of $\log_2N$ qubits. This task is one of the main steps in quantum machine learning algorithms applied to classical data. The protocol combines partial CNOT gate rotations with probabilistic projection onto the desired state. The number of additional ancilla qubits used during the implementation of the protocol, as well as the number of quantum gates, scale linearly with the number of qubits in the processing register and hence logarithmically with $N$. The average time needed to successfully perform the encoding scales logarithmically with the number of qubits, in addition to being inversely proportional to the acceptable error in the encoded amplitudes. It also depends on the structure of the data set in such a way that the protocol is most efficient for non-sparse data.


Abstract
We present a protocol for encoding N real numbers stored in N memory registers into the amplitudes of the quantum superposition that describes the state of log 2 N qubits. This task is one of the main steps in quantum machine learning algorithms applied to classical data. The protocol combines partial CNOT gate rotations with probabilistic projection onto the desired state.
The number of additional ancilla qubits used during the implementation of the protocol, as well as the number of quantum gates, scale linearly with the number of qubits in the processing register and hence logarithmically with N . The average time needed to successfully perform the encoding scales logarithmically with the number of qubits, in addition to being inversely proportional to the acceptable error in the encoded amplitudes. It also depends on the structure of the data set in such a way that the protocol is most efficient for non-sparse data.
Quantum computing devices have made great progress towards the construction of a quantum computer whose computing power exceeds that of any existing classical computer [1][2][3]. In particular, a clear quantum advantage over classical computers was recently demonstrated using superconducting devices [4,5]. Multi-order-of-magnitude increases in the number of qubits and computing power are expected in the coming few years.
On the software side, new algorithms are continually being developed for future quantum computers [6,7]. In particular, as machine learning techniques become increasingly prevalent, researchers are exploring the potential for quantum computers to offer a computational advantage using similar techniques [8,9]. There have been a large number of proposals for using quantum computers to perform machine learning tasks. There have also been a few proof-of-principle experimental demonstrations of such tasks [10][11][12][13].
Quantum machine learning algorithms operate on data stored in the form of a quantum superposition in the state of a quantum information processing register. There are cases where the initial state can be encoded easily into the processing unit for machine learning processing. For example, the data could be a quantum state that results from easily reproducible quantum dynamics, e.g. a quantum simulation of a physical system. In this case it could be practically impossible to translate this data into classical form (because of the exponentially large Hilbert space) but easy to take the prepared quantum state and perform quantum machine-learning analysis on it. The situation is starkly different when dealing with input data that is provided in classical form and does not necessarily have any relation to quantum mechanical quantities. Assuming that the data is described by a set of N real numbers {c 0 , c 1 , ..., c N −1 }, one first needs to encode this data into the quantum state of a quantum register. In this case, the step of encoding the classical data into the quantum processor can be the most challenging step in running the machine-learning algorithm.
A conceptually natural encoding of the data, which is used for example in the quantum support vector machine [14] and allows a straightforward evaluation of the distance between two data points, is amplitude encoding. This encoding can be described as preparing the where |k is the n-qubit state with the integer k expressed in the binary representation, and n is the smallest integer larger than log 2 N. For example |k = 5 in a three-qubit system would correspond to the state |101 . This encoding step in some sense compresses 2 n data values such that they are stored in the quantum state of n qubits. The denominator N is a normalization factor. Here it is given by N = k |c k | 2 . However, we shall use the same symbol as a generic normalization factor below.
The task of encoding classical data into a quantum register is closely related to the problem of quantum state preparation, which has been studied by several authors in the past two decades [15][16][17][18][19][20][21][22] and is also closely related to the study of quantum random access memory (qRAM) in more recent literature [23][24][25]. Early studies on this topic showed that one can perform state preparation of a general n-qubit state using a sequence of ∼ 2 n single-and two-qubit gates [15,17,18]. Later proposals showed improved, though still exponential, resource scaling [19]. These results are consistent with intuition based on the fact that a general n-qubit state is defined by 2 n complex basis state amplitudes, with one normalization constraint and one irrelevant overall phase. The state-preparation gate sequence must therefore contain at least 2 (n+1) − 2 adjustable parameters to be able to access any point in the n-qubit Hilbert space, which leads to the conclusion that ∼ 2 n single-and two-qubit gates are needed to perform state preparation with an arbitrary target state. Other proposals showed that polynomial scaling in n can be achieved for special cases depending on the structure of the data, e.g. if the number of basis states in the superposition is small [22] or if the superposition contains only basis states with a certain number of zeros and ones [21,25]. Other proposals demonstrated polynomial scaling in n based on the assumption that a certain oracle that contains the data as controllable parameters can be implemented efficiently [16,20]. A protocol integrating qRAM and state preparation steps was described in Ref. [9]. However, this protocol also does not specify the details of the oracle implementation. Another recent direction of research in this area is approximate encoding, where adaptive learning techniques have been proposed to optimize the performance of encoding protocols for a fixed amount of resources [26][27][28][29].
Here we present a protocol that performs amplitude encoding efficiently when an exponentially large number of amplitudes are present in the data to be encoded, explicitly describing the quantum operations used in the protocol. The protocol is based on the use of partial CNOT gates and probabilistic measurement-induced projection. The steps used in this protocol are similar to those used in Grover's state preparation protocol [15,30]. As we shall see below, the protocol is most efficient when a large fraction of the values in the data set are of the same overall scale, while the protocol becomes increasingly inefficient as we approach the sparse limit where most of the data are zeros or negligibly small relative to the largest value in the data set.
As our starting point, we assume that the numbers c k are stored in N memory registers, each as an integer with a length of L bits, where L sets the accuracy of the numbers c k .
Each memory register therefore contains L bits that we call the value bits. We also assume that each memory register contains n additional bits that encode the integer k. We refer to this part of the memory register as the index bits. The state of each memory register can therefore be expressed as |k, c k , with a total of n + L bits in each register. For example, if we take a data set that contains four elements, each of which contains five binary digits, the entire memory is described by the state where c k,l (with l = 0, 1, 2, ..., L − 1) are the individual bits in the bit string that encodes the value c k . In practice, it might not be necessary to encode the index bits in a dedicated part of the memory, as the hardware might be designed to correctly identify the index of any value stored in the memory based on the location where the value is stored. It should be noted that the memory registers are in a classical state that does not involve any quantum superposition, and they remain in the same state throughout the protocol. However, they must be physically isolated from the environment to prevent the environment from knowing which memory register is accessed. The reason is that amplitude encoding relies on having a quantum superposition over all values of k and each memory register is accessed in one branch of the superposition. If the information about which memory register is accessed leaks to the environment, this information would show that only one memory register (with a specific value of k) was accessed, and the state of the quantum computer would collapse to a state with a single value of k. For this reason we shall refer to the physical components that hold the memory bits as qubits, even though they remain in a classical state throughout the protocol.
To provide insight into the basic idea of the protocol, we start by presenting its first steps in the simple case where the data set contains only two numbers, c 0 and c 1 , such that these two numbers are to be encoded into a single-qubit processing register, to which we shall refer as the CPU register. The memory is therefore in the state |0, c 0 ⊗ |1, c 1 . We initialize the CPU qubit in the state (|0 + |1 )/ √ 2. In addition to the CPU register, we introduce an ancilla qubit, the flag qubit, initialized in the state |0 . Hence the combined system is initially in the state where the first ket is that of the CPU qubit, the second ket is that of the flag qubit, and the product of kets on the right describes the state of the two memory registers.
With this initial state, we seek an operation that will modify the amplitudes in the quantum superposition and make them proportional to the corresponding numbers c k . Considering only one of the two computational basis states and its corresponding memory register, an operation that produces an amplitude that is approximately proportional to c k is a rotation by an angle that is proportional to the value in the memory register. If such a rotation is applied to the flag qubit, it causes the transformation: The parameter R is a constant that the user can choose freely, and it should be chosen to be much larger than |c k |, such that sin(c k /R) is approximately equal to c k /R, as we shall discuss in more detail below.
The above-described rotation, which needs to be implemented for one value of k, can be described by the unitary operation U k = e −iσ F y c k /R , where σ F y is a Pauli operator that flips the states |0 and |1 of the flag qubit. The operator U k can be expressed more explicitly in terms of the individual bits c k,l that comprise the number c k : Note that, depending on the convention used for the scale of the numbers c k , there could be an overall scale factor, e.g. 2 L , inside the exponent that we have omitted. Any such factor can be absorbed into the constant R and hence does not affect the protocol. In an efficient encoding protocol, the externally applied controls should not explicitly depend on the data values. Instead, these values should be retrieved from the memory and used to implement the necessary rotations on the flag qubit via a unitary operator that does not explicitly contain the numbers c k . To demonstrate how this goal can be achieved, we first turn the numbers c k,l in U k into operators. We therefore rewrite the operator U k in the form where σ Memory,k,l The CNOT gate is usually generated by creating conditions for a control-qubit-dependent resonance, which results in Rabi oscillations in the target qubit conditioned on the state of the control qubit. By setting the pulse time such that half a Rabi oscillation is completed, a CNOT gate is implemented. By varying the pulse time, a partial CNOT gate can be implemented. Specifically, by setting the pulse time to obtain a rotation angle of 2 L−l /R and using the lth memory qubit as the control qubit, the rotation corresponding to the lth term in Eq. (6) can be realized. An example of such an implementation of the CNOT gate dynamics that is one of the standard two-qubit gate protocols for superconducting qubits is described in Ref. [31]. Each one of the rotations in the product in Eq. (6) which performs the necessary transformations for both values of k. It should be emphasized here that the product over k and the additional conditioning operator inside the exponential do not add serious complications to the implementation of U, e.g. in terms of resource scaling or the need to perform a separate operation for each value of k. As explained above, the product over k simply means that all the memory registers, which are separate physical objects, are accessed in the process. Physically only a single query operation, which has the ability to access any part of the memory, is applied. The part of the memory that responds to the query is determined by the state of the CPU qubit in the computational basis. As for the additional operator inside the exponential, although the operator U is now a multi-qubit operator in which the flag qubit rotation is conditioned on the memory value qubits as well as the matching of the CPU and memory index qubits (both of which become qubit strings when n > 1), the different condition terms can be efficiently mapped onto a single qubit that is used as the only control qubit in the implementation of the controlled rotation [33,34].
More specifically, if we are given K control qubits and we wish to implement an operation on a target qubit conditioned on all the control qubits being in the state |1 , we can use K/2 ancilla qubits initialized in the state |0 and perform K/2 Toffoli gates to obtain K/2 control qubits instead of the original K control qubits. Next we use K/4 additional qubits and repeat the process to halve the number of control qubits once again. We repeat this process log 2 K times, using a total of K − 1 ancilla qubits, to obtain a single control qubit that is in the state |1 if and only if all the original control qubits are in the state |1 . After the controlled operation on the target qubit (which here is the flag qubit), the ancilla qubits can be returned to their initial state (i.e. |0 for all the ancilla qubits) by reversing the above process, such that the ancilla qubits are disentangled from the rest of the system, which is needed to prepare the desired state in the CPU qubit [35]. As an example, the procedure for implementing a multi-qubit-controlled operation for four control qubits is illustrated in the following diagram: We also note that an alternative implementation of multi-qubit-controlled operations, relying on the use of qutrits instead of qubits and not requiring any ancillae, was proposed recently in Ref. [36].
Application of the operation U leads to the following transformation in the quantum state of the system It is now convenient to turn to the case of a multi-qubit CPU register. The CPU register is initialized in the state 2 −n/2 (|0 + |1 ) ⊗n , which can alternatively be expressed as 2 −n/2 2 n −1 k=0 |k . Similarly to the single-qubit case, the protocol proceeds by implementing a conditional rotation on the flag qubit controlled by the CPU and memory registers: This transformation is implemented using the operator where the index m labels the n qubits in the CPU register and in each memory index register. The product over m is equal to 1 if the state of the CPU register matches that of the memory index register and is equal to 0 otherwise. As a result, the operator U uses the appropriate values, i.e. c k,l , in the kth branch of the quantum superposition of computational basis states.
The implementation of the operator U in Eq. (10) proceeds as follows: first, the information about the matching between the CPU qubits and memory index qubits is mapped onto n ancilla qubits, which we call the parity qubits. Then n − 1 additional ancilla qubits are used to compress the information in the n parity qubits into a single qubit. Then this single qubit is used as a control qubit to implement the rotations in the flag qubits based on the values of the memory value bits. There are L of these rotations, one for each value of l. All of these operations can be performed simultaneously. It should also be noted that each one of these rotations is a Toffoli gate with two control qubits: one memory value qubit and the one ancilla qubit that encodes the full CPU-memory-index matching condition. After the controlled rotations, the ancilla qubits are returned to their initial state.
After performing the above transformation (described by the operator U), we perform a measurement on the flag qubit. If it is found to be in the state |1 , the system is projected The state of the CPU register is now disentangled from that of the flag qubit. If we set R to a sufficiently large value such that |c k /R| ≪ 1 for all values of k, the state of the CPU register can be expressed as which, to lowest order, is the desired state. We shall comment on the deviation term shortly.
We now consider the resources required for the implementation of the protocol. In the steps described above, we introduced 2n extra qubits in addition to the CPU and memory registers. These extra qubits are: one flag qubit, n ancilla qubits to temporarily store the parity information, and n − 1 additional ancilla qubits for the implementation of the nqubit-controlled rotations, i.e. for compressing the parity information into a single qubit.
The number of single-and two-qubit gates performed during the protocol is the sum of two terms: ∼ n gates needed to prepare the single control qubit and ∼ L gates needed to perform the controlled rotations on the flag qubit. The total number of gates therefore scales linearly with the larger of the two parameters n and L. As explained above, the L operations involving different values of l can be performed simultaneously. The n operations needed to prepare the parity qubits can also be performed simultaneously. The step of preparing a single control qubit from n control qubits can be partially parallelized. Specifically, the depth, i.e. the minimum number of steps when as many single-and two-qubit gates as possible are performed simultaneously, scales only as log 2 n, since each step in the control qubit compression procedure halves the number of control qubits. In addition to these scaling laws, the resource requirements will depend on the success probability of the measurement step, i.e. the probability that the flag qubit will be found in the state |1 . The time needed to successfully prepare the desired state will, on average, be proportional to the inverse of the success probability. The probability that the flag qubit will be in the state |1 just before the measurement is given by The success probability therefore depends on c k , which are the data values that we wish to encode, as well as the parameter R, which is a variable that we can set freely.
First we consider the parameter R. If we want to maximize P Success while ignoring all other considerations, we choose a small value of R, ideally a value comparable to c k . However, R must be much larger than the largest value of |c k |, to which we refer as c max , to make sure that the approximation in Eq. (11) is valid for all values of k. The question then is how small we can take R before we start having a nonnegligible deviation from the ideal final state. The coefficient of the state |k in the final state of the protocol is sin(c k /R) instead of being c k /R. The relative error is therefore given by It is worth noting here that this expression for the error is a conservative estimate: because all values of ǫ k are negative, the renormalization factor in Eq. (11) will at least partially suppress the difference between the coefficients in the prepared state and in the desired state. If we set a maximum acceptable relative error of ǫ in any individual value in the data, R must be chosen such that c max /R ≤ √ 6ǫ. In other words R ≥ c max / √ 6ǫ. Substituting this inequality into Eq. (12), we obtain the inequality If we assume that the optimal value of R is chosen, i.e. if R is much larger than c max but does not diverge with increasing n such that it affects the scaling law, P Success will be on the order of the right-hand side of Eq. (14).
To analyze the role that the structure of the data plays in the efficiency of the protocol, it is instructive to define the density (or, in other words, non-sparsity) measure which allows us to express the optimized success probability as The parameter ρ takes its maximum value (ρ = 1) when all |c k | are equal, while ρ ≪ 1 for sparse data, where only a small number of c k values are on the same order as c max . If a substantial fraction of all the numbers |c k | in the data set are comparable to each other, ρ will be on the order of one. This situation is desirable for maximizing P Success . If on the other hand only a small fraction of the data set is nonnegligible in comparison with the largest value c max , ρ will be much smaller than one, and P Success will be especially small. In the worst case scenario, when the number of large |c k | values does not grow with the size of the data set, ρ decreases exponentially with n. The latter case corresponds to a very sparse data set. The protocol is therefore well suited for dense data sets where a significant fraction of the data is on the same order as the maximum value. The more sparse the data, the less efficient the protocol. Note that in order to make the assessment about the value of ρ we need to have some basic information about the overall properties of the data set. Note also that some of this information can be measured. For example, P Success can be measured and used to obtain an estimate for k |c k | 2 .
Combining the success probability (∝ ρǫ) with the time needed to implement the steps of the protocol (∝ log n), we obtain the scaling law for the time needed to successfully prepare the encoded state: Importantly, provided that the data is not sparse (i.e. the parameter ρ is not exponentially small) as explained above, there are no exponential factors in the resource scaling laws.
The protocol can therefore be considered efficient in this case. It is worth noting here that dense data sets with an exponentially large number of elements are generally considered the most difficult ones for classical algorithms, because these algorithms operate on each element separately, leading to an exponential scaling in the required resources.
A few comments are in order at this point. The numbers c k are usually treated as real numbers, while the coefficients in a quantum superpositions can in general be complex. As a result, one could compress the data further by taking advantage of the complex-number nature of the quantum superposition. However, we ignore this possibility in this work. The adjustment of the protocol to take advantage of this fact should be relatively straightforward.
However, the factor-of-two resource savings translates into a reduction of the required qubit number by one qubit, which is in some sense minimal and not worth complicating the physical picture.
By setting a sufficiently large value for R and implementing the protocol a large number of times, the success probability (heralded by the flag qubit being found in the state |1 ) gives a good approximation for the sum k |c k | 2 . However, it is not sufficient to rely on Another point concerns how to do the resource counting. It might seem at first sight that there is exponential scaling in resources because the protocol involves 2 n memory registers, i.e. a number that scales exponentially with n. However, this does not imply that the protocol is not efficient. The ∼ 2 n memory qubits store the input data. Therefore their large number is simply a reflection of the fact that we are given an extremely large amount of input data for the computation. The more appropriate comparison is to say that, given this amount of data, a classical algorithm requires resources that scale as 2 n (i.e. N), while the quantum protocol could require resources that scale only as n (i.e. log 2 N), hence an exponential speedup. A related point is that in implementing the conditional operations (Eq. 6) an exponentially large number of operations are implemented. However, considering that the hardware is normally set up such that any memory register can be accessed, the N different memory registers are accessed simultaneously in different branches of the quantum superposition with the relevant memory register activated by the state of the CPU register. This situation is somewhat similar to that encountered in Grover's database search algorithm [37] or qRAM protocols [25], where all the data stored in the memory are queried simultaneously. One can therefore say that we are assuming a setup that supports such operations.
In this work we focused on the problem of encoding the entire data set, with all different values of k, into the CPU register. In some cases, one might be interested in analyzing only a subset of the data, e.g. the elements for which the index k satisfies a certain condition. It should be possible to incorporate such a condition on k when implementing the conditional operation U. The quantum circuit needed to implement such a condition will depend on the nature of the condition defining the subset of interest. A related point is that we assumed a uniform quantum superposition in the initial state of the CPU register, as well as an ancilla flag qubit initialized in the state |0 . Considering more general initial states can lead to a richer, and possibly computationally advantageous, variety of final states. We do not consider these possible extensions of our protocol here.
In conclusion, using steps similar to those used in Grover's state preparation algorithm and qRAM protocols, we have developed a protocol for amplitude encoding of classical data into a quantum processing register for quantum machine learning. Provided that the data is not sparse, the protocol is efficient. This proposal addresses one of the main bottlenecks for quantum machine learning algorithms and can be integrated into such algorithms in future quantum computing applications.
We would like to thank Jae Park, Kouichi Semba, Hefeng Wang and Naoki Yamamoto for useful discussions. This work was supported by MEXT Quantum Leap Flagship Program Grant Number JPMXS0120319794.