Experimental Quantum Learning of a Spectral Decomposition

Currently available quantum hardware allows for small scale implementations of quantum machine learning algorithms. Such experiments aid the search for applications of quantum computers by benchmarking the near-term feasibility of candidate algorithms. Here we demonstrate the quantum learning of a two-qubit unitary by a sequence of three parameterized quantum circuits containing a total of 21 variational parameters. Moreover, we variationally diagonalize the unitary to learn its spectral decomposition, i.e., its eigenvalues and eigenvectors. We illustrate how this can be used as a subroutine to compress the depth of dynamical quantum simulations. One can view our implementation as a demonstration of entanglement-enhanced machine learning, as only a single (entangled) training data pair is required to learn a 4x4 unitary matrix.

Currently available quantum hardware allows for small scale implementations of quantum machine learning algorithms. Such experiments aid the search for applications of quantum computers by benchmarking the near-term feasibility of candidate algorithms. Here we demonstrate the quantum learning of a two-qubit unitary by a sequence of three parameterized quantum circuits containing a total of 21 variational parameters. Moreover, we variationally diagonalize the unitary to learn its spectral decomposition, i.e., its eigenvalues and eigenvectors. We illustrate how this can be used as a subroutine to compress the depth of dynamical quantum simulations. One can view our implementation as a demonstration of entanglement-enhanced machine learning, as only a single (entangled) training data pair is required to learn a 4 × 4 unitary matrix.
Quantum simulation and machine learning are among the most promising applications of large-scale quantum computers. Of these, the discovery of algorithms with provable exponential speedup has been more challenging in the machine learning domain, in part because it is harder to port established machine learning techniques to the quantum setting [1]. Notable exceptions include linear system solvers [2][3][4], kernel methods [5,6], and Boltzmann machines [7,8]. But quantum simulation demonstrations [9][10][11][12][13] appear to be ahead of machine learning [14][15][16][17][18][19][20] in terms of the maximum problem sizes achieved, suggesting that quantum simulation might yield the earliest applications with quantum advantage.
Variational quantum algorithms [21][22][23] will likely facilitate near-term implementations for these applications. Such algorithms employ a problem-specific cost function that is evaluated on a quantum computer, while a classical optimizer trains a parameterized quantum circuit to minimize this cost.
Some variational quantum algorithms have interest beyond their ability to achieve quantum advantage on their own, and can serve as subroutines for larger quantum algorithms. These include quantum autoencoders for data compression [24,25] and linear algebra methods [26][27][28]. A common subroutine is the training of a variational quantum state to approximate the ground state of a given n-qubit Hamiltonian, which can be the Hamiltonian of a simulated model [29] or some other optimization objective [30][31][32]. Variational quantum algorithms to learn and diagonalize density matrices have also been developed [33][34][35], which is a fundamental subroutine that will have many uses including principal component analysis and estimation of quantum information quantities [36,37]. Another subroutine is the learning of one quantum state by a second state, where the output of a variational circuit is optimized to maximize the over-lap with an input state that might itself be the output of another algorithm [38][39][40][41]. Although a minimum of 2(2 n − 1) real parameters are required to do this exactly in general, it is widely believed that for many cases of interest a polynomial number of parameters will suffice.
Beyond learning states, it is also useful to variationally learn a unitary channel ρ → U ρU † . This is more challenging because now all of the 4 n − 1 matrix elements must match for an exact replica V of an arbitrary U , up to a global phase. A hybrid protocol for learning a unitary is provided by the quantum-assisted quantum compiling algorithm [38]. This is a low-depth subroutine appropriate for both near-term and fault-tolerant quantum computing.
Given a target unitary U and parameterized unitary V (θ), both acting on n-qubits, quantum-assisted quantum compiling uses a maximally entangled Bell state on 2n qubits to compute the Hilbert-Schmidt inner product, |Tr(U V (θ) † )|. Since this inner product is directly related to the average fidelity between states acted on by U and V (θ) [42,43], this allows the action of U on all possible input states to be studied using a single entangled input state. Consequently, with this entanglement-enhanced learning strategy, only a single training state is needed to fully learn U , in contrast to the ∼2 n input-output pairs that are required in the absence of entanglement [44,45].
Although more challenging than state learning, learning a unitary can be used for a wide variety of quantum information applications, including circuit depth compression, noise-tailoring, benchmarking, and the 'black box uploading' of an unknown experimental system unitary [38]. Quantum-assisted quantum compiling has been demonstrated on 1+1 qubits, where a single-qubit variational circuit V (θ) learned the value of a single-qubit unitary U [38].
Quantum-assisted quantum compiling can be gener-alized to learn not only a unitary, but also its Schur decomposition W (θ)D(γ)W (θ) † , where W (θ) is a parameterized unitary and D(γ) is a parameterized diagonal unitary. That is, one can use quantum-assisted quantum compiling to variationally diagonalize a unitary. This is useful for a variety of quantum information science applications, since access to the spectral decomposition of a unitary U enables arbitrary powers of U to be implemented using a fixed depth circuit. Specifically, suppose we learn the optimum parameters θ opt and γ opt such that U = W (θ opt )D(γ opt )W (θ opt ) † . We can then implement U k using the fixed depth circuit W (θ opt )D(kγ opt )W (θ opt ) † . We stress that the parameter k here can take any real value and hence this approach can be used to implement non-integer and negative powers of U .
One important application of variational unitary diagonalization is quantum simulation. Let U be a Trotterized (or other) approximation to a short-time evolution e −iH∆t by some Hamiltonian H. We assume that H is local [46][47][48][49], sparse [50][51][52], or given by a linear combination of unitaries [53][54][55], permitting efficient simulation with bounded error. Then W D t/∆t W † is an approximate fast-forwarded evolution operator with a circuit depth that is independent of t. By contrast, most of the best known Hamiltonian simulation algorithms [49,52,55] have depths scaling at least linearly in t, inhibiting long time simulations on near-term hardware.
This low-depth algorithm, called variational fastforwarding [56], lies at an exciting intersection of machine learning and quantum simulation and is an promising approach in the burgeoning field of variational quantum simulation [57][58][59][60][61][62][63][64][65][66][67][68]. Variational fast-forwarding has been demonstrated on 1+1 qubits [56]. Refinements of variational fast-forwarding for simulating a given fixed initial state [67] and for learning the spectral decomposition of a given Hamiltonian [68] have also been proposed. It is important to note that the unitary being diagonalized need not already be known. Therefore, variational unitary diagonalization could also be used to perform a 'black box diagonalization' of the dynamics of an unknown experimental system. Thus, this approach provides a new algorithmic tool for probing dynamics in an experimental setting.
In this work we use ibmq bogota to demonstrate the successful learning of a spectral decomposition on 2+2 qubits. Specifically, we diagonalize the short time evolution unitary for an Ising spin chain. After only 16 steps of training by gradient descent, the spectral decomposition is used to fast-forward the evolution of the Ising model, resulting in a dramatic reduction of simulation error compared with Trotterized Hamiltonian simulation and a significant (∼ 10×) increase in the effective quantum volume of the simulation.

METHODS
Learning task. We demonstrate the variational learning of the spectral decomposition of a unitary by learning a diagonalization of the short-time evolution operator of the 2-spin Ising model Here J quantifies the exhange energy, B is the transverse field strength and Z i and X i are Pauli operators on the i th qubit. We approximate the short-time evolution exp(−iH∆t) of the spin chain using a second order Trotter approximation, that is we take where θ B = B∆t/2 and θ J = 2J∆t/2. The simulated Ising model parameters are listed in Table I. The specific circuit we used for U is shown in Fig. 1. Ansatz. To learn the spectral decomposition of U we variationally compile it to a unitary with a structure of the form where W is an arbitrary unitary, D is a diagonal matrix and θ and γ are vectors of parameters. After successful training, D will capture the eigenvalues of U , and W will capture the rotation matrix from the computational basis to the eigenbasis of U .
The parameterized circuits we used as ansätze for the diagonal unitary D and basis transformation W are shown in Fig. 3. A general diagonal unitary D ∈ SU(2 n ) on n qubits contains 2 n − 1 variational parameters. In our experiment we implement a two-qubit version of this exact D containing 3 variables. In general an arbitrary unitary W can be constructed from any general n-qubit parameterized quantum circuit with poly(n) variables. The expressive power of different options has been investigated in [69,70]. The two-qubit circuit we use to implement the arbitrary unitary W consists of 3 layers of X, Y rotations and phase gates on each qubit, separated by 2 CNOT gates. Cost function. To compile the target unitary into a diagonal form we use the local Hilbert-Schmidt test cost function defined in [38]. For learning a 4 × 4 unitary matrix, this cost can be written as where Pr(00) 1,2 and Pr(00) 3,4 are the probabilities of observing the outcome 00 on qubits (1,2) and (3,4) on running the circuits shown in Fig. 2(a) and Fig. 2(b) respectively.
The probabilities Pr(00) 1,2 and Pr(00) 3,4 are measures of the entanglement fidelity of the unitary channel U V † with V = W DW † . As a result, this cost is faithful, vanishing if and only if the diagonalization W DW † matches the target unitary U (up to a global phase). Furthermore, the cost is operationally meaningful for non-zero values in virtue of upper bounding the average gate fidelity between U and V . Hence a small value of C guarantees that the diagonalization W DW † is an accurate approximation of the target unitary U . We note that this cost, Eq. (4) involves only local measures and hence mitigates trainability issues associated with barren plateaus [71][72][73][74][75][76][77][78][79][80][81][82]. For more details on the cost see Ref. [38].
The circuits are trained to minimise the cost using gradient descent. At each step of the training, we measure the cost and gradients ∂C ∂θ k and ∂C ∂γ l at a particular point (θ, γ) in the parameter space, and use this to update (θ, γ) according to where η is the learning rate. The gradients ∂C ∂θ k and ∂C ∂γ l are measured using the parameter shift rule derived in Ref. [56]. In each optimization step the 21 components of the gradient are measured, requiring the measurement of 156 distinct circuits. For each cost evaluation we took 8000 measurement shots. The learning rate was decreased as the optimization progressed according to a schedule where j ∈ {0, 1, 2, · · · , 16} is the optimization step number. The hyperparameters η 0 = 1.1, κ = 0.5, and δ = 12 were optimized by classical simulation. Additional details about the training are provided in the Supplementary Information.

RESULTS
To assess the quality of the optimization, the parameters found at each step of the training were used to evaluate C both on the quantum computer and classically. The results are shown in Fig. 4. The classical cost, which we call the noise-free or ideal cost, reflects the true accuracy of the optimized circuits. We successfully reduced the raw cost to 0.1099 corresponding to an ideal cost of 0.013. The raw cost from the quantum computer is higher than the ideal cost because of gate errors.
The inset of Fig. 4 confirms that the errors in D and W DW † are both iteratively reduced as the cost is trained. Here the Frobenius distance between U and V is plotted, minimized over the arbitrary global phase e iϕ . The ideal and learnt diagonals are also compared, accounting for the global phase e iϕ and for a possible reordering, specified by a permutation matrix χ. Specifically, we plot the Frobenius distance min ϕ,χ ||D exact − e iϕ χDχ † || where D exact is a diagonal matrix with {λ exact i }, the ordered exact spectrum of U , along the diagonal. This is equivalent to the sum of the eigenvalue errors i |λ exact i −λ i e iϕopt | 2 , where {λ i } is the ordered learnt spectrum and ϕ opt accounts for the global phase shift.
It is also interesting to monitor the training by using the measured gradient g meas at each step to calculate the angle between g meas and the exact gradient simulated classically. This is plotted in the Supplementary Information. The data confirms that the optimization is correctly moving downhill in the cost function landscape.
Having learned the spectral decomposition, we use the result to fast-forward the Hamiltonian simulation of the Ising model (1) for 2 spins on ibmq bogota. In this experiment we prepare an initial state |+ ⊗2 and propagate it forward in time by both Trotterization and variational fast-forwarding (with backend circuit optimization disabled in both cases). The Trotterized evolution at times t = K∆t, is obtained by applying K = 0, 8, 16, · · · , 96 consecutive Trotter steps from Fig. 1. After each step we experimentally measure the fidelity of the Trotterized evolution with the perfect evolution e −iHt |+ ⊗2 , which contains no Trotter error. The variational fastforwarding evolution at time t is obtained by applying the optimized variational circuits for W † , D * , and W to In the main plot we show the cost as a function of the iteration step during training. The blue curve is the measured cost function C and the green curve is its noise-free 'ideal' value. In the inset, the yellow curve indicates the total error in the learnt unitary defined as Frobenius distances between U and V minimized over a global phase e iϕ , i.e., the quantity minϕ,χ||U −e iϕ V ||. The red curve indicates the eigenvalue error, defined as the Frobenius distance between the learnt and exact diagonal also minimized over a permutation χ, i.e., minϕ,χ||Dexact − e iϕ χDχ † ||.
The state fidelity with perfect evolution e −iHt |+ ⊗2 is also measured in this case. The results for this experimental fast-forwarding and experimental Trotter simulation are indicated by the green and blue solid lines respectively in Fig. 5. We compare the experimental fast forwarding to the ideal classical fast-forwarding by also measuring the noise-free fidelities obtained by classical simulation. In this ideal simulation, the initial state |+ ⊗2 is prepared perfectly and the Trotterized evolution includes Trotter errors but no gate errors. The measurement is also as- sumed to be ideal. The fidelities of these ideal simulations are indicated by the dashed lines in Fig. 5.
The ideal Trotterized evolution is nearly perfect in Fig. 5 due to the small value of ∆t and use of a secondorder Trotter step. The ideal variational fast-forwarding evolution is less accurate, due to the imperfect learning of U 's spectral decomposition. However, the real data taken on ibmq bogota exhibits the opposite behavior. Whereas the variational fast-forwarding evolution is only slightly degraded by gate errors-because the circuit depth is independent of t-the Trotterized evolution quickly loses fidelity. Namely, while the fidelity of the variationally fast forwarded simulation remains above 0.7 for all 20 times steps, the fidelity of the Trotterized simulation is less than 0.4 after only 6 times steps. Thus Figure 5 demonstrates a striking example of fast-forwarding as well as provides further evidence that the spectral decomposition of U has been well learnt.
In conclusion, we have experimentally demonstrated the entanglement-enhanced quantum learning of a 4 × 4 unitary using a diagonal ansatz containing a total of 6 CNOTs and 21 variational parameters. A single inputoutput pair was used for training compared to the 2 2 = 4 input-output pairs necessary for training without entangling ancillas. We both learn the unitary and its spectral decomposition, enabling the fast-forwarding of an Ising model. This four-qubit experiment took 8000 shots for each of the 156 independent circuit evaluations, at each of the 16 steps of the optimization algorithm, a total of 20 × 10 7 circuit evaluations. Thus, this experiment is among the most complex quantum machine learning demonstrations to date [14][15][16][17][18][19][20] and constitutes an important primitive in experimental quantum information science.

Supplementary Information for "Experimental Quantum Learning of a Spectral Decomposition"
This document provides additional details about the experimental results. In Sec. 1 we describe the online superconducting qubits used in the experiment and give calibration results (gate errors, coherence times, and single-qubit measurement errors) provided by the backend. In Sec. 2 we provide additional details about the circuit training.

QUBITS
Data was taken on the IBM Q processor ibmq santiago using the BQP software package developed by Geller and colleagues. BQP is a Python package developed to design, run, and analyze complex quantum computing and quantum information experiments using commercial backends. We demonstrate the learning of a spectral decomposition using the qubits shown in Fig. S1. Calibration data supplied by the backend is summarized in Table I. Here T 1,2 are the standard Markovian decoherence times, and is the single-qubit state-preparation and measurement (SPAM) error, averaged over initial states. The U 2 error column gives the single-qubit gate error measured by randomized benchmarking. The CNOT errors are also measured by randomized benchmarking.

TRAINING
The variational circuits for W and D were trained by gradient descent where θ k and γ l are the variational parameters for the W and D circuits, respectively, and η is the learning rate. We used the variable learning rate plotted in Fig. S2a), which was optimized by classical simulation. At each step of the training, we measure the cost C LHST and gradients ∂C LHST /∂θ k and ∂C LHST /∂γ l at a particular point (θ 1 , · · · , θ 18 , γ 1 , · · · , γ 3 ) in parameter space, and use this to update (θ 1 , · · · , θ 18 , γ 1 , · · · , γ 3 ) according to (S2). We also calculate the angle between the measured gradient and the exact gradient simulated classically, which is plotted in Fig. S2b). This data shows that the measured gradient is correctly pointing uphill until the end of the training when a local minimum is reached and the gradient becomes small and noisy.
FIG. S1. Layout of IBM Q device ibmq bogota. In this work we use qubits Q0, Q1, Q2 and Q3 for the training and qubits Q1 and Q2 for the Trotter and VFF comparison. where j ∈ {0, 1, 2, · · · , 16} is the optimization step number and the hyperparameters η0 = 1.1, κ = 0.5, and δ = 12 were optimized by classical simulation. (b) Error in the measured gradient direction during training. Here we plot the angle between gradient measured on ibmq bogota during training, and true gradient calculating classically for the same set of parameters, for each optimization step.