Generation of High-Resolution Handwritten Digits with an Ion-Trap Quantum Computer

Generating high-quality data (e.g. images or video) is one of the most exciting and challenging frontiers in unsupervised machine learning. Utilizing quantum computers in such tasks to potentially enhance conventional machine learning algorithms has emerged as a promising application, but poses big challenges due to the limited number of qubits and the level of gate noise in available devices. In this work, we provide the first practical and experimental implementation of a quantum-classical generative algorithm capable of generating high-resolution images of handwritten digits with state-of-the-art gate-based quantum computers. In our quantum-assisted machine learning framework, we implement a quantum-circuit based generative model to learn and sample the prior distribution of a Generative Adversarial Network. We introduce a multi-basis technique that leverages the unique possibility of measuring quantum states in different bases, hence enhancing the expressivity of the prior distribution. We train this hybrid algorithm on an ion-trap device based on $^{171}$Yb$^{+}$ ion qubits to generate high-quality images and quantitatively outperform comparable classical Generative Adversarial Networks trained on the popular MNIST data set for handwritten digits.


I. INTRODUCTION
In the last decades, machine learning (ML) algorithms have significantly increased in importance and value due to the rapid progress in ML techniques and computational resources [1,2].However, even state-of-the-art algorithms face significant challenges in learning and generalizing from an ever increasing volume of unlabeled data [3][4][5].With the advent of quantum computing, quantum algorithms for ML arise as natural candidates in the search of applications of noisy intermediate-scale quantum (NISQ) devices, with the potential to surpass classical ML capabilities [6].Among the top candidates to achieve a quantum advantage in ML are generative models [7], i.e. probabilistic models aiming to capture the most essential features of complex data and to generate similar data by sampling from the trained model distribution.Although there has been promising progress towards demonstrating a quantum supremacy for specific quantum computing tasks [8,9], and quantum generative models have been proven to learn distributions which are outside of classical reach [10][11][12][13][14], it is not clear whether these theoretical guarantees hold in practice, or whether such enhancements provided by a quantum generative model are limited to cases where one can prove a theoretical gap between classical and quantum algorithms.
In particular, quantum resources offer a divergent set of tools for tackling various challenges and could instead lead to a practical quantum advantage by avoiding pitfalls of conventional classical algorithms, for example, by improving training and consequently enhancing performance on generative tasks.Despite all promises, applying and scaling quantum models on small quantum devices to tackle real-world data sets remains a big challenge for quantum ML algorithms.Ref. [7] proposes to enable quantum models for practical application * alejandro@zapatacomputing.com by exploiting the known dimensionality-reduction capabilities of deep neural networks [15] and compressing classical data before it is handed to a small quantum device (see Ref. [16] for a framework utilizing tensor networks for data compression).Having a quantum model learn the so-called latent representation of data and take part in a joint quantum-classical training loop, opens up hybrid models to leverage quantum resources and potentially enhance performance when compared to purely classical algorithms.This synergistic interaction between a quantum model and classical deep neural networks is at the heart of the proposed quantum-assisted Helmholtz machine [7,17] and more recent hybrid proposals [18,19] for enhancing Associative Adversarial Networks (AAN) [20].In the specific case of Ref. [19], the authors propose to use a Quantum Boltzmann Machine (QBM) [21], while Refs.[17,18] experimentally demonstrated this concept with a D-Wave 2000Q annealing device.A similar adoption of this hybrid strategy with quantum annealers has been explored with variational autoencoders [22].Despite these efforts, a definite demonstration utilizing truly quantum resources on NISQ devices and with full-size ML data sets, e.g. the MNIST data set of handwritten digits [23], has remained elusive to date.Recent experimental results on gate-based quantum computers [24] illustrate that current proposals are far from generating highquality MNIST digits.
In this work, we introduce the Quantum Circuit Associative Adversarial Network (QC-AAN): a framework combining capabilities of NISQ devices with classical deep learning techniques for generative modelling to learn relevant full-scale data sets (see Fig. 1).The framework applies a Quantum Circuit Born Machine (QCBM) [25] to model and re-parametrize the prior distribution of a Generative Adversarial Network (GAN) [26].Furthermore, we introduce a multi-basis technique for the QCBM and argue that the use of a quantum generative model could enhance deep generative algorithms by providing them with non-classical distributions and quantum samples from a variety of measurement bases.
Finally, to demonstrate the readiness of this framework, we train the QC-AAN with an experimental implementation of 8 qubits to generate the first high-resolution handwritten digits with end-to-end training of the quantum component on an ion-trap quantum device.Our results imply that near-term quantum devices could effectively be employed for generative modelling and flexibly assist classical GANs in their learning task.
In Section II we discuss GANs, which are the classical component of the QC-AAN, followed by the AAN framework in Section III, which is aimed at solving common issues related to conventional GANs by providing them with an informed prior distribution.In Section IV we introduce QCBMs and our multi-basis technique which can be used to possibly enhance the prior of a GAN with quantum measurements.Lastly, in Section V we present simulation results and demonstrate the QC-AAN algorithm on hardware.

II. GENERATIVE ADVERSARIAL NETWORKS (GANs)
GANs [26] are one of the most popular recent generative machine learning algorithms able to generate remarkably realistic images and other data [27,28].The algorithm consists of a generator G and a discriminator D, both of which are commonly implemented as deep artificial neural networks.The goal of a GAN is to train G until its generated data is of sat-isfactory quality.This is achieved by using D as a proxy for estimating the training loss and backpropagating the gradients to G. To that effect, D performs a binary classification task to distinguish between data x from the training dataset p data and data x which was generated data by G.The adversarial min-max loss function of a GAN can be written as where E x∼p data and E x∼G denote expectations over data sampled from the training dataset p data and the generator G, respectively.The outputs of D are in the domain D(x), D(x) ∈ (0, 1), and it is trained to maximize Conversely, G is trained to minimize C GAN by generating data which D cannot classify with certainty (D(x), D(x) − → 0.5).While GANs have been shown to be able to generate remarkable data, common challenges in training a GAN lie in mode-collapse and nonconvergence [26,28], which are natural consequences of the delicately balanced adversarial game.One critical component of a GAN, which we argue is under-studied and we aim to enhance with a quantum model, is the source of randomness in G, i.e., the prior distribution q(z) in Fig. 1.G takes prior samples z as input to generate x = G(z).In other words, the neural network of G learns a transformation between q(z) and a high-quality data space.Because the model needs to be sampled efficiently on a classical computer, the N -dimensional prior is conventionally implemented as a continuous uniform distribution (i.e., z ∈ (0, 1) ⊗N ) or a normal distribution (i.e., z ∈ N (µ, σ) ⊗N with mean µ = 0 and width σ).Discrete Bernoulli priors (i.e., z ∈ {0, 1} ⊗N ) have also been shown empirically to be competitive [28] and are used throughout this work for the classical GANs.For such uninformed prior choices, a prior with large N requires a notably expressive neural network architecture to be able to map the full prior space to high-quality outputs, whereas a small N could potentially lead to the algorithm not learning a good approximation of the full target data [29].Consequently, ML practitioners often rely on sufficiently large N and scale the number of parameters in G for their purpose.Clearly, an uninformed prior distribution is unlikely to be ideal for any given dataset and GAN architecture, and the prior distribution q(z) should, if possible, be adapted such that G can effectively map prior samples to a high-quality output space.The AAN framework [20] aims to address all of these challenges by implementing a non-trivial prior distribution for G.

III. ASSOCIATIVE ADVERSARIAL NETWORKS (AANs)
A schematic overview of the AAN framework can be viewed in Fig. 1.In an AAN, the prior is modelled by a smaller generative model with distribution q θ (z) which adapts to the training data, the GAN architecture, and the current stage of training.The parameters θ of the prior are tuned to model the node activations in a deep layer l of the discrimi-nator D with matching size N .This layer l is called the latent space layer and it captures highly condensed features of the training data and generated data (see Fig. 1).The prior is trained to maximize the likelihood of its parametrized distribution q θ (z) given latent data samples z from the latent data distribution p l (z): where E z∼p l (z) indicates the expectation over the observed latent data samples.By training q θ (z) to maximize C q , G receives explicit access to information which D deems to be important for its classification task.
Although the original AAN work proposed using Restricted Boltzmann Machines (RBMs) [30] to model the prior, RBMs have been shown to be outperformed by comparable QCBMs in learning and sampling probability distributions constructed from real-world data [31].Therefore, in this work, we implement a QCBM in the prior of a GAN to tackle common challenges of GANs and potentially enhancing them with measurements from quantum distributions.As GANs typically employ 16 to 128 dimensional priors (see Appendix E in Ref. [28]), we expect our approach to be able to produce very competitive results on practical datasets by using near to midterm quantum devices.

IV. QUANTUM CIRCUIT BORN MACHINES & THE MULTI-BASIS TECHNIQUE
A QCBM is a circuit-based generative model which encodes a data distribution in a quantum state.This approach allows for sampling of the QCBM by repeatedly preparing and measuring its corresponding wavefunction U(θ) is a parametrized quantum circuit acting on an initial qubit state |0 , with U chosen according to the capabilities and limitations of NISQ devices.The probabilities for observing any of the 2 n bitstrings s in the n-bit (qubit) target probability distribution are modeled using the Born probabilities such that Importantly, QCBMs can be implemented on most NISQ devices (see e.g.Refs.[32][33][34][35][36]) and additionally open our algorithm up to exploit unique features of quantum circuit-based approaches, like measuring in a range of bases.
By training a QCBM on computational basis samples, families of sample distributions, i.e. projections of the wavefunction, become accessible in a range of other basis sets.This is information which is present in the QCBM but which is conventionally not used when measuring only in computational basis.Thus, we propose a multi-basis technique for the QCBM which provides the QC-AAN with a prior space consisting of quantum samples in flexible bases.The multi-basis technique can potentially enhance the performance of the generator by providing it with quantum samples in different measurements bases which have no classical analog.Follow-up work has indeed indicated that quantum measurements in different bases can lead to a separation of quantum and classical generative models [37].In principle, one could construct and measure many different basis sets, although this implies measuring increasingly redundant bases.The optimal number of basis sets likely depends on the given learning task and available quantum resources.In this work, we restrict ourselves to measurements in one additional basis set, as well as bases which can be reached by one layer of single-qubit rotations.This prevents adding significant depth to the QCBM circuit which is later implemented on quantum hardware.One could however measure in bases which are generated by more complex rotations.
Fig. 1 displays how the multi-basis technique is applied in practice.For each measurement s in computational basis according to Eq. 4, a second measurement is prepared by applying parametrized single qubit rotations R X (ϕ i ) to the wavefunction for each qubit i and with parameters ϕ.For ϕ = π/2, each qubit is rotated into the Y-basis, which we refer to as the 'orthogonal basis' and is denoted with o throughout this work.The more general case defines what we call the 'trained basis' approach, which we denote with t.Here, the parameters ϕ are trained for each qubit along with other circuit parameters to optimize the information extracted from the quantum state.When preparing the QCBM wavefunction in Eq. 3 and applying the corresponding single-qubit post-rotations, we obtain a sample s o/t in the orthogonal or trained basis, respectively.Both measurements s and s o/t are then forwarded through a fully-connected neural network layer and into G to learn an effective utilization of the provided quantum resources.These multi-basis variants of the QC-AAN are called QC +o -AAN and QC +t -AAN, respectively.Implementation and training details of the multi-basis QCBM, we refer to Appendix B and Appendix G.

V. APPLYING THE QC-AAN FRAMEWORK
As a first step towards showcasing the QC-AAN and our multi-basis technique, we numerically simulate training on the canonical MNIST data set of handwritten digits [23], a standard data set for benchmarking a variety of ML and deep learning algorithms.All implementations in this work are performed using the Orquestra™ platform.To isolate the effect of modelling the prior with a QCBM, we compare our quantum-classical models to purely classical Deep Convolutional GANs (DCGANs) with precisely the same neural network architecture (see Appendix F for details) and with uniform prior distribution.The QCBM prior in the QC-AAN adds between 31 and 52 trainable parameters, which equates to an increase of approximately 0.02% in the total number of parameters relative to the reference DCGAN used in this work.
The QCBM is initiated with a warm start such that the prior distribution is uniform and thus QC-AANs and DCGANs are equivalent at the beginning of training.This initialization provides us with a good baseline for comparison and should additionally avoid complications related to barren plateaus [38].
3 tify whether input data is from the training data p data or if it was generated by G.The prior of G is conventionally a high-dimensional continuous uniform or normal distribution with zero mean but discrete Bernoulli priors have also been shown empirically to be competitive [26].A specific prior distribution should generally be of a shape that allows G to effectively map it to a high-quality output space while still providing enough edge cases for the model to explore the entire target data space.If the prior space is too small for a given task, the model cannot learn a good approximation of the target data, whereas a large prior requires a notably expressive neural network architecture to be able to map the full space to high-quality outputs [27].ML practitioners often fall back on utilizing sufficiently large priors and increase the number of parameters in the GAN for their purpose.
Common challenges in training a GAN lie in mode-collapse and non-convergence, which are natural consequences of the delicately balanced adversarial game.Approaches to implement non-trivial priors during training of a GAN, such as the AAN framework, have been suggested to improve training and general performance of the algorithm [15,28].In an AAN, the prior distribution of G is re-parametrized by a Restricted Boltzmann Machine (RBM) [29].This generative model is trained on activations ẑ in the classical layer l representing the latent space p l of D shown in Fig. 1.The latent space captures features of the training data and generated data which the discriminator D deems to be important for its classification task.To that end, the GAN cost function in Eq. 2 is extended with the likelihood distance between the current prior distribution q and the latent space distribution p l .This introduces a structure into q which is specific to the training data set and the current stage of training.RBMs have been shown to be outperformed by comparable QCBM models in learning and sampling probability distributions constructed from real-world data [30].In our hybrid framework, the prior distribution is modelled by a QCBM that slowly follows changes in the latent space during training of G and D in a smooth transition training protocol (see Appendix G) mitigating instabilities in the only-classical AAN (see Appendix H for an example).Importantly, in this work, we take advantage of an exclusive property of quantum generative models, i.e. their representation of encoded probability distributions in different bases.By training a QCBM on computational basis samples, families of sample distributions, i.e. projections of the wavefunction, become accessible in a range of other basis sets without adding a large number of parameters in the quantum circuit.Thus, we propose a multi-basis technique for the QCBM which provides the QC-AAN with a prior space consisting of quantum samples in flexible bases, potentially enhancing the overall performance of the neural network.all measurements.The second measurement basis in the multi-basis QCBM can be fixed, for example to measure all qubits in the orthogonal Y -basis, or it can be trained for each qubit along with other circuit parameters to optimize the information extracted from the quantum state.We call those variants QC +o -AAN and QC +t -AAN, respectively.
As a first step towards showcasing the QC-AAN and our multi-basis technique, we numerically simulate training on the canonical MNIST data set of handwritten digits, a standard data set for benchmarking a variety of ML and deep learning algorithms, using the Orquestra™platform.To isolate the effect of modelling the prior with a QCBM, we compare our quantum-classical models to purely classical Deep Convolutional GANs (DCGANs) with precisely the same neural network architecture (see Appendix F for details) and with uniform prior distribution.The QCBM is initiated with a warm start such that the prior distribution is uniform and thus QC-AANs and DCGANs are equivalent at the beginning of training.This initialization should additionally avoid complications related to barren plateaus [31].For more information on the quantum circuit ansatz and training of the QCBM, we refer to Appendix B and Appendix G. To quantitatively assess performance, we calculate the Inception Score (IS) (see Appendix I) which evaluates the quality and diversity of generated images in GANs.The IS is high for a model which produces very diverse images of high-quality handwritten digits.
Fig. 2 shows results of handwritten digits generated As shown in Appendix H, the robust global sampling abilities of the QCBM have shown to be very useful in this training protocol where we have observed instabilities with the RBM.
To quantitatively assess performance, we calculate the Inception Score (IS) (see Appendix I) which evaluates the quality and diversity of generated images in GANs.The IS is high for a model which produces diverse images of high-quality handwritten digits.However, even the MNIST dataset does not achieve a perfect IS of 10, as some digits are not clearly identifiable.Fig. 2 shows results of handwritten digits generated by our models.For each model type, we pre-selected the bestperforming models in terms of the IS and, among those, chose a single representative based on appearance of the images for a human observer.The generated digits themselves are random sub-samples of the selected models.It is apparent that all models presented here can achieve good performance and output high-resolution handwritten digits.This is generally expected as the MNIST dataset is considered to be a simple dataset to learn for modern network architectures.In a quantitative evaluation of average model performance (see also Appendix E), we find that the 8 qubit QC-AAN without multi-4 warm start such that the prior distribution is uniform and thus QC-AANs and DCGANs are equivalent at the beginning of training.This initialization should additionally avoid complications related to barren plateaus [32].For more information on the quantum circuit ansatz and training of the QCBM, we refer to Appendix B and Appendix G. To quantitatively assess performance, we calculate the Inception Score (IS) (see Appendix I) which evaluates the quality and diversity of generated images in GANs.The IS is high for a model which produces very diverse images of high-quality handwritten digits.Fig. 2 shows results of handwritten digits generated by our models.For each model type, we pre-selected the bestperforming models in terms of the IS and chose a single representative based on quality and diversity of the images for a human observer.The generated digits themselves are random subsamples of the selected models.For details on the training parameters of the QCBM and the neural networks, we refer to Appendix F and Appendix G.It is apparent that all models presented here can achieve good performance and output high-resolution handwritten digits.In a quantitative evaluation of average model performance (see Appendix E), we see that the 8 qubit QC-AAN without multi-basis technique typically does not outperform comparable 8 bit DCGANs under any of the hyperparameters explored.For low-dimensional priors in general, a uniform prior distribution seems to yield optimal training for our GANs.In contrast to that, both multibasis QC-AAN models, the QC +o -AAN and the QC +t -AAN, generate visibly better images and achieve higher IS than the 8 bit and 8 qubit models without additional basis samples.In fact, Fig. 3 shows that, with an average IS of 9.28 and 9.36, respectively, both multi-basis models outperform the 16 bit DCGAN with an average IS of 9.20.This is a remarkable result, suggesting that an 8 qubit multi-basis QCBM does not require full access to a 16 qubit Hilbert space to outperform a 16 bit DCGAN.Another key observation is that the trained-basis approach generally enhances the algorithm even more compared to the fixed orthogonal-basis approach.
To provide final confirmation that the QC-AAN framework is fit for implementation on NISQ devices, we train both QC +o/t -AAN algorithms on a quantum device from IonQ which is based on 171 Yb + ion qubits.For more information on the device, we refer to Fig. 1, Appendix A, and Ref. [33].The experimental results for the training on hardware can be viewed in Fig. 3. To the best of our knowledge, this is the first practical implementation of a quantum-classical algorithm capable of generating high-resolution digits on a NISQ device.
With as few as 8 qubits, we show signs of positively influencing the training of GANs and indicate general utility in modelling their prior with a multi-basis QCBM on NISQ devices.Learning the choice of the measurement bases through the quantum-classical training loop, i.e. our QC +t -AAN algorithm, appears to be the most successful approach in simulations and also in the experimental realization on the IonQ device.This is a great example of how quantum components in a hybrid quantum ML algorithm are capable of effectively utilizing feedback coming from classical neural networks and a testament to the general ML approach of learning the best parameters rather than fixing them.Unlike many other use-case implementations of quantum algorithms on NISQ devices, our models do not underperform compared to noise-free simulations.It is reasonable that significant re-parametrization of the prior space, paired with a modest noise floor, provide GANs with an improved trade-off between exploration of the target space and convergence to high-quality data.
[moved up:]Our QC-AAN framework also extends flexibly to more complex data sets such as data with higher resolution and color, for which we expect refinement of the prior distribution to become more vital for performance of the algorithm.Besides extending to these more challenging data sets, we could adapt the learning strategy of the quantum basis technique typically does not outperform comparable 8 bit DCGANs under any of the hyperparameters explored.For low-dimensional priors in general, we were not able to improve the performance of uniform prior distributions.In contrast to that, both multi-basis QC-AAN models, the QC +o -AAN and the QC +t -AAN, generate visibly better images and achieve higher scores than the 8 bit and 8 qubit models without additional basis samples.In fact, Fig. 3 shows that, with an average IS of 9.28 and 9.36, respectively, both multi-basis models outperform the 16 bit DCGAN with an average IS of 9.20.Considering that the values are expected to saturate towards the IS of the training data (IS = 9.81), these are remarkable results.Additionally, we observe that an 8 qubit multi-basis QCBM does not require full access to a 16 qubit Hilbert space to outperform a 16 bit random prior, and that the trained-basis approach generally enhances the algorithm even more compared to the fixed orthogonal-basis approach.We emphasize that these differences in performance were achieved merely by implementing a multi-basis QCBM as informed prior to the generator.
To provide final confirmation that the QC-AAN framework is fit for implementation on NISQ devices, we train both QC +o/t -AAN algorithms on an 11-qubit quantum device from IonQ which is based on 171 Yb + ion qubits.For details on the device, we refer to Fig. 1, Appendix A, and Ref. [39].To achieve high fidelity and to attend to limitations of the quantum hardware, we consider the following adaptations as compared to the simulation setup.While this device allows for all-to-all connectivity, long-range gates are generally more noisy.We thus employ a next-neighbor entangling topology for in our quantum circuits.Additionally, we utilize parametrized Mølmer Sørenson entangling gates (see Eq. A1) in the circuit ansatz of the QCBM which are native to the iontrap device.This reduces gate overhead and thus noise introduced from gate re-compilation.Finally, to reduce the overall number of calls to the quantum device, as well as wall time between evaluations of the same quantum circuit, we aggregate a large number of measurements in-between training steps of the quantum prior.To optimize run time of the algorithm, one could simultaneously perform and buffer measurements on the quantum device while the classical networks are training with samples from a constant prior distribution.Only during training of the quantum circuit do all other components need to halt.For more information on the training protocol of the QCBM, we refer to Appendix G.
The experimental results for the training on hardware can be viewed in Fig. 3. Every image generated by the GAN during and after training has been produced exclusively utilizing hardware measurements from multiple bases.Note that we demonstrate the results of a single run on hardware per model type.No averaging or post-processing was performed.To the best of our knowledge, this is the first practical implementation of a quantum-classical algorithm capable of generating high-resolution digits on a NISQ device.

VI. DISCUSSION & OUTLOOK
With as few as 8 qubits, we show signs of positively influencing the training of GANs and indicate general utility in modelling their prior with a multi-basis QCBM on NISQ devices.Learning the choice of the measurement bases through the quantum-classical training loop, i.e. our QC +t -AAN algorithm, appears to be the most successful approach in simulations and also in the experimental realization on the IonQ device.This is a great example of how quantum components in a hybrid quantum ML algorithm are capable of effectively utilizing feedback coming from classical neural networks and a testament to the general ML approach of learning the best parameters rather than pre-determining them.Unlike many other use-case implementations of quantum algorithms on NISQ devices, our models do not underperform compared to noise-free simulations.It is reasonable that significant reparametrization of the prior space, paired with a modest noise floor, provides GANs with an improved trade-off between exploration of the target space and convergence to high-quality data.
Our QC-AAN framework also extends flexibly to more complex data sets such as data with higher resolution and color, for which we expect refinement of the prior distribution to become more vital for performance of the algorithm.Besides extending to these more challenging data sets, we could adapt the learning strategy of the quantum prior to follow a different objective function as compared to the AAN framework, potentially one that directly ties into improving the generator's performance in the adversarial loss function in Eq. 1.
Because of the distributed nature of the quantum-classical optimization loop, one may additionally adapt the training to follow a secure learning protocol [40][41][42] for the current scenario where quantum and classical components are owned by separate entities.
Our work has been driven by the challenge of finding a practical generative learning task which is more successfully accomplished with a quantum generative model [7].This could for example be achieved by encoding classically outof-reach quantum distributions [10][11][12][13][14].In fact, recent work has used the idea introduced here of using measurements from multiple bases to prove one kind of such classical-quantum model separation [37].However, our work also emphasizes another form of practical quantum advantage which is usually dismissed, but arguably equally important towards reaching a high-performing generative model: the training trajectory in model space that needs to be traversed from the initial distribution to the target distribution.For example, we have observed an improvement in robustness of the QCBM prior as compared to the RBM, which we showcase in Appendix H.Both quantum enhancement scenarios are schematically discussed and illustrated in Appendix J. Consequently, it is essential that we better understand the capabilities of our multibasis technique, which kind of quantum distributions can be built from families of basis measurements, and their practical impact on GAN training.Particularly, one needs to consider the trade-offs of the multi-basis technique against doubling the number of qubits in the QCBM.While the multi-basis model is less expressive, it has the advantage that it can be implemented on earlier quantum devices, has lower circuit depth, employs significantly less parameters, and may thus be more effectively trainable.Additionally, the multi-basis QCBM exhibits natural synergy with the subsequent neural network that transform the samples from different measurement bases.The seemingly rigid and structured bi-partition of the effective prior space is significantly mitigated by a fully-connected layer which has the potential to reorder and learn non-trivial correlations among the bits in the prior during training.
Moving forward, the use of a quantum generative model like a QCBM as a building block in the prior space of a larger classical architecture could unlock the path to reliably investigating the impact of certain prior shapes during training of large generative ML models.This is a field which rather under-studied in the context of classical deep learning algo-rithms but has shown promising results [18][19][20]28].Given that they are highly flexible and can provide true global samples from the encoded distribution, quantum generative models may offer the tools required to enable this research more effectively.Implementing prior distributions from quantuminspired models such as the tensor-network-based Born machines [43] is another exciting research direction that we will be exploring.In terms of generative applications outside of image generation, we will be studying the QC-AAN as a generator in the quantum-enhanced framework for combinatorial optimization problems presented in Ref. [44].There, the QC-ANN allows quantum-enhanced optimizers to explore combinatorial problems with a larger number of variables than those considered with the quantum-inspired tensor network models, and it allows the framework to extend to non-discrete variables.
Finally, a full-fledged quantitative comparison between quantum and classical versions of machine learning algorithms can be challenging.For instance, classical resources are currently much cheaper and more accessible than quantum resources.For the hardware experiments performed in this work, getting access to and training the quantum component takes significantly more time than sampling pseudo-random numbers for the conventional random GAN prior.Therefore, factors such as cost and training time, particularly for time-sensitive applications, need to be considered when deciding whether a model has reached practical quantum advantage.Our focus has been towards showcasing our new quantum-assisted algorithm which leverages quantum circuits and their unique capabilities of using measurements in multiple basis.Although for this task one might be able to achieve better classical ML performance by choosing more sophisticated neural networks or training techniques, our QC-AAN framework is adaptive to any GAN architecture and could potentially enhance even the best classical GANs with NISQ devices on full-scale data sets.With the recent metrics for evaluating the generalization capabilities of generative models [45], this is a question that can be studied in a grounded and quantitative manner.

ACKNOWLEDGMENTS
The authors would like to thank Coleman Collins and Algert Sula for their support with the experimental images and the design of the hardware illustrations.The authors also thank Yudong Cao, Max Radin, Marta Mauri, Matthew Beach, Dax Enshan Koh, and Jérôme F. Gonthier for their feedback on an early version of this manuscript.The authors would like to acknowledge Zapata Computing's Platform Team for all the support with Orquestra™: the software platform used during the execution of all the simulations and experiments shown here.M.S.R. would like to acknowledge Zapata Computing for hosting his Quantum Applications Internship.
The ansatz is inspired by capabilities of current ion-trap quantum devices and is structured in layers were expressivity of the model increases as layers are added.Although the QCBM equipped with this ansatz can become a powerful generative model, one needs to consider important trade-offs in the ansatz hyperparameter choice.For NISQ quantum devices, shallow quantum circuits are generally desired as deeper circuits can significantly decrease fidelity of the quantum states.Additionally, deep circuits oftentimes come with an excess number of parametrized quantum gates that enhance expressivity but can compromise trainability as well as creating a model that strongly overfits training data.For our work, we limit ourselves to 2 layers to minimize the number of gates used while introducing entanglement into the quantum state.In order to increase state fidelity for the experimental implementation, we additionally reduce the all-to-all connectivity of the XX gates shown in Fig. 4 to a linear chain of entangling operations for the experiment on the IonQ quantum device.The circuit parameters are initialized with a warm start such that the QCBM encodes a uniform distribution in computational basis as well as the o/t bases discussed in the main text and Appendix C.
In our application, the QCBM is trained by minimizing the clipped negative log-likelihood where p(x) is the probability over all training data samples x and Minimizing this function is equivalent to maximizing the expression in Eq. 2. A regularization constant prevents singularity of the logarithm for samples with zero probability.The probability distribution q θ (x) of samples x is estimated by sampling the prepared state and accumulating the measurements.FIG. 4. Quantum circuit ansatz for the QCBM.The ansatz is structured in layers to control expressivity of the model and fidelity of the prepared state.Throughout this work, we use a layer depth of 2 in order to maximize fidelity in the experimental implementation and enforce some interpolation between learned samples.The red numbers indicate the layer counting convention.For simulations, we used the all-to-all entangling layer as shown here, whereas for the experimental implementation, we instead adopted linear nearest-neighbor connectivity.

Appendix C: The Multi-Basis Technique
In this work, we introduce a multi-basis technique for quantum circuit-based models to expand the repertoire of quantum machine learning researchers.
Commonly, when referring to sampling a generative model, one means generating instances of data that follow the encoded probability distribution.For classical models, one is limited to one basis -the computational basis.In quantum models, this is not the case.When encoding a probability distribution into a qubit wavefunction, as is the case with a Quantum Circuit Born Machine (QCBM), the learned wavefunction contains a potential family of sample distributions which are accessible by measuring in different bases.These additional distributions, or more specifically, projections of the wavefunction, can be evaluated by applying arbitrary post-rotations to the quantum registers before measurement.In this work, we explore the questions of whether we can enhance a generative model by including measurements in additional bases and how we can maximize the benefit of measuring additional basis sets.For this purpose, we quantitatively compare performance inside the QC-AAN framework using samples from an orthogonal measurement basis, i.e., the Y-basis, and flexible bases where the post-rotation angles are trained together with the ansatz parameters.This is done by doubling the latent space in the discriminator and training the samples of the multi-basis QCBM on the respective latent activations.While we do not explore a variety of possible basis choices, we argue that the trainable multi-basis model should converge to a learned constellation of measurement bases which is close to optimal for this specific training instance.Future work will need to study a wider range of variations of the multi-basis technique to uncover how to most efficiently utilize near-term quantum devices.
As an explicit example of the sample concatenation, assume samples s = 1010 and s o/t = 1100 which samples in computational and o/t basis, respectively.Together, they then define s* = 10101100.For a series of measurements in the computational and o/t bases, the assignment of which pair of measurements is forwarded to the neural network is arbitrary as there is no direct correlation between the computational basis and o/t basis distributions other than that they obey the normalization constraint of the QCBM wavefunction.This technique generalizes to the measurement and concatenation of samples of any observable able to be measured on a quantum device.
A more subtle advantage of this multi-basis technique in the context of generative modelling with a QCBM is that the prior size N (see Sec. II) is doubled from n to 2n where n is the number of qubits.The effective sample space for the binary samples of the QCBM scales like 2 N which thus increases from 2 n to 2 2n .Although the sample s* lives in the {0, 1} 2n space, the multi-basis QCBM does not have access to the full 2n qubit space because of the normalization constraint of the wavefunction.However, increasing the sample dimension and additionally the amount of information encoded and utilized from small near-term quantum computers, here proved of immense value by enhancing the expressivity and the robustness of the hybrid QC-AAN model considered.prior.For this work, we only compare models with the same prior dimension to isolate the effect that re-parametrization of the prior distribution has on GAN training.In Fig. 5 we show that for 6-and 8-qubit QC-AANs, we could not achieve an advantage in learning a non-trivial prior.For a 6 (8) qubit QCBM, there are only 64 (256) distinct samples available for the neural network to map to high-quality images.Since the IS is very sensitive to class imbalance of the generated images, we see that modeling those priors does not lead to meaningful improvements.In fact, the results shown in Fig. 5 for the QC-AAN with 6-and 8 qubits were obtained by only minimally disturbing the uniform distribution.The QC +o/t -AANs with the multi-basis technique show interesting results where for 6 qubits, we almost reach the performance of a 12 bit DCGAN, whereas with 8 qubits, we outperform a 16 bit DCGAN (see Fig. 3 in main text).This is despite the fact that the QC +o/t -AAN models are restricted to an effictive sub-space compared to a Hilbert space with double the number of qubits.Note, that these results are specific to our neural network architecture.It is possible that more expressive and computationally intensive neural networks face less challenges in learning a great model with a uniform prior distribution.For a general learning task and network architecture, the QC-AAN algorithm allows to consider the hyperparameters of the trainable prior as hyperparameters towards a more successful overall generative algorithm.
The 8 qubit QC +o/t -AANs operate on a space with 2 16 = 65, 536 potential samples.Although the multi-basis QCBM is a binary model, with this amount of different images, an Inception Score of occasionally over 9.5 can be considered to be comparable to state-of-the-art GANs with continuous priors.In fact, Ref. [28] argues that binary units may perform at least as good as continuous uniform or normal distributions.

Appendix F: Neural Network Architectures and Training
Fig. 6 shows a schematic overview of the network architectures of the generator G and discriminator D neural networks used throughout this work.We emphasize that these architectures are employed for both our hybrid QC-AANs, and the classical DCGANs.The networks have approximately inverse structure with three convolutional layers, although it is not generally required for stable GAN training.The second to last layer in D (latent space) has the same size as the first layer in G (prior space) to be able to train the quantum model in the QC-AAN on the latent activations of D. The total number of parameters in each network amounts to approximately 2.77 × 10 5 .The multi-basis QCBM in the QC-AAN adds between 31 and 52 trainable parameters for the 8 qubit model, equating to an increase of 0.02% in the total number of parameters.
All convolutional layers have batch normalization [46] and leaky rectified linear unit (ReLU) activation functions [47].In training D, a small percentage (3%) of training samples have their label flipped and label smoothing is applied to the training images.The optimizer for both networks is the ADAM optimizer with parameters [β 1 , β 2 ] = [0.5, 0.9].Generally, our neural network architectures and hyperparameter choices are inspired by successful practices in modern DCGANs [27].

Appendix G: Training Details of QCBM as Model Prior
The QCBM in this work implements a hardware efficient ansatz inspired by capabilities of ion-trap quantum computers (see Fig. 4).The layer depth of the ansatz is chosen to be shallow with only one layer of single-qubit and entangling gates respectively.For the numerical simulations, we chose an allto-all connectivity between qubits, whereas in the hardware implementation, we used linear connectivity to improve the state fidelity.For the case of the MNIST training set, we did not observe on average significant negative effects in reducing the ansatz connectivity.We expect that for more challenging generative modeling tasks, the circuit ansatz will play a more crucial role.
Over one QC-AAN training epoch on the entire MNIST data set with N = 60, 000 images, we perform an update of the QCBM parameters every 100 batches for the simulation and every 600 batches for the experimental implementation.The latter implies one training step per training epoch.We use the Simultaneous Perturbation Stochastic Approximation (SPSA) algorithm [48] for training the parametrized quantum circuit to adapt the model distribution of the QCBM while minimizing calls of the quantum device.The gradients are evaluated with 1000 readout measurements (shots).For the experimental implementation on the quantum device, we sample the 8 qubit distributions with 10 4 shots per measured basis and are able to construct multi-basis samples appropriately by resampling those measurements until the next training step.For the numerical simulations, this resampling was not performed and circuits were evaluated with as many shots as required for the GAN, i.e. for each image generated by the generator.
One technique that has shown to stabilize training for the QC-AAN is to freeze the prior, i.e. to fix the QCBM parameters, after a certain number of training epochs.Altering the prior distribution significantly in the latter stages of training has shown to destabilize training and lead to visibly worse images.This is likely due to the converging generator network where parameters are slowly settling.Altering the prior at these stages introduces a new impulse for training, which is not matched by the decreasing step sizes of the ADAM optimizer of the networks (see Appendix F).Throughout this work, we freeze the prior after 10 epochs.
Appendix H: QCBM and RBM in the AAN-Framework Quantum Circuit Born Machines (QCBMs) are promising quantum generative models that offer global sampling capabilities of the encoded distribution and additionally provide access to quantum measurements that may be beneficial in learning a strong model.One of those properties, which we leverage in this work, is the possibility to measure in additional bases.Still, one needs to weigh the costs and benefits for such a quantum model when Restricted Boltzmann Machines (RBMs) are light-weight generative models with efficient but local sampling algorithm.Ref. [19] shows that the AAN framework with an RBM modeling the Generator's prior could be improved by instead implementing a Quantum Boltzmann Machine.Finding the best hyperparameters for their respective models is a notoriously difficult task and comparing models across all possible hyperparameter combinations is in general unfeasible.In this work, we argue that, given a poor choice of hyperparameters and the same number of model parameters, RBMs can easily become unstable in our smooth learning protocol while QCBMs in the QC-AAN retain their stability.Fig. 7 shows average training performances of 10 individual QC-AANs and AANs, both with 8-dimensional prior on the MNIST training set.The priors are trained in a smooth transition protocol where every 100 batches, their distributions are altered by 1 or 5 training steps of stochastic gradient descent with a step size of 0.01.A priori, this step size seems small enough that both of the hyperparemter options of 1 or 5 training steps do not seem unreasonable.However, Fig. 7 shows that 5 training steps is in fact a sub-optimal hyperparameter choice.Notably, the RBM suffers from its native local sampling technique and can become very unstable.We also provide the sampled prior distributions of the worst performing models across 10 training runs.It is apparent that the RBM has reached a distribution which it cannot effectively sample.If this happens at any stage of our smooth transition training protocol, further training of the AAN becomes impossible.For the QCBM, the change in performance between two hyperparameter choices is more continuous.This example indicates a potentially improved robustness of the QC-AAN framework as compared to the AAN framework.classes y of the original data set.Quantitatively, this is done by calculating the KL divergence between the posterior probability distribution p(y|x) and the prior probabilities p(y) for each class label.For details, we refer to Ref. [49].The IS is a human-readable metric with values between 1 and the number of total classes in the data set, i.e. 10 for the MNIST data set of handwritten digits.Although it has been proven to be very useful.[49], one of the main criticisms of the IS is that it does not depict the realism of generated images for a human observer because it is calculated with classifiers which are trained to search and find exactly the class labels that they have been trained for.Images can either be warped or noisy and still achieve very high classification certainty [50].In this work specifically, we achieve a surprisingly high IS with models that implement 6-and 8-dimensional priors, resulting in only 64 and 256 distinct images in total.Although the IS is high for those models, a human observer does not judge them as being particularly clear images.For such limited models, there arises an interesting effect where the discriminator can remember all images generated by the generator, constantly pushing it away, preventing further convergence and thus introducing noisy artifacts.Another classifier will clearly identify the digits as member of their particular class, regardless of the noise.Nevertheless, IS is a straight-forward quantitative performance measure for GANs that commonly correlates well with human perception.
The IS is commonly calculated with use of the pretrained Inception-v3 Network [51] as a proxy to calculate the probabilities in Eq.I1.Since the Inception-v3 Networks can only be applied to colored data, we instead utilize a convolutional classifier with approximately 99.3% accuracy on the MNIST data set to calculate Inception Scores.FIG. 8. Sketch for two potential scenarios for practical quantum advantage in generative modeling.The large overlapping circles indicate the set of probability distributions which could be expressed within two given parametrized classical or quantum generative models.Note that this does not represent all distributions which could possibly be expressed by any classical or quantum model.a) The target distribution lies outside of the space that can be expressed by the classical model, similar to Fig. 5 in Ref. [13].This might be due to the distribution not being tractable classically, or because this specific parametrized model cannot reach it.b) The target distribution lies in a region which both models can reach.However, both models do not traverse the space equally and the classical model might be hindered, for example, by the presence of more local minima or due to pitfalls in conventional heuristics.As indicated in Fig. 7, training success could be significantly different for the classical and quantum algorithms.

Fig. 1
FIG.2.Simulated results of digits generated by our 8-qubit QC-AAN models, with-and without basis extension, and comparable DCGANs trained on the MNIST data set of hanwritten digits.The models have been selected to have a high Inception Score for their respective model type as well as being pleasant looking.The extension of QCBM samples in the QC +o/t AAN with measurements in bases other than the computational basis clearly enhances the final results.

FIG. 2 .
FIG.2.Digits generated by our QC-AANs with 8 simulated qubits and comparable classical DCGANs.The differences in performance are only due to replacing the uninformed random prior of the DC-GANs with a multi-basis QCBM.All neural network architectures are equivalent.The models shown here were selected among several training repetitions to have a high Inception Score for their respective model type.The multi-basis technique in the QC +o/t -AAN enhances the algorithm compared to the 8 and 16-bit DCGAN, and the "plain vanilla" QC-AAN (i.e.,without measurements in a second basis.) FIG. 3. Left: Quantitative comparison between DCGANs with 16 bit prior distribution and our 8 qubit QC +o/t -AAN algorithm.The experimental realization on the IonQ device includes complete implementation of the multi-basis QCBM on hardware.Where present, error bars indicate the standard deviation of 10 independent training repetitions.The 8 qubit hybrid models generally outperform the classical DCGAN with uniform prior distribution.Right: Images of handwritten digits generated by the experimental implementation of the QC +o/t -AAN models on the IonQ device.The QC+o-AAN model achieves a maximal Inception score of over 9.4 while the QC+t-AAN scores over 9.5 with overall better diversity.

FIG. 3 .
FIG. 3. Left: Quantitative comparison between DCGANs with 16 bit random prior distribution and our 8 qubit QC +o/t -AAN algorithm.The experimental realization on the IonQ device includes complete implementation of the multi-basis QCBM on hardware.Where present, error bars indicate the standard deviation of 10 independent training repetitions.The 8 qubit hybrid models generally outperform the classical DC-GAN with uninformed prior distribution while the neural network architectures are equivalent.Right: Images of handwritten digits generated by the experimental implementation of the QC +o/t -AAN models on the IonQ device.The QC+o-AAN model achieves a maximal Inception score of over 9.4 while the QC+t-AAN scores over 9.5 with overall better diversity.

1 FIG. 6 .
FIG.6.Schematic neural network architectures of the generator G and discriminator D used throughout this work for the MNIST data set of handwritten Digits.Purple color indicate a 1d layer of nodes whereas the orange blocks represent 2d convolutional layers with 128, 64 and 32 channels.The dark color change inside each block indicates the application of a non-linear activation function after every linear transformation.G and D are approximately inverse, although this is not strictly required.Note that the second to last layer in D represents the latent space which contains n bits, the same size chosen for the prior of G.

FIG. 7 .
FIG. 7. Example comparison of two hyperparameter configurationsfor an AAN and QC-AAN, which implement a RBM and QCBM as the model prior, respectively.A training protocol of 5 training steps per training instance appears to be too much variation in the GAN prior for stable training, but unlike a QCBM, a RBM is prone to becoming unstable under sub-optimal configurations and sampling only one distribution mode.When that happens, training of the algorithm fails.
Simulation results of QC-AANs with 6-and 8 qubits relative to comparable DCGANs with uniform prior distribution.Depicted are average Inception Scores and standard deviations of 10 independent training repetitions per model.We observe no significant improvement for the 6-and 8 qubit QC-AANs, as well as for the QC +o/t -AANs with 6 qubits.The first improvements were observed for the 8 qubit QC +o/t -AAN, for which we refer to Figure3in the main text.