Quantum State Tomography with Conditional Generative Adversarial Networks

Quantum state tomography (QST) is a challenging task in intermediate-scale quantum devices. Here, we apply conditional generative adversarial networks (CGANs) to QST. In the CGAN framework, two duelling neural networks, a generator and a discriminator, learn multi-modal models from data. We augment a CGAN with custom neural-network layers that enable conversion of output from any standard neural network into a physical density matrix. To reconstruct the density matrix, the generator and discriminator networks train each other on data using standard gradient-based methods. We demonstrate that our QST-CGAN reconstructs optical quantum states with high fidelity orders of magnitude faster, and from less data, than a standard maximum-likelihood method. We also show that the QST-CGAN can reconstruct a quantum state in a single evaluation of the generator network if it has been pre-trained on similar quantum states.

Introduction. The ability to manipulate and control small quantum systems opens up promising directions for research and technological applications: quantum information processing and computation [1][2][3][4][5], simulations of quantum chemistry [6][7][8][9][10], secure communication [11,12], and much more [6,[13][14][15][16][17][18][19][20][21][22]. A prominent example is the recent demonstration of a 53-qubit quantum computer performing a computational task in a few hundred seconds that was anticipated to take much longer on a classical supercomputer [5]. Such speedup is possible partly due to the exponentially large state space that can be used for storage and manipulation of information in quantum systems [23][24][25]. However, this large size of the state space also brings challenges for the characterization and description of these systems.
One interesting recent development in machine learning is generative adversarial networks (GANs) [74,75]. Such networks have led to an explosion of new results that were previously thought futuristic: generation of photorealistic images [76][77][78], conversion of sketches to images [76], text generation in different styles [79,80], text-to-image generation [81], generating and defending against fake news [82,83], and even game design learned from observing video [84]. An improvement of standard GANs that led to many of these results is conditional generative adversarial learning [85], which enabled increased control of the output of generative models. Recently, such GANs have been applied to tomography of materials structure with synchrotron radiation [86,87] and computed tomography of soft tissue in medicine [88].
In this Letter, we introduce QST with conditional GANs (QST-CGAN). Leveraging a CGAN architecture, complemented by custom layers for representing a quantum state in the form of a density matrix, we show that adversarial learning can be a powerful tool for QST. The QST-CGAN is different from RBM-based methods since it learns a map between the data and the quantum state instead of a probability distribution. The custom layers we introduce bridge a gap between ML and quantum information processing; they enable many further applications beyond the QST-CGAN presented here. We benchmark the QST-CGAN on reconstruction of various simulated optical quantum states, and show an example with real experimental data. The QST-CGAN performance is superior to that of a standard maximum-likelihood reconstruction method in terms of reconstruction fidelity, convergence time, and amount of measurement data required. We also show that a QST-CGAN can reconstruct quantum states in a single pass through the network if it has been pre-trained on simulated data.
Our reconstruction method is versatile, general, and ready to be applied for QST of intermediate-scale quantum systems, which are widely explored in current experiments [4]. In Refs. [89,90], we provide more details on our implementation (including data and code) and also discuss classification of quantum states with neural networks.
Quantum state tomography with maximum likelihood estimation. Quantum state tomography estimates the quantum state (a state vector |ψ or a density matrix ρ) from measurements of Hermitian operators O [28,91]. The operators are usually positive-operator-valued measures (POVMs), a set of positive semi-definite matrices {O i } that sum to identity, k i O i = I, representing a measurement with k possible outcomes. The probability of each outcome is given by tr(O i ρ). A set of operators that allows for the complete characterization of a quantum state is called informationally complete (IC) [92].
In an experiment, single-shot measurements are repeated over an ensemble of identical states to collect statistics: the frequencies d i of POVM outcomes. These frequencies give an estimate of the expectation values tr(O i ρ), where ρ is the density matrix describing the state. The outcomes of many different POVMs can be combined to form a linear system of equations d = Aρ f , where ρ f is the flattened density matrix and A is the "sensing matrix" determined by the choice of POVMs [93]. Solving this system of equations by linear inversion methods to obtain ρ can fail, either due to the statistical nature of the (noisy) measurement or due to a high condition number for A [93].
An alternative to linear inversion methods is maximum likelihood estimation (MLE). In MLE, the likelihood function [40,94] L(ρ |d) = i [tr(ρ O i )] di is maximized to find the best estimate ρ for reproducing the experimental data. In this Letter, we take a different approach by applying CGANs to find ρ .
Conditional generative adversarial networks. In generative adversarial learning, a generator G and a discriminator D compete to learn a mapping from some prior noise distribution to a data distribution [74]. The generator and the discriminator are parameterized nonlinear functions [parameters (θ D , θ G )], usually multi-layered neural networks. The generator takes an input z ∼ p z (z) from the noise distribution p z (z) and generates an output G(z; θ G ). The discriminator takes an input q and outputs a probability D(q; θ D ) that it belongs to the data distribution p data .
The parameters of G and D are optimized alternatively such that the generator produces outputs that resemble the data and thus fool the discriminator, and the discriminator becomes better at detecting fake (generated)

K s k k o W 2 S 8 E T 2 Q q w o Z 4 K 2 N d O c 9 l J J c R x y 2 g 3 H 1 4 X f f a B S s U T c 0 l K / R g P B Y s Y w d p I g V 3 z c i / G e k Q w R 7 c B Q 9 4 0 s O t O w 5 k B L R O 3 J H U o 0 Q r s L 2 + Q k C y m Q h O O l e q 7 T q r 9 H E v N C K f T q p c p m m I y x k P a N 1 T g m C o / n 0 W f o h O j D F C U S P O E R j P 1 9 0 a O Y U m c W g m i 5 h q 0 S v E / 7 x + p q N L P 2 c i z T Q V Z H 4 o y j j S C S p Q A M m K d F 8 Y g g m k p m s i I y w x E S b t q q m B H f x y 8 u k c 9 Z w n Y Z 7 d 1 5 v X p V 1 V O A I j u E U X L i A J t x A C 9 p A 4 B G e 4 R X e r C f r x X q 3 P u a j K 1 a 5 c w h / Y H 3 + A K D X k 5 I = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 4 = " p Q F S 9 M q k T n H w K z p G T C J 5 r G 0 c p L w = " > A A A B + n i c b V D L S s N A F L 3 x W e s r 1 a W b w S K 4 K o k I u i y c W c F + 4 A m h M l 0 0 g d T M L M R C m x n + L G h S J u / R J 3 / o 2 T N g t t P T B w O O d e 7 p k T p p w p 7 T j f 1 s r q 2 v r G Z m W r u r 2 z u 7 d v 1 w 4 K s k k o W 2 S 8 E T 2 Q q w o Z 4 K 2 N d O c 9 l J J c R x y 2 g 3 H 1 4 X f f a B S s U T c 0 l K / R g P B Y s Y w d p I g V 3 z c i / G e k Q w R 7 c B Q 9 4 0 s O t O w 5 k B L R O 3 J H U o 0 Q r s L 2 + Q k C y m Q h O O l e q 7 T q r 9 H E v N C K f T q p c p m m I y x k P a N 1 T g m C o / n 0 W f o h O j D F C U S P O E R j P 1 9 0 a O Y U m c W g m i 5 h q 0 S v E / 7 x + p q N L P 2 c i z T Q V Z H 4 o y j j S C S p Q A M m K d F 8 Y g g m k p m s i I y w x E S b t q q m B H f x y 8 u k c 9 Z w n Y Z 7 d 1 5 v X p V 1 V O A I j u E U X L i A J t x A C 9 p A 4 B G e 4 R X e r C f r x X q 3 P u a j K 1 a 5 c w h / Y H 3 + A K D X k 5 I = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 4 = " p Q F S 9 M q k T n H w K z p G T C J 5 r G 0 c p L w = " > A A A B + n i c b V D L S s N A F L 3 x W e s r 1 a W b w S K 4 K o k I u i y c W c F + 4 A m h M l 0 0 g d T M L M R C m x n + L G h S J u / R J 3 / o 2 T N g t t P T B w O O d e 7 p k T p p w p 7 T j f 1 s r q
B G e 4 R X e r C f r x X q 3 P u a j K 1 a 5 c w h / Y H 3 + A K D X k 5 I = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " p Q F S 9 M q k T n H w K z p G T C J 5 r G 0 c p L w = "  Figure 1. Illustration of the CGAN architecture for QST. Data d sampled from measurements of a set of measurement operators {Oi} on a quantum state is fed into both the generator G and the discriminator D. The other input to D is the generated statistics from G. The next to last layer of G outputs a physical density matrix and the last layer computes measurement statistics using this density matrix. The discriminator compares the measurement data and the generated data for each measurement operator and outputs a probability that they match.

o = " > A A A C C X i c b V D L S s N A F J 3 U V 6 2 v q E s 3 g 0 V w V R I R d F l 0 o T s r 2 A c 0 I U y m k 3 b o 5 M H M j V h C t m 7 8 F T c u F H H r H 7 j z b 5 y 0 W W j r g Q u H c + 7 l 3 n v 8 R H A F l v V t V J a W V 1 b X q u u 1 j c 2 t 7 R 1 z d 6 + j 4 l R S 1 q a x i G X P J 4 o J H r E 2 c B C s l 0 h G Q l + w r j + + L P z u P Z O K x 9 E d T B L m h m Q Y 8 Y B T A l r y T O w A e 4 A M Z O 5 k T k h g R I n A N x 7 H j h z F 3 p W T e 2 b d a l h T 4 E V i l 6 S O S r Q 8 8 8 s Z x D Q N W Q R U E K X 6 t p W A m x E J n A q W 1 5 x U s Y T Q M R m y v q Y R C Z l y s + k n O T 7 S y g A H s d Q V A Z 6 q v y c y E i o 1 C X 3 d W R y r 5 r 1 C / M / r p x C c u x m P k h R Y R G e L g l R g i H E R C x 5 w y S i I i S a E S q 5 v x X R E J K G g w 6 v p E O z 5 l x d J 5 6 R h W w 3 7 9 r T e v C j j q K I D d I i O k Y 3 O U B N d o x Z q I 4 o e 0 T N 6 R W / G k / F i v B s f s 9 a K U c 7 s o z 8 w P n 8 A 7 c + a c g = = < / l a t e x i t >
output. In each optimization step, θ D is updated to maximize the expectation value where y denotes samples from the data. Then, θ G is updated to minimize In this way, the generator learns to map elements from a noise distribution to data as G : z → y [74,76]. However, since the generator input is random, we have no control over the output. This issue is solved by using a conditional generative adversarial network (CGAN) [76,85]. In a CGAN, the generator and discriminator output is conditioned on some variable x. This conditioning allows the generator to learn the mapping G : x, z → y [76]. The optimization of parameters for the CGAN is done as before, by maximizing Eq. (1) and minimizing Eq. (2); the only difference is that the outputs now are D(x, y; θ D ) and G(x, z; θ G ). This CGAN approach is very flexible and can be used to find complex maps between inputs and outputs. The flexibility stems from using the discriminator network for evaluation instead of, or in addition to, a simpler loss function.
Quantum state tomography using conditional generative adversarial networks. We now adapt the CGAN framework to the problem of QST. In our approach, illustrated in Fig. 1, the conditioning input to the generator is the measurement statistics and the measurement operators (x → d, {O i }). The generated output is a density matrix ρ G . We find that we do not need to provide any input noise z, consistent with the results in Ref. [76].
The discriminator takes as input the experimental measurement statistics d (as the conditioning variable) and generated measurement statistics calculated from tr(O i ρ G ). The output from the discriminator is a set of numbers describing how well the generated measurement statistics match the data. This partitioning of the evaluation of the generated statistics is inspired by the PatchGAN architecture of Ref. [76]. If the generator has managed to learn the correct density matrix, the discriminator will not be able to distinguish the generated statistics from the true data.
The adaption of the CGAN architecture to QST requires us to introduce two custom layers at the end of the generator neural network. First, we add a Density-Matrix layer, which takes the unconstrained intermediate output of the generator, moulds it into a lower triangular complex-valued matrix T G with real entries on the diagonal, constructs T † G T G , and normalizes the resulting matrix to have unit trace. This method is inspired by the Cholesky decomposition [40]. It ensures that the output ρ G is a valid density matrix: Hermitian, positive, and having unit trace. A similar idea was found independently in Ref. [73].
Secondly, we add an Expectation layer that combines the output ρ G with the given measurement operators {O i } to compute the generated measurement statistics for each measurement outcome as tr(O i ρ G ). These two custom layers do not have any trainable parameters. They are only present to enforce the rules of quantum mechanics in the neural networks. This is akin to regularization [95] and normalization [96] in neural networks. We note that our two custom layers could be used to augment any deep-learning neural-network architecture for QST, e.g., Refs. [72,73].
We train the QST-CGAN using standard gradientbased optimization techniques, e.g., Adam [97] with learning-rate scheduling, starting from random initial values for the parameters (θ D , θ G ). In this way, data from one experiment can be used to estimate the density matrix of the state in that experiment. However, when reconstructing ρ from another experiment, the QST-CGAN must start from zero again. We can avoid this reset by pre-training on simulated data corresponding to the type of state(s) and noise that is expected to be present in the experiment. The reconstruction from experimental data then requires less additional training; it even becomes possible to do single-shot reconstruction with a single evaluation by the pre-trained generator.
We note that adding L 1 loss to Eq. (2) as suggested in Ref. [76] proved helpful in training the QST-CGAN [89], and was used for all results displayed below, but was not necessary to obtain good results. Similarly, adding a gradient penalty [98] to Eq. (1) improved results for single-shot reconstruction.
Benchmarking CGAN quantum state tomography. To benchmark the QST-CGAN method, we test it on reconstruction of optical quantum states and compare its performance to a standard MLE method: iterative MLE (iMLE) [41]. In iMLE, projection operators determined by the measurement statistics are iteratively applied to a random initial density matrix until convergence. The final result is an estimated density matrix ρ that maximizes the likelihood function L(ρ |d).
Optical quantum states describe quantized singlemode electromagnetic fields (harmonic oscillators). Our choice of optical quantum states for testing the QST-CGAN was motivated by the existence of visual representations, e.g., Wigner functions, for these states, seeing how CGANs have mainly been applied to image processing. However, we stress that the QST-CGAN approach is general and can be applied to any type of quantum system with any type of observable [89].
Some of the common observables for optical quantum states are instances of a displace-and-measure technique. For example, the photon-number distribution obtained after applying a displacement β is the generalized Q function [99]: Q β n = tr |n n|D(−β)ρD † (−β) , where |n is the Fock state with n photons, D(β) = e βa † −β * a is the displacement operator, and a(a † ) is the bosonic creation (annihilation) operator of the electromagnetic mode. The Husimi Q function (photon field quadratures) is (1/π)Q β 0 and the Wigner function (photon parity) is W (β) = (2/π) n (−1) n Q β n . The measurement data we consider in the following are samples of Q β 0 and W (β) at certain β, as illustrated in Fig. 2.
A state ρ in a truncated Hilbert space with size N is specified by up to N 2 − 1 real numbers [93,100] (we use N = 32). Thus, in general, IC requires displacements and measurements to be carried out such that d has at least N 2 − 1 elements. However, note that the required number of elements in d for reconstruction can be lower, ∝ rN , if ρ has low rank r [101].
Results. In Fig. 3(a), we compare the reconstruction fidelity for the QST-CGAN and iMLE methods as a function of the number of iterations. One iteration is one update of all the weights (θ D , θ G ) for the QST-CGAN (a single gradient-descent step) and one application of the projection operators in iMLE. We find that the QST-CGAN converges to a fidelity > 0.999 in about two orders of magnitude fewer iterations (approximately one order of magnitude less time) than the iMLE. Note that the  choice of network architecture and training parameters will affect the speed of convergence and the computational cost of one iteration for the QST-CGAN. Next, we investigate, in Fig. 3(b), how many data points are required as input to reach high reconstruction fidelity. We find that the QST-CGAN approach starts outperforming the iMLE around N = 32 data points and reaches fidelities close to unity already with < 100 data points, while the iMLE requires ∼ 1000 data points to attain good fidelity (an RBM-based reconstruction of a similar state also requires thousands of data points to reach high fidelity [68]). Note that the rank r = 1, since ρ is a pure state.
Experimental state reconstruction from parity measurements. The benchmarking of the QST-CGAN so far has been on simulated data. We now demonstrate, in Fig. 4, that our QST-CGAN can reconstruct a noisy state from experimental data. In this particular experiment, a superconducting transmon qubit was used to generate a Wigner-negative state in a resonator [102], by applying a selective number-dependent arbitrary phase (SNAP) [103,104] of π to |0 and |1 of a coherent state |α = 1 . Despite significant state-preparation-andmeasurement (SPAM) noise, the QST-CGAN still manages to reconstruct the data well from measurements of the Wigner function, even when using only ∼ 15 % of the measurement data.
Single-shot reconstruction with pre-training. We now pre-train the QST-CGAN on a data set with several thousand cat states similar to Fig. 2 by selecting |α| ∈ [1,3] randomly with up to six coherent states in superposition. As shown in Fig. 5(a), this pre-trained network is then able to perform single-shot reconstructions for dif-  The data outside the circle, e.g., the Wigner-negative region in the top left, is not as reliable due to measurement calibration problems at higher photon numbers. We also attempt reconstruction with a subset of the data points inside the circle, and find that ∼ 600 data points are enough to achieve a fidelity ∼ 0.9 with the full reconstruction.  ferent cat states with a high average fidelity ∼ 0.98. It turned out to be difficult to find a learning strategy enabling further improvement of the fidelity with just a few more iterations for each state, but with tens of iterations a clear improvement is observed [ Fig. 2(b)]. The pre-trained network thus does not have to iterate many times from an initial random guess for each state, as is the case for the results in Fig. 3 and most other reconstruction methods in use today, resulting in a four orders of magnitude faster reconstruction than in Fig. 3(a).
Conclusion and outlook. In this Letter, we have adapted the CGAN architecture for use in quantum state tomography. The adaption relies on the introduction of two custom layers, which enforce the properties of a density matrix and allows calculation of expectation values of measurements. We showed that our QST-CGAN clearly outperforms the standard reconstruction method iMLE: the QST-CGAN consistently reconstructs states with higher fidelity, needing ∼ 100× fewer iterations and ∼ 10× fewer data points to do so in the examples we showed. Furthermore, we showed that we can pre-train the QST-CGAN on classes of quantum states and achieve high fidelity for single-shot reconstruction.
Looking to the future, we note that the custom layers we introduced could be included into other types of neural networks, e.g., Transformers [72], for both QST and other applications in quantum information processing. The CGAN approach has potential for denoising measurement data by pre-training on simulated noisy data. We further envisage the application of QST-CGAN for adaptive tomography [30,59], by choosing next measurements around the points where the discriminator finds that the reconstructed data does not match the experimental data well [67].