Variational Denoising for Variational Quantum Eigensolver

The variational quantum eigensolver (VQE) is a hybrid algorithm that has the potential to provide a quantum advantage in practical chemistry problems that are currently intractable on classical computers. VQE trains parameterized quantum circuits using a classical optimizer to approximate the eigenvalues and eigenstates of a given Hamiltonian. However, VQE faces challenges in task-specific design and machine-specific architecture, particularly when running on noisy quantum devices. This can have a negative impact on its trainability, accuracy, and efficiency, resulting in noisy quantum data. We propose variational denoising, an unsupervised learning method that employs a parameterized quantum neural network to improve the solution of VQE by learning from noisy VQE outputs. Our approach can significantly decrease energy estimation errors and increase fidelities with ground states compared to noisy input data for the H2 and LiH molecular Hamiltonians, and surprisingly only requires noisy data for training. Variational denoising can be integrated into quantum hardware, increasing its versatility as an end-to-end quantum processing for quantum data.

Introduction.-Theterm "quantum advantage" is derived from the promise of a quantum computer to solve practical problems significantly faster than even the best classical computer.While fault-tolerant quantum computers are expected to provide a quantum advantage, noisy effects in relevant quantum hardware necessitate a large number of qubits for useful and reliable computation.Nevertheless, the noisy intermediate-scale quantum (NISQ) devices [1] continue to excite the research community by promising to outperform classical computers in certain mathematical tasks [2][3][4][5].Following the success of classical neural networks (NNs) with noiserobust trainability for various practical machine learning problems, there has recently been renewed optimism that quantum NNs on NISQ computers can lead to quantum advantage.Indeed, the quantum speedup has been demonstrated in an impractical supervised learning problem using an artificial and engineered data set [6].There has been progress in identifying complex distributions that classical NNs are provably inefficient in expressing but quantum NNs [7].
Training parameterized quantum circuits (PQCs) with variational quantum algorithms (VQAs) is an important factor in NISQ devices [8].In VQA, an optimization problem is designed with a loss function computed from the expectation values of observables in a PQC.Classical computers are used to minimize this loss function by adjusting the circuit's parameters.However, the trainability of PQCs can be hampered by factors such as the instability of stochastic optimizers in complex loss landscapes, quantum errors, and systematic errors in experimental setups.The well-known barren plateau phenomenon [9], where the loss landscape is effectively flat, can occur under certain conditions such as deep quantum circuits, nonlocality of the observable defining the loss [10], and excessive quantum entanglement [11].Even when using shallow and local quantum models with local cost functions to avoid the barren plateau, poor local optimal concentrations far from the global optimum can lead to untrainability [12,13].Furthemore, factors such as quantum noise, task-specific circuit design, and hyperparameter settings, can have a significant impact on training performance.
Two representative methods in VQAs are quantum approximate optimization algorithms (QAOAs) [14] and variational quantum eigensolvers (VQEs) [15].QAOAs are proposed for combinatorial optimization problems, such as finding the maximum cut or the largest complete subgraph of a graph.VQEs are designed for applications in quantum chemistry and material science to determine the lowest energy level and ground state of a Hamiltonian.Since identifying exact solutions to such problems on classical computers incurs a computational cost that exponentially grows with the system's size, VQEs hold the promise of demonstrating the quantum advantage.Despite the successful implementation of VQE on smallscale problems in quantum devices [16], large-scale implementations remain a challenge [17].
VQE generates quantum states |ψ(ζ) = V (ζ) |ψ by applying unitary V (ζ) of parameter ζ to some arbitrary starting state |ψ .Given a Hermitian operator Ĥ, VQE provides an estimation on the ground state |ψ min which corresponds to an unknown minimum eigenvalue (lowest energy) E min of Ĥ.The energy estimation E ζ of VQE is bounded as , which is then iteratively trained by classical optimizers to obtain the minimum value.However, the estimation of ≤ E ζ can be a barrier to achieving near-term practical quantum advantages, since the number of quantum measurements required can be prohibitive [18].This can prevent VQE from being run for a large number of iterations [12].Furthermore, stochasticity in the optimizer and noise in quantum devices can result in different final |ψ(ζ) as |ψ 1 , . . ., |ψ N .These states are considered noisy data providing only partial information about the target ground state.
FIG. 1. Variational denoising by Quantum Autoencoder (QAE).Quantum states prepared by VQE circuits are considered "noisy" states.QAE trains encoding and decoding parameterized quantum circuits to compress each noisy state into a subsystem so that the state recovered from the reduced states is as close as possible to other noisy states.QAE is generalized from the classical feedforward NN where each neuron corresponds to a qubit while unitary circuits connect neurons in subsequent coupled layers (upper panel).The number of neurons is reduced in the hidden layer to compress the input state.The lower panel describes an example of circuit-based implementation for a QAE [4,2,1,2,4].
Here, we propose a method for learning the target state from noisy VQE data.This idea can be realized using the quantum extension of the autoencoder framework [19], in which feedforward NNs are trained to reproduce the input at the output layer.In the autoencoder, the number of nodes in the hidden layer is typically less than the number of nodes in the input and output layers, indicating that some essential information is retained for the reconstruction.The quantum autoencoder (QAE) is described in Ref. [20] with applications in compressing and denoising a specific quantum state [21][22][23][24].Our proposal is the first hardware-efficient QAE for learning from noisy quantum data to adapt and correct errors in VQEs.
Quantum Autoencoder(QAE).-Theaim of a QAE is to generate an encoder E to map the n-qubits input state to a k-qubits (k < n) reduced state, and a decoder D to reconstruct the input state from the reduced state.E and D are implemented using PQCs with parameters θ e and θ d , respectively.They are trained to maximize the expectation fidelity of the input state ρ with the output of QAE over the input data distribution C max θe,θc Here, the fidelity between two density matrices ρ and σ is defined as In practice, given the training data ρ 1 , . . ., ρ N sampled from C, Eq. ( 1) can reduce to maximize the empirical cost The upper panel of Fig. 1 shows an example of a QAE based on a dissipative quantum NN [25,26].Here, a dissipative quantum NN generalizes the classical feedforward NN with each neuron representing a qubit and unitary circuits connecting neurons in subsequent coupled layers.Qubits in each layer are dissipative after the forward process in the next layer, allowing the reduced space for QAE to be constructed.Consider the QAE with structure [m 1 , m 2 , . . ., m M ] (M ≥ 2), for neuron jth in the layer i + 1, we denote U i j as the parameterized unitary acting on its own qubit and the neurons on the preceding layer.The unitary between layers i and i + 1 are denoted by U i = j U i j .We denote β i as the state at the ith layer with m i neurons.Therefore, the transformation between β i and β i+1 can be expressed via the channel Here, Q i maps an m i -qubit state to an (m i +m i+1 )-qubit state before reducing it to an m i+1 -qubit state by tracing out qubits in layer i.For example, we show that the structure [4,2,1,2,4] corresponds to a QAE(n = 4, k = 1).
The encoder E = Q 1 Q 2 comprises two channels, the first channel Q 1 extending 4-qubit states (layer 1) into a 6qubit state and then reducing it to a 2-qubit state (layer 2), and the second channel Q 2 extending 2-qubit states into a 3-qubit state and then reducing it to a 1-qubit state (layer 3).The decoder D = Q 3 Q 4 inversely works on this 1-qubit state to reconstruct the final 4-qubit state.QAE can be implemented in the quantum circuit with the number of qubits is max{(m i + m i+1 )} (lower panel in Fig. 1).Implementations of QAE in Refs.[21,22,24] optimize over the space of unitary operators, limitting the scale of experimentation.We present an ansatz-based implementation of QAE that is hardware efficient and scales well with the number of qubits.Here, each unitary U i j consists of L blocks and final parameterized rotation gates R on every qubit.Each block contains parameterized rotation gates on each qubit as well as two-qubit entangling gates (Fig. 2).The entangling gates can be nonparametric controlled gates or parametric controlled-rotation gates, which are placed in circular with indexes of qubits.The number of parameters only scales linearly with the number of qubits and ansatz blocks.
Denoising QAE.-Since QAE discards irrelevant information from the input in the compressed state, it can be used for denoising quantum data.This concept is inspired by the classical denoising autoencoder [28], which FIG. 2. The ansatz to implement each unitary U i j in the circuit in Fig. 1.Each block includes L ansatz blocks and final parameterized rotation gates R on every qubit.Each block includes parameterized rotation gates on every qubit and two-qubit entangling gates, which are circular with indexes of qubits.The entangling gates can be nonparametric controlled gates or parametric controlled-rotation gates [27].
is trained to reconstruct a clean input from a corrupted version.In Ref. [23], noise-free logical states are used as target states to train the QAE to remove noise from states in a predefined codespace.However, because of the no-cloning theorem, preparing copies of the target state as a reference state in quantum training is often challenging.We can overcome this problem by using pairs of input and target states from the same noisy data source such as a quantum state with bitflip errors and small random unitary transformations [21].However, current QAE denoising applications are limited on a few specific quantum states, such as GHZ or W-states [21].In this study, we show how QAE can be applied to VQEs where quantum noise and stochastic behavior of optimizers can result in noisy final states.
Let us consider a final quantum state ρ in a VQE running with a specific setting, such as initialization, the number of iterations for the optimizer, and a quantum noise condition.We can assume that ρ is generated by a distribution C ρ * based on the target state ρ * .For example, ρ * is the ground state, and C ρ * is the process that yields the final VQE solution after a fixed number of iterations in a stochastic optimizer.The training procedure attempts to maximize the empirical loss given N paired training states (ρ A i , ρ B i ) where ρ A i , ρ B i are drawn from C ρ * .We consider C ρ * to be a noisy data source, which can generate different realizations of the states.If these states share essential features of ρ * , training QAE can output the state with a high overlap with the target state.In our study, we employ the simultaneous perturbation stochastic approximation (SPSA) [29] optimization to training Eq. (4).SPSA is efficient because the number of measurements is independent of the total number of training parameters (see more details of the training in [27]).
Results.-We apply QAE to denoising states generated by the VQE algorithm.The noisy data can be considered as the state generated from a noisy quantum channel applied to the VQE circuits (the first experiment) or the results of the early stopping in the optimization process of VQE (the second experiment).In [27], we also consider noisy data generated from the combination of quantum noise and early stopping optimization or from different ansatzes of the VQE circuit.We perform numerical experiments using Qiskit [30] simulation for H 2 and LiH molecules, in which the molecular Hamiltonians presented in the Slater-type orbital (STO-3G) basis are mapped to qubit Hamiltonians with 2 (for H 2 ) and 4 (for LiH) qubits, respectively [31,32].For VQE circuits, we construct hardware-efficient circuits comprising single-qubit operations spanned by SU(2) and two-qubit controlled-X entangling gates [33].Here, we use rotation operators of Pauli Y and Z as single qubit gates [27].
In the first experiment, we consider process C ρ * as a noise channel applied to a VQE circuit after performing SPSA optimization of 250 iterations for H 2 and 1000 iterations for LiH.We consider two noise channels with relatively large bitflip and depolarization errors.In the bitflip channel, every qubit is flipped with a probability of 0.2.In the depolarization channel, a single-qubit depolarizing is applied to all qubits with a probability of 0.2.We construct the QAE[2,1,2] with L = 1 ansatz block (for H 2 ) and QAE [4,1,4] with L = 3 ansatz blocks (for LiH) where the rotation gate R is the parameterized rotation Pauli Y gate R Y (θ) = e −iY θ/2 and the nonparameterized two-qubit entangling gates are controlled-Z.At each bond length in the molecule, we train the QAE with 200 pairs of noisy VQE data.We generate 1000 noisy samples for testing, and test with 200 samples of the highest energy.Figure 3 shows the energy of noisy and denoised data for H 2 and LiH molecules at various bond lengths and noise types.The inset figures depict the energy difference |∆E| between these energies and the ground state energy.Interestingly, even when using noisy data for training, the average performance of our method improves the ground state energy estimation of H 2 and LiH with small fluctuation for both types of noise in VQE circuits.
In the second experiment, we consider C ρ * as the earlystopping process for VQE, taking into account the output of the VQE circuit after 10 and 500 SPSA iterations for H 2 and LiH, respectively.Similar to the previous experiment, we construct the QAE[2,1,2] for H 2 and QAE [4,1,4] for LiH with L = 1, 3, 5 ansatz blocks, where L and the type of entangling gates (controlled-X or controlled-Z) are chosen to maximize the training cost.For each bond length, we train the QAE with 200 pairs of noisy VQE data generated at different SPSA optimizer random seeds.We then generate 1000 noisy samples and test with 200 samples of the highest energy to see the denoising effect.Figure 4 shows the energy of the noisy and denoised data, as well as the fidelities with the ground state at each bond length.Even with noisy data from the early termination of the VQE routine, QAE can reduce energy error and the deviation in energy estimation while increasing the fidelity to the ground state.The variance in the fidelity and energy estimation can be further reduced by using the parametric controlled-rotation ansatz in the entangling gates [27].The decreased fidelity for bond lengths longer than 2 Å is attributed to the small energy gap between the ground state and the first excited state, and the original VQE circuits find it difficult to optimize to isolate the ground state representation.
Finally, to comprehend the mechanism of denoising QAE, we investigate the fidelities between the outputs of subsystems of the QAE and the corresponding parts in the target state as data is forwarded.Here, we consider C ρ * as the early-stopping process for the VQE at 50 and 100 SPSA iterations of LiH, and the target state is the ground state of LiH at bond length 1.4 Å.The QAE [4,1,4] with L = 1 ansatz block is trained on 200 noisy data pairs and tested on 1000 noisy data samples.As shown in Fig. 5, when the input state is forwarded through the QAE, a portion of the target state is reconstructed in the subsystem's output.The dissipative property of the  [4,1,3] correspond the subsystem with the fourth qubit, the first qubit, the two first qubits, the three first qubits of the ground state, respectively.As the noise level increases, the noisy parts leak out from the bottleneck layer, preventing the decoder from reconstructing the ground state.Solid lines and error bars describe the average value and standard deviation over 1000 test samples.
QAE allows it to retain partial information of the target state at the bottleneck neuron and gradually reconstruct other parts of the target state.For example, at subsystem [4,1], the state at the bottleneck (red) neuron has a high overlap with the last qubit in the target state.At a low noise level, for example, the VQE output states at 100 SPSA iterations, at subsystem [4,1,1], [4,1,2], [4,1,3], and [4,1,4], QAE can reconstruct the subsystem with the first qubit, two first qubits, three first qubits, and finally all qubits of the target state (orange lines).However, suppose we train the QAE with high noise-level VQE data where the VQE output states (at 50 SPSA iterations) have a low overlap with the ground state.In that case, irrelevant information associated with the ground state leaks out at the bottleneck, preventing the decoder from reconstructing of the ground state (blue lines).
Conclusion.-We proposed a method for learning and separating noise from VQE noisy data.It learns to discard noisy and irrelevant information while compressing relevant information into a reduced space to improve the overlap between the output of the VQE and the true target state.Our method is especially suited to NISQ devices, where noisy quantum data is common and access to the noiseless data is restricted.Furthermore, our method is efficient in terms of quantum resources, paving the way for further research into the advantages of quantum models in learning quantum data sets [34,35].
This supplementary material describes in detail the calculations, the experiments presented in the main text, and the additional figures.The equation, figure, and table numbers in this section are prefixed with S (e.g., Eq. (S1) or Fig. S1, Table S1), while numbers without the prefix (e.g., Eq. ( 1) or Fig. 1 This section explains the quantum circuit-based implementation of Quantum Autoencoder (QAE) and the entire circuit for training.The concept of QAE employed in this study is referenced from Ref. [1] with the circuit implementation based on Qiskit library [2] in Ref. [3].However, this implementation does not provide the hardware-efficient ansatz, which can be run on real NISQ computers.In this implementation, the number of measurements for training scales with the total number of parameters.Furthermore, the number of parameters exponentially increases with the number of qubits used in the system.In our implementation, we provide a hardware-efficient ansatz circuit for QAE, which can significantly reduce the number of parameters for training.We further use the simultaneous perturbation stochastic approximation (SPSA) [4] optimization to train these circuits where the number of measurements is independent of the total number of training parameters.

A. Circuit Design
The upper panel of Fig. S1 describes the circuit design used to train the QAE.Since the QAE in our study takes an input state ρ as a noisy state and trains the parameterized circuits to make the output state as close as possible to another noisy state ρ, the whole circuit for training consists of four parts: (I) circuit for state preparation for both input state ρ and target state ρ (in our experiments, the input and target state are prepared by VQE circuits), (II) circuit of QAE, (III) circuit to compute the fidelity of the output of QAE and the target state ρ, and (IV) measurement part.The fidelity of the output of QAE and the target state is calculated as 2p 0 − 1, where p 0 is the probability of obtaining 0 in the measurement.
Figure S1 shows the circuit implementation for the QAE structure with [4, 2, 1, 2, 4] nodes in layers to denoise 4-qubit states (right bottom figure).Each neuron corresponds to a qubit, and unitary circuits connect neurons in subsequent coupled layers.Qubits in each layer are dissipative after the forward process in the next layer, making it possible to construct the reduced space for QAE.Consider the QAE with structure [m 1 , . . ., m M ] (M ≥ 2), for neuron * tran.quochoan@fujitsu.comarXiv:2304.00549v1[quant-ph] 2 Apr 2023 TABLE I.The number of parameters used in the earlier implementation in Refs.[1,3] and our implementation for QAE.Here L is the number of ansatz blocks used in each unitary of QAE (see the right bottom figure in Fig. S1).

Structure
Implementation in Ref. [ jth in layer i + 1.We denote U i j as the parameterized unitary acting on its qubit and the neurons on the preceding layer.These unitary matrices are subject to optimization.The unitary between layer i and i + 1 can be expressed as U i = j U i j .We denote β i as the state at ith layer with m i neurons.Consequently, the transformation between β i and β i+1 can be expressed through the quantum channel The authors in Refs.[1,3] consider the optimization on all possible values of each unitary U i j (applied to the jth neuron on layer i + 1 and the neurons on the preceding layer) in circuit part (II).Here, U i j is represented as where the Hermitian matrix K i j is uniquely defined by 4 mi+1 coefficients k σ on the expansion in the Pauli basis σ ∈ {I, X, Y, Z} ⊗(mi+1) for the real vector space of 2 mi+1 ×2 mi+1 .Thus, for the QAE with structure [m 1 , m 2 , . . ., m M ], the total number of training parameters is which is exponentially scaling with the number of qubits used in the system.Consequently, we need to explore a large number of training parameters, which makes the training phase time-consuming and challenging.Furthermore, in principle, the implementation of the representation in Eq. (S2) in real NISQ devices is inefficient.
In our research, we provide an ansatz-based implementation for QAE, where each unitary block U i j is implemented by a parameterized quantum circuit comprising L blocks and final parametric rotation gates R on every single qubit (the right bottom of Fig. S1).Here, each block includes parametric rotation gates on every single qubit and two-qubit entangling gates.In our experiments, we use the rotation operator of the Pauli Y gate as R Y (θ) = e −iY θ/2 .The nonparametric two-qubit entangling gates are controlled-X or controlled-Z (controlled qubits in gray circles and operate unitaries in gray squares), which are arranged circularly with indexes of qubits.Our implementation is hardware efficient and can reduce the number of parameters for training (see Table I).The number of parameters only scales linearity with the number of qubits and layers used in each ansatz block, given the same depth of QAE.

B. Training Algorithm
The training in denoising QAE can be formulated to maximize the empirical cost FIG. S1. Circuit implementation for training QAE.The quantum circuit used to train the QAE using noisy training data is described in the upper panel.We adopt the concept of implementation in Refs.[1,3].The circuit composes of three parts separated with vertical grey lines: (I) circuit to prepare the input state ρ and the target state ρ (note that these states are noisy states obtained from other VQE circuits); (II) circuit to implement the abstract structure of QAE with encoder E and decoder D (here, this circuit represents for the left bottom structure of QAE with [4, 2, 1, 2, 4] nodes in layers); (III) circuit with two Hadamard gates and four SWAP gates to calculate the fidelity between the output of QAE and the target state ρ; and (IV) the measurement part.In QAE, each unitary block U i j is implemented by a parameterized ansatz circuit consisting of L blocks and final parametric rotation gates R on every single qubit as illustrated in the right bottom figure.Here, each block includes parametric rotation gates on every single qubit and two-qubit entangling gates.In our experiments, we employ the rotation operator of Pauli Y gate as RY (θ) = e −iY θ/2 .The entangling gates can be nonparametric controlleged gates (controlled-X or controlled-Z) or parametric controlled-rotation gates (controlled-RX or controlled-RZ), which are circularly arranged with the indices of the qubits.They have controlled qubits in gray circles and operate unitaries in gray squares.
given N paired training states (ρ A i , ρ B i ), where θ = (θ 1 , . . ., θ P ) is the parameter vector with P parameters used to construct E and D. The paired training states (ρ A i , ρ B i ) are sampled from a noisy process C ρ * where we do not know the true target state ρ * .Therefore, this is different from the training of classical autoencoder, where only pairs of identical data are considered for training.
A typical method to maximize the cost in Eq. (S4) is the vanilla gradient descent (GD) method, which has been employed to train QAE in Ref. [3].Here, parameters are updated over iterations, where the update is based on the gradient of the cost function to increase the cost.At (t + 1)-th iteration step, GD updates the parameter θ p as where η is the learning rate.At each iteration, the gradient ∂L ∂θ p (θ) is evaluated for every index p.Considering the sufficient small perturbation step , we can use the finite difference method to approximate the gradient as follows: where p is an indicator vector with at the p-th position in the parameter vector θ, and 0 otherwise.From Eq. (S5) and Eq.(S6), we can estimate the complexity in terms of the number of measurements for each iteration as where N is the number of training data, P is the number of training parameters, and S is the number of shots to run the circuit to estimate the probability of obtaining 0 in the measurement part to compute the fidelity.
Since N and P are often fixed, the number of quantum measurements heavily depends on the number of parameters in the model, which is what caused the major constraint in the prior approach in Ref. [3].
To overcome this limitation, we employ the SPSA [4] optimizer to replace the gradient estimation dimension by dimension with a stochastic one that needs only two evaluations of the cost function regardless of the number of parameters.SPSA does not provide a stepwise unbiased gradient estimation but is appropriately effective for largescale optimization problems in the presence of noise.Thus, it is recommended in the optimization routine with NISQ devices or noisy quantum simulators.
In SPSA, a P -dimensional random direction b is sampled from a P -dimensional discrete uniform distribution on {−1, 1} P .Along this sampled direction, the gradient component is approximated with a finite difference method with two evaluations of the cost function as Therefore, the stochastic estimator of the gradient is constructed as |∇ b L(θ)|b, which gives the update rule: where b (t) is sampled at t-th iteration.Furthermore, to reduce the fluctuation in the stochastic estimator |∇ b L(θ)|b, we consider the multiple sampling of b at each iteration and then take the average value for this estimator.In our experiments, we consider two sampling of b at each iteration.Therefore, the number of measurements for each iteration is 4N S, where we use N = 200 training data and the number of shots S = 1000 in all simulations.
In our implementation, we further introduce the mini-batch learning and the schedule for learning rate µ for more efficient learning.Here, mini-batch learning involves training a model on small batches of data rather than using the entire data set at once.In mini-batch learning, the data are divided into smaller batches, and the model is trained on each batch in turn.Because mini-batch learning exposes the model to a greater variety of data during training, it can increase the model's generalization.In our training process, we randomly split the data set with 200 instances into four batches of 50 instances at each epoch.We calculate the cost function and gradient at each batch to update the parameters.Consequently, each training epoch requires updating parameters four times.We run all experiments with 100 epochs of training.Furthermore, the small perturbation is set to = 0.1 for gradient estimation.The learning step at the first epoch is set to µ = µ 0 = 0.2 but is reduced at the scale of 0.8 every ten training epochs.

II. VQE IMPLEMENTATION
In our experiments, we create training data for QAE from VQE in the electronic structure problem in quantum chemistry.First, the H 2 and LiH molecular Hamiltonian are mapped to qubit Hamiltonians with Q = 2 and Q = 4 qubits via the parity transformation from the second-quantized fermionic Hamiltonian, respectively.The VQE quantum states are prepared by the following two-local variational ansatz state [5] of parameter ζ: for Q qubits consisting of D entangling CNOT gates alternating with Q(D + 1) rotation gates on each qubit.In our experiments, we compare two versions of ansatz circuits in the VQE method: VQE-YZ and VQE-Y.VQE-YZ uses a single-qubit rotation gate ).The number of parameters in these circuits is Q(2D + 1) and Q(D + 1), respectively.In the main text, we construct the variational circuits with D = 1 entangling gates, resulting in 6 and 8 parameters for VQE-YZ and VQE-Y, respectively.We optimize the circuits using Qiskit's implementation of SPSA.In the first experiment described in the main text, we introduce noise to the well-trained VQE circuits by applying a quantum noise channel to the variational circuit given by the VQE-YZ ansatz.Specially, we use SPSA for 250 iterations and 1000 iterations to train the circuit for the H 2 and LiH molecules, respectively.
In the second experiment described in the main text, we suppose that a lack of computational resources has prevented the VQE variational circuits from receiving adequate training.For example, we employ SPSA for 10 iterations and 500 iterations to train VQE for the H 2 and LiH molecules with VQE-YZ and VQE-Y ansatz, respectively.Because of the stochastic nature of SPSA, different random seeds result in different final VQE states.Consequently, the overlap of these states with the target ground state has a high variance, resulting in noisy states.
FIG. S2.The ansatz for each unitary U i j in the QAE.Each block includes L ansatz blocks and final parametric rotation gates R on every qubit.Each block includes parametric rotation gates on every qubit and two-qubit entangling gates, which are circularly arranged with the indices of the qubits.We introduce two types of two-qubit entangling gates: (a) non-parametric gates such as controlled-X or controlled-Z and (b) parametric controlled-rotation gates such as controlled-RX or controlled-RZ.
In the third experiment described in the main text, we investigate the mechanism and effectiveness of our QAE in training with varying amounts of data.Specifically, we use output data from VQE variational circuits with VQE-YZ ansatz trained for 50 and 100 SPSA iterations as the training data.

III. PARAMETRIC CONTROLLED-ROTATION ANSATZ
In the experiments in our main text, we introduce the ansatz block with nonparametric entangling gates such as controlled-X or controlled-Z gates [Fig.S2(a)].These structures can reduce the number of rotation gates in the hardware implementation but have limitations in expressing the entanglement properties.To improve the expressivity of the entanglement, we can introduce variational parameters into these gates.We modify them to the parametric controlled-rotation gates, e.g., the controlled-RX (controlled rotation Pauli X) or controlled-RZ (controlled rotation Pauli Z) gates, to adjust the degree of entanglement more flexibly [Fig.S2(b)].
We present the denoising results with noisy VQE data of H 2 and LiH molecules in Fig. S3 and Fig. S4, respectively.Here, the noisy data are generated from the VQE-YZ circuit at 10 SPSA iterations for H 2 and the VQE-Y circuit at 500 SPSA iterations for LiH.We use QAE [2,1,2] for H 2 with one ansatz block, and QAE [4,1,4] for LiH with three ansatz blocks using the rotation of Pauli Y as parametric gates on every qubit and two-qubit entangling gates.We then compare the performance of two types of entangling gates in the ansatz: CZ with nonparametric controlled-Z gates [Figs.S3(a We train the QAE with 200 pairs of noisy data and test with 200 samples with the highest energy as the noisiest data.Figures S3 and S4 describe the fidelity with the ground state of the noisy and denoised data.The parametric controlled-rotation ansatz can reduce the variance in the fidelity of the test data, which largely improves the denoising performance.

IV. DENOISING RESULTS
This section discusses the additional results for denoising the VQE of H 2 and LiH molecules.First, we consider the noisy training data as the output of VQE-YZ circuits at 10 and 100 SPSA iterations for H 2 and LiH, respectively.The noisy VQE data are generated at different random seeds of the SPSA optimizer.We train the QAE with 200 pairs and test with 1000 samples of noisy data.Here, we employ QAE[2,1,2] for H 2 with L = 1 ansatz block, and QAE [4,1,4] for LiH with L = 5 ansatz blocks in Fig. S2.Here, the parametric rotation gate R on each qubit is the rotation of Pauli Y and the entangling gates are nonparametric controlled-Z gates.Figures S5 and S6 depict the histogram of energy difference |∆E| between the estimated energy and the ground state energy, and the histogram of the fidelity with the ground state of the noisy and denoised data for H 2 and LiH molecules, respectively.The energy estimation and the overlap with the ground state are largely improved in both cases.
In addition, we analyze the effect of the number of training pairs on the denoising performance of our method for VQE data of LiH. Figure S7 shows the fidelity of the denoised states for 1000 test samples as a function of the number of training pairs.We use QAE [4,1,4] for LiH with L = 3 ansatz blocks, as shown in Fig. S2.Notably, our method achieves high fidelity even with a small number of training pairs, suggesting the potential for efficient learning.
Next, we consider the experiment where we apply bitflip and depolarizing noise channels to the output of VQE-YZ circuits at 10 iterations (early stopping) to generate noisy data for H 2 , and at 500 iterations to generate noisy data for LiH.At each bond length, we train the QAE with 200 pairs of noisy VQE data generated at different random seeds.We further generate 1000 samples of noisy data for testing and test with 200 samples of the highest energy to see the denoising effect in ground-state estimation.Figures S8-S11 show the energy of the noisy and denoised data and the fidelities with the ground state at each bond length for the H 2 and LiH molecules.Here, we construct the QAE[2,1,2] for H 2 with L = 1 ansatz block and the QAE [4,1,4] for LiH with L = 3 ansatz blocks, where the rotation gate R is the parametric rotation gate of the Pauli Y gate, and the entangling gates are nonparametric CZ gates or parametric CRZ gates.
Finally, we demonstrate that our method can work with noisy data generated from two different VQE ansatzes.Considering VQE for H 2 molecule, we use the data of the VQE-YZ circuit at 10 SPSA iterations as the input data, and the data of the VQE-Y circuit at 10 SPSA iterations as the target data in training QAE (Fig. S1).For each bond length, we train the QAE using 200 pairs of noisy VQE data generated at different random seeds.We also generate 1000 samples of noisy data for testing and test with 200 samples of the highest energy to see the denoising effect in ground-state estimation.We validate the performance of the QAE[2,1,2] for H 2 with L = 1 ansatz block, where the rotation gate R is the parametric rotation of Pauli Y gate and the entangling gates are the CZ [Fig.S12(a  FIG.S7.The fidelities of noisy VQE and denoised states with the ground state via the the number of training data.The QAE is trained with 200 pairs and tested with 1000 noisy samples.We consider the VQE states at the early stopping of 100 SPSA iterations for the VQE-YZ ansatz as noisy data.We employ QAE [4,1,4] with L = 3 ansatz blocks, where the parametric rotation gate R on each qubit is the Pauli Y rotation, and the entangling gates are nonparametric controlled-Z gates (see Fig. S2).

FIG. S8
. The energy, the energy difference |∆E| with the ground state energy and the fidelity with the ground state of the noisy VQE data (with an early stopping of SPSA and bitflip noise) and denoised data for H2 at each bond length.Here, VQE data at 10 SPSA iterations for the VQE-YZ ansatz are applied to the bitflip channel, where each qubit is flipped with a probability of 0.2.We use one ansatz block in Fig. S2 for QAE [2,1,2], where the parametric rotation gate R on each qubit is the rotation of Pauli Y gate, and the entangling gates are nonparametric CZ gates (a) or parametric CRZ gates (b).Solid lines and error bars describe the average value and standard deviation over 200 test samples.FIG.S10.The energy, the energy difference |∆E| with the ground state energy and the fidelity with the ground state of the noisy VQE data (with an early stopping of SPSA and bitflip noise) and denoised data for LiH at each bond length.Here, VQE data at 10 SPSA iterations for the VQE-YZ ansatz are applied to the bitflip channel, where each qubit is flipped with a probability of 0.2.We use 3 ansatz blocks in Fig. S2 for QAE [4,1,4], where the parametric rotation gate R on each qubit is the rotation of Pauli Y gate, and the entangling gates are nonparametric CZ gates (a) or parametric CRZ gates (b).Solid lines and error bars describe the average value and standard deviation over 200 test samples.FIG.S11.The energy, the energy difference |∆E| with the ground state energy and the fidelity with the ground state of the noisy VQE data (with an early stopping of SPSA and depolarization noise) and denoised data for LiH at each bond length.Here, VQE data at 500 SPSA iterations for the VQE-YZ ansatz are applied to the depolarization channel with a single-qubit depolarizing applied to all qubits with a probability of 0.2.We use 3 ansatz blocks in Fig. S2 for QAE [4,1,4]

FIG. 3 .
FIG. 3. The energy of noisy VQE data and denoised data for H2 and LiH molecules at each bond length and each type of noise (bitflip, depolar).Inset figures describe the energy difference |∆E| of these energies with the ground state energy.Solid lines and error bars describe the average value and standard deviation over 200 test samples.

FIG. 4 .
FIG. 4. The energy and the fidelity with the ground state of the noisy data at early steps of VQE optimization and denoised data for (a) H2 and (b) LiH molecules.Inset figures describe the energy difference |∆E| of these energies with the ground state energy.Solid lines and error bars describe the average value and standard deviation over 200 test samples.

FIG. S3 .
FIG.S3.The fidelity with the ground state of the noisy VQE data and denoised data for H2 at each bond length.Here, VQE data at the early stopping of 10 SPSA iterations for the VQE-YZ ansatz are used for training (200 pairs) and testing (200 pairs with the highest energy as the noisiest data).We use one ansatz block in Fig.S2, where the parametric rotation gate R on each qubit is the rotation of Pauli Y.We compare two types of entangling gates in the ansatz: (a) CZ with nonparametric controlled-Z gates (b) CRZ with parametric controlled rotation Pauli Z gates.

FIG. S6 .
FIG. S6.The histogram of energy difference|∆E| with the ground state energy and the histogram of the fidelity with the ground state of the noisy and denoised VQE data for LiH molecule at the bond length 1.4 Å. QAE is trained with 200 pairs and tested with 1000 noisy samples.VQE states at the early stopping of 100 SPSA iterations for the VQE-YZ ansatz are considered noisy data.We employ QAE[4,1,4]  with L = 5 ansatz blocks in Fig.S2, where the parametric rotation gate R on each qubit is the rotation of Pauli Y and the entangling gates are nonparametric controlled-Z gates.
FIG.S11.The energy, the energy difference |∆E| with the ground state energy and the fidelity with the ground state of the noisy VQE data (with an early stopping of SPSA and depolarization noise) and denoised data for LiH at each bond length.Here, VQE data at 500 SPSA iterations for the VQE-YZ ansatz are applied to the depolarization channel with a single-qubit depolarizing applied to all qubits with a probability of 0.2.We use 3 ansatz blocks in Fig.S2for QAE[4,1,4], where the parametric rotation gate R on each qubit is the rotation of Pauli Y, and the entangling gates are nonparametric CZ gates (a) or parametric CRZ gates (b).Solid lines and error bars describe the average value and standard deviation over 200 test samples.
, Table1) refer to items in the main text.
I. TRAINING QUANTUM AUTOENCODER

)
Here, Q i extends an m i -qubit state into an (m i + m i+1 )-qubit state, then reduces it into an m i+1 -qubit state via tracing out qubits in layer i.In this way, with topology [4,2,1,2,4], the encoder E = Q 1 Q 2 comprises three unitaries in two channels, where the first channel Q 1 extends 4-qubit states (layer 1) into a 6-qubit state then reduces it into the 2-qubit state (layer 2), the second channel Q 2 further extends 2-qubit states into the 3-qubit state then reduces it into the 1-qubit state (layer 3).The decoder D = Q 3 Q 4 does inversely on this 1-qubit state to reconstruct the final 4-qubit state with six unitaries.Consequently, this dissipative design of QAE requires max{(m i + m i+1 )} qubits and the training circuit requires 1 + m 1 + max{(m i + m i+1 )} qubits with m 1 + 1 qubits used to calculate the fidelity.