Variational Quantum Anomaly Detection: Unsupervised mapping of phase diagrams on a physical quantum computer

One of the most promising applications of quantum computing is simulating quantum many-body systems. However, there is still a need for methods to efficiently investigate these systems in a native way, capturing their full complexity. Here, we propose variational quantum anomaly detection, an unsupervised quantum machine learning algorithm to analyze quantum data from quantum simulation. The algorithm is used to extract the phase diagram of a system with no prior physical knowledge and can be performed end-to-end on the same quantum device that the system is simulated on. We showcase its capabilities by mapping out the phase diagram of the one-dimensional extended Bose Hubbard model with dimerized hoppings, which exhibits a symmetry protected topological phase. Further, we show that it can be used with readily accessible devices nowadays and perform the algorithm on a real quantum computer.


I. INTRODUCTION
Since the discovery of Shor's algorithm [1], there have been many attempts to leverage the power of quantum computers to outperform classical computers [2]. One of the most promising near-and far-term applications is simulating quantum many-body systems. Universal quantum simulation algorithms like phase estimation [2,3], or adiabatic quantum computation [4] require quantum fault-tolerance [5], that is, the ability to reliably correct errors that occur during the quantum computation. With recent experimental advancements in quantum error correction [6,7], there is hope that quantum fault-tolerance can one day be reached. Currently, we only have access to noisy intermediate-scale quantum (NISQ) devices [8]. There are several proposals for algorithms on these devices [9,10] like the variational quantum eigensolver (VQE) [11], the quantum approximate optimization algorithm (QAOA) [12], or the quantum autoencoder [13,14], that employ parameterized circuits which are optimized through a classical feedback loop, typically with gradient based methods [15][16][17]. These approaches can suffer from so-called Barren Plateaus, the phenomenon of an exponentially vanishing gradient of the loss function [18]. In practice, this issue occurs for large systems but can be avoided by using shallow circuits and local cost functions [19]. While gradient-free optimization does not solve the Barren Plateau problem [20], other proposals give hope for large-scale and deep variational quantum circuits [21,22].
With the rise of deep learning in the 2010s, the term quantum machine learning was mostly used to refer to leveraging quantum computers for linear algebra tasks such as matrix inversion in sub-polynomial time via the Harrow-Hassidim-Lloyd algorithm [23,24]. One famous use-case was the quantum recommendation system algorithm with an exponential quantum speed-up at the time [25], which inspired classical analogs of the algorithm with the same, sub-polynomial, complexity (termed as quantum-inspired machine learning algorithms) [26]. Today, quantum machine learning refers to using quantum circuits as neural networks [27], or kernel functions [28] to perform classical machine learning tasks like supervised learning [29,30]. There are cases, where quantum models have provable advantages over classical models [31], but it has been argued that these instances are special cases and no quantum speed up is to be expected for quantum machine learning with classical data [32].
On the other hand, applying classical machine learning to quantum physics has been a great success story [33], most prominently for the classification and mapping of phase diagrams [34][35][36]. These methods rely on classical data and are therefore restricted by the available classical simulation methods. With physical devices surpassing system sizes that are classically tractable [37], there is need for methods to investigate physical quantum states with quantum computers.
In this paper, we propose a quantum machine learning algorithm for quantum data. The data are ground states of quantum many-body systems that are prepared by a quantum simulation subroutine and serve as the input for Variational Quantum Anomaly Detection (VQAD). Our quantum anomaly detection scheme belongs to the category of variational quantum algorithms where the circuit learns characteristic features of the input state [38]. This can in principle be leveraged for obtaining physical insights of the system from training [39] and is in contrast to previous proposals that are based on kernel methods (one-class support vector machines) [40,41]. In the present study, we use it to map out an unknown phase diagram of a system without requiring knowledge about the order parameter or the number and location of the different phases.
In anomaly detection, the task is to differentiate normal data from anomalous data, opposed to supervised Figure 1: Overview of our proposal. First, the quantum states are prepared via VQE. Then, they are processed through the anomaly syndrome, consisting of a parameterized unitary U (θ) and a measurement of a subset of qubits, referred to as trash qubits. R y indicates a parameterized y-axis rotation and CZ a (fixed) controlled-z gate.
learning tasks, where a fixed set of classes with labels for training are differentiated. On the other hand, the task of anomaly detection requires an anomaly syndrome, i.e., an observable that is trained to be of a certain value (typically 0) when normal data is input, and be significantly larger for anomalous data it is tested on. In classical machine learning, anomaly detection has already been used to extract phase diagrams in an unsupervised fashion from simulated and experimental data [42][43][44]. VQAD allows us to perform anomaly detection directly on a quantum computer, and, with programmable devices readily available, we demonstrate it experimentally on a real device.

II. PROPOSAL
The task of detecting anomalies in ground states of quantum many-body Hamiltonians can be loosely divided into two sub tasks: Preparing the ground state for specific Hamiltonian parameters, and computing an anomaly syndrome indicating whether the state corresponds to a normal example or an anomaly. An overview of our proposed algorithm is shown in Fig. 1. The problem of state preparation on quantum computers is one of ongoing research, and in principle, one can use any state preparation subroutine for preparing the ground state. Here, we choose the Variational Quantum Eigensolver (VQE) as it has the lowest hardware requirements while achieving reliable results on current devices [11,45]. The VQE algorithm iteratively minimizes the expectation value of a Hamiltonian with the ansatz circuit to find the ground state by optimizing the parameters of the circuit via a quantum-classical feedback loop. We choose a minimal ansatz as depicted in Fig. 1   simulation, and the quantum anomaly detection on real noisy devices. For more complex systems, the problem of finding a suitable hardware efficient ansatz can be addressed for example by the adaptive VQE algorithm [46]. In this work we employed the VQE implementation provided by the Qiskit library [47] and optimized it using simultaneous perturbation stochastic approximation (SPSA) [48]. For all technical details we refer to App. A.
Once the ground state is prepared on the quantum device a subsequent circuit serves as the anomaly syndrome. Our circuit ansatz is inspired by the recently proposed quantum auto-encoder, which similar to its classical counterpart can be used for compression of classical and quantum data [13,14]. It is composed of several layers each consisting of parameterized single qubit yrotations and controlled-z gates. After the final layer a predefined number n t of trash qubits is measured in the computational basis. The objective is to decouple the trash qubits from the rest of the system, effectively compressing the original ground state into a smaller number of qubits. The circuit parameters are then optimized to faithfully compress states that are considered normal. However, when the optimized circuit is tested on anomalous states not seen during training, it is expected that the circuit fails to decouple the trash qubits from the rest of the system. To quantify the degree of decoupling we use the Hamming distance d H of the trash qubit measurement outcomes to the |0 ⊗nt state, i.e., the num-ber of 1s in a bit-string of measurement outcomes [14]. The cost function C can then be defined as the Hamming distance averaged over several circuit evaluations where N is the number of performed measurements or shots. The cost function can also be rewritten in terms of expectation values of local Pauli-z operators Z j The VQAD circuit achieves perfect compression if the trash qubits are fully disentangled from the remaining qubits and mapped into the pure |0 ⊗nt state resulting in a cost equal to zero. The specific circuit ansatz for the anomaly syndrome is shown in Fig. 1 for the case of n t = 2 trash qubits. Each layer of the circuit starts with parameterized single-qubit y-rotations applied to every qubit followed by a sequence of entangling controlled-z gates. The currently available NISQ devices are inherently noisy and the computations are subject to gate errors. To minimize the number of two-qubit gates we apply the controlled-z gates only between trash qubits and non-trash-qubits as well as between trash qubits themselves instead of an all-to-all entangling map [14]. This entangling map is physically motivated as the goal of the circuit is to disentangle the trash qubits from the rest, with the trash qubits resulting in the |0 ⊗nt state. In a single layer each non-trash qubit will be coupled to exactly one trash qubit. This entangling scheme is repeated in the subsequent layers until every non-trash qubit has been coupled to each trash qubit exactly once, i.e. the number of layers of the circuit is equal to n t . After the final layer, additional single-qubit y-rotations act on the trash qubits before they are measured.
Barren Plateaus are the fundamental obstacle prohibiting training of variational circuits with increasing numbers of qubits [18]. It was previously shown that using local cost functions and circuits featuring a number of layers scaling at most logarithmically in the system size can prevent the occurrence of Barren Plateaus [19]. Additionally for realistic devices, gate errors lead to decoherence, making quantum simulation on real devices a challenging task even for small systems and low depths [49]. The former calls for a minimal number of layers while the latter calls for a minimal number of gates overall. Therefore, we seek a minimal solution for our variational circuit that we want to implement on a readily available NISQ-era quantum computer. On the other hand, it is desirable to have an ansatz as general as possible to be able to capture a wide range of problems (see circuit complexity [50,51]).
For the anomaly syndrome in this paper, we propose an ansatz that aims at compromising between being general enough to compress the ground states of the investigated systems while still being trainable. One way to make our circuit scalable for larger systems is to choose the num-ber of trash qubits n t = log 2 L , where L is the total number of qubits. Together with the fact that our cost function is composed of only local operators, the training is expected to not suffer from Barren Plateaus. We empirically confirm successful trainability, i.e., achieving a cost of 0.0 for ground states of the systems discussed later in the manuscript, for L ∈ {3, 4, 8, 16, 32}, corresponding to n t ∈ {1, 2, 3, 4, 5}, respectively. In Fig. 2, a ground state of the Ising model Eq. (5) at g x , g z , J = (0.3, 0, 1) is taken as a realistic example and we can confirm successful trainability in all cases.
Note that in principle the trash qubits can be placed anywhere in the circuit, however, when performing computations on a real quantum device it proved advantageous to explicitly take the qubit connectivity structure of the device into account in order to reduce the number of required SWAP operations. Specifically here, we placed the trash qubits in the middle of the IBMQ devices.
The training and inference procedure is identical to the classical anomaly detection schemes for mapping out phase diagrams [42]. In the first step, one randomly chooses a training region in the phase diagram that represents normal data, which is an arbitrary definition. Note that no prior knowledge about the phase diagram is therefore required. The circuit representing the anomaly syndrome is then trained on ground states of the training region, and tested on the whole phase diagram. States in the same phase as the training data are normal and can be disentangled, leading to a low cost. Anomalous states can be inferred through an increase in the cost function signaling that the corresponding ground state cannot be disentangled by the optimized circuit. From the resultant cost profile, we can deduce the phase boundary between the phase the circuit has been trained on and any other phases in the diagram. This procedure is then repeated by training in the anomalous region from the previous iteration until all phase boundaries are found. An example is provided in Fig. 3.
Anomaly detection is a semi-supervised learning task. The setting is typically that one is provided with one class of data that is well known, normal data, and aims at finding outliers of that distribution, anomalous data. An archetypical example is credit card fraud where a big database of normal transactions is provided and one aims at finding fraudulent ones. We consider anomaly detection semi-supervised as labeled data (x, "normal") is provided for training while (x, "anomalous") is to be inferred. Here, however, we arbitrarily define (x, "normal") and iteratively find the different classes (phases of matter). The definition of (x, "normal") is arbitrary and does not necessitate prior knowledge. Furthermore, it is merely a means to an end to find the different classes. In that sense, the way anomaly detection is used to map out the phase diagram can be regarded as an unsupervised learning method.
Note that in previous works, where the same task has been tackled with classical machine learning techniques, it has been shown that a single ground state was sufficient to successfully train the model [44]. This feature stems from the fact that ground states within the same phase share similar properties and there is very little variance when changing the physical parameters inside one phase. We observe this feature also in the training of the VQAD.

A. Simulations with ideal quantum data
In order to test the performance of VQAD, we first study the one-dimensional extended Bose Hubbard model with dimerized hoppings (DEBHM) [52], where b † i (b i ) is the bosonic operator representing the creation(annihilation) of a particle at site i of a lattice of length L. The tunneling amplitudes J − δJ (J + δJ) indicate hopping processes on odd (even) links connecting nearest-neighbor sites, while V represents the nearestneighbor (NN) repulsion. Here, we take the hardcore boson limit, i.e. the on-site repulsion U/J → ∞, such that the local Hilbert space is two-dimensional and each site can only accommodate 0 or 1 bosons. This model can be effectively mapped into a spin-1/2 system [53].
Previous studies of the DEBHM model at half filling (n = 0.5) have demonstrated the existence of three distinct phases [52]. For small and intermediate values of V /J and δJ > 0, we find a topological Mott insulator (TMI) displaying features analogous to a symmetry protected topological phase appearing in the dimerized spin-1/2 bond-alternating Heisenberg model [53]. On the other hand, for negative values of δJ we expect a trivial Mott insulator (MI), while in the regime where the nearest-neighbor repulsion dominates, a charge density wave (CDW) appears.
In Fig. 3(a)-(b), we study the phase diagram of the model in Eq. (2) in terms of the parameters δJ and V /J, using the density matrix renormalization group algorithm (DMRG) [54][55][56]. In order to differentiate between the Mott insulating phases and the CDW, one can compute the CDW order parameter which detects staggered patterns in the density. In Fig. 3(a) we report a vanishing value of O CDW everywhere but in the region with large values of V /J, which corresponds to the CDW [57]. To characterize the TMI we study the entanglement spectrum (ES), which is expected to be doubly degenerate in a topologically nontrivial phase [58] due to the existence of edge states. The entanglement spectrum {λ i } is defined in terms of the positive real-valued Schmidt coefficients {α i } of a bipartite decomposition of the system by α 2 i = exp(−λ i ). We determine its degeneracy using In Fig. 3(b), we show that the quantity D ES vanishes only for small NN interaction strengths V and positive values of δJ, which correponds to the TMI. The trivial MI and CDW phases do not show a degeneracy and hence do not host topological edge states. In the following, we test the capabilities of the VQAD with ideal states obtained from DMRG simulations. The anomaly syndrome is trained using a single representative ground state within one of the phases such that the cost measured at the trash qubits is minimised and the states of this phase can be efficiently compressed by the circuit. Afterwards, the trained circuit processes all states from the full phase diagram, ideally with similarly low cost in the same phase and significantly higher cost in other phases.
In Fig. 3(c)-(e) we show the resultant cost diagram for three circuits, each optimized at a different point in the phase diagram. Indeed, ground states outside of the training phase give rise to a large cost and hence are correctly classified by the VQAD as anomalous. Surprisingly, a single ground state example (indicated by the cross) was sufficient to successfully train the VQAD and infer all three phases. Similar results were recently reported for the case of classical anomaly detection using neural network auto-encoders [44].
To demonstrate the robustness of the VQAD against noise present in currently available NISQ devices we apply a depolarizing noise channel after each gate with error probabilities p err = 0.001 (single-qubit gates) and p err = 0.01, 0.07 (two-qubit gates) and show two exem-plary cost profiles of the trained anomaly detector in Fig. 4. Since the noise becomes more prominent with larger circuit depths, we used the two-layer VQAD circuit ansatz with only two trash qubits in this case. While it is not possible to reach a cost of zero in the training phase, the optimization still converges and all three phases can be successfully inferred. Hence, this suggests that even if the VQAD is not able to fully disentangle the trash qubits, the phase diagram can still be recovered from the resultant cost profile.

B.
Experiments on a real quantum computer We have seen that with ideal quantum data, VQAD can map out non-trivial phase diagrams including topologically non-trivial phases with and without noise in the anomaly syndrome. Next, we discuss its performance in real-noise simulations, that is with noise profiles and qubit connectivities from a real quantum device. Furthermore, we perform the quantum simulation subroutine, i.e., the ground state preparation via VQE, on the same circuit. For this task, we consider the paradigmatic transverse longitudinal field Ising (TLFI) model [59] where X i , Z i are the Pauli matrices on site i, J is the coupling strength, and g x , g z are the transverse and longitudinal fields, respectively. For g z = 0 the model is exactly solvable and shows a quantum phase transition from a ferromagnetic (antiferromagnetic) phase for g x /J < 1 and J negative (positive) to a paramagnetic one for g x /J > 1 [60]. In the following we set J = 1 and vary the longitudinal and transverse fields. In this regime the model is not exactly solvable and the phase diagram has been extensively studied numerically [61,62]. The antiferromagnet-paramagnet quantum phase transition is best characterized by the order parameter which in this case is the staggered magnetization We simulate the ground states of the Hamiltonian in Eq. (5) using VQE for L = 5. On a noisy device, longrange entangling gates are performed by consecutive local two-qubit gates (SWAP operation), increasing the actual circuit depth. A large number of consecutive gates leads to decoherence due to gate errors and destroys the results. With the circuit presented in Fig. 1 for the VQE subroutine, we found a trade-off between expressibility and noise tolerance with a circular entanglement distribution and only one layer. Additionally, we performed measurement error mitigation [63], which can further improve the results of the cost function as seen in Fig. 7 in App. A. For small values of g x and g z , in the ferromagnetic ordered phase, the ground states ψ |10101 ( Ŝ = 1) and ψ |01010 ( Ŝ = −1) have a similar energy, which is why the optimization can get stuck in local minima. Hence, in the ordered phase, VQE can converge to both a state with positive or negative staggered magnetization, or an equal superposition of the two as can be seen in Fig. 5(a). The VQAD simulation results in Fig. 5(b) show a perfect correlation between positive Ŝ and low cost, and vice versa, negative Ŝ and high cost -which, intuitively, can be expected [64]. The disordered phase is detected from the plateau of high cost (∼ 1).
We see that VQAD also performs well under realistic conditions, so we next test the algorithm on a physical device. For this task, we use the L = 5 qubits on ibmq_jakarta [63]. To avoid jumps in the staggered magnetization in the ordered phase and improve convergence of the VQE optimization, we reuse already optimized parameters at neighboring points in the phase diagram as a good initial guess. Due to a large computation time overhead per execution on the real device, we additionally prepared pre-optimized parameters for both subroutines from a realistic noisy simulation, and use these as initial guesses for the optimization on the device. We found that for computing the staggered magnetization it is actually not necessary to re-run the VQE optimization on the physical device, and we can achieve faithful results by directly using the optimized parameters from the simulation as seen in Fig. 6. The resulting cost values for the optimized circuit, plotted in Fig. 6, clearly distinguish the two phases, with the cost from the experiment showing solely an almost constant offset compared to the noisy simulation. |Ŝ| ibmq jakarta Training point Figure 6: Real device VQAD experiments: We show the order parameterŜ compared to the VQAD results both for execution on ibmq_jakarta and noisy simulators with the same noise profile. We trained on a single ground state in the ordered (a) and paramagnetic (b) phase. For samplingŜ, we use the same parameters for the VQE circuit in simulation and experiment. All values forŜ in the paramagnetic phase are negative, hence, for better visualization we plot its absolute value |Ŝ|. For training the anomaly syndrome, the optimized parameters from the simulation are taken as an initial guess.

IV. OUTLOOK
We showed that our proposed algorithm is capable of mapping out complex phase diagrams, including topologically non-trivial phases. We further demonstrated that the algorithm also works in realistic scenarios for both real-noise simulations and on a real quantum computer. Hence, we provide a tool to experimentally explore phase diagrams in future quantum devices, which will be especially useful when physical devices surpass the limit of what can be classically computed.
Currently, the main bottleneck of VQAD is the presence of noise in real devices. We were able to improve our anomaly detection scheme by employing measurement error mitigation and adopting the circuits according to the physical device. These results are promising, and with current efforts on enhancing device performances, error mitigation and circuit optimization strategies in the community, we are hopeful to see even further improvements soon.
In this work we focused on using VQAD to extract the phase diagram of quantum many-body systems. A possible future extension would be to apply it to the problem of entanglement witnessing and certification in manybody scenarios without tomography. Furthermore, the use of an autoencoder-like architecture has the advantage over kernel-based schemes in that there exists tools of interpreting the feature space in classical autoencoders to gain physical insights [39], which can be a possible future extension for the quantum case discussed here.