Entanglement-Induced Barren Plateaus

Open Access

Entanglement-Induced Barren Plateaus

Carlos Ortiz Marrero, Mária Kieferová, and Nathan Wiebe

PRX Quantum 2, 040316 – Published 25 October 2021

Abstract

We argue that an excess in entanglement between the visible and hidden units in a quantum neural network can hinder learning. In particular, we show that quantum neural networks that satisfy a volume law in the entanglement entropy will give rise to models that are not suitable for learning with high probability. Using arguments from quantum thermodynamics, we then show that this volume law is typical and that there exists a barren plateau in the optimization landscape due to entanglement. More precisely, we show that for any bounded objective function on the visible layers, the Lipshitz constants of the expectation value of that objective function will scale inversely with the dimension of the hidden subsystem with high probability. We show how this can cause both gradient-descent and gradient-free methods to fail. We note that similar problems can happen with quantum Boltzmann machines, although stronger assumptions on the coupling between the hidden and/or visible subspaces are necessary. We highlight how pretraining such generative models may provide a way to navigate these barren plateaus.

Received 15 February 2021
Revised 21 July 2021
Accepted 13 October 2021

DOI:https://doi.org/10.1103/PRXQuantum.2.040316

Published by the American Physical Society under the terms of the Creative Commons Attribution 4.0 International license. Further distribution of this work must maintain attribution to the author(s) and the published article's title, journal citation, and DOI.

Published by the American Physical Society

Physics Subject Headings (PhySH)

Machine learning Quantum algorithms

Quantum Information, Science & Technology

Authors & Affiliations

Carlos Ortiz Marrero^1,*, Mária Kieferová^2,†, and Nathan Wiebe^3,4,‡

¹Data Sciences and Analytics Group, Pacific Northwest National Laboratory, Richland, Washington 99354, USA
²Centre for Quantum Computation and Communication Technology, Centre for Quantum Software and Information, University of Technology Sydney, New South Wales 2007, Australia
³Department of Computer Science, University of Toronto, Ontario M5S 1A1, Canada
⁴High Performance Computing Group, Pacific Northwest National Laboratory, Richland, Washington 99354, USA

^*carlos.ortizmarrero@pnnl.gov
^†maria.kieferova@uts.edu.au
^‡nwiebe@cs.toronto.edu

Popular Summary

The promise of quantum machine learning is that by incorporating quantum effects, such as entanglement, into machine-learning models, researchers can improve model performance and understand more complex data sets. This hope is particularly pronounced in the design of deep quantum neural networks, which attempt to boost the performance of existing deep-learning models by allowing entanglement between the visible and hidden variables in the model. In our work, we show that applying this approach to quantum deep learning is problematic, given that an excess of entanglement between the hidden and visible layers can destroy the predictive power of these models. Our key insight is that barren plateaus, i.e., vanishing gradients as the model scales in the number of qubits, can occur as a result of an excess of entanglement between visible and hidden units in deep quantum neural networks.

This surplus of entanglement to some extent defeats the purpose of deep learning by causing information to be nonlocally stored in the correlations between the layers rather than in the layers themselves. As a result, when one tries to remove the hidden units, as is customary in deep learning, one finds that the resulting state is close to the maximally mixed state. Indeed, we show that such situations are generic and that gradient-descent methods are unlikely to allow the user to escape from such a plateau at a low cost. This suggests that if quantum effects are to be used, then they must be used surgically.

Key Image

Article Text

Click to Expand

References

Click to Expand

Issue

Vol. 2, Iss. 4 — October - December 2021

Reuse & Permissions

Author publication services for translation and copyediting assistance advertisement

Images

Figure 1
Examples of QNNs. (a) A quantum unitary network characterized by a circuit with parametrized unitaries $U_{j} = e^{- i θ_{j} H_{j}}$ , where the $θ_{j}$ are the parameters that we aim to learn and the $H_{j}$ are Hamiltonians that specify the QNN. The output is then $U (θ_{1}, \dots, θ_{n}) | ψ_{0} ⟩$ , where $| ψ_{0} ⟩$ can be taken to be $| 0 \dots 0 ⟩$ for generative learning. In this model, visible units correspond to the qubits on which we evaluate the objective function, in this case the last two registers. The remaining qubits are called hidden units. (b) Quantum Boltzmann machines (QBMs) defined on a graph. Each edge and each vertex correspond to a weight on a local Hamiltonian corresponding to the pair of qubits or a single qubit. The top layer of units (circles) corresponds to visible units and the bottom layer (rectangles) are hidden units. QBMs model data as a thermal state $e^{- H (θ)} / Z (θ) := e^{- \sum_{i} θ_{i} H_{i}} / Tr (e^{- \sum_{i} θ_{i} H_{i}}) .$ Without loss of generality, we take $Tr (H) = 0$ for all quantum Boltzmann machines. The aim when training a quantum Boltzmann machine is to learn a vector $θ$ such that for a training objective function given by $O_{obj}$ that acts on the visible subsystem, we maximize $Tr (O_{obj} {Tr}_{h} [e^{- H (θ)} / Z (θ)]$ .
Reuse & Permissions
Figure 2
For an area law, the entanglement entropy scales as the number of qubits on the boundary (in the dashed rectangle).
Reuse & Permissions
Figure 3
(a) A log-log plot showing the trace-distance data in relation to our bound. The blue and orange marked values correspond to the estimated maximum peaks of the data histograms of $1000$ model instances and the width of the shaded area corresponds to two standard deviations for a fix $D_{v} = 2^{1} = 2$ . The green marked values are our bound results, i.e., $E [T (ρ, I / D)] \leq 1 / 2 \sqrt{D_{v} / D_{h}}$ . (b),(c) Semilog plots highlighting the decay in the expected value of the $\infty$ norm of the gradient vector over an ensemble of initialized models. The dashed blue line represents the average of $1000$ model instances. The dash green line represents the best fit obtained from least squares with the standard error of the estimated slope under the assumption of residual normality: (b) gradient estimates for the unitary model; (c) gradient estimates for the normalized quantum Boltzmann machine.
Reuse & Permissions
Figure 4
Computation of the trace distance between the reduced density matrices of our models and the maximally mixed state for $1000$ instances. The models considered have only one visible unit, i.e., $D_{v} = 2^{1} = 2$ . (a) The empirical trace-distance distribution of a real-time evolution ( $e^{- i H t}$ , for $t = 10$ ) of Hamiltonians drawn from the Gaussian unitary ensemble (GUE). (b) The empirical trace-distance distribution of the unitary model. All coefficients are drawn from a uniform distribution over $[0, 1)$ . (c) The empirical trace-distance distribution of the quantum Boltzmann machine. The on-site coefficients, $J_{a}^{i}$ , are drawn $N (0, 0.01)$ . The off-site coefficients, $J_{a, b}^{i, j}$ , are drawn from $N (0, 1)$ . Moreover, the Hamiltonian is normalized by its operator norm.
Reuse & Permissions
Figure 5
(a),(b) A semilog plot highlighting the decay in the variance of the $\infty$ norm of the gradient vector over an ensemble of initialized models. The dashed blue line represents the variance over $1000$ model instances. The dashed green line represents is the best fit obtained from least squares with the standard error of the estimated slope under the assumption of residual normality: (a) gradient estimates for the unitary model; (b) gradient estimates for the normalized quantum Boltzmann machine.
Reuse & Permissions

PRX Quantum

a Physical Review journal