Is quantum advantage the right goal for quantum machine learning?

Machine learning is frequently listed among the most promising applications for quantum computing. This is in fact a curious choice: Today's machine learning algorithms are notoriously powerful in practice, but remain theoretically difficult to study. Quantum computing, in contrast, does not offer practical benchmarks on realistic scales, and theory is the main tool we have to judge whether it could become relevant for a problem. In this perspective we explain why it is so difficult to say something about the practical power of quantum computers for machine learning with the tools we are currently using. We argue that these challenges call for a critical debate on whether quantum advantage and the narrative of 'beating' classical machine learning should continue to dominate the literature the way it does, and highlight examples for how other perspectives in existing research provide an important alternative to the focus on advantage.

Machine learning is frequently listed among the most promising applications for quantum computing. This is in fact a curious choice: Today's machine learning algorithms are notoriously powerful in practice, but remain theoretically difficult to study. Quantum computing, in contrast, does not offer practical benchmarks on realistic scales, and theory is the main tool we have to judge whether it could become relevant for a problem. In this perspective we explain why it is so difficult to say something about the practical power of quantum computers for machine learning with the tools we are currently using. We argue that these challenges call for a critical debate on whether quantum advantage and the narrative of "beating" classical machine learning should continue to dominate the literature the way it does, and highlight examples for how other perspectives in existing research provide an important alternative to the focus on advantage.
The average number of papers on the arXiv's quantum section that relate to machine learning has increased from a handful of contributions per year in the early 2000s to a few papers per day in 2021. 1 A large share of this literature can be attributed to the field of quantum machine learning, which investigates how quantum computers can be used to solve machine learning problems [1][2][3][4][5], stemming from both conventional and "quantum" [6,7] data. The dominant goal in quantum machine learning is to show that quantum computers, with their properties like entanglement and interference, offer advantages for machine learning tasks of practical relevance [8]. This question is particularly important to the emerging quantum technology industry, which has been driving the discipline right from the start [9,10], and which often names machine learning as one of the core application areas for quantum computers.
In this perspective, we want to put the goal of beating classical machine learning under critical scrutiny and argue that the scale of progress we seek may require at least a partial liberation from the "tunnel vision of quantum advantage". First, in Section I we explain why -contrary to commercial expectations -machine learning may turn out to be one of the hardest applications to show a practical quantum advantage for (see also Table I): (a) machine learning is famous for notoriously powerful algorithms that set a challenging baseline for quantum algorithms, (b) the inputs to training algorithms are increasingly big and therefore hard to handle by early quantum computers, (c) the problems tend to stem from the human domain and are much messier than the tasks solved by standard quantum algorithms, (d) machine learning theory provides a shifting ground to work with, since past assumptions and intuition is currently being upheaved by deep learning, and (e) we only have limited options to * maria@xanadu.ai 1 This observation is based on an arXiv API search in the quant-ph category for papers using the term "learn" as substring in title or abstract. A similar trend is found for usage of the term "neural network".
practically evaluate our methods with benchmarks. To state it in simple terms, quantum machine learning research is trying to beat large, high-performing algorithms for problems that are conceptually hard to study. At the same time, the tools that quantum computing offers to think about advantages -essentially, experiments on prototype quantum devices and over 30 years' worth of knowledge on provable asymptotic speedupsare severely limited. Consequently, showing that quantum can beat classical machine learning may only be possible in highly abstract settings or on very small scales at this stage. Focusing on quantum advantage therefore means focusing only on a biased subset of models, datasets and theoretical approaches, namely the ones we can tackle under these difficult conditions -a fact that we should discuss more critically. It is important to be clear that these challenges do not mean we should stop trying to figure out what quantum computers can offer for machine learning. But judging the value of research from the limited lens of speedups could prevent important research areas from emerging, and in the worst case, it may even hinder the innovation needed to find use cases for quantum computers in the future. This is what we argue in Section II.
However, if we decide to let go of the goal of beating classical machine learning for a moment, what other meaningful questions can we ask? In Section III we discuss how existing research already yields rich results without having to only look at quantum advantage. First, we use the discussion around quantum perceptrons [11][12][13][14][15][16][17][18] to motivate the question What are good building blocks for quantum models?, where "good" can have much more diverse interpretations than speedups and beating benchmarks. Second, we explain how the connection between a large class of quantum machine learning models and kernel methods probes the important question How can we bridge quantum computing and classical learning theory to gain a better understanding of quantum machine learning?, rather than only finding classically intractable quantum kernels. Third, we use the technique of computing gradients on a quantum computer as an example of a successful subfield that has not been purely driven by an advantage from the very beginning, but by the question How can we make quantum software ready for machine learning applications?. We believe that all three ingredients -the right quantum models to study, theoretical tools with which we can study them, and software solutions to scale experiments -are important for a meaningful attempt to explore the benefit of quantum computing for machine learning in future. But, somewhat paradoxically, limiting what we deem worth researching in these and other areas by whether or not a paper can demonstrate that "quantum is better" may actually prevent us from laying these much-needed foundations.

I. WHY MACHINE LEARNING IS SUCH A CHALLENGING PROBLEM
To be more precise about why machine learning may be a challenging application for the state that quantum computing is in, we will have to become a bit more technical, and look at how a machine learning task can actually be formulated as a mathematical problem using the framework of empirical risk minimisation. This may contain familiar material to some readers, but will help us to make the argument of the following section more explicit.

A. How to formalize learning
Intuitively, learning is the acquisition of skills from examples 2 (some useful textbooks are [20][21][22]). In machine learning, computers are the "agents" that learn, and examples are represented by data. Skills can be as diverse as navigating a physical body in an environment, playing chess, generating artificial images, or translating languages. These situations have been captured by the famous distinction into supervised, unsupervised and reinforcement learning. It is a bit surprising at first that most of machine learning theory only focuses on supervised learning -which is not so much a reflection of importance, but of the fact that supervision, or the provision of some information on the "ground truth", makes it easier to define what it means for a problem to be solved. At the same time, supervised learning does not have to deal with interaction between the learner and the data as is common in reinforcement learning.
A rather general version of a supervised learning problem can be stated as follows 3 : 2 See [19] for an interesting debate that is challenging modern machine learning, arguing that learning is the efficient acquisition of skills from examples -rather than just fitting massive models to massive datasets. 3 There are many different ways to formalise the notion of learning, and as usual in science, there were strong trends in what ?
FIG. 1. A central problem in machine learning is how to find a model that performs well with regards to a distribution p(x) over datapoints x if only a small set of samples from that distribution is given. In supervised learning, the data samples are labeled (red and green dots) and the goal is to label a new sample (blue dot). Especially in high dimensions, the samples will not be able to provide information on the entire data space (here indicated by the region of high density with no samples). Learning is only possible if the distribution, model and/or model selection strategy contains a lot of structure, which is not always easy to analyse theoretically.
Definition 1 (Supervised learning task) Consider a suitable data input domain X and a label domain Y, as well as a probability distribution p(x) over inputs x ∈ X . We assume that there is some ground truth mapping f * : X → Y of inputs to target labels. We are given a finite set of inputs sampled from p(x), together with their target as well as a loss l : Y ⊗ Y → R that tells us how well a label predicted by a function f : X → Y compares to the target label. The task is to find a model f from a class of model functions F that minimizes the expected loss over the data distribution, For image recognition, which was one of the early success stories of modern machine learning, the inputs are numerical representations of images, and the labels could be a binary tag that indicates whether the image contains harmful content. The distribution p(x) describes the probability with which we can expect to be given certain images in the problem, but is necessarily unknown in a real-life task. Instead, we are given a subset of example images drawn from this hypothetical distribution (see Fig. 1), as well as information on which contain harmful content (y = 1) and which do not (y = 0). A typical loss machine learning research considered to be a "relevant" setting. For example, in the past learning from membership queries [23], where we can actively influence which data we are given, was often considered. A popular alternative flavour to supervised learning as we set it up here is PAC learning, where learning translates to finding a model so that with a high probability, the loss of an input x sampled from p(x) has a loss smaller than a threshold.

Property
Problems studied in quantum computing Problems solved by machine learning classical performance low -problems are carefully selected to be provably difficult for classical computers high -machine learning is applied on an industrial scale and many algorithms run in linear time in practice size of inputs small -near-term algorithms are limited by small qubit numbers, while fault-tolerant algorithms usually take short bit strings very large -may be millions of tensors with millions of entries each problem structure very structured -often exhibiting a periodic structure that can be exploited by interference "messy" -problems are derived from the human or "real-world" domain and naturally complex to state and analyse theoretical accessibility high -there is a large bias towards problems about which we can theoretically reason shifting -theory is currently been re-built around the empirical success of deep learning evaluating performance computational complexity -the dominant measure to assess the performance of an algorithm is asymptotic runtime scaling practical benchmarks -machine learning research puts a strong emphasis on empirical comparisons between methods function is simply an indicator function 4 Minimising the expected loss over the data distribution is another way of saying that we want our model f to do well with regards to the loss over all data we can expect to see.
We can now pinpoint more precisely why machine learning applications are so hard to access from a theory point of view: in all but pathological examples, the probability distribution p(x) as well as the target function f * in Definition 1 -and hence an important part of Eq. (1) -is unknown. Even if we could model it, the integral in Eq. (1) will be hard to compute for all but special cases. In other words, even a very basic formalization of machine learning translates to a mathematical problem that is usually unsolvable.

B. Solving the problem in practice
Even though surprisingly few beginners to machine learning are aware of this correspondence, the standard approach of how to deal with this predicament is to solve a proxy problem to Definition 1 and hope that it translates well to the original one. The proxy problem is known as empirical risk minimization, and prescribes to evaluate the model performance using the finite set of data samples D: Much of learning theory tries to find guarantees on how solving the empirical proxy will generalize to the original problem, or how solutions found with a finite sample size perform on the original distribution. 5 The performance of a model on unseen data is usually measured on a test set of further data samples that have not been used for training, and most papers in the machine learning literature report the error on the test set by running benchmarks on famous datasets. While this sounds straightforward, getting high-quality results that do not depend on implementation details is hard. For quantum machine learning research, parts of which try to adopt the culture of benchmark comparisons, the current limitations of hardware size make it an even more challenging tool to use and interpret.
In summary, while an important component of machine learning is optimization, its central aim is generalization, which is non-trivial to formalize and measureeven more so when we want to add quantumness into the mix.

C. Deep learning turns learning theory upside down
Many of the standard tools in machine learning, such as cross-validation and regularization, are trying to fulfil the balancing act of not solving Eq. (3) "too well": We want to use the information provided by the finite data sample, but we do not want to pick up its particularities (which may not be present if we were given a different data set D sampled from p(x)). For example, if coincidentally all images in our data set that have a black pixel in one position are images with harmful content, we do not want to learn the spurious relation that if the pixel is black the image is harmful.
For the longest time, "picking up too much information" was thought to be identical to interpolating the training data perfectly well (i.e., getting a zero average loss over the data). Examples of mitigation strategies are to choose a simple function class F, to add terms to the loss that penalise non-smooth models from that class, or to stop iterative optimization before it converges to a minimum. But since more than a decade, we are consistently getting empirical evidence that challenges this assumption: very large models can fit any function perfectly well, but still generalize beyond the data used for training -even in the presence of noise. This phenomenon was first attributed to a kind of hierarchical model called a deep neural network, but has been observed in other settings as well, and is now understood to be a main characteristic of the regime of so-called deep learning [24,25].
One of the most important goals in machine learning research today is to unite the evidence presented by deep learning with learning theory. This is a formidable challenge due to the mathematical structure of neural networks as long sequences of linear and nonlinear transformations, which make them unwieldy for mathematically modelling. Furthermore, it is by now largely uncontested that the algorithm with which neural networks are trained, as well as the data itself, plays a crucial role in the phenomena we observe in deep learning [24,26,27]. A viable theory therefore cannot just make statements about the model class F, but has to describe the solutionsf to an optimization problem, as well as the data distribution p. This means that even the simplest of toy models has to capture many moving parts, each of which is already difficult to analyse in the first place.
This ongoing revolution in machine learning theorybuilding, as well as the practical success of deep learning itself, obviously pose even more challenges for a theory of quantum machine learning, where we want to add quantum theory as another moving part. At the same time we have only little access to empirical results from "just running the algorithm". And even if few-qubit proofof principle circuits can be simulated (or even run on real hardware), the learning regimes we are trying to understand are not observed on these small scales -which means that we cannot say much about the behaviour that quantum models will exhibit on a realistic problem scale.

II. A CRITICAL LOOK AT QUANTUM ADVANTAGE
The previous section motivated why machine learning is a challenging problem to improve by quantum computers due to the good performance of existing algorithms, large inputs in many applications, the complex mathematical structure of the basic problems, and the little we know about why the best models perform so well, forcing us to gather evidence by benchmarks rather than guiding it by theory.
In this section we will (after a short overview of the research field itself) motivate why in the context of machine learning the tools we currently use to investigate quantum advantage substantially limit and bias the statements we can make about the practical use of quantum computers.

A. Progress in quantum machine learning
While sporadic papers at the intersection of quantum computing and machine learning were published since the 1990s [23,[28][29][30][31][32][33][34][35], quantum machine learning -here defined as research on how to use quantum computers for machine learning tasks from the classical or quantum domain -only gained momentum around 2013 (see references in [1,36]).
Since then one can distinguish two popular approaches to quantum machine learning. In the first years, a common goal was to speed up existing machine learning algorithms by solving (sub)tasks such as matrix inversion [37][38][39], Gibbs sampling [40,41], singular value estimation [42] or search [11,43] on a quantum computer. Since this agenda is pretty much borrowed from the modus operandi of traditional quantum computing, it may not be surprising that the studies are firmly rooted in this parent discipline, and touch upon the intricacies of machine learning research only in the most basic strokes.
The advent of near-term quantum computers led to a growing popularity of the second approach, which considers parametrized or variational quantum circuits as machine learning models [44][45][46][47]. In these proposals, training is done similarly to neural networks: gradientdescent-type algorithms iteratively find better physical parameters of the "quantum model". Central questions in this branch of research are what architectures to choose [48,49], how to compute gradients [50,51], as well as the trainability [52,53], expressivity [54,55] and generalization power [56][57][58][59] of such models using insights from machine learning.
Apart from these two active fields of research there are many other contributions that try to formulate quantum versions of classical learning problems and analyse their scaling. For example we can ask how quantum data distributions change the sample complexity of learning [7,23,60], how classification problems change in a quantum setting [59,61,62], how quantum agents learn from interacting with an environment [63,64], or how quantum Ising models compare to Ising-based machine learning models such as Boltzmann machines [10] or Hopfield networks [65].

B. Quantum advantage
Almost all branches of quantum machine learning research have been heavily framed by the question of "beating" classical machine learning in some figure of merit, such as: • the asymptotic runtime of a particular machine learning algorithm, for example an optimiser used to solve the empirical risk minimisation problem in Eq.
The positive publication bias known from other areas of science [71] is strongly prevalent in most areas of quantum machine learning, and a number of "positive" results have been put forward which either "prove" theoretically or "show" empirically that quantum computers are better at something. A few examples of the typical phrasing in abstracts and introductions are these: • "we establish a rigorous quantum speed-up for supervised classification" [66] • "[w]e prove that [...] quantum machines can learn from exponentially fewer experiments than those required in conventional experiments." [69] • "we prove that PQCs with a simple structure already outperform any classical neural network for generative tasks" [68] • "for achieving accurate prediction on all inputs, we prove that exponential quantum advantage is possible" [7] • "[w]e show that our quantum-inspired generative models [..] generalize to unseen candidates with lower cost function values than any of the candidates seen by the classical solvers." [72] • "[o]ur simulation results show that our quantuminspired models have up to a 68x enhancement in generating unseen [..] samples compared to GANs" [70] Some areas, most notably the trainability of quantum models and the sample complexity in certain learning frameworks, actively discuss results that show which approaches do not lead to advantages, or that could be problematic for practical quantum machine learning: • • "[o]ur main result is that quantum and classical sample complexity are in fact equal up to constant factors in both the PAC and agnostic models" [60].
If so much progress is being made in understanding quantum advantage, why do we think there is a problem? Will this research not eventually narrow down on areas where quantum computers could have a practical impact on machine learning applications -or show by overwhelming evidence that the case is hopeless? We believe that there is a deeper structural issue: the tools we currently have in quantum computing are not sufficient to make meaningful statements about this question. Let us motivate this statement with a few points.
Proving exponential speedups for artificially constructed settings [46,67], on the one hand, is interesting from an academic point of view, but does not say much about possible quantum applications. In the language of machine learning, this approach picks problems that have a heavy bias in the data distribution p(x) and/or the ground truth of the problem to what quantum computers solve well. Or as remarked in [73], "quantum machine learning models can offer speed-ups only if we manage to encode knowledge about the problem at hand into quantum circuits, while encoding the same bias into a classical model would be hard". But as we explained in the previous section, the success of machine learning does not stem from solving a structured, well-understood problem and hand-coding it into the solution method. On the contrary, machine learning is famous for being agnostically applied to a range of problems for which we do not know the exact inductive bias that would suit the data. Furthermore, the principle of no free lunch [74] in machine learning states that for any algorithm performing well on one problem there will always be another problem on which it does not perform well. Good performance on selected hand-crafted examples therefore does not tell us anything about quantum computers as general learning tools. Note that relaxing the need for exponential speedups can lead to provable advantages in more general settings, (i.e., see [75,76]), but it is currently questioned that such advantages will have an impact considering the overhead of error correction in fault-tolerant quantum computers [77].
The "traditional approach" to quantum machine learning mentioned in the previous section follows a different logic and looks for exponential speedups to widely applicable algorithms like support vector machines and neural nets [38,39]. Since these algorithms evidently have efficient runtimes, the goal is usually to reach sub-linear scaling. Here we find another issue, namely that we need extreme assumptions about data loading and read-out settings, and fair comparison to classical models have been challenged in the past [78,79].
Another potential issue of proving or disproving whether quantum machine learning "works" is the tendency of making statements about average or worst-case properties of extremely large model families, such as the class of models we can express as f (x) = tr{ρ(x)M } (where ρ is a quantum state depending on x and M any observable), or models constructed from circuits sampled according to the Haar measure [52,58,73]. Such statements do not preclude a more specific subclass of quantum models to have entirely different statistical properties. As a comparison, we may be able to prove that all models we can express on a classical computer have certain average/worst-case properties for learning, which does not prevent specific models like boosting or GANs to perform in an entirely different manner.
Empirical studies, on the other hand, tend to compare to very specific classical models on (necessarily) small datasets [45,57], and it is consequently hard to tell if advantages are due to the careful selection of the hyperparameters, benchmarks and comparisons, or if it is a structural observation. Small changes in the -often ad-hoc designed -architecture of the circuits can vary results significantly [80]. Only few studies try to reproduce existing results [81], or critically ask what measures we should apply [70] in our benchmarks other than borrowing concepts from classical machine learning. We also know very little about the scaling of empirical results to larger problem sizes, which will still be a challenge for experiments in years to come.
In our view, the question about whether quantum computers can really play a role in identifying practical machine learning applications is therefore still wide open, and unlikely to be decided by theoretical proofs or smallscale experiments. These tools should be considered more as a means to foster our understanding and test hypotheses in a well-defined setting. This is very rele-vant at the current state of quantum machine learning, where we observe an increasing resignation in informal conversations with colleagues and students as quantum machine learning fails to produce immediate commercial use-cases. The frequently repeated solution is to discard quantum computers for classical data processing [73], and instead see the future of quantum machine learning in analysing data in the form of quantum states [6,69]. But following the thoughts laid out in this section, we should ask ourselves if switching our attention to "quantum data" is subconsciously motivated by the hope that it suits our traditional proof techniques better, rather than providing a mature use-case.

III. ALTERNATIVE RESEARCH AGENDAS
Acknowledging the current difficulty of proposing quantum algorithms that improve the performance of machine learning does not mean that quantum machine learning research is at a dead end. Quite the contrary -recent years have shown a lot of interesting and fruitful research areas which have grown our understanding of the intersection without focusing on advantages only. We want to illustrate this now with three examples. The first two examples -the search for a quantum perceptron and the link between quantum circuits and kernel methodsshows how a research area can or has been framed from both an "advantage" and "non-advantage" perspective; either approach leads to different kinds of investigations which can mutually benefit from each other. The third example, the training of quantum circuits using gradients and automatic differentiation software, highlights an area that enabled quantum applications research without directly trying to improve classical algorithms in the first place.

A. Quantum perceptrons or the search for building blocks of quantum models
A perceptron [82] is a simple function where x is an input vector, w a vector of trainable weights, and ϕ a nonlinear scalar function. The perceptron has a long history that connects machine learning with biological models of the brain. It is the basic building block of neural networks, and hence most of the modern deep learning models used in practice today. Ways of constructing quantum versions of perceptrons have sparked the imagination of researchers since more than 25 years, and quantum machine learning consequently contains a huge variety of proposals (see for example [11,12,[15][16][17][18][83][84][85][86], to only mention a few). Implicitly, quantum perceptrons are motivated by the success of classical perceptrons, and the desire to port this success over to the quantum domain. Depending on whether we want to prove a quantum advantage or not, very different study designs emerge. An advantage focus would require a comparison of quantum and classical versions with respect to runtime or performance in learning tasks. The design would have to focus on enabling this advantage (a feat that to our knowledge has not been convincingly performed yet).
Shifting the motivation to other figures of merit allows us to shed a different light on the search for a quantum perceptron. As done in many studies, one could ask what the most natural equivalent of a non-linear activation function would be in quantum algorithms. But we could also try to find an efficiently trainable unit for quantum machine learning models that quantum hardware can easily implement. Other figures of merit are the simplicity of the model to allow theoretical investigations into training and generalisation behaviour, or whether it allows us to pinpoint "non-classicality" or "quantumness", so we can directly study its influence on learning. All these alternative figures of merit lead to very different design choices.
We want to remark in passing that without critical reflection the role of a universal building block has been filled by the ubiquitous Pauli rotations that we are so used to from quantum computing textbooks. But are we able to do better? Is there another "unit" that can provide a playground for theoretical insight and direct us towards the right practical implementations, such as the Ising model did for many-body physics [87], or linear models for deep learning [88,89]? Ironically, this change of perspective is not unlike the development of the perceptron itself: while researchers originally wanted to mimic a powerful concept for learning, namely the brain, porting it over to the computational domain required finding the right abstraction rather than emulating the original. Likewise, quantum researchers are trying to mimic the perceptron that has proven to be a powerful concept in classical machine learning, but it may turn out that rather than emulation, we ought to distill the crucial properties of this model to make it suitable for the quantum computing domain.

B. Quantum kernels as a bridge between quantum computing and learning theory
The second example we want to bring forward is that of quantum kernel methods. The research area of quantum machine learning grew out of the realization that data encoding is what machine learning researchers call a "feature map" [38,90], which means that many quantum circuits can be understood as a linear model in a feature space of the data [91,92]. Again, part of this research area has been framed by (and used for) the search for quantum advantage [56,66,92]. But there is a complementary angle: we can see this research area as an attempt to find formal connections between quantum and machine learning theory, connections which help us to FIG. 2. Many quantum circuits used as supervised machine learning models can be understood as mapping data to quantum states and then distinguishing these quantum states via hyperplanes defined by measurement observables. Such linear models in high-dimensional spaces are known as kernel methods in classical machine learning, and connect quantum machine learning to a rich set of tools to analyse optimization, learning and generalization.
apply results from one field to the other (see also [93]). We want to briefly introduce the basic concepts of quantum kernel research (see Fig. 2) to compare these two angles in more detail.
In a nutshell, quantum kernel research is based on the insight that if we encode a data input x ∈ X into a quantum state ρ(x) (for example via a quantum state preparation routine), the expectation of an observable M can be interpreted as a machine learning model of the form Realising that the trace is an inner product (known as the Hilbert-Schmidt inner product) in the space of complexvalued matrices, and that ρ(x) maps the input x into this space, we can state that the "quantum model" in Eq. (5) is a linear model of the form where φ(x) is a feature map from the data space to a feature space H, w a weight vector, and ·, · the inner product in H. Most often, H is simply R N . The weight vector then contains trainable parameters and defines a linear hyperplane that can be used to separate classes of data in a supervised learning problem. Likewise, in many variational quantum models M = M(θ) from Eq. (5) is trainable: by optimising a parametrized circuit before a fixed measurement, we effectively choose a measurement basis (and hence the discriminating hyperplane) via optimization. This innocent link has immense consequences. Linear models in high-dimensional spaces are the core of one of the richest corners of machine learning theory, namely kernel theory, which we can now use to understand "quantum models" [93]. For example, kernel theory tells us that quantum models of the form in Eq. (5) can be rewritten as a linear combination of the "distances" between quantum states encoding the training data points x m and the quantum state encoding the input x we seek to classify, Instead of learning the parameters θ in a variational circuit, we can learn the coefficients a m and only need the quantum computer to evaluate the trace term (which for pure states reduces to the overlap | ψ(x)|ψ(x m ) | 2 ). Furthermore, we are guaranteed that the optimal coefficients a m construct a model that is also the global minimum of the empirical risk minimization problem in Eq. (3). In other words, while Eq. (7) may define a smaller function class compared to Eq. (5), it still contains the solution we want to find. If the loss used to compare predictions with target labels is convex, the entire optimization problem is convex and hence conceptually simple to analyze. It also guarantees that we can find the optimal solution.
Understanding quantum computers as "kernel evaluators" can tell us something about quantum advantage. In situations where this link holds, potential speedups have to be located in the evaluation of the kernel tr{ρ(x), ρ(x )} [56]. We can investigate kernels based on circuits that are believed to be classically intractable [92], and prove end-to-end quantum advantages for very specific learning problems [66]. At the same time, the necessity to estimate the value of the kernel function using finite shots introduces an overhead [94,95]. While this research is certainly valuable, a classically intractable kernel that is useful for practical machine learning tasks has yet to be found.
On the other hand, we can view quantum kernel theory purely as a tool for theory-building. For example, the link connects quantum circuits to linear representations of neural networks such as neural tangent kernels [96] and random Fourier features [97], which are central to current investigations of deep learning -a fact that has been explored in a series of recent papers [98][99][100]. The theory of kernel methods also allows us to study generalization by making statements about the margin between the dataencoding states ρ(x) for two different classes of data [59] or the regularization properties of a model [101]. Finally, it allows us to port over insights from quantum state discrimination as to what optimal decision boundaries are [102]. None of these studies directly tries to answer the question of whether or not quantum computers could be superior for learning, and leads to very different kinds of results.
There are many similar points of contact between quantum and machine learning theory, such as the interpretation of quantum measurements as samples from a generative model [103], or the proximity of quantum computers to machine learning models inspired by manybody-physics [10], and the usefulness of neural networks in representing quantum states [104].
C. Quantum gradients and making quantum software ready for machine learning applications The last example highlights an area of research that massively increased our capability of performing experiments and building software around quantum machine learning without having quantum advantages as an immediate goal. It is the study of gradients of quantum computations, and how to retrieve them from performing other, efficient, quantum computations. In fact, it is well known that so-called parameter-shift rules [50,51] put forward for this task are less efficient than classical backpropagation, since they require a full model estimation per model parameter (while each estimation requires many shots or runs of the model circuit).
Historically, the dominant representation of quantum computations involved static algorithms that were handdesigned by expert theorists to maximally leverage coherent effects. More recently, there has been growing recognition that adding free parameters to quantum circuits allows them to represent an entire family of functions, while retaining the unique coherence properties that make quantum algorithms distinct [105]. The best value of these parameters for a particular task can then be determined variationally. This expansion makes it easier for researchers to quickly test out new ideas and discover new quantum algorithms -as evidenced by the recent explosion of works on variational quantum circuits -but comes with the caveat that such classes of circuits may be harder to pin down theoretically (compared to, for example, kernel methods discussed in the previous section). Notably, this dichotomy mirrors the presentday situation in deep learning.
In the variational framework, a quantum circuit implements a function of the form 6 where in contrast to Eq. (6) we included the free parameters θ in the measurement, and also allowed for a trainable state ρ(x, θ). Typically, the free parameters correspond to rotation angles of gates in a quantum circuit. This presents us with a new task: given a parametrized circuit, how should we adjust the parameter values to "train" the circuit to minimize some loss function l that measures the quality of f (x, θ)? While many options are available for training, there are very intriguing links with the workhorse algorithm used to train deep learning models: gradient descent. In gradient descent, we optimize a loss function by computing its gradient with respect to the free parameters, Cost   FIG. 3. Parametrized quantum circuits can be trained as parts of larger machine learning pipelines by making use of automatic differentiation and the fact that we know in many settings how to estimate analytic gradients of cost functions with respect to circuit parameters. and iteratively updating the parameters in the direction of the gradient. From the chain rule, we must therefore determine the gradient of the model function, ∇ θ f (x, θ), with respect to the circuit's free parameters θ. Modern software tools like TensorFlow [106] or PyTorch [107] largely automate the gradient computation of deep learning models using the backpropagation algorithm [108]. These libraries even let a user optimize custom functions -such as an expectation value produced from calling quantum computing hardware -provided the user also supplies the gradient of this function.
Drawing on insights originally developed from quantum optimal control [109], it turns out to be remarkably simple to compute the gradients of (many) quantum circuits. Using a technique now known as the parametershift rule [50,51], we can evaluate the derivatives ∂f ∂θi of a parametrized circuit 7 -and hence the gradient as well -by running the same circuit with parameter θ i shifted forward and backward by a fixed amount, This technique, which has since been generalized to more and more cases [110][111][112][113][114][115][116][117], has a similar form to the numerical finite-difference approximator, but in fact provides an analytically exact expression 8 for any shift value s = 0, π. And although it does not match the efficiency of backpropagation 9 , the simplicity of the parameter-shift rule makes it a very hardware-friendly mechanism for computing quantum circuit gradients. Parameter-shift rules, quantum gradients and the resulting surge in quantum software for automatic differentiation are a prime example of research that strives by enabling quantum machine learning applications, rather than demanding superiority of quantum algorithms. Armed with the ability to evaluate quantum models and compute their gradients, we can directly "plug and play" with existing deep learning tools and train quantum circuits the same way as we train neural networks. We can connect differentiable quantum subroutines into larger hybrid quantum-classical models and train the whole pipeline end-to-end using any of the specialized gradient-based optimizers developed in deep learning, such as Momentum or Adam [118] (see Fig. 3). Finally, the links unveiled through the study of quantum gradients and training quantum models open up a rich opportunity for cross-pollination of ideas between quantum computing and deep learning. For example, we have already seen the arrival of "quantum-aware" optimizers [119][120][121][122] which tweak ideas from deep learning to make them more native to the quantum setting. On the theory side, we can leverage the latest (admittedly, still evolving) theoretical insights on optimization landscapes and generalizations coming from deep learning, and potentially adapt them to better understand phenomena such as barren plateaus [52]. As our understanding increases, ideas and techniques from quantum computing can even find their way back into deep learning. A recent example is the use of tensor-network-based models in place of standard neural networks [123].

IV. MOVING FORWARD
This perspective advocated a shift in the research agenda of quantum machine learning away from investing all our resources into the notion of "beating" classical algorithms. Sections I and II tried to motivate such a shift by arguing that the goal of showing quantum advantages forces us to limit our analytical focus to the very few problems we can actually study in a setting as complex as machine learning, while Section III showcased existing areas framed by alternative research questions. Until quantum computers become available to do large-scale benchmarks, asking more fundamental questions may be a very good use of our time, but require a bit of courage to withstand the narrative of trying to find the billiondollar quantum "supremacy", or to resist catchy expressions like "the power of deep quantum neural networks".
A paradigm shift is never easy, and will require the community to make subtle but crucial adjustments, for example to the way that supervisors guide students, how science journalists portray the topic, how companies formulate their deliverables, and how reviewers judge publication-worthiness. However, in the end this may be exactly what is needed to push quantum machine learning research to the level that leads to future industrialscale applications.