Quantum Kernel Methods for Solving Differential Equations

We propose several approaches for solving differential equations (DEs) with quantum kernel methods. We compose quantum models as weighted sums of kernel functions, where variables are encoded using feature maps and model derivatives are represented using automatic differentiation of quantum circuits. While previously quantum kernel methods primarily targeted classification tasks, here we consider their applicability to regression tasks, based on available data and differential constraints. We use two strategies to approach these problems. First, we devise a mixed model regression with a trial solution represented by kernel-based functions, which is trained to minimize a loss for specific differential constraints or datasets. Second, we use support vector regression that accounts for the structure of differential equations. The developed methods are capable of solving both linear and nonlinear systems. Contrary to prevailing hybrid variational approaches for parametrized quantum circuits, we perform training of the weights of the model classically. Under certain conditions this corresponds to a convex optimization problem, which can be solved with provable convergence to global optimum of the model. The proposed approaches also favor hardware implementations, as optimization only uses evaluated Gram matrices, but require quadratic number of function evaluations. We highlight trade-offs when comparing our methods to those based on variational quantum circuits such as the recently proposed differentiable quantum circuits (DQC) approach. The proposed methods offer potential quantum enhancement through the rich kernel representations using the power of quantum feature maps, and start the quest towards provably trainable quantum DE solvers.


I. INTRODUCTION
Solvers of differential equations (DEs) are essential for all areas of science [1,2].These include fluid dynamics, ecology, finance, medical science, and many more.While some simple instances of differential equations can be solved analytically, in majority of cases numerical solvers are required.Existing numerical methods heavily rely on finite differencing methods on finely discretized grids [3].Other classical methods include global spectral methods which effectively fit a function basis set to the differential equation problem considered [4].Classical numerical solvers often suffer from instabilities that emerge in highly nonlinear systems.Another problem is the curse of dimensionality caused by an increase of grid for multidimensional systems.Thus, developing new techniques to solve DEs remains a hot area of contemporary research [5], and increasingly requires new computational architectures [6,7].
Quantum computing offers advantages in performing certain computational tasks [8][9][10].Enabled by quantum principles, the use of superposition and entanglement can lead to fundamentally different scaling for quantum algorithms as compared to classical approaches [11].One example is for solving linear systems of equations [9], and quantum speed-up of matrix-vector multiplication [12].When using finite-differencing, this translates directly to associated systems of DEs [13][14][15][16][17][18].However, previously described techniques rely on large-scale implementation of quantum phase estimation, and require resource overheads that render their implementation infeasible in fore-seeable future [19].This prompts to develop different approaches that can potentially help solving nonlinear DEs with near-term devices.
Recently, rapid improvement of quantum computing hardware has called for algorithms that can operate in the noisy regime [20].In this case the hybrid quantumclassical workflow is often used.One possibility is to formulate a problem such that solution can be searched variationally [21], where parameterized quantum circuits play a role similar to deep neural networks in classical machine learning (ML).This approach was coined as a quantum machine learning (QML) [22][23][24], and has triggered the development of various ML protocols for quantum hardware [25][26][27][28][29][30][31][32][33][34].Variational approaches were also used for describing quantum evolution [35,36] and linear algebra [37][38][39].In the field of nonlinear differential equations variational algorithms were used together with amplitude encoding [40], where multiple quantum registers are required for encoding nonlinearity.Another approach was proposed in [41], where QML-type workflow is used.There a DE solution is represented by a differentiable quantum circuit (DQC), with nonlinear dependence being introduced via feature map encoding [42] and cost function based readout, while function derivatives are introduced with the automatic circuit differentiation [25,43].Similar solutions were developed for continuous variable QML [44], stochastic differential equations [45], and generative modelling [46,47].
Another facet of quantum machine learning was revealed when formulating models in terms of kernelssimilarity functions that define a distance between two data points [24,48].The core concept of kernel methods is the so-called 'kernel trick' that maps data into a highdimensional space [49].Kernel methods are frequently used in classical machine learning, and aim to rewrite the ML task as a convex optimization problem [50].In the quantum domain, kernels are conveniently defined as overlaps between parametrized quantum states that represent data [24,48], or any similar measure [51].It was conjectured that many supervised QML models can be considered as kernel methods that are well-suited to near term devices [52].Currently these methods have mainly been considered for classification purposes [53,54].
In this paper, we propose to use quantum kernel methods for regression problems, including solvers of nonlinear differential equations [55].In classical ML kernel methods are used for support vector regression (SVR) [56], where the kernel trick and convex optimization lead to expressive and provably trainable models.Kernel methods can also be applied to solve differential equations [57][58][59], while being limited by the expressivity of classical kernels.We describe two approaches that express solutions of DEs as quantum kernel-based models, and describe the rules for their automatic differentiation.We refer to the two as mixed model regression (MMR) and support vector regression (SVR) protocols.The protocols are applied to test problems of regression on quantum data, linear DEs, and nonlinear DEs in the form of Duffing equation [60].We discuss the cases where the proposed workflow for DEs may provide advantage over existing methods.

II. QUANTUM KERNEL METHODS
We start by introducing the concept of a quantum kernel function.A kernel function is a conjugate-symmetric positive definite function κ mapping two variables x, y ∈ X to the complex space, κ : X × X → C. Quantum kernel function refers to a function which fulfils the requirements of a kernel function and can be evaluated by a quantum computer.An important result concerning kernel functions is known as the kernel trick.The kernel trick relies on the fact that any kernel function can be written as an inner product in a potentially high dimensional feature space, κ(x, y) = ϕ † (x)ϕ(y).Conversely ϕ † (x)ϕ(y) always represents a valid kernel function.This a consequence of Mercers theorem.Informally, it corresponds to the statement that for any symmetric positive definite function f (s, t) there exists a countable set of functions {φ i } i such that f (s, t) can be expressed as f (s, t) = i φ i (s)φ i (t) [61].An example of a quantum kernel function is an overlap κ(x, y) = ψ(x)|ψ(y) where |ψ(x) denotes a state encoded by the variable x.This is an inner product which fulfils the requirements of a kernel function.Later we will consider other forms of the quantum kernel function, showing how to encode the variable into the state and how to evaluate the kernel.
Our goal is to use the quantum kernel functions to solve differential equations.We consider two main methods -mixed model regression (MMR) and support vector regression (SVR).Let us first consider using these methods to solve data-driven regression problems.This is a simpler case than solving differential equations yet still requires representing a solution function via quantum kernel function, and can be built upon to solve differential equations.For this regression problem we have a set of values {x i , f i } i and we want to find a function f (x) that fits these points such that f i = f (x i ).We consider how both MMR and SVR approach the described problem.

A. MMR
When using the mixed model regression we represent a trial function as where y = {y i } i is a set of evaluation points, and α = {α i } i and b are tunable coefficients.We then write the problem defined by a loss function This loss function is chosen such that when optimized with respect to α and b the corresponding f α solves the problem.The loss function requires the evaluation of {f α (x i )} i , which in turn requires the evaluation of {κ(x i , y j )} i,j .These evaluations are independent of α -the variable which is adjusted during optimization.This means that the kernel function will only need to be evaluated once for each point in {x i , y j } i,j at the start of the optimization procedure.Any suitable optimisation method may be used to optimise L(α).We can also see that the considered loss function is convex.
We consider the general case L(α) = |x| i=1 L(x i ; α) 2 with L being a linear function of α, and represents a distance.A sufficient condition for convexity of the loss function is ∂ 2 L/∂α 2 j ≥ 0 everywhere for all α j ∈ α.We can write the second-order derivatives as where passing from Eq. (2) to Eq. ( 3) we use the linearity of L in α.When a loss function is convex its minimum is global, and there are bounds on convergence for various optimization methods [62].
The workflow to solve an MMR problem is as follows: 1. Choose setup for training, including the kernel function, optimizer, x, y.Once the model is trained, we can also evaluate it at a grid of points different from the training grid, learning the solution in the full domain of x.

B. SVR
For support vector regression we represent a trial function as f (x) = w † ϕ(x) + b, where w and b are tunable parameters, and ϕ(x) is a set of functions we later use the kernel trick upon.The first step is to write the problem as a primal (original) optimization model.This reads min w,b,e {w T w + γe T e}, subject to where e is the set of error variables which relax the constraints from f i = w T ϕ(x i ) + b + e i , and γ is a tunable hyperparameter that changes the emphasis on minimising the error.
The process that follows is to write the model in its Lagrangian form, introducing a set of variables (known as dual variables) to implement each constraint.The Karush-Kuhn-Tucker (KKT) optimality conditions are then found, which emerge from equating the first derivative of the Lagrangian with respect to each of the primal and dual variables to zero [63].These conditions are then used to eliminate a subset of the primal variables.This can intuitively be understood as turning the variable into a constraint.This leads to a system of equations which have terms of ϕ(x i ) T ϕ(x j ), and by using the kernel trick these terms can be changed to κ(x i , x j ).Now the problem is written in a dual form as a system of equations to solve with coefficients involving kernel evaluations.Similar to the MMR method these have to be evaluated once at the start.The resulting system of equations is where Ω i,j = κ(x i , x j ), Ω = {Ω i,j } i,j , and α are a set of introduced dual variables.The system of equations can now be solved with any available method to solve such a problem.Once solved the relevant KKT conditions can be substituted into the expression f (x; α) = w † ϕ(x) + b, and the kernel trick applied to get an expression for f (x) in terms of the dual variables, which have been solved for and kernel evaluations.We thus write our model as Although we started considering ϕ as our fitting functions, the resulting function to this process is based on kernel evaluations and we never need knowledge of what ϕ are or to directly evaluate them.The workflow to prepare an SVR problem is as follows: 1. Write model with minimization function and constraints.
3. Find the KKT optimality conditions.4. Eliminate subset of original optimization variables.5. Use the kernel trick to realise problem in terms of kernels.6. Write out remaining relationships as system of equations.7. Use KKT conditions and kernel trick to express function in terms of kernel functions.
In the Appendix this process is worked through in more detail for a specific (DE) example.The prepared SVR model can then be used for any problem of the form assumed in preparing the original model.The workflow for solving an SVR problem is as follows: 1. Choose setup for training, including the kernel function, system of equations solver, x, γ. 2. Identify suitable SVR model for the problem considered.3. Calculate set of kernel function evaluated over x ⊗ x. 4. Solve system of equations.
We also note that SVR method results in a form that can still be considered as an optimization problem to be solved with an optimizer.The system of equations Ax = b can be translated into the loss function Here we use MSE loss but other forms can be employed.This formulation can especially useful when considering problems resulting in nonlinear systems of equations.
Comparing the MMR and the SVR methods we note that the solving workflow for the two are similar.Namely, we choose a setup, identify what to solve based on method and problem, calculate the set of kernel function evaluations, and solve the model identified in step two.However, identifying the model for the SVR method is a more involved process.
Both MMR and SVR result in a function approximation to the solution of the problem considered.For regression this is Eq. ( 1) and Eq. ( 7), respectively.These two functions look very similar with the difference being the kernel evaluation at y i for MMR versus x i for SVR.This is a consequence of using the kernel trick when formulating the SVR model, which necessarily results in y = x.Also to be highlighted is that the form of Eq. ( 7) depends on the problem considered.For example, later we see that when solving differential equations, evaluations of the kernel derivative are involved in the function expression.However, for MMR the form of the model remains the same no matter what problem considered.
One benefit of using the MMR model is the simpler identifying of the model to solve.Another is the convexity when considering certain problems.The benefit of SVR is that when linear the resulting system of equations to solve is also linear and thus has deterministic solution.Furthermore, the initial trial function is in terms of ϕ which can be of higher dimensionality than the ker-... ... ...

𝑥 𝑥
FIG. 1: Circuit diagram showing a general form of function encoding circuit U(x) used to implement quantum kernel.This is formed by layers of static circuits Vi and data reuploading circuits U φ i (x) parametrized by x.
nel function yet never needs to be evaluated directly.

III. QUANTUM KERNEL FUNCTION EVALUATION
We now look further into the specifics of quantum kernel functions.In particular, we consider their structure, where feature maps encode dependence on a variable into a state.We also consider how to evaluate them and their derivatives.Earlier we mentioned a quantum kernel function of the form κ(x, y) = ψ(x)|ψ(y) , being an inner product that is generally complex for quantum states.In the following, we consider κ(x, y) = | ψ(x)|ψ(y) | 2 as an absolute value square of the overlap.This also corresponds to a valid kernel function [52].We consider this kernel function as it is real valued -an advantage when expressing real valued functions.

A. Encoding
The kernel functions we consider contain states |x , which are encoded by a classical variable x.To create such states we use feature map encoding, where x is embedded into the state by parametrizing gates preparing the state, |ψ(x) = U(x)|0 .A simple example of U(x) is the product feature map U φ (x) = N j=1 R α,j (φ(x)), where R α,j (φ(x)) is rotation on qubit j of angle φ(x) about a Pauli operator α.Other more complicated feature map encodings can be considered.The generalization may include the re-uploading technique [64] where action of feature maps can be layered with (non-variational) entangling circuits, 1).This layered form terminates with a circuit encoded by a variable as a final entangling circuit cancels out for kernels based on U † (x)U(y).As with many variational algorithms, when choosing feature maps it is important to have a map expressible enough to represent the solution to the problem whilst also being trainable [65].

B. Evaluation
We now discuss how to implement the quantum kernel function κ(x, y) = | ψ(x)|ψ(y) | 2 .One way is to use the coherent SWAP test [66,67].This test requires 2N + 1 qubits, where N is the number of qubits used to express |ψ(x) .|ψ(x) and |ψ(y) are both prepared on separate registers then via Hadamard gates and controlled operations an ancillary qubit can then be measured to read We can also employ other methods.For this, we use the fact that the kernel evaluation can be written as The measurement in Eq. ( 8) can be implemented naively by the circuit in Fig. 2(a).The circuit is initialized in the zero state.Then U(y) is applied, followed by U † (x).The probability of remaining in the zero state and thus the kernel is then calculated my measuring all qubits and finding the ratio of times |0 is measured.
Another possible implementation is two evaluations of the Hadamard test with N + 1 qubits as shown in Fig. 2(c) [68].This can be used to evaluate the real and imaginary parts of 0|U † (x)U(y)|0 which can then be used to evaluate the kernel as Re( 0|U † (x)U(y)|0 ) 2 + Im( 0|U † (x)U(y)|0 ) 2 .

C. Derivatives
As our goal is to solve differential equations, we need to be able to evaluate derivatives of the kernel function.We introduce notation for the derivatives as follows, To implement derivative evaluation, one way is to consider the kernel as written in Eq. ( 8) and the parameter shift rule [25,43].With this method we take the kernel evaluation method as in Fig. 2(a) but shift x and y up and down depending on what derivative is being calculated in each gate that they parametrize.For example for the first order derivative with respect to x the number of evaluations of Fig. 3(a) is 2n with n being the number of gates parametrized by x.Using the parameter shift rule means we calculate the analytic derivative though it does place some requirements on the gates parametrized by x/y such as being involutory.Generalized parameter shift rules are possible, where such requirements are relaxed [69][70][71][72][73].
We can also implement derivatives via the Hadamard test.First, we note the form of the first-order derivative of the kernel in x by using the product rule in Eq. ( 8) as and thus we can evaluate this derivative by evaluating 0|U † (y)d/dx(U(x))|0 and 0|U † (x)U(y)|0 (real and imaginary parts).The second term can be evaluated as for evaluating the kernel shown in Fig. 2(b), and calculations can be reused because the derivatives are evaluated over the same set of points as the kernel itself.To calculate the first term a modified Hadamard test can be used.
We consider the generalized layered form of kernel encoding We can now assume that the generators G j are unitary.When G j are unitary we can calculate each overlap term in the derivative expansion with two Hadamard tests.However, if this is not the case, one can decompose them into sums of unitary terms and evaluate them separately with increased number of Hadamard tests [43].Once the procedure for evaluating derivatives being set up, we generalize to higher-order derivatives.By using the product rule in Eq. ( 8) whatever derivative is required, one can express it as sums of products of overlaps with U(x) and U(y) differentiated to different orders.These overlaps can be calculated with two (when generators unitary) overlap tests for each gate with x and/or y as a parameter (see Fig. 3).These overlap evaluations can be reused for calculating different derivatives where the same overlap occurs.

IV. SOLVING DIFFERENTIAL EQUATIONS
In the following, we collect the described tools for model and derivative evaluations, and apply them to solve differential equations.While there are many possible choices, we start by considering a simple class given by the differential constraint with initial condition f (x 0 ) = f 0 , and g a smooth function of x and f which in general can be nonlinear in either of those arguments.We now use both MMR and SVR to solve this type of DEs in next subsections.

A. MMR
When solving DEs of the type (11) via MMR, we choose a loss function in the form We remind that the trial function reads f α (x) = b + |y| i=1 α i κ(x, y i ).Therefore, kernels κ and their derivatives ∇ 0 1 κ are evaluated over {x i , y j } i,j , leading to corresponding f α and df α /dx evaluations.These values are independent of α, and only need to be evaluated once at the start, then being reused throughout optimization.The loss function can the be optimized via any appropriate optimizer for getting optimal weights α opt .The resulting function is then a suitable approximation to the solution of the differential equation, mainly being limited by expressivity of the model and generalization bounds.
When the differential equation is linear [i.e., g is linear in f in Eq. ( 11)] the considered loss function is convex.This is true when the differential equation is linear and f α (and consequently f α ) is linear in α, meaning we are in the situation as described by Eq. ( 3).When the differential equation is nonlinear this is not necessarily the case.In order to determine that one needs to check for the convexity of the loss function.One possibility is a numerical check by sampling the second derivatives of the loss with respect to the optimizable parameters at many locations.If this value is ever negative then the problem is non-convex.

B. SVR
When considering solving DEs with support vector regression, the formulation of the problem changes depending on the form of differential equation considered [57][58][59].The steps for the problem formulation however remain the same: state a model, write out Lagrangian, find KKT optimality conditions, eliminate subset of prime variables by using the KKT conditions, use the kernel trick, and finally write out remaining equations in matrix form.
We follow the SVR formulation procedure for problems of the form DE(x, f, df /dx) = df /dx − g(x, f ) = 0 with initial condition f (x 0 ) = f 0 .We provide the details in the Appendix, and here provide the resulting set of equations in the matrix form: Here, we introduced dummy variables y i .η and β are dual variables introduced along with α corresponding to the dummy variable constraint and the initial variable constraint, respectively.The remaining notation is as follows [g] i = g(x i , y i ). ( 18) We now have a set of generally nonlinear equations, which can be solved for finding a vector of optimized weights.By substituting the relevant KKT optimality conditions into f (x) = b+ |y| i=1 α i κ(x, y i ) and employing the kernel trick, we get an expression for f in terms of kernel functions where optimized variables (weights) are used.Note that if the differential equation is linear, the dummy variable constraints of y i are not required.This leads to a system of linear equations with lower dimension.

C. Other forms of DEs
Many practical problems are not of the form DE(x, f, df /dx) = df /dx − g(x, f ) = 0 considered above.For instance, they may include terms of higher order, higher dimension, or indeed many other different variations.When considering the MMR method, one can readily generalize is to any other form of DE simply relying on generalized optimization.For this, a suitable loss function needs to be formulated for the chosen equation.Additionally, we shall be able to evaluate each term of the differential equation.For systems of DEs the overall loss becomes the sum of the loss of each individual differential equation within the system.For PDEs with domains of more than one dimension, the kernel function can be considered as κ(x, y) = | 0|U † (x)U(y)|0 | 2 , where the feature maps now encode a vector of domain variables.The simplest form is When the SVR method is used, the considered problem needs to be formulated into the SVR form, resulting in a different form of matrix equation.Higher order derivative SVRs [57,59] and SVRs for PDEs [58] are possible, as well as their generalizations for systems of differential equations.

V. RESULTS
Having established quantum kernel approaches for solving DEs and learning from data, we apply them to specific problems and show the results.
Regression on quantum data.We start by considering the case of regression.We generate a quantum dataset that corresponds to dynamics of total magnetization of a biased honeycomb Kitaev model [74,75].The Hamiltonian of the system reads where X , Y, Z are sets of bonds.We choose antiferromagnetic coupling and set h z /J = 0.2.Specifically, we simulate nonequilibrium effects by performing time evolution of M z = j Z j /N for N = 12 qubits on a lattice with periodic boundary conditions [76], starting from the uniform initial state.The choice of quantum dataset with strong magnetic correlations may be especially suitable for kernel-based regression, given recent advances in learning from experiments [77].Choosing a subset of evolved magnetization values labeled by x values (here corresponds to time), we proceed to perform MMR.
When implementing the MMR method we consider x with 51 values of x between 1 and 10, associated to the data, and y = x.We use and compare the results from a classical kernel and a quantum kernel.The classical kernel used is a commonly used radial basis function (RBF) kernel κ(x, y) = exp (x − y) 2 /(2σ 2 ) , with σ being a hyperparameter that describes a width of the kernel.In calculations we choose σ = 0.2 as that shows favorable performance.For the quantum kernel, we use layers of depth-five HEA and feature maps based on parametrized X rotations, R x (φ(x)), acting on each qubit.We set φ(x) = qx/2, where q is the qubit index, and consider a register of eight qubits.For the loss function MSE is used with a pinned boundary formulation (see [41] for the details of boundary handling).The loss function for data regression is convex and is optimized via Newton's method.In this case just three epochs is enough for converging to low loss values.We model the system with full state simulation using the Julia's package Yao.jl [78].The error of the results of this are shown in Fig. 4(b) with associated loss in Fig. 4(a).As can be seen, both kernel types are able to closely approximate the considered function.Moreover, we note that for complicated quantum data coming from spin-spin correlation one can benefit from specifically-designed quantum kernels that account for the structure of the problem.
When implementing the SVR method, we use the same points x, kernel functions and the simulation package.The resulting SVR system of equations to solve for this form of problem are as shown in Eq. (6).The results of this with the quantum kernel are shown in Fig. 4(a).The error of the results is shown in Fig. 4(b).It can be seen that both kernel types are able to closely approximate the considered function, and that SVR outperforms the MMR method.
Linear DEs.Next, we consider solvers of linear differential equations.In particular, we solve the equation where parameters are chosen as λ = 20 and κ = 0.1, along with initial condition f (0) = 1.The analytic solution to the differential equation ( 21) is f sol (x) = exp(−λκx)cos(λx), being a fading oscillatory dependence.When implementing the MMR method we consider x and y of 20 points uniformly spaced over [0, 1].We use and compare the results from a classical RBF kernel with σ = 0.2 and a quantum kernel with two layers of HEA circuits (depth equal to five) followed by feature map of R x (φ(x)) on each qubit with φ(x) = qx/2, where q is the qubit index.We consider eight qubits in the register.For the loss function MSE is used with a pinned boundary.This loss function is convex, as the DE is linear, and is optimized via Newtons method.The error of the results are shown in Fig. 5(b) with corresponding loss in Fig. 5(c).As can be seen both kernel types are able to closely approximate the considered function with the error less than 0.002 in magnitude, the quantum kernel slightly outperforms the classical kernel although we did not further explore hyperparameter optimization.
When implementing the SVR method, we use the same x and kernel functions.The corresponding SVR system of equations to solve for a problem of form df /dx + g(x)f + r(x) = 0 reads where the notation is as follows: We choose γ = 10 5 and this system is then solved with Julia's built-in matrix-defined linear equation solver.
The error of these results is shown in Fig. 5(b), with the result from using the quantum kernel shown explicitly in Fig. 5(a).Again it can be seen that both kernel types are able to closely approximate the considered function, though not as closely as the MMR method, with quantum kernel outperforming the classical kernel.Solving nonlinear DEs.We now move on to consider solving nonlinear differential equations.We show the results of solving the non-damped Duffing equation [60] We consider a = 1, b = 1, c = 3, d = 3 as well as initial conditions f (0) = 1 and f (0) = 1, and solve with both MMR and SVR methods.To compare our solutions we also solve the problem with a classical numerical technique with the Julia's package DifferentialEquations.jl[5] specifying to solve the problem with a fifth-order Tsitouras method [Tsit5(...)].
When implementing the MMR method we consider x and y of 13 points uniformly spaced over [0, 1].We use and compare the results from the classical RBF kernel (σ = 0.2 as before) and the quantum kernel with two layers of HEA circuits (depth equal to five) followed by feature map of R x (φ(x)) on each qubit with φ(x) = qx/4, where q is the qubit index.As before, we consider eight qubits in the register.For the loss function MSE is used with a pinned boundary.This loss is optimized via Newtons method with 200 epochs.The error of the results is shown in Fig. 6(b) with associated loss functions in Fig. 6(c).As can be seen both kernel types are able to closely approximate the solution to the differential equation.
When implementing the SVR method the same x and simulation package are used.Different kernel functions  are used however with less expressibility better suited to this method due to the differing function expression.Used are a classical kernel -the RBF kernel with σ = 0.8 and a quantum kernel -two layers of HEA depth five followed by feature map based on R x (φ(x)) on each qubit with φ(x) = qx/4, where q is the qubit index and we consider four qubits in the register.γ = 10 6 is chosen.The resulting SVR system of equations to solve for a problem of the form d 2 f /dx 2 = g(x, f ) with initial conditions where the notation is as follows: [g] i = g(x i , y i ).
The resulting variational function is of the form f (x) = β 0 κ(x 0 , x) + β 1 ∇ 0 1 κ(x 0 , x) + j α j ∇ 0 2 κ(x j , x) + j η j κ(x j , x) + b.This system of nonlinear differential equations is trained using the ADAM optimizer with learning rate 0.003 and 20,000 epochs.The error of the results of this are shown in Fig. 6(b) with associated loss functions in Fig. 6(c).For this problem the quantum kernel is able to achieve a close solution, though not as close as MMR methods, the RBF kernel struggled however with maximum error magnitude over 0.2.

VI. DISCUSSION AND CONCLUSION
In this work, we proposed quantum protocols for solving differential equation with kernel methods.We represent potential solutions as quantum models that are based on weighted sums of kernel functions, corresponding to overlaps of quantum states.The adjustable weights are optimized such that for many problems the optimization is convex, leading to fast convergence to the potential solution.Specifically, we propose two approaches, being mixed model regression (MMR) and support vector regression (SVR), where optimization workflow is different.An important element of our approach is the automatic differentiation of quantum kernels with respect to encoded feature variables using quantum circuit differentiation.We applied both MMR and SVR for several toy problems.First, we presented regression for a quantum dataset, corresponding to nonequilibrium dynamics of quantum spin liquids.In this case, the use of quantum kernels may offer advantage, as native quantum operations are used.Second, we solve linear DEs, showing that nontrivial solutions can be routinely found with few epochs.Finally, applying our approaches to some nonlinear problems, the optimization becomes non-convex, thus requiring largely increased number of epochs.At the same time, we note that by kernelizing quantum models we modify the landscape of optimization.This raises the question of convergence difference between parameterized quantum circuits [79,80] and kernel models.
While this work presents a first step towards quantum kernel-based differential equation solving, many aspects are left unexplored.This includes the quantum feature map design, which should be chosen appropriately and could potentially be problem-motivated for each specific case.Finding conditions for which non-linear equations are guaranteed to result in a convex loss landscape is another open question.Making use of a high-dimensional feature space without full tomography of the quantum wavefunction allows quantum kernel methods to potentially provide tangible advantage beyond classification.
Finally, let us discuss the comparison of quantum kernel-based approaches to solving DEs as compared to those based on differentiable quantum circuits [41], which in many ways reflect the difference between classical kernel methods and deep learning [52].When considering the training stage, kernel methods require evaluating a Gram matrix of kernels with O(|x| 2 ) points, increasing measurement budget with the grid size |x| as compared to O(|x|) scaling of DQC evaluations for the loss function.
At the same time, kernel methods do not rely on additional function evaluations, and optimization is straightforward for both linear and nonlinear problems.Deep learning instead needs to evaluate gradients at each iteration, leading to large overhead if the convergence is slow (for instance, when dealing with barren plateaus).This is a known trade-off for variational vs non-variational methods for ground state search [81].However, for trainable QML circuits it may be beneficial to do the iterative training, if the number of points in a dataset |x| is large.When considering model evaluation (reading out solution of DEs), kernel methods in principle require evaluating Gram matrix once again for a different grid, adding another O(|x| 2 ) computational steps.At the same time, deep learning and DQC only require evaluating the trained model at points of interest.Finding the optimum strategy between the two methods thus becomes crucially dependent on the problem and available quantum hardware.We expect that future studies will shed light on cases where one or another is preferred, both contributing to the emergent field of quantum DE solvers.
Ethics declaration.A patent application for the method described in this manuscript has been submitted by Pasqal.
These conditions are now used to eliminate a subset of the primal variables w, e, ξ leaving 3|x| + 2 equations:   j [α j ϕ (x j ) + ν j ϕ(x j )] + βϕ(x 0 )   † ϕ (x i ) (A.17) −g(x i , y i ) + α i /γ = 0, (A.For these equations we then expand out the brackets and use the kernel trick, introducing the kernel function κ as κ(x, y) = ϕ † (x)ϕ(y) and corresponding derivatives.We remember that this is a consequence of Mercers theorem, given that ϕ † (x)ϕ(y) is a kernel for any ϕ.Now we are able to write the resulting equations in matrix form as where the notation is as follows [g] i = g(x i , y i ). (A.31) We now have a set of nonlinear equations that can be solved for a set of variable, representing solution to the original stated problem.These equations are written in terms of κ, and not ϕ.Also note that these equations are true for any valid kernel function, and we can choose our kernel function freely.We need not know what the corresponding ϕ are, we simply know from Mercers theorem that such functions exist.Therefore the formulation of these equations (in particular the use of the kernel trick to introduce the kernel) is valid.The remaining step is to write f (x) = w † ϕ(x) + b in a form that is instead dependent on the variables solved for.We find it to be η i κ(x i , x) + βκ(x 0 , x) + b, (A.32) by using the w KKT condition and then the kernel trick.
We have now formulated an SVR method for the form of problem considered.

2 .
Identify the loss function for problem considered.3. Calculate set of kernel function evaluated over x ⊗ y. 4. Optimize the loss function.

FIG. 2 :
FIG. 2: Circuit diagrams for evaluating the kernel κ(x, y) = | ψ(x)|ψ(y) | 2 , where in all circuits U and H represent the kernel feature map and the Hadamard gate, respectively.(a) Naive kernel evaluation based on consecutive application of U circuits, followed by measuring each qubit.The kernel value is inferred from a probability of returning to the initial state.(b) SWAP test measuring | ψ(x)|ψ(y) | 2 .The controlled SWAP onto the size 2N register is composed of qubitwise controlled SWAP on the n th qubit pair, repeated for n ∈ 1 : N .(c) Hadamard test measuring Re( 0|U † (x)U(y)|0 ) and Im( 0|U † (x)U(y)|0 ) for b = 0 and b = 1, respectively.S denotes the phase gate, exp(−πZ/4).

FIG. 3 :
FIG.3: Circuit diagrams used for evaluation of derivatives.(a) Generic circuit for differentiating kernels shown in Fig.2(a) where a parameter shift rule is used.Depending on which derivative is calculated, gates parametrized by x and/or y have their parameters shifted up and down.Contributions from all parametrized gates are then summed for overall derivative.(b) Using Hadamard test for evaluation of the overlap 0|d n /dx n U † k (x)d m /dy m Uj(y)|0 , where k and j index over which gates with x and/or y as parameters are differentiated.When b = 0 and b = 1 are used the real and imaginary part is evaluated.By summing over j, k the full overlap 0|d n /dx n U † (x)d m /dy m U(y)|0 can be evaluated.These overlaps can then be used to evaluate kernel derivatives.

FIG. 4 :
FIG. 4: MMR and SVR used to solve a regression problem.Data to fit is for time-evolved magnetization of a biased honeycomb Kitaev model.(a) Solution via SVR method for quantum kernel with two layers and N = 8 shown by dashed purple line.Data plotted as solid light blue curve with points used for training highlighted with green circles.(b) Error between results and underlying data plotted over x as [f (x)−f data (x)], and we additionally normalize data by the range of magnetization values.Error plotted for result shown in (a) and SVR method with classical RBF kernel with σ = 0.2.Also plotted results from MMR method with same kernels considered.Newton optimizer with just 3 epochs is used for MMR method.(c) Loss value over epoch number plotted for MMR results shown in (a) and (b).

FIG. 5 :
FIG. 5: MMR and SVR used to solve a linear differential equation (21).λ = 20, κ = 0.1 and f (0) = 1.(a) Solution via SVR method for quantum kernel with two layers and N = 8 shown by dashed purple line.Known analytic solution plotted with solid light blue line.(b) Error between results and analytic solution plotted over x as (f (x)−f sol (x))/range(f sol ).Error plotted for result shown in (a) and SVR method with classical RBF kernel with σ = 0.2.Also plotted results from MMR method with same kernels considered.Newton optimiser with 100 epochs used for MMR method.(c) Loss value over epoch number plotted for MMR results shown in (a) and (b).

FIG. 6 :
FIG. 6: MMR and SVR used to solve Duffing equation (31) with a = 1, b = 1, c = 3, d = 3, f (0) = 1 and f (0) = 1.(a) Solution via SVR method for two-layered quantum kernel with feature map of Rx(qx/4) on each qubit with q qubit index and N = 4 (purple dashed line).Numerical solution from classical solver plotted with solid light blue curve.(b) Error between results and analytic solution plotted over x as f (x) − fnum(x) (normalized by the range of values).Error plotted for result shown in (a) and SVR method with classical RBF kernel with σ = 0.8.Also plotted results from MMR method.The same form of kernels based on N = 8 register, σ = 0.8.(c) Associated loss functions over epoch number for results shown in (a) and (b).SVR method uses 100 times more total epochs as compared to MMR.