Robust quantum gates using smooth pulses and physics-informed neural networks

The presence of decoherence in quantum computers necessitates the suppression of noise. Dynamically corrected gates via specially designed control pulses offer a path forward, but hardware-specific experimental constraints can cause complications. Existing methods to obtain smooth pulses are either restricted to two-level systems, require an optimization over noise realizations or limited to piecewise-continuous pulse sequences. In this work, we present the first general method for obtaining truly smooth pulses that minimizes sensitivity to noise, eliminating the need for sampling over noise realizations and making assumptions regarding the underlying statistics of the experimental noise. We parametrize the Hamiltonian using a neural network, which allows the use of a large number of optimization parameters to adequately explore the functional control space. We demonstrate the capability of our approach by finding smooth shapes which suppress the effects of noise within the logical subspace as well as leakage out of that subspace.


I. INTRODUCTION
There has been significant progress in the past decades towards the realization of a physical quantum computer. The greatest obstacle presently hampering the current efforts, particularly for solid-state qubits such as Josephson junction based qubits or semiconductor spin qubits, is the presence of unwanted couplings between the qubit and its hosting environment [1]. An established way of suppressing the effects of such non-Markovian noise (as well as calibration errors in the Hamiltonian) since the early days of NMR is dynamical correction [2] -the application of carefully designed control fields such that the effect of low-frequency noise can be arranged to cancel out when integrated over the entire evolution. This approach is especially relevant for noisy intermediate-scale quantum (NISQ) devices [3] whose limited number of qubits precludes implementation of quantum error correction codes, which can require as many as thousands of physical qubits to protect just one logical qubit [4]. Dynamical correction instead addresses noise without any overhead in the number of qubits at the cost of lengthening and complicating the control pulses. For a robust quantum control protocol to be practical, however, it is important to take the limitations of the control hardware into account. Experimentally realistic control fields must have bounded amplitude and bandwidth, and hence be smooth pulses [5,6].
For an ideal qubit with no noise, smooth pulses can be obtained analytically [7] or numerically [6]. For dynamical correction, smooth pulses that are robust against errors within a qubit's logical subspace have also been proposed to address quasistatic noise in two-level systems [5,8,9] by reducing sensitivity to noise in a perturbative analysis. Extending these (semi-)analytic approaches beyond two-level systems remains a challenge. This is an important problem because although two-level systems are useful for basic demonstrations, a quantum computer requires at least pairwise interaction of qubits, entailing dynamics in a larger Hilbert space. Furthermore, even for single qubit control, systems such as the transmon [10] and resonator-coupled spin qubits [11] contain not only pairs of low lying energy levels used to encode a qubit, but also slightly higher lying "leakage" states, requiring navigation of a larger Hilbert space that is outside the scope of these robust smooth pulses.
Other existing approaches numerically search for pulses that maximize gate fidelity for an ensemble of simulated noise sampled from an assumed distribution function, and are not restricted to two-level systems. The control is typically (although not necessarily [12]) split into piecewise-constant segments [13][14][15][16][17][18][19]. In the piecewise-continuous approach, when the number of segments is not sufficiently large, a secondary optimization may be required to account for the filtering effects of the bandwidth-limited waveform generator on the sudden jumps in the pulse [6]. Although one may strive to approach a smooth pulse by increasing the number of segments and penalizing sudden jumps at the cost of computational complexity, in our work we instead use continuous functions which enable the efficient use of adaptivetime ordinary differential equation (ODE) solvers. For broadband noise, a smooth pulse approach to improve gate fidelity by suppressing the leading order effects of an assumed noise power spectral density (PSD) was reported in Ref. 20. A common theme in all these approaches is that the optimization of fidelity is performed over the combined effect of the noise and control, which requires assumptions regarding the spectrum or the statistics of the noise and averaging over noise realizations.
Here, we propose a new approach that addresses all these issues simultaneously and allows us to find smooth pulses for implementing robust quantum gates in any qubit platform. The generic approach described here is not limited to two-level systems and can suppress the effects of both leakage and quasistatic noise, regardless of the underlying statistics of the noise, by minimizing the sensitivity to arbitrary quasistatic noise (as opposed to maximizing fidelity for a given noise sampling) through the exact functional relation between noise and its effect on the resulting quantum gate. This is enabled by recently introduced physics-informed neural network (PINN) frameworks [21] which implement deep neural networks (DNNs) that can be used to minimize a cost function while at the same time respecting a given set of differential equations (see also . Furthermore, as we will show, the resulting robust pulses are practical and constitute a new state-of-the-art in quantum control. From a computational point of view, sampling-based approaches which focus on maximizing the gate fidelity for a given stochastic perturbation require additional optimization cycles with different noise realizations whose values are sampled from a given distribution function, which comes at a significant computational cost. By focusing on the noise sensitivity, our approach eliminates the necessity for averaging over noise traces. Furthermore, operating over smooth functions rather than a large number of piecewise-continuous pulses with fixedwidth segments allows additional speed up when using adaptive ODE solvers [6]. Another advantage of our functional approach is that one can impose exact boundary conditions and functional constraints (such as explicit bandwidth constraints, maximum drive amplitude, boundary conditions for the drive, or even functional constraints such as DRAG [10] and filtering effects [25]) from the outset. This is in contrast with the usual approach of penalizing any deviation of such constraints by including them in the cost function, which makes it more difficult to find solutions by making the optimization landscape more complicated, leading to additional false minima and stiffness. Furthermore, as we demonstrate below, it is also feasible to suppress higher order effects of noise within our approach, which are important for strong errors and long pulse durations.
The parameterization of the smooth control Hamiltonian using a DNN, instead of the commonly used Fourier harmonics amplitudes [6,12,20], is a distinguishing feature of our approach which enables the use of a very large number of optimization parameters to exhaustively probe the functional space of smooth control fields. This is impractical with a Fourier series, since increasing the number of harmonics necessarily introduces faster oscillations which slow down adaptive ODE solvers, and no method to compute the parameter gradients with improved efficiency (such as the backpropagation algorithm for DNNs) exists.
Finally, in terms of methodology, within the wider context of DNN based quantum control [26][27][28][29], our approach is a new direction beyond sampling based learning protocols for finding robust and nonrobust shaped pulses.

II. METHOD
We start from a generic definition of a quantum system described by a control and error Hamiltonian as where p = p(t) is a time-dependent control parameter characterizing the driving fields h i , i are the quasistatic stochastic noise strengths, Λ i are traceless generators of the Lie algebra su(n) that obey tr(Λ i Λ j ) = nδ ij , and χ i (t, p) represent any dependence of the noise Hamiltonian H on the control fields. The latter becomes relevant when a term in the Hamiltonian is a nontrivial function of a noisy parameter, e.g., single-qubit microwave driving Ω(t) with multiplicative amplitude error δΩ = Ω(t) or tunable qubit-qubit coupling J(V ) susceptible to fluctuations δV in the voltage or flux tuning parameter as δJ = δV ∂ V J(V ).
By treating H as the interaction Hamiltonian, the solution of the time-dependent Schrödinger equation can be expressed as ]U (t) accounts for the effects of the noise. For weak noise ( , U can be calculated perturbatively using Magnus expansion as where A robust quantum gate is one that is insensitive to the first order effects of i , i.e. ∂ i U (T )| i=0 = 0. This condition can be achieved by choosing a control field for which ||E i (T )|| is negligiblly small. The control field must also be chosen such that within the logical subspace U c (T ) has the same result as a given target operation, U 0 . These requirements are equivalent to stating that the following cost function needs to vanish: where D ≤ n is the dimension of the logical subspace used for quantum computation, N P (E i (T )) = ||PE(T ) − tr[PE(T )]/D|| is the noise sensitivity of the logical subspace, P denotes the projection operator onto the logical subspace, ||A|| 2 = tr(AA † ), max i is the maximum tolerable value of the stochastic noise strength i , w i 1 are hyperparameters for the optimization step which can be tuned or annealed with a schedule to avoid local minima. Backpropagation efficiently works in conjunction with the adjoint sensitivity analysis of the coupled ODE system, and all derivatives with respective to optimization parameters θj (weights, biases and time-scaling parameter) are computed using automatic differentiation [30].
The first term in parentheses is similar to the gate infidelity in the absence of stochastic noise which ensures that the final unitary is U 0 , and the second term is the sensitivity which ensures that the implemented gate is immune to the first order effects of the stochastic noise i ; this is different from the existing approaches where the cost function is taken to be the noisy gate infidelity which also necessitates an averaging over a particular noise distribution [12-14, [16][17][18][19]. The exponents k and l determine the relative weighting of the fidelity and the sensitivity terms in the cost function. In fact, the two terms could be wrapped inside any monotonically increasing functions, but in our results for simplicity we will simply raise to a power of either 1 or 2.
A neural network is a network of "neurons" (see Fig. 1), where a vertical column of neurons form a layer. The overall action of the ith layer of the neural network is where W i is a matrix and b i a vector respectively containing the "weights" and "biases" of the ith layer, and σ i is known as the "activation function." Our DNN represents a function t → p(t), where p(t) is a smooth curve parameterized by the internal degrees of freedom of the DNN, b i and W i , as p(t) = L N • . . . • L 1 (t). Intermediate layers 1 < i < N are commonly referred to as hidden layers. For concreteness, we will take σ i to be the element-wiseacting tanh function, although functions which are sufficiently smooth and do not significantly affect the bandwidth requirements of the resulting pulse shape are also viable.
The goal of a DNN optimizer is to vary the weights and biases until the cost function C is minimized. This is achieved by using local gradient based optimization in conjunction with the backpropagation algorithm [31]. This requires calculation of C in addition to its parameter gradient ∂C, i.e., the partial derivatives of C with respect to each of the elements of W i and b i at each iteration, which can be computationally prohibitive. For this reason, we differentiate E i (t) analytically and transform the problem of calculating C into solving a coupled system of ODEs Similarly, ∂C is calculated by taking partial derivatives of Eq. (4) with respect to the elements of W i and b i as in Ref. [6]. This form allows a more efficient calculation of the cost function and its parameter gradient, as required by the backpropagation algorithm, by using the adjoint sensitivity method [32,33] on Eq. (4). In practice, we perform the straightforward but unwieldy task of calculating the parameter gradients of the coupled ODE system Eq. (4) by using reverse mode automatic differentiation [26,28,30]. The above analysis can be extended to higher order error correction, which is relevant when the noise is not sufficiently weak for a first order treatment. For example, second order error correction can be achieved by ij (T ))/D 2 ) l to the cost function, which ensures that the second order term in the Magnus expansion for U (T ) vanishes, and comes at only minor additional computational cost. [34] We remark that we are performing a search in the space of functions. In general, a large number of parameters are required to adequately explore a functional space. Parametrizing a functional space in a way that leads to convergent and experimentally feasible solutions is a nontrivial problem [5,8,9,35]. For example, a set of a few sinusoidal harmonics [6,12,36] explores a very limited portion of the functional space; this can be remedied by increasing the number of harmonics but results in unrealistic bandwidth requirements and significantly slows down the optimization problem due to inefficient computation of gradients and reduced timesteps in adaptive ODE solvers.
In contrast, our parametrization of the timedependence of the Hamiltonian eliminates the problem of finding a well-behaved hand-crafted ansatz form to optimize over, and makes it straightforward to explore a larger functional space by simply increasing the number of neurons and layers, owing to the fact that DNNs are universal function approximators [37], and produces wellbehaved Hamiltonians while remaining computationally efficient. We emphasize that our parameterization with DNNs, t → p(t) → H c (t, p), differs from earlier works: instead of using the output of the DNN to represent discrete ansatz parameters (such as the amplitudes of harmonics [12] or segments of piecewise control fields [13][14][15][16][17][18][19]), we directly use p(t) as a smooth differentiable function of time to construct H c (t), which is made possible by the recently introduced PINNs [21]. Our approach has the added benefit that functional constraints can be specified as a part of the parametrization p(t) → H c (t, p) (e.g., H c (t, p) = A sin(p 1 (t))σ x +Bσ z constrains the drive amplitude to be less than |A|) which limits the search to the space of functions that satisfy the constraints from the outset. This is in contrast to enforcing constraints via cost function by penalizing any deviations, which complicates the optimization landscape and can introduce false minima, leading to inefficiency. Since both the adjoint sensitivity method and backpropagation algorithm (which uses the directed graph structure of DNNs to compute the gradients more efficiently) are well established computational methods, we here focus on presenting our results on robust quantum control, and refer the reader to the literature [31][32][33] for their details. For our numerical results, we use Dif-fEqFlux.jl [33], which is a Julia package implementing a PINN optimizer, with optional support for hardware DNN accelerators. Before moving on to explicit solutions, we remark that the gate time T needs to be decided prior to optimization. For Hamiltonians that can contain non-adjustable terms (such as drift terms), a solution will generally not exist for an arbitrary T , and finding a suitable value can be laborious. This problem can be solved by introducing a new optimization parameter α, a time-independent scaling factor, to the right hand side of Eq. (4) and to the noise terms i in Eq. (3); α can be seen as a scaling factor for time or the total Hamiltonian. To avoid long gate times approaching to inverse of the incoherent decay rates Γ i , one can also introduce a factor e − i ΓiT /α to the trace fidelity term in the cost function C.
To train the neural networks, we use local gradient based optimization algorithms. We start from a random initial internal state for the neural network, and use a variant of stochastic gradient descent algorithm with limited iterations, followed by Broyden-Fletcher-Goldfarb-Shanno (BFGS) passes. A representative plot for the decay of the cost function is shown in Fig. 2, which was obtained during the training process for the pulse shown in Fig. 6. During the first 200 iterations, AMSGrad [38] was used with the learning rate η = 10 −3 . The result was further refined using a BFGS pass of 350 iterations with an initial step norm of 10 −3 . The fully connected DNN layer structure with two hidden layers of length 32 provides ∼ 10 3 optimization parameters, which we found to be sufficient for all the problems we have studied in this work.

III. EXAMPLES
A. Exchange-coupled spin qubits As our first proof-of-concept example, we consider a pair of spin qubits in a semiconductor double quantum dot, with one electron in each dot. The overlap between the electron wavefunctions is determined by the gate voltages, which provides voltage tunable exchange coupling between the spin degrees of freedom of the electrons. Single-qubit operations are realized by modulation of the magnetic field that is generated by an on-chip wire. In a rotating frame, the spin Hamiltonian of this system can be written as [35] H whose Lie algebra is isomorphic to a two-level su(2) generated by {σ z ⊗ σ z , σ x ⊗ 1 1, σ y ⊗ σ z }. The exchange coupling, J, which in some devices is essentially fixed due to bandwidth limitations [35,39], is susceptible to charge noise induced fluctuations, which can be modelled by the noise Hamiltonian H = J Jσ z ⊗ σ z /4. An entangling CZ-equivalent gate (e −iσz⊗σzπ/4 ) can be produced naively by setting B x = 0 and waiting a time π/J. However, instead we search for a smooth pulse B x (t) which can correct both first and second order quasistatic fluctuations in J by parameterizing the magnetic field as gµ B B x (t)/2 = (J/4)A(t)p 1 (t) sin(p 2 (t)). Here, A(t) = coth(κT )[tanh(κt) − tanh(κ(t − T ))] − 1 is a smoothed unit square pulse (κ determines the degree of the smoothing), and its purpose is to enforce the condition that the magnetic field is turned on only for the duration of the pulse, B x (0) = B x (T ) = 0. The resulting smooth pulse shown in Fig. 3 produces a CZ gate with extraordinarily high gate fidelities (defined as F = |tr(U (T )U † 0 )/4| 2 ) above 99.99% for exchange errors as large as 24%, and is faster than the smooth pulse reported in Ref. [7]. A bandwidth limitation of ∆f ≈ 20/T leads an infidelity ≈ 10 −5 , which corresponds to a very conservative value of ≈ 4MHz for J/h = 1MHz [40].
For completeness, we now turn to full SU(4) control in such systems. A robust universal set of gates in exchangecoupled spin qubits can be achieved with the addition of error-free virtual Z rotations [41] and the robust onequbit rotation X π/2 [35] implemented by the pulse shown in Fig. 4. Either qubits can be targeted by changing the modulation frequency of the driving field to that of the target qubit [35]. We can also use this method to find robust pulses for arbitrary elements of SU(2) ⊂ SU(4) generated by {σ z ⊗ σ z , σ x ⊗ 1 1, σ y ⊗ σ z } or {σ z ⊗ σ z , 1 1 ⊗ σ x , σ z ⊗ σ y }, in a direct manner provided that the gate time is sufficiently long (for example, an entangling operation requires at least T π/4J). Arbitrary SU(4) rotations can also be implemented using a two-tone drive, given by The generators of this Hamiltonian fully span su(4), and hence can generate any SU(4) gate directly. As a representative example, the pulse shown in Fig. 5 generates exp −i π 8 [σ y ⊗ σ y + σ z ⊗ σ z ] which is equivalent to an √ iSWAP gate up to local unitaries. We note that this rotation cannot be generated directly using the Hamiltonian in Eq. (5).

B. Transmon qubit
To illustrate simultaneous noise and leakage suppression, we next turn to the transmon qubit, although we No infidelity curve for a naive implementation is shown, because one-qubit gates with always-on J coupling are a nontrivial problem even without any robustness requirements [39].
remark in passing that this scenario is also relevant in the context of resonator-coupled spin qubits [11] and encoded exchange-only spin qubits [42]. The effective Hamiltonian for the transmon can be written as [43] H c (t) ≈ δ(t)a † a + ∆ 2 a † a(a † a − 1 1) + Ω(t)a + Ω * (t)a † 2 (7) in a rotating frame, where ∆ is the anharmonicity and Ω(t) is the complex envelope of the microwave drive capacitively coupled to the transmon via a resonator, whose frequency is detuned from the qubit frequency by δ(t). The first two levels encode a logical qubit, and we consider the first four levels in our calculations [44]. Thus, when calculating C, we project U (T ) onto the qubit subspace. The single-sided projection PE i (T ) in N P (E i (T )) allows us to leave out the effects of the noise solely on leakage subspace [45]; that being said, it is possible to protect the leakage subspace as well by simply omitting this projection. Our goal is to find a smooth pulse for implementing a gate that can suppress both leakage and shifts in detuning δ(t) → δ(t) + which can be caused by calibration errors [46], stochastic phase errors [47]. At this point, we recall that an established method of suppressing leakage in Josephson junction based qubits (1) x (t)/J g1µBB (1) y (t)/J g2µBB (2) x (t)/J g2µBB is to use pulse shapes that obey a particular family of differential relations between Ω(t), ∆ and δ(t), known as DRAG [10]. These relations ensure that the leakage inducing terms remain small throughout the pulse. It is possible to enforce DRAG conditions by construction, for instance by augmenting Eq. (4) with the relatioṅ Ω P (t) = p 1 (t) sin(p 2 (t)) and parameterizing the drive as Ω(t) = A(t)Ω P (t) − i∂ t [A(t)Ω P (t)]/2∆, δ(t) = 0. However, we will proceed without doing so in order to avoid limiting the search space: DRAG is a sufficient condition for suppressing leakage, but it is not a necessary one since what matters is whether the qubit subspace timeevolution operator is equal to the target unitary at the final time t = T , regardless of any leakage that may be present during intermediate times 0 < t < T . We thus parameterize the driving field as =4∆A(t)(2/π)[arctan(p 1 (t)) sin(p 2 (t))+ i arctan(p 3 (t)) sin(p 4 (t))], δ(t) =2∆A(t)(2/π) arctan(p 5 (t)) sin(p 6 (t)), where the use of arctan clamps the field amplitudes, and we target a X π/2 gate [48]. This is in particular a useful example, because when used together with error-free virtual Z rotations implemented by shifting the rotating frame [41], X π/2 gates are sufficient to implement arbitrary single qubit rotations. The resulting pulse shape is shown in Fig. 6. For a typical anharmonicity value of ∆/h ∼ −200MHz [43], the pulse duration is T ∼ 10ns. Under this assumption, the resulting gate fidelities, defined by F = |tr([PU (T )P † ]U † 0 )/2| 2 , remain above 99.99% as long as the shift in detuning remains below 3.5% of ∆. When the effect of two additional higher leakage states are taken into account for this pulse shape, the baseline fidelity remains the same. Compared to a nonrobust pulse based on DRAG [10, 43,46], which can take at least T ≈ 2.1h/|∆| = 10.5ns and reaches the same infidelity threshold at 0.03% of ∆, this new shaped pulse improves the error threshold against detuning errors by two orders of magnitude. A bandwidth limitation ∆f ≈ 4.73/T leads to 10 −5 infidelity, which corresponds to ≈ 473MHz for ∆ = −200MHz, approximately twice the bandwidth required by DRAG ≈ 255MHz.
Although dephasing effects associated with quasistatic fluctuations are corrected by a robust pulse, relaxation processes can limit the fidelity of gate operations in transmon. These effects can be quantified using the master equatioṅ where D[A]ρ(t) ≡ Aρ(t)A † − {A † A, ρ(t)}/2 and 1/T φ = 1/T 2 − 1/2T 1 is the pure dephasing rate. We compute the state-averaged gate fidelity [49] where Q represents the quantum channel associated with the master equation as ρ 0 → Q(ρ 0 ). For this robust pulse, we find that the effects of relaxation on the fidelity is less than 10 −4 when T 1 |∆|/h ≥ 3554, and T φ ≥ 10T 1 (we remark, however, that this represents a higher bar than necessary, given that our pulse shape already protects against pure quasistatic dephasing errors). These limits are well accessible in recent experiments; for example in Ref. [50], the value of T 1 |∆|/h is 13200 with |∆|/h = 220MHz, T 1 = 60µs and T φ 727µs (lower limit, obtained from T Hahn 2 = 103µs which is smaller than T 2 ) which is greater than 10T 1 .
We remark that when drive amplitude limitations or the robustness constraints are removed, our method easily produces faster nonrobust gates that well outperform the fastest DRAG pulses, ultimately limited by the bandwidth of the drive. More stringent experimental bandwidth constraints can be accommodated by running the search with an appropriately increased gate time T ; although it is harder to find solutions for shorter gate times, one can always find solutions at longer times.

IV. CONCLUSION
In summary, we have introduced a method for performing dynamically corrected quantum gates with practical smooth pulses that is broadly applicable to large Hilbert spaces beyond two-level systems, which is necessary for robust control of multi-qubit devices and leakage into excited states, and is practically extensible to correction of errors beyond first order. Our approach is the first generally applicable robust noise-sensitivity based smooth pulse shaping method, does not require sampling or assumption with regards to the nature of the noise, and leverages physics-informed deep neural networks for computational advantages. In addition to noise cancellation, our generic approach can also be used to reduce or eliminate the need for careful recalibration cycles during experiments. A future direction is the extension of this approach to suppress time-dependent broadband noise. The dimensionless pulse shapes shown in the main text can be approximated using a truncated Chebyshev series in the form where T n (x) is nth the Chebyshev polynomial of the first kind, τ (τ 0 ) is the dimensionless (total) time shown along the x-axis. The Chebyshev coefficients c n for gµ B B x (τ )/J shown in Fig. 3 are tabulated in Table I Similarly, the Chebyshev series coefficients for gµ B B x (τ )/J shown in Fig. 4 are tabulated in Table II, and for g 1 µ B B (1) x (τ )/J, g 1 µ B B (1) x (τ )/J and g 2 µ B B  Table III. Finally, for Ω x (τ )/∆ Ω y (τ )/∆, and δ(τ )/∆ in Table  IV. Approximate pulse shapes reconstructed from these Chebyshev coefficients result in similar fidelity curves which reach the 10 −4 threshold around the same value, closely approximating the fidelity curves given in the main text. The code used to produce the robust pulses shown in Fig. 3 and Fig. 6, and the resulting internal state of the neural networks can be found in the supplemental files 51.  Fig. 4.