Optimally Band-Limited Noise Filtering for Single Qubit Gates

We introduce a quantum control protocol that produces smooth, experimentally implementable control sequences optimized to combat temporally correlated noise for single qubit systems. The control ansatz is specifically chosen to be a functional expansion of discrete prolate spheroidal sequences, a discrete time basis known to be optimally concentrated in time and frequency, and quite attractive when faced with experimental control hardware constraints. We leverage the filter function formalism to transform the control problem into a filter design problem, and show that the frequency response of a quantum system can be carefully tailored to avoid the most relevant dynamical contributions of noise processes. Using gradient ascent, we obtain optimized filter functions and exploit them to elucidate important details about the relationship between filter function design, control bandwidth, and noise characteristics. In particular, we identify regimes of optimal noise suppression and in turn, optimal control bandwidth directly proportional to the size of the frequency bands where the noise power is large. In addition to providing guiding principles for filter design, our approach enables the development of controls that simultaneously yield robust noise filtering and high fidelity single qubit logic operations in a wide variety of complex noise environments.


I. INTRODUCTION
The ability to perform fast and robust operations on multiqubit quantum systems is a necessity for realizing reliable quantum computation [1]. Unfortunately, the inevitable interaction between a quantum system and its environment presents an obstacle for achieving such operations. Unwanted system-environment interactions lead to noise processes that cause quantum gates to deviate from their intended evolution, consequently leading to a loss of coherence and computational errors. Quantum control is an approach that seeks address this challenge through the design of control protocols that implement desired quantum operations, while simultaneously achieving robustness against noise [2]. Quantum control can be particularly advantageous for combating spatio-temporally correlated noise, which is known to be detrimental to quantum error correction [3][4][5][6][7].
Various control techniques have been developed to carry out robust quantum gates in the presence of systematic and environmental noise sources. Pulse-based techniques such as dynamically corrected gates [8,9] leverage features of dynamical decoupling [10,11] to effectively average out noise while implementing a logical operation. Despite their ability to account for practical limitations, such as bounded control amplitudes, they are limited to static noise models.
Smooth control methods based on quantum optimal control theory, such as open-system Gradient Ascent Pulse Engineering (GRAPE) [12], extend beyond the traditional closed system GRAPE approaches [13][14][15] to enable the construction of quantum gates in the presence of time-dependent noise. Opensystem GRAPE performs local updates to the control waveform in accordance with typical GRAPE approaches, however, requires averaging over dynamical simulations of quan- * Current affiliation: Goldman Sachs & Co. tum trajectories to optimize control waveforms in the time domain.
The filter function formalism (FFF) offers an alternative perspective on optimized quantum control in the presence of time-dependent noise processes. Capable of accommodating a wide range of spatio-temporally correlated noise models, the FFF captures a quantum system's sensitivity to noise in the frequency domain via control-dependent filter functions (FFs) [16][17][18]. Within the FFF, gate fidelity can be expressed in terms of an overlap integral between the FFs and the noise power spectral densities (PSDs). This relationship gives rise to a highly intuitive perspective on robust quantum gates, namely, minimizing spectral overlap between the FFs and noise PSDs is essential for realizing noise-optimized gates.
Minimization of the overlap integral has become a guiding principle for gate optimization via FF design. Proposed approaches have made direct use of the overlap integral as an objective function [19,20], where a priori knowledge of the noise PSD is assumed, or focused on the minimization of the FF over a specified low-frequency band [19,21]. In practice, both objective functions require estimation of the noise PSD (e.g., through quantum noise spectroscopy (QNS) [22][23][24]), with the latter potentially requiring only knowledge of the noise cutoff frequencies. Regardless of the choice of objective function, a majority of the approaches have been numericallyoriented, utilizing either gradient [20] or non-gradient-based optimization [19,21]. While analytical solutions for optimal FFs are difficult to ascertain due to the non-linear relationship between the control and FF, numerical approaches have offered little intuition into the design of optimal FFs. Furthermore, questions remain regarding the interplay between control parameters, such as bandwidth and amplitude, noise parameters (e.g., noise cutoff frequencies), and optimal FF design.
In this work, we provide analytical insight into FF design and introduce an optimization protocol that sheds light on the relationship between control, noise, and optimal FFs. With focus on a single qubit system subject to additive dephasing, we show that relatively simple control schemes can provide FF tunability in both single and multi-axis noise scenarios. Moreover, we show that such schemes can be straightforwardly designed based on the properties of the noise PSD. Analytically designed controls are used to inform the initialization of a FF optimization approach we refer to as Filter GRadient Ascent in Function Space (F-GRAFS). As an extension of the GRAFS method [25], F-GRAFS seeks to minimize the spectral support of the FF within a specified frequency band while simultaneously performing a non-trivial quantum gate. F-GRAFS is shown to be highly versatile and adaptable to a variety of multi-axis noise scenarios, including non-uniform high-pass and band-pass gates. Furthermore, it proves to be a valuable tool for examining the dependence of FF design on control and noise parameters.
Following the GRAFS approach, we utilize the discrete prolate spheroidal sequences (DPSS) or so-called "Slepians" [26][27][28][29][30][31] as a functional basis for expressing the control waveform. The DPSS possess intrinsic bandwidth tunability that enables the study of optimized FFs as a function of control bandwidth. Additionally, the DPSS bases constitute an optimal description of the subspace of functions limited in both time and bandwidth. Using the DPSS as functional basis restricts the controls to the space of physically realizable functions, while substantially reducing the dimensionality of the optimization problem. It is through the use of F-GRAFS in conjunction with the DPSS that we arrive at conditions on optimal control bandwidth for filter design. In particular, we find clear indications that the optimal control bandwidth is lower-bounded by twice the size of the frequency band over which the FF is to be suppressed. Interestingly, this result generally holds for both single and multi-axis noise.
Together, our analytical and numerical results provide a relatively comprehensive guide for FF design in a variety of relevant noise scenarios. Fig. 1 summarizes the F-GRAFS workflow. First, the region of frequencies where the noise is strongest is identified. Then, this information is used to construct the DPSS basis and tailor initial conditions, both key ingredients in the gradient-based optimization.
The manuscript is organized as follows. In Sec. II we describe the relevant background necessary to understand the numerical optimization, namely the system models and the FFF. We also describe the DPSS sequences, their definition, and main properties. In Sec. III, we introduce the F-GRAFS method, defining the optimization problem and explicitly computing the gradients. Sec. IV presents the main results. Here, we describe analytical control schemes used to initial F-GRAFS. We then showcase optimized control waveforms and FFs obtained from F-GRAFS for various control and noise scenarios. F-GRAFS is then employed to examine the connection between control parameters, noise characteristics, and optimal FF design. We conclude in Sec. VI with a summary of the main results and an outline of future investigations.

II. BACKGROUND
We begin by describing the theory necessary to understand and implement F-GRAFS. First, we define the control and noise Hamiltonians relevant to this study. Then, we describe the FFF, the framework used to study the system dynamics in the frequency domain.

A. System Model
We consider the problem of controlling a single qubit system in the presence of temporally correlated additive dephasing noise. The dynamics of the system are governed by a Hamiltonian H(t) that can be partitioned as where H C (t) denotes the control contribution that acts solely on the system and H N (t) encapsulates the contributions of the noise processes. In the reference frame rotating with the frequency of the qubit, and under the rotating wave approximation, the control Hamiltonian is given by (in units of ̵ h = 1) Motivated by a variety of system architectures [32][33][34], we consider control functions along the axes transverse to the quantization direction. While the control Hamiltonian defines amplitude control along the Pauli operators σ x and σ y , one can move from Cartesian to polar coordinates to express the control functions in terms of time-dependent amplitude and phase. We focus on the former representation, yet note that our approach is effectively agnostic to the choice in control representation. Formally, the controls are assumed to be expressed as a weighted sum The expansion coefficients α ν,k are real and weight basis functions ϕ k (t). At this stage, we assume ϕ k (t) to be arbitrary and more concretely specify them in Sec. II C. The single qubit system is subject to semiclassical noise generically described by the Hamiltonian where ⃗ β(t) = (β x (t), β y (t), β z (t)) and the Pauli operators are given by ⃗ σ = (σ x , σ y , σ z ). Each noise component β µ (t) defines a random Gaussian process considered to be wide sense stationary with zero mean, ⟨β µ (t)⟩ = 0, µ = x, y, z, where ⟨⋯⟩ denotes classical ensemble averaging. In addition, the functions β µ (t) are characterized by two-point correlation functions ⟨β µ (t)β ν (t ′ )⟩, related to the noise PSD S µν (ω) via a Fourier transform: As will be discussed below, the frequency domain representation provides a convenient language for analyzing the dynamical contributions of the noise to the time evolution of the qubit. This representation gives rise to a powerful framework known as the FFF. Capturing the effective dynamics of a quantum system in terms of the spectral properties of the noise and the control-driven frequency response of the system, the FFF has been employed in settings such as quantum noise spectroscopy [22][23][24] and noise mitigation [19,33,35,36]. In the following section, we will review this formalism, following closely the work by Green et al. [37].

B. Filter Function Formalism
Due to the time dependent nature of the noise, the Hamiltonian in Eq. (1) will in general not commute with itself at different times [H(t), H(t ′ )] ≠ 0 if t ≠ t ′ , and therefore the time evolution it induces will be given by the time-ordered propagator where T + is the time-ordering operator. In general, for an arbitrary Hamiltonian H(t), the time propagator U (t) will not be analytically tractable and hence we will not have access to a closed analytical description of the time evolution. Instead, by moving to the interaction picture with respect to the control, and assuming that the control is dominant with respect to the noise, the noise dynamics can be treated as a time-dependent perturbation.
Moving into the rotating reference frame with respect to the control propagator U C (t) = T + exp −i ∫ t 0 H C (s)ds , the time evolution operator U (t) can be expressed as U (t) = U C (t)Ũ N (t). The rotated-frame error propagatorŨ N (t) = Since SU (2) is homomorphic to SO(3), the rotated-frame Hamiltonian can be written in terms of the control matrix components R µν (t). Each component can be expressed using the Hilbert-Schmidt inner product as Using a perturbative Magnus expansion [38], it is convenient to parametrize the error propagator in terms of the error vector ⃗ a(t) = ∑ ∞ l=1 ⃗ a (l) (t). In general, a closed form for ⃗ a(t) does not exist, however, when the noise is sufficiently weak and the time scale of the dynamics is sufficiently short, one can truncate the expansion [17,24]. Under these conditions, the error vector can be approximated to leading (first) order such that ⃗ a(t) ≈ ⃗ a (1) (t), where with µ, ν = x, y, z.
The error vector representation proves to be convenient for examining the efficacy of a control protocol via the average operational fidelity This particular measure utilizes the Hilbert-Schmidt inner product to quantify how well a given (noisy) propagator U (T ) approximates a target gate U G after a total controlled evolution time T . In the case where U C (T ) = U G , this measure can be expressed as where 17]. Typically, χ(T ) is referred to as the overlap due to its frequency domain representation. More specifically, χ(T ) can be generically expressed as a sum of products of integrals, where each integral is defined as a product between noise PSDs and the filter functions (FFs). In the weak noise limit, χ(T ) ≈ ⟨ ⃗ a (1) (T ) 2 ⟩ and the overlap conveniently reduces to The FFs are defined as where the frequency domain control matrix elements are Note that in this formulation, it is assumed that crosscorrelations are neglected; thus, S µ (ω) = S µν (ω)δ µν . The FFF offers an alternative perspective that can be exploited for characterization and control problems. For example, in the case of optimized control, the objective is to find control functions Ω ν (t) that minimize the overlap [Eq. (13)], and thus, maximize the operational fidelity [Eq. (12)]. This is the overarching principle leveraged by F-GRAFS to tailor FFs and achieve optimized gate.

C. Time-Band-Limited Sequences for Quantum Control
Generally speaking, there is no designated protocol for choosing a parametrization of the control function Ω ν (t). In GRAPE approaches, the control profiles are typically assumed to be piecewise constant in time [13]. The optimization proceeds by locally updating each control amplitude for each timestep such that the overall profile generates a controlled evolution that converges towards the desired operation. This approach becomes increasingly computationally expensive as the number of timesteps increases. Furthermore, GRAPE methods typically require additional bandwidth and amplitude constraints to enforce physical limitations in control hardware or low-pass filtering to generate smooth control [15].
Functional expansions of the control waveform offer advantages over piecewise control. When expressed as a weighted sum of basis functions, control optimization algorithms focus their attention on optimizing the basis function weights, rather than the individual control amplitudes. This alternative approach leads to global, as opposed to local updates to the waveform. While GRAPE based methods [39] and other nongradient based methods [21] have sought to leverage functional expansions for optimized control, basis selection is to some degree unmotivated.
In this study, we employ a functional expansion parametrized by the DPSS [31]. With their rich history in classical signal processing, DPSS offer an optimal compromise to the time-bandwidth uncertainty relation. More specifically, they form a basis with optimal spectral concentration for time-limited signals. From a control perspective, DPSS are attractive for designing optimized control that inherently account for physical limitations of control hardware. Timing resolution and control bandwidth preclude the basis generation. As a result, intrinsic bandwidth constraints are imposed within the basis prior to optimization rather than as an additional constraint appended to the objective function.
As discrete analogs of prolate spheroidal wave functions, DPSS are parametrized by the sequence length N and the dimensionless bandwidth parameter W ∈ (0, 0.5). A kth order Slepian sequence {v m=0 is generated as a solution to the Toeplize matrix eigenvalue equation where k, n = 0, ..., N − 1. The DPSS form an orthonormal basis of the vector space of real numbers R N , satisfying ∑ n (N, W ) = δ k,l . The eigenvalues {λ k (N, W )} determine the order k of the DPSS and increase monotonically with k, such that 0 ≤ λ 0 (N, W ) ≤ λ 1 (N, W ) ≤ ⋯ ≤ λ N −1 (N, W ). Moreover, {λ k (N, W )} are a measure of spectral concentration of DPSS within the frequency band (−2πW δt, 2πW δt), where δt = T N designates the time resolution of the control. DPSS of order k < 2N W are the most spectrally concentrated, possessing eigenvalues very close to unity. In contrast, DPSS with k ≥ 2N W are characterized by λ k (N, W ) ≈ 0. This property has been previously used to establish an approximate dimension K of the space of band-limited functions: K = ⌊2N W ⌋. Lastly, we note that the order k of the DPSS determines its number of zero-crossings and characterize even-odd symmetry of the sequence about the midpoint.
Below, the DPSS are used to parametrize the space of control functions available to the optimization algorithm. The size of the basis is dictated by K and therefore, the timing resolution and bandwidth parameter. However, as we will discuss, leveraging the spectral information contained within the DPSS eigenvalues, we can considered a basis smaller than K to introduce additional control constraints, e.g., endpoint constraints. We further exploit the DPSS to establish a rela-tionship between the bandwidth W and the noise suppression characteristics of our optimized gates.

III. FILTER GRAFS
F-GRAFS is a gradient-based optimization method for constructing noise-robust quantum operations via the FFF. Inspired by our previous work on closed system optimized control [25], F-GRAFS utilizes a functional expansion of the control in terms of DPSS. Below, we further elaborate on F-GRAFS, providing detailed information about the objective function, gradient expressions, and the optimization procedure.

A. Optimization Problem
F-GRAFS is designed to engineer noise-optimized control profiles that minimize the distance between a target gate U G and a noisy controlled evolution described by the unitary U (T ) = U C (T )Ũ (T ). This is accomplished by casting the optimized control problem as a constrained optimization problem. The objective function aims to minimize the spectral overlap between the noise PSDs and FFs, while the constraint works to enforce a targeted fidelity for the logic gate. Formally, the F-GRAFS optimization problem is defined as where Γ quantifies the spectral leakage of the FFs within the frequency bands B j . The constraint is defined relative to the ideal gate fidelity F G (T ) = 1 4 Tr[U † G U C (T )] 2 , which determines how well U C (T ) approximates the desired target gate within an infidelity tolerance G .
The spectral leakage is dictated by the spectral null-bands (NBs) of the noise PSDs and the FFs resulting from candidate control profiles. The NBs are defined as the regions B µ , µ = x, y, z, where the fractional noise power is small. More rigorously, for a desired fractional noise power µ ≪ 1, the NB is defined according to It is sufficient to consider equivalent fractional powers across all Pauli channels, and therefore, µ = will be assumed throughout the remainder of this study. In general, the NB is selected based on three guidelines. (1) The NB should achieve maximum connectedness, i.e., minimize the number L of disjoint sets that compose the NB: The size of the NB B should be maximal. As we will discuss later in this study, the control bandwidth requirements decrease with B . (3) The NB should be chosen to maximize the presence of high-frequency contributions. In practice, most time-correlated noise processes are characterized by PSDs that are concentrated at low frequency. A choice of will then determine a high-frequency cutoff ω H , producing B = [ω H , π δt), within which the noise is sufficiently weak.
It is within the NBs that the FFs would ideally reside. Thus, an optimized control scheme strives to maximize the FF support within the NB, or equivalently, minimize the leakage of the FFs within the complement of the NBs (CNBs) B µ . The F-GRAFS approach operates within the context of the latter, seeking to minimize the spectral leakage Here we have introduced the estimated weight of the noise power in the µ-th direction p µ ; see Appendix A for further details. Observe that we have included a factor of 1 3T to normalize Γ(T ) with respect to the total power of the FFs. Each component FF possess a total power of T , while the factor of three appears due to presence of noise along all three single qubit Pauli channels. While our focus will be on the spectral leakage, we show that there is connection between the minimization of Γ(T ) and the size of the CNB. Formally, we define the size of the CNB B as the integral Each integral is bounded between 0 and π δt and thus, 0 ≤ B ≤ 3π δt. The lower bound is saturated in the noiseless case, while the upper bound is achieved in the white noise case. In Sec. IV, the size of the CNB will emerge as an important quantity in the discussion of optimized spectral leakage and optimal control bandwidth. The F-GRAFS optimization problem is solved via Sequential Least SQuares Programming (SLSQP) [40]. In practice, we find that SLSQP offers faster convergence rates than alternative numerical optimizers, such as the interior point method [41] when solving Eq. (17). Variants of this optimization problem, for example, utilizing an effective "leakage fidelity" F Γ (T ) = 1 2 [1 + exp(− P T δω Γ(T ))], where P = ∑ µ ∫ ∞ 0 S µ (ω)dω is the total power, leads to improved convergence for both the interior point method and L-BFGS-B [42]. However, the latter approach requires knowledge of the total power and therefore more detailed estimates of the noise PSDs. This is in contrast to the spectral leakage, which may only require rough estimates of noise PSDs to determine CNBs. For this reason, and its simplicity, we utilize Eq. (17) for filter design and gate optimization.
While presented in a rather axiomatic fashion, the optimization problem given in Eq. (17) can be shown to be related to the global phase invariant metric between unitaries [43][44][45][46]. This metric is defined as where A 2 = Tr(A † A) is the Frobenius norm. As a distance metric, D naturally satisfies the properties of symmetry and the identity of indiscernibles. In addition, D satisfies the triangle inequality, which can be used to establish the following upper bound on the average squared-distance: The term K(T ) is a function of F G , F N and goes to zero as these quantities approach unity. The last term signifies a dependence on the fractional power in the NB. Note that as long as can be kept sufficiently small, the bound on D is effectively minimized by minimizing Γ(T ) subject to a G ≪ 1.
Additional details regarding the derivation of the bound can be found in Appendix A. Lastly, we address the potential practical advantage of defining the objective function in terms of NB/CNB regions as it pertains to reducing overhead required by QNS protocols. Noise characterization techniques, like QNS, are used to provide estimates of noise spectra by utilizing the quantum system as a dynamical probe. Such estimates can be critical to the design of noise-informed gates, as the noise suppression characteristics of the control are directly related to the spectral overlap between the FFs and the noise PSDs. Thus, in general, one requires reasonably sufficient characterization of the complete noise PSD in order to design gates to minimize the overlap described in Eq. (13). However, we find this condition to be too stringent and argue that it is sufficient to only require knowledge of key features, such as noise cutoff frequencies in order to define CNBs and estimates of the fractional power. Unconcerned with knowledge of the complete PSD, but rather just the "flavor" of the noise, this approach potentially reduces the overhead required to provide sufficient estimates of noise PSDs via QNS.

B. Gradients
In this section, we derive analytical gradient expressions for the objective function in Eq. (A11) and the ideal gate fidelity F G (T ). Our derivation makes use of gradients originally introduced in Ref. [25] for closed system DPSS-based optimized control. Note that similar analytical FF gradients have also been derived in Ref. [20].
Under the parametrization of Eq. (3), where the DPSSs are selected to form the functional basis, the pure control evolution U C (t) is piece-wise constant. The control profiles resulting from the DPSS expansion inherit properties of the basis, namely, they are discrete sequences. Thus, for a given control sequence of N timesteps, each of duration δt, the control amplitude will take constant values Ω ν (t) = Ω ν,n , where t ∈ [t n , t n+1 ) and t n = nδt for n = 0, 1, . . . , N − 1. Equivalently, projecting into the DPSS basis, where we have dropped the explicit dependence on N and W for the DPSS. The piecewise-constant control assumption permits the control propagator U C (t) to be written as the product where n = ⌊t δt⌋. Each constituent propagator U C (t j , t j−1 ) implemented over the j-th time step is generated by the control Hamiltonian [Eq. (2)], for j = 0, 1, . . . , N − 1 and ⃗ Ω j = (Ω x,j , Ω y,j ). F-GRAFS optimizes the control waveform via optimization of the expansion coefficients {α ν,k }. As a result, the coefficients are updated according to at the (r + 1)-th iteration of F-GRAFS. The initial values α (0) ν,k can be chosen randomly or tailored to the noise characteristics, as we will discuss in detail in Sec. IV. The parameter γ is the learning rate that is determined adaptively by the SLSQP algorithm.
The gradient of the objective function is proportional to the gradient of the FFs. The controls are finite duration and bounded by construction, and therefore the integral in Eq. (A11) converges. Hence, the derivatives with respect to the expansion coefficients commute with the integral over frequencies and can be applied directly to the FFs as follows: for µ, ν, η = x, y, z. Note that we have dropped the explicit dependence on T for brevity. By virtue of the Eq. (15) and subsequently, Eq. (8), the derivatives propagate from the frequency-domain representation of the control matrices to its time-domain counterpart according to Employing the chain rule, and noting that ∂Ων,n n δ µρ , we arrive at the derivatives for the control propagator. We can again exploit the piecewise constant control assumption to determine the derivative of the control propagator with respect to the control amplitude. Letting denote the partial control propagator, the derivative can be expressed as Finally, each derivative of the controlled evolution during t ∈ [t n , t n−1 ) can be computed via exact diagonalization [47], where the matrix elements of are given by The final expression required is the gradient of the ideal gate fidelity F G (T ). Using Eqs. (29) and (30), we find Together, the spectral leakage and ideal gate fidelity gradient expressions are used by F-GRAFS to generate controls for specified gate operations with tailored FFs. Below, we showcase the capabilities of F-GRAFS for a variety of noise and control scenarios.

IV. NOISE-OPTIMIZED GATES
In this section, we demonstrate F-GRAFS's ability to discover noise-optimized controls in two different control and noise scenarios. First, we consider the case of single axis control along σ x and dephasing noise only along σ z . In the second case, we study the more complex case of multi-axis control along σ x and σ y , with dephasing noise along σ µ , µ = x, y, z. In each subsection, we illustrate how to initialize F-GRAFS based on analytical expressions that can be tuned to the specifications of the CNB. Subsequent optimization is then used to produce optimized controls that simultaneously provide significant suppression of the FFs within the CNB and high-fidelity non-trivial single qubit operations. Lastly, we explore the relationship between the DPSS bandwidth parameter and the post-F-GRAFS residual spectral leakage. This analysis provides key insight into the interplay between the control bandwidth and the characterisics of the noise PSD.

A. Single Axis Control and Dephasing
We begin by studying a system driven via single axis control applied in the x direction and subject to time-correlated dephasing noise along the z axis. This scenario is compatible with the highly asymmetric case where the fractional power estimates are p z = 1 and p x = p y = 0. At the Hamiltonian level, we impose Ω y (t) = 0 and ⃗ β(t) = (0, 0, β z (t)) in Eqs. (2) and (4), respectively. Control along σ x and noise along σ z gives rise to two control matrix components R yz and R zz that ultimately contribute to F z (ω), the only non-trivial FF for this case. The objective function [Eq. (A11)] therefore reduces to a single integral focused solely on the spectral leakage of F z (ω) within the CNB B z . Below, we consider two types of CNBs defining two distinct gate types: (1) high-pass gates, where the noise is assumed to be low frequency and the CNB is described by a properly chosen high frequency cutoff dependent upon the characteristics of the dephasing noise PSD S z (ω) and (2) band-pass gates, where the noise PSD possesses both low and high frequency components. In the latter case, the CNB is determined by multiple cutoff frequencies to adequately capture the characteristics of S z (ω).

Analytically-Informed Initial Conditions
The FF design problem constitutes a non-convex optimization problem that strongly depends on the initial conditions used for the gradient-based optimization procedure. One approach is to randomly initialize the expansion coefficients α ν,k ; however, as we show in Appendix C, we find that this typically results in unstable solutions and unnecessarily large control amplitudes. We overcome this issue by utilizing primitive controls with straightforwardly and intuitively tunable FFs as initial conditions.
In particular, we employ constant drive (CD) control as an initial guess for the optimized control waveform. Known in noise characterization [48] and quantum signal detection [49], CD represents a simple control scheme in which the system is driven at a constant rate Ω(t) = Ω 0 for a time T . The amplitude Ω 0 dictates the center frequency of the FF, while the total duration of the drive determines the spectral width; see Fig. 2(a) for example. Functionally simplistic, the CD FF F (ω, T ) ∝ T sinc((ω ± Ω 0 )), and in the limit T → ∞, this FF converges to delta functions centered around ±Ω 0 . The tunability and localization of the CD FF are key features that we exploit to initialize the F-GRAFS optimization.
Upon construction, the initial control waveform is projection into the DPSS basis and then optimized via F-GRAFS. First, the amplitude of the CD initial condition is determined by the CNB, while T is dictated by the desired gate time. The subsequent (initial) control Ω are determined via the DPSS orthogonality relation n δt = δ k,k ′ . F-GRAFS proceeds by optimizing these coefficients to reduce the spectral leakage of the FFs outside of the NB. CD affords some spectral concentration and reduced spectral leakage; however, as we will show in Sec. IV A 2, F-GRAFS can further lessen residual leakage while abiding by the desired bandwidth constraints of the control.

Optimized Control Waveforms and Filter Functions
Here, we illustrate the utility of F-GRAFS for constructing high-pass and band-pass gates using single axis control. Fig. 2 serves as the focal point of this discussion, where we consider an π-rotation about the x-axis of the Bloch Sphere, U C (T ) = X π , as a representative case. Control profiles are displayed in the left column, while the right column shows the corresponding FFs and CNBs (shaded regions). Each row indicates a different control and/or noise scenario. We begin with a discussion focused on how the initial conditions are informed by the CNB and then move to examining the optimized controls and FFs resulting from F-GRAFS. Optimizations are performed assuming a tolerance of 10 −10 which typically requires O(100) iterations of the SLSQP algorithm.
The amplitude of the CD initial condition is set by either the high frequency cutoff or the center frequency of the pass band. When the noise PSD predominately resides at low frequencies, the CNB is defined according to the high frequency cutoff ω H : B = [0, ω H ). Note that the size of a low frequency noise CNB equals its high-frequency cutoff, i.e. B = ∫ x (t) = ω H to center the FF at the edge of the NB; see Fig. 2 x (t) ≥ ω H would suffice, however, centering a CD FF at higher frequency comes at the cost of higher amplitude controls. Furthermore, in practice, we find Ω x (t) = ω H to be a sufficient initial condition for achieving an optimized control within approximately 100 iterations of F-GRAFS.
A similar approach is used when defining initial conditions for band-pass gates. Noise PSDs that have significant support at both low and high frequencies require CNBs to be defined as unions of disjointed regions in frequency space. For simplicity, we consider the case where there are two such regions: one at low frequency B 1 and another at intermediate frequencies B 2 . The pass band existing between these CNBs can be defined according to a lower end ω and noiseless band of width ∆ω. In terms of these parameters, the CNB is given by Initial conditions defined from B are dependent upon the lower cutoff frequency of the pass band, where Ω x (t) = ω can be used to initialize the optimizer.
F-GRAFS offers improved suppression of the FF over CD within the CNB. In Fig. 2(c) and (d), the resulting F-GRAFS optimized control and FF, respectively, are shown for a highpass gate. The normalized bandwidth of the DPSS basis is set to W = 2ω H × δt 2π. This choice is based on a bandwidth analysis of the FF suppression within the CNB; see Sec. IV A 3 for further details. Comparing Fig. 2(a) and (c), we observe that while the CD offers some initial suppression of the FF within the CNB, F-GRAFS further reduces the FF contributions by approximately 4 orders of magnitude. Interestingly, the control profiles required to achieve this enhance- Band-pass optimized controls are shown in (g) and (i), with their associated FFs shown in (h) and (j). The passband is determined by B = B1 ∪ B2, where B1 = [0, ω ) and B2 = (ω + ∆ω, ω H ). The low-frequency cutoff is given by ω = 0.004 × 2π δt, the spectral width of the passband is ∆ω = 0.01 × 2π δt, and the high-frequency cutoff of B2 is ω H = 0.018 × 2π δt. Panels (g) and (h) use a DPSS of size K and bandwidth W = 2 B × δt 2π, while panels (i) and (j) utilize K ′ DPSSs and W ′ = 2W to impose near-zero control boundary conditions. All scenarios described above use a total number of N = 1000 timesteps. In all cases, the optimized FFs attain several orders of magnitude improvement in cancellation within the CNB regions over CD. ment are qualitatively similar to the CD control. The distinctions lie in the high frequency oscillations centered about the CD-like control profile and the control boundaries Ω(0) and Ω(T ) that differ from zero.
Through the DPSS basis, conditions on the endpoint to maximum control amplitude ratio can be imposed intrinsically, bypassing the need to append additional constraints to the objective function. From a practical perspective, control boundary conditions are desirable for ensuring the creation of viable optimized control profiles that abide by control hardware slew-rate limitations. Such conditions can be included in optimized control schemes via additional constraints [15] or by analytically enforcing the constraints prior to optimization [39]. The F-GRAFS approach essentially straddles the two approaches by imposing constraints on the DPSS basis elements prior to the optimization. More specifically, control boundary constraints can be indirectly enforced by imposing a minimum tolerance on the DPSS eigenvalues described in Eq. (16). The highest order DPSSs within K ≤ 2N W are the least spectrally concentrated and have nonzero amplitudes at the boundaries [25]. By enforcing a spectral concentration constraint of λ k ≥ η, where η is the desired tolerance, one can circumvent this issue and impose approximate boundary conditions on the DPSS basis. In order to maintain the same number of basis functions, the bandwidth W must be artificially increased from W to W ′ .
In practice, we find that boundary conditions can be sufficiently maintained by demanding 99% concentration (η = 0.99). Basis cardinality is preserved by doubling the normalized bandwidth W ′ = 2W , and taking the first K ′ = 2⌊N W ′ ⌋ − 4 DPSSs to form the new basis. An example of the control profiles and FFs resulting from the truncated basis optimization are shown in Fig. 2(e) and (f) for the high-pass filter case. FF suppression within the CNB is comparable to the results showing in panels (c) and (d). The control endpoint amplitudes are Ω(0) = 0.01×ω H and Ω(T ) = 0.05×ω H , compared to Ω(0) = 1.13 × ω H and Ω(T ) = 1.32 × ω H obtained without the eigenvalue concentration constraint. Furthermore, we observe an improvement in spectral concentration. Controls obtained from the truncated basis with bandwidth W ′ possess greater than 99% concentration within (−W, W ), while optimized control waveforms constructed with a DPSS basis of bandwidth W typically achieve between 70 − 90% concentration. Thus, F-GRAFS is able to achieve similar noise suppression capabilities, while approximately satisfying boundary conditions and attaining improved spectral concentration. Note that the endpoints can be further reduced by increasing spectral tolerance -and increasing bandwidth, if one wishes to preserve basis cardinality.
F-GRAFS supports band-pass gate design for more complex noise scenarios. In Fig. 2(g), F-GRAFS optimized controls are displayed for the X π gate and the CNB B = B 1 ⋃ B 2 , as described earlier in this section. The spectral width of the pass band is ∆ω = 0.01 × 2π δt and the high frequency cutoff of B 2 is ω H = 0.018 × 2π δt. The control bandwidth is set to W = 2 B × δt 2π, where B = ω H − ∆ω. This choice is based on an analysis of the spectral leakage in the CNB as a function of bandwidth; see Sec. IV A 3 for further elaboration. Noise suppression afforded by the optimized controls is depicted in Fig. 2(h), where the FF is shown to have spectral nulls within the CNB regions. In comparison to the CD initial condition, the F-GRAFS solutions achieve greater FF suppression; approximately four orders of magnitude improvement. Similar performance characteristics are observed when leveraging a truncated basis to satisfy control boundary conditions; see Fig. 2(i) and (j).

Optimal Control Bandwidth for Single-Axis Noise
The connection between control bandwidth and noise characteristics plays an important role in achieving noise-robust quantum gates. Control hardware is often subject to constraints on amplitude and bandwidth. In Sec. IV A 1, we alluded to the dependence of the FF on control amplitude when selecting initial conditions. Namely, the spectral cutoffs of the noise determined the control amplitude. Here, we examine the dependence of the FF on control bandwidth. In particular, we exploit the intrinsic tunability afforded by the DPSS to study Numerical experiments indicate a uniform dependence of Γ(T ) on control bandwidth for both high-pass and band-pass gates. Both gate types exhibit a distinct phase transition in spectral leakage at the critical bandwidth W c = 2 B × δt 2π. Control bandwidths below the critical bandwidth result in significant spectral leakage. In contrast, we observe convergence in Γ(T ) for W > W c , likely due to saturation of the designated optimization tolerance. Despite potential optimizerdependent features at high control bandwidth, the emergence of the critical bandwidth appears to be generally only dependent on the size of the CNB. This seemingly universal behavior suggests that for a single qubit subject to single axis control and dephasing, the optimal control bandwidth is W c .
Establishing an analytical justification for this relationship between W c and properties of the noise is challenging. This is primarily due to the non-linear relationship between the control and the FFs. That said, it is worth noting that the expression of W c is strongly reminiscent of the Nyquist-Shannon sampling theorem [50]. This theorem states that in order to effectively reconstruct a signal of a given bandwidth B, it suffices to sample at a frequency of f s = 2B. Without further constraints imposed on the signal, the theorem determines that this sampling frequency is both sufficient and necessary. It is within this context of classical signal reconstruction that we propose the following intuition. Rather than considering F-GRAFS as a method for optimally filtering noise, let us treat it as an approach for finding the controls required to "reconstruct" an ideal FF that optimally filters noise within the CNB B. According to Fig. 3, such a reconstruction requires a minimum control bandwidth of W c = 2 B × δt 2π in units of 2π δt. This value is, in turn, also the one needed to sample and reconstruct a signal of bandwidth B . This suggests that the problem of filtering a noise with a CNB of size B is equivalent (in the sense of resources needed) to sampling and performing signal reconstruction of a function of bandwidth B .
The results obtained from F-GRAFS allow for an alternative interpretation, in terms of the Landau-Pollak theorem [27]. This theorem states that the dimensionality of a signal of bandwidth W (in units of Hz) and total duration T is 2W T , or 2N W as expressed in the present units and quantities. This is obtained by showing that the DPSSs can optimally approximate any time and bandlimited function of bandwidth W with a DPSS basis of only 2N W elements. Through F-GRAFS, we find that the optimal DPSS basis uses a bandwidth W c and 2N W c elements. This basis is the one capable of approximating any time and bandlimited function of bandwidth W c as well, and F-GRAFS shows that the controls capable of noise filtering belong to this set of functions. This seems hardly a coincidence: the amount of resources (degrees of freedom) needed to filter a noise with a CNB of size B are the same as the ones required to approximate a signal of bandwidth 2 B , with effective dimensionality 4N B as given by the Landau-Pollak theorem. Let us reiterate that this signal of bandwidth 2 B is the one capable of filtering a noise with CNB size B .
Based on these arguments, we claim that the FFs bridge the gap between the noise and the controls. Based on the sampling theorem, classically one could think that if the noise functions β(t) reach a maximum bandwidth ω H ≥ B , the frequency 2ω H ≥ W c would have to act as an absolute minimum for the control functions bandwidth. The results from F-GRAFS show that the non-linear FF transformation is able to capture the essential degrees of freedom of the noise needed for can-cellation, compressing it into a space of dimension 2N W c , with W c = 2 B × δt 2π.

B. Multi-axis Control and Dephasing
In situations where noise contributions are not limited to a single axis, F-GRAFS can be employed to simultaneously suppress noise along multiple axes. We illustrate this feature by considering two axis control, Ω ν (t) ≠ 0, ν = x, y, and noise along all three Pauli axes: ⃗ . We assume cylindrical symmetry and therefore require β x (t) = β y (t) = β xy (t). To simplify the analysis, in the present section we consider the symmetric case, where the fractional power estimates are equal p x = p y = p z = 1 3, and vary the relative sizes of the CNBs.
In the absence of cross-correlations, six control matrix components give rise to three unique FFs F µ (ω) and PSDs S µ (ω), µ = x, y, z. The objective function [Eq. (A11)] now contains three contributions over CNBs B µ . Below, we investigate the efficacy of F-GRAFS for arbitrary single qubit gates in both the high-pass and band-pass case.

Analytically-Informed Initial Conditions
The increased complexity in the control, FFs, and desired gate operations poses new challenges for the F-GRAFS optimization problem in the multi-axis setting. In particular, the ability of the algorithm to convergence on viable solutions is strongly dependent upon the initial conditions. In many cases, random initial conditions are not sufficient. Hence, we take an approach similar to the single-axis case and rely on analytically-informed initial conditions to improve algorithmic stability and convergence.
Initial conditions for multi-axis control optimization are constructed from the high-pass filtering scenario. More specifically, we consider the case where B x = B y = [0, ω xy ) and B z = [0, ω z ). Initial conditions are derived based on a simplistic control ansatz: CD along one axis and a square wave along the second control axis. This approach is inspired by the single axis case and canonical dynamical decoupling sequences [11,51] that leverage rapidly fluctuating control to mitigate noise. Despite its simplicity, this control ansatz proves to be amenable to more general scenarios beyond the conditions under which it is derived.
The analytical form of the initial conditions is most conveniently expressed in polar coordinates. The controls are thus represented as ⃗ Ω(t) = Ω (cos φ(t), sin φ(t)), where φ(t) = φ 0 s λ (t). The modulation function s λ (t) = ±1 defines a square wave with unit amplitude and control frequency λ = 2π T c . T c is the control period, and it satisfies T = M T c for some positive integer M . An example of this control is given in Fig. 4(a). The values of the control parameters required to filter low frequency noise up to ω z = n z δω and ω xy = n xy δω are found by setting M, θ as defined below and solving the following system of equations Solving the third equation numerically using standard optimization libraries, one finds a family of potential parameter choices. Motivated by efficient use of resources, we select the solution that minimizes ξ and as a consequence the control amplitude. Note that much like the single axis case, this control ansatz affords intuition in parameter selection. Namely, the control amplitude is proportional to the sum of the noise cutoff frequencies M , up to a factor determined by ξ. Furthermore, the single-axis initial control ansatz can be recovered by setting ω xy = 0, which in turn yields φ 0 = 0 and single axis CD control. Further details on the derivation of the system of equations given in Eq. (35) can be found in Appendix C 2. An example of initial conditions for the controls and their associated FFs are shown in panels (a) and (b) of Fig. 4, respectively. Note that Ω x (t) is chosen to drive the single qubit system according to a CD, while Ω y (t) utilizes the fluctuating square wave. Results are shown for ω z = ω xy = 0.02 × 2π δt. For this specific case of noise parameters, n xy = ω xy δω = n z = ω z δω = 8. This yields controls parameter values of M = 16 and θ = π. Additionally, we obtain the smallest value of ξ ≈ 1.937 after solving the third equation in Eqs. (35) and φ 0 ≈ 0.711 after inverting the last one.

Optimized Control Waveforms and Filter Functions
F-GRAFS optimized control waveforms and FFs for the multi-axis case are displayed in Fig. 4. The left column contains the controls, while the right column shows the FFs. All cases perform the optimization of the same arbitrary single qubit gate using N = 1000 timesteps. We illustrate the utility of F-GRAFS in two scenarios: the fully high-pass case and a hybrid band-pass/high-pass case. The former showcases F-GRAFS' ability to uniquely tailor FFs based on distinct CNBs and therefore noise properties. The latter further conveys this message with a more complex noise scenario.
Here, we show that F-GRAFS can provide significant suppression of multi-axis, non-uniform low-frequency noise. Prior to optimization, the initial conditions are set according to the procedure described in the previous section. Thereafter, the controls are projected into the DPSS basis using a control bandwidth W = 2 B × δt 2π. We elaborate on this choice in bandwidth in Sec. IV B 3. In Fig. 4, panels (c) and (d), we show F-GRAFS optimized controls and FFs for a particular non-uniform high-pass gate scenario. Panel (d) includes F xy (ω, T ) = F x (ω, T ) + F y (ω, T ) in the top half of the plot, while the lower half displays F z (ω, T ). The example scenario is described by a CNB B xy = [0, ω xy ), where ω xy = 0.02 × 2π δt, and B z = [0, ω z ), with ω z = 2ω xy . Op- timized control profiles maintain much of the qualitative features of the initial conditions. Yet, through small, optimized fluctuations in the controls, F-GRAFS controls yield many orders of magnitude improvement in FF suppression within the CNBs. Lastly, we explore the hybrid case where optimized controls generate both high-pass and band-pass FFs. We consider the case where B xy = [0, ω xy ) and B z = B z1 ∪ B z2 ; thus, requiring high-pass filtering along σ x and σ y and band-pass filtering along σ z . The CNB B xy is bounded by ω xy = 0.02 × 2π δt, with the low-frequency CNB for the σ z channel also being determined by ω xy : B z1 = [0, ω xy ). The passband is chosen to reside between B z1 and B z2 , where B z2 = (ω xy + ∆ω, ω z ). The high-frequency CNB is characterized by the width ∆ω = ω xy and high frequency cutoff ω z = 4ω xy . Upon optimization, we find that F-GRAFS offers substantial FF suppression within the CNBs; again, approximately ten orders of magnitude.
In examining the optimized control waveforms in Fig. 4(c) and (e), a notable observation is the apparent resemblance between Ω y (t) and a sinusoidal function. As we show discuss in Appendix C 2, one can consider an alternative initial control ansatz, where the square wave is replaced with a sine func-tion: Ω ν (t) = A ν + B ν sin(λt). Although lacking an analytical proof of its effectiveness, it can be shown numerically to perform equally as well as the initial conditions presented in Sec. IV B 1.

Optimal Control Bandwidth for Multi-axis Noise and Control
Here, we investigate the relationship between the properties of multi-axis noise and the control bandwidth. In Fig. 5, the spectral leakage Γ(T ) is shown as a function of normalized control bandwidth. We consider 13000 high-pass gate scenarios using non-uniform CNBs and the Clifford+T gate set S = {I, X, Y, Z, H, S, T } as the desired operations. As in the single-axis case, a strong relationship between control bandwidth and the high-frequency cutoffs is observed. More specifically, Γ(T ) decreases with increasing bandwidth, where the most rapid decline occurs near the critical frequency W c = 1.5 × B × δt 2π where B = ω z + 2ω xy . The abrupt transition thereafter manifests due to the saturation of the optimizer to the specified gradient tolerance. Note that this critical proportionality factor of 1.5 between W and B × δt 2π, is lower than the critical value of 2 found in Sec. IV A 3 for the case of single axis noise and control.
We conjecture that the reduction in the optimal bandwidth condition is due to the additional degree of freedom in the control. As such, we investigate the dependence of Γ(T ) on W by reducing the noise degrees of freedom to single-axis dephasing along σ z and maintaining multi-axis control. The initial conditions for multi-axis control are determined by the properties of the noise and return to the single-axis CD case along σ x when n xy = 0. Despite the single-axis initial condition, the optimizer has freedom to activate the control along σ y . Optimized y control are in general non-zero, but typically remain smaller in amplitude than the optimized x control. The inset in Fig. 5 shows the results of the F-GRAFS optimization for the single-axis noise and multi-axis control setting using the S gate set. Black dots represent the minimum spectral leakage over the gate set, while the crosses represent the mean values. Interestingly, the mean values revert to the single-axis noise and control critical bandwidth with a proportionality factor of 2, with minimum values being consistent with a proportionality factor of 1.5. Examining the control profiles, we find that the optimized controls more closely resemble single-axis optimized control along σ x , with a small fluctuating component along σ y .
Control power remains relatively constant despite the additional control degree of freedom. A reduction in control bandwidth could imply an increase in an alternative control resource, such as control power. In order to eliminate this possibility, we investigate the dependence of the optimized control power on control bandwidth. We find that no distinguishing features appear for W > W c . Furthermore, the power of the optimized controls remains close to those of the initial conditions. This suggests that the improvement in performance is not provided by an increase in control amplitude, but rather due to the additional availability of control along σ y ; see Appendix D 1 for further details.
Although the increased degree of control appears to play a role in determining the critical control bandwidth, there are other factors within the optimization problem that can also alter W c . The F-GRAFS optimization problem is parameterized by the noiseless gate fidelity tolerance G . While W c does not appear to vary with G in the single-axis control case, we observe dependence on this parameter in the multi-axis case. In particular, increasing G in the SLSQP optimization facilitates a reduction in W c from 2 to 1.5 times the size of the CNB for the multi-axis control and single-axis noise scenario. The magnitude of the spectral leakage does not suffer from the lower fidelity tolerance, and degradation in ideal gate fidelity appears to be rather negligible. Thus, we suspect that the critical bandwidth coinciding with the single-axis control is due merely to the optimizer rather than being an intrinsic property of the control problem.
An additional interesting feature of the multi-axis control setting is that W c is gate dependent. By adjusting the fidelity tolerance, one can achieve a near-1.5 proportionality for single-axis noise for a subset of gates in S. Among the gates that typically require bandwidth closer to twice the CNB is the identity. An adjustment of the initial conditions can yield reductions in critical bandwidth for the identity gate as well as other gates; however, the subset of gates that convey bandwidth improvements is predominately initial-conditiondependent. This behavior indicates that while the initial conditions shown here possess intuitive features, they are not universally favorable for all gates.

C. On Optimal Bandwidth and Reachability
The relationship between the characteristic timescales of the control and noise has been key to understanding the effectiveness of a control strategy in quantum control. For example, in pulse-based dynamical error suppression, the typical statement is: the interaction between the system and its environment can be effectively averaged out by utilizing pulses with inter-pulse delays much shorter than the characteristic timescale (or equivalently, the inverse of the high frequency cutoff) of the noise [10,52,53]. Albeit qualitatively instructive, this guiding principle is quite nebulous in that it is specific to ideal, instantaneous, pulse-based schemes and does not encompass more generic, smooth control. Moreover, it does not speak to optimality when striving to minimize control resources (such as bandwidth and power) while maximizing the effectiveness of the control.
The numerical studies in Secs. IV A 3 and IV B 3 address these issues and provide quantitative insight into the interplay between control bandwidth -in the multi-axis control case, control power as well -and the spectral properties of the noise. Empirical bounds enable the identification of optimal control bandwidth conditions for a variety of single and multiaxis control and noise scenarios, including those where the noise has significant spectral support at low and high frequencies. As a result, we supplement general criteria for pulsebased error suppression with explicit conditions that apply to a wide range of smooth control strategies and complex noise environments. Furthermore, our numerical analysis of optimal bandwidth speaks to notions of reachability when subject to limited control resources. In control theory, reachability refers to the ability to drive a system from a given initial state to a set of final states, i.e., a reachable set. Equivalent notions of reachability have been developed in the quantum domain, where the reachable set can be described by a set of achievable unitaries [54,55]. Gate fidelity measures such as Eq. (11) are commonly used to quantify distance between the target and controlled unitaries and determine reachable sets that can be achieved within a specified tolerance [55,56]. The connection between Eq. (11) and the spectral leakage via the FFF suggests that Γ(T ) can act as a proxy for investigating reachability. It is within this context that we associate the minimum spectral leakage with attaining the reachable set. Thus, we find empirical evidence for saturation in reachability for control bandwidths beyond W c in both the single-axis and multiaxis control settings. The reachable set for single-axis control corresponds to arbitrary X rotations, while S serves as the reachable set for multi-axis control. Note that in the latter case, by the Solovay-Kitaev theorem [57], the reachable set provides access to the full SU(2) group and therefore, speaks to notions of controllability with limited control resources as well. Lastly, we note that while this approach does not supply rigorous analytical insight, it can be quite informative for identifying regimes where one expects to achieve a reachable set of logic operations with high fidelity when subject to control bandwidth constraints and a variety of control and noise scenarios.

V. F-GRAFS EFFICACY IN SIMULATIONS
In Sec. III A, the F-GRAFS optimization problem is defined through the spectral leakage as opposed to a distance metric. However, we argue that solving the F-GRAFS problem can be viewed as optimizing the upper bound on the phase invariant distance D(U G , U (T )) in Eq. (A1). We substantiate this claim in this section by comparing the upper bound calculated through the F-GRAFS objective function to full dynamics simulations of a noisy single qubit driven by optimized control.
The efficacy of F-GRAFS is examined for a single qubit subject to multi-axis additive dephasing. Each noise component β µ (t), µ = x, y, z, is defined as a Ornstein-Uhlenbeck (OU) process. The PSD of the OU process is given by where σ denotes the standard deviation and the parameter γ is effectively related to the correlation time of the noise τ ∼ 1 γ. For simplicity, we assume uniform noise along all three Pauli channels, i.e., β µ (t) is generated by a process with an equivalent standard deviation and correlation time for all i. Note that cross-correlations are not permitted by construction, a condition enforced through this study. Each noise process is simulated by β n+1 = (1−γδt)β n +σ √ 2γw n , where w n and β 0 are drawn from normal distributions, with variance √ δt and σ, respectively. The parameter δt again denotes the resolution of the control.
The control bandwidth and CNB are determined by the spectral features of the noise. The OU process defines a Lorentzian spectrum that is primarily concentrated at lowfrequency. As such, the objective of F-GRAFS is to engineer high-pass gates with minimal spectral support in the CNB B µ = [0, ω H ), µ = x, y, z. A relationship between the high-frequency cutoff ω H and γ can be determined analytically through explicit integration of the PSD. Denoting the total noise power as P (σ) = ∫ ∞ 0 S OU (ω)dω = πσ 2 , it can be shown that the fractional power in the CNB ∫ Bµ S OU (ω)dω P = 1 − can be used to derive ω H = γ tan[(1 − )π 2]. We demand that 99% of the noise power be concentrated within the CNB and therefore choose = 0.01 for the optimization. Note that the specifications of ω H also dictate the optimal DPSS bandwidth W = 2 B = 6ω H used in this example.
Confirmation of the upper bound for F-GRAFS optimized controls is displayed in Fig. 6. Full dynamics simulations are averaged over 1000 noise realizations and the single qubit Clifford+T gate set. The solid lines correspond to the averages of the phase invariant distance, computing U (T ) [Eq. (6)] from simulations. Dashed lines denote the upper bound from Eq. (A1) computed using the F-GRAFS minimized spectral leakage. The discrepancy between the curves, and therefore the tightness of the bound, is dictated by omitted contributions from both K(T ) and the 1% leakage outside of the CNB; see Appendix A for further insight.
The upper bound is maintained over a wide range of noise powers, justifying the F-GRAFS approach. As a surrogate objective function for the phase invariant distance, the spectral leakage (subject to an ideal gate constraint) proves to be sufficient for designing temporally correlated noise-robust gates for single qubit systems. Furthermore, the upper bound supports the use of NBs/CNBs rather than the complete noise spectrum. This observation provides an alternative perspective and potential focus for quantum noise spectroscopy protocols that may substantially reduce the typical cost of estimating the full noise PSD.

VI. CONCLUSION
In summary, we have introduced a method for optimizing control in the presence of temporally correlated noise based on the FFF. Known as F-GRAFS, this approach seeks to simultaneously tailor FFs to minimize spectral overlap with a noise PSD and achieve non-trivial single qubit operations. Motivated by the need for improved F-GRAFS algorithmic convergence, we develop analytical control ansatze that are intuitively tunable based on the spectral cutoffs of the noise PSD. Their structure, albeit simplistic, is highly versatile and applicable to a variety of multi-axis noise scenarios. Furthermore, these analytical control schemes prove to be key to achieving fast F-GRAFS algorithmic convergence.
F-GRAFS accommodates practical limitations in control hardware through the use of the DPSS basis. Characterized by an intrinsic bandwidth parameter, the DPSS provide a natural approach to constructing optimized controls that inherently abide by control hardware restrictions. We show that F-GRAFS can produce optimized control waveforms that significantly reduce spectral support of the FFs in designated frequency bands, while maintaining spectrally concentrated control.
Together, F-GRAFS and the DPSS basis provide key insights into the connection between control parameters, noise characteristics, and optimal FF design. We leverage the intrinsic tunability of the DPSS to examine the noise suppression capabilities of optimized control protocols as a function of the DPSS bandwidth. We show that in both single axis and multi-axis noise scenarios there exists an identifiable optimal bandwidth proportional to twice and one-half the size of the region over which the FF is to be suppressed, respectively.
Follow-on work would focus on providing an analytical understanding of the optimal control bandwidth condition and extending F-GRAFS to the multi-qubit regime. The nonlinear relationship between the control and FF poses challenges for analytically deriving the optimal bandwidth condition. However, an analytical proof may shed light on key features of optimal control in the presence of temporally correlated noise processes. Extensions of F-GRAFS beyond the single qubit case could aid in expanding and generalizing the relationship between optimal control and the spectral properties of the noise. In this section, we derive the F-GRAFS objective function. As described in the main text, to quantify the performance of the optimized gates, we use the phase-invariant distance Eq. (21). Taking the square and using the triangle inequality, the following upper bound on the average squared-distance can be established: where ⟨⋅⟩ denotes an average over noise realizations. K(T ) is the cross term resulting from expanding the square. We now focus our attention on the average noise fidelity F N (T ) and relate it to the spectral leakage F Γ (T ). Through Eq. (12), F N (T ) can be related to the overlap χ(T ). As seen in Eq. (13), χ(T ) is defined as the overlap integral between the PSDs and the FFs, summed over all axes µ = x, y, z In the last line, we imposed the restriction that since the total pulse time T is finite, the frequency domain will be discretized in steps of δω = 2π T . In going from the first to the second line, we separated the frequency domain into two disjoint regions: the null-band (NB) B and its complement (CNB) B. As described in the main text (see Sec. III A), the NB is defined as the largest (not necessarily connected) subset of frequencies over which the PSD has fractional powers in the NBs µ ≪ 1, i.e. 1 P µ n∈Nµ S µ,n δω = µ , where P µ = ∑ n∈Nµ∪N µ S µ,n δω is the power along the µ-th channel. The NB and CNB are normalized and discretized into their discrete versions N µ , N µ = ⌊B µ δω⌋, ⌊B µ δω⌋. The sets N µ , N µ are disjoint subsets of natural numbers satisfying N µ ∪ N µ = [0, ..., N − 1], where N = T δt is the number of time steps for the control functions. The time step δt depends on hardware limitations. The discretized FFs F µ,n are defined as the averages over the frequency windows [nδω, (n + 1)δω) of the filter functions F µ,n = δω −1 ∫ (n+1)δω nδω F µ (ω)dω, for n = 0..N − 1. A similar statement can be made for the discrete PSD S µ,n .
Interpreting the sums (or integrals in the continuous case) over frequencies as 1-norms ⋅ 1 = ∑ n∈X ⋅ for some subset of integers X, we can use Hölder's inequality f g 1 ≤ f 2 g 2 [58] to bound these expressions, where the 2-norm is ⋅ 2 = ∑ n∈X ⋅ 2 . Additionally, since in practice these are finite dimensional spaces, we know that the sum of a positive sequence f n will satisfy the triangle inequality ∑ n f 2 n ≤ (∑ n f n ) 2 . Consequently, we have where the summation is performed over the discretized frequency regions X = {N µ , N µ }. We can use this inequality to bound the sums over {N µ , N µ } as follows: Additionally, we have used ∑ N −1 n=0 F µ,n δω = T , i.e., the integral of the FF over all frequencies is equivalent to the total time. Note that in the optimal case, where all of the spectral weight of the PSD is in the CNB ( µ = 0) these terms in Eq. (A5) all converge to zero.
Similarly, using Eq. (A4), we bound the integral over the CNB with Γ µ (T ) = T −1 ∑ n∈N µ F µ,n δω, where the factor of T −1 is added explicitly to keep the functions Γ µ dimensionless. Combining the bounds from Eqs. (A5) and (A6) and requesting the same level of noise spectral concentration along all axes, i.e. µ = ∀µ, we find Here, we have used P = ∑ µ=x,y,z P µ and introduced where p µ = P µ P . This implies that, to zeroth order in the fractional power, the noise fidelity is bounded by Finally, the distance will be bounded by which, as long as can be kept small, justifies Γ(T ) in Eq. (A11) as our choice of objective function. In practice, the gate fidelity F G (T ) can be kept as close to 1 as desired by setting it as a constraint in a constrained optimization using an optimizer such as SLSQP. In the main text, we show in simulation how this assumption yields good noise filtering controls.
In the bound above, F Γ (T ) is dependent upon the fractional noise power within the CNB, which can place additional requirements on noise characterization protocols. The spectral leakage in Eq. (A8) is composed of a sum of terms, each weighted by the power weights in the µ-th direction p µ . While refined estimates of noise power may require more QNS resources than those required to determine spectral cutoffs, even rough estimates of noise PSDs can provide sufficient information to identify dominant noise channels. We denote these estimates as p µ within the F-GRAFS objective function where the final expression denotes the continuous frequency representation. Two distinct scenarios arise from this definition. In the case when such estimates reveal a highly asymmetric noise scenario, one can approximate the spectral leakage by the FF corresponding to the dominant noise source. We denote this configuration as the single-axis noise case. In the main text we discuss the case where noise is dominant along the z direction, i.e., p z ≈ 1 and p x ≈ p y ≈ 0. On the other hand, the symmetric case can be considered, where the noise power along all channels is nearly equivalent, i.e., p µ ≈ 1 3 for all µ = x, y, z. In the main text we refer to this noise configuration as the multi-axis noise case. In practice, the fractional noise estimates will lie in between 0 and 1, with the condition that ∑ µ p µ = 1.
Lastly, one could consider choosing an alternative objective function when a reliable estimate of the total power of the noise is available. When this is the case, it is possible to use the combined fidelity as an objective function instead of Γ(T ). The gradient can be computed following the same steps as described in the main text Sec. III B and using the chain rule. The combined fidelity allows us to optimize without constraints, for example utilizing an optimizer like L-BFGS-B. See Appendix B for further information on the choice of optimization methods.

Appendix B: Optimizer comparison
The F-GRAFS optimization algorithm described in Sec. III B can be executed using different gradient descent methods. In this section, we compare the following algorithms: SLSQP, L-BFGS-B, trust-region constrained (TRC) and Nelder-Mead (NM). We study the performance of each algorithm as a function of wall time. Algorithmic performance is examined using Φ(T ) or Γ(T ) as an objective function. The latter being only applicable for SLSQP and TRC. We denote those using Φ(T ) as unconstrained, e.g., unconstrained SLSQP is denoted by U-SLSQP. Similarly, algorithms utilzing Γ(T ) are defined as constrained, e.g., constrained SLSQP is designated by C-SLSQP.
In Fig. 7, we present a summary of our results for singleaxis (left) and multi-axis (right) noise and control. Here, we see that the methods C-SLSQP (blue), L-BFGS-B (green), and U-TRC (purple) are the only algorithms that consistently achieve the desired levels of objective function reduction within the expected timeframe. In the single-axis case, L-BFGS-B and U-TRC are indistinguishable within errorbars, and present only a slight advantage with respect to C-SLSQP. In the multi-axis case, L-BFGS-B is the fastest method by approximately a factor of 3, while U-TRC and C-SLSQP are indistinguishable within errorbars. From this, it can be concluded that the fastest method to optimize Φ(T ) is L-BFGS-B, and should be used if knowledge of the total noise power is available. On the other hand, C-SLSQP performs the best for the constrained optimization of Γ(T ) subject to F G (T ) > 1 − G , for some G ≪ 1.
Appendix C: Initial conditions and control ansatz

Single-axis: Constant Drive
The convenience of utilizing CD as initial condition for the optimization can be seen by analytically computing the associated FF. For the present analysis, we consider the scenario with single-axis control along x with dephasing noise along z. In this case, the FF takes the form F (ω, T ) = where T is the total time. CD control is achieved by setting Ω(t) = Ω 0 , from where we obtain Θ(t) = ∫ t 0 Ω(s)ds = Ω 0 t. Using Θ(t) in Eq. (C1), it is possible to compute the integral explicitly, obtaining where in the second line we consider the infinite T limit. It is straightforward to see that in this limit, the FF converges to delta functions centered around ±Ω 0 , normalized such that δ(0) = 1. From Eq. (C2), it follows that Eq. (13) leads to an overlap χ(T ) = T S(Ω 0 ), where we used the fact that semiclassical noise PSDs are even around ω = 0. In order to minimize the overlap, the driving frequency should be tuned to the minimum of the PSD, i.e., Ω 0 = argmin ω S(ω). It is worth studying the case of monotonically decreasing S(ω), e.g., 1 f noise, where no global minima exist. In order to reduce the  [0, 2π), and the Clifford+T gates in the latter. The shaded regions represent the standard error over these averages. In both cases, the high-frequency noise cutoff was set to ω H = 0.08 × 2π δt. The single-axis cases undergo only dephasing (z axis) noise and x axis control, while the multi-axis noise see x, y, z noise and x, y control. For all methods considered, optimizations using Φ(T ) were performed (U-SLSQP, L-BFGS-B,U-TRC,NM). For the methods that allow for constrained optimizations, additional optimizations using Γ(T ) as objective function (C-SLSQP, C-TRC labels) and treating the gate fidelity as a constraint were studied. In the single-axis case, it is clear that the methods achieving the best performance are C-SLSQP, L-BFGS-B and U-TRC methods. In the multi-axis case, L-BFGS-B presents the best performance. All runs were performed with an Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz 2.30 GHz processor.
overlap between the FF and the PSD, Ω 0 should be chosen as large as allowed by hardware, provided that it does not violate additional constraints on the control. An additional argument in favor of the CD control ansatz comes from noting that CD is the solution with minimum power for a given rotation angle Θ(T ) = ∫ T 0 Ω(t)dt = Ω 0 T . Suppose another control functionΩ(t) produces the same rotation angle, i.e., ∫ T 0Ω (t)dt = Θ(T ), then the power of this new control is In (1) we used the assumption that the area of the difference is zero ∫  Lastly, we can see in Fig. 8 that using CD (dashed lines) as initial condition of F-GRAFS provides a qualitative advantage over random initialization (RN, solid lines). Each curve represent the values of the objective function Γ(T ) as a function of the optimization steps. The optimizations produce highpass filters implementing identity gates, with high-frequency cutoffs of ω H = 0.02 × 2π δt (blue) and 0.04 × 2π δt (orange). While CD initilization is run once, random initializations are averaged over 20 different realizations. For the lower frequency noise, we see that CD finds a solution with Γ(T ) < 10 −14 in approximately 30 steps. Random initial conditions, on the other hand, take about 150 steps. For the higher cutoff case, CD initialization reaches the desired solution in about 200 steps, while RN is not capable of converging within 1000 steps. This example highlights the importance of using CD in minimizing the computational cost of the F-GRAFS optimization.

a. Derivation of Initial Conditions
In the multi-axis control and noise scenario, the quality of F-GRAFS solutions varies extensively when employing random initial conditions. In order to get consistent cancellation over the CNB and therefore improve algorithmic stability, it is necessary to narrow down the search space. Although CD control is an effective solution for the single axis case, any combination of CD in the x and y axes will lead to FFs with non-zero DC contributions. The multi-axis CD condition is given by ⃗ Ω(t) = (Ω x , Ω y , 0) = Ω 0 (cos φ 0 , sin φ 0 , 0), where the amplitude Ω 0 and phase φ 0 correspond to the control representation in polar coordinates in the xy plane. The presence of the DC component can be more easily interpreted in the case where noise is symmetric along x and y axes, implying Note how for the lower noise cutoff, the optimization using CD as initial condition improves exponentially until it stops before the step 50 at about Γn = 10 −14 , while the optimizations using random initial conditions averaged over ten realizations converge to similarly good solutions but taking about 150 steps. For the higher noise cutoff, the random initial conditions are not able to find good noise filtering solutions in average. This shows how CD can correct the instability of choosing random initial conditions, adding robustness, while at the same time improve the running time, since it not only satisfies the tolerance earlier, but it also does not require selecting from multiple realizations. S x (ω) = S y (ω). In this case, the two resulting FFs are Here, F z (ω, T ) is the same as in the single-axis case, but the longitudinal FFs F x (ω, T ), F y (ω, T ) have an extra term involving a sinc(x) function. Note that this combination of FFs is independent of the angle of φ 0 and only depend on the amplitude Ω 0 . The reason behind this is that multi-axis CD control, although acting on both axes simultaneously, is still a single axis control along the rotated direction given by φ 0 . As shown in Sec. (C 1), it is possible to choose the value of Ω 0 such that F z (ω, T ) does not present relevant low frequency contributions. On the other hand, the sinc 2 (ωT 2) term in F x (ω, T ) + F y (ω, T ) is concentrated around ω = 0 and is independent of the control. Consequently, there is no choice of multi-axis CD parameters that provides DC filtering along x and y axis simultaneously. This DC component makes multiaxis CD alone a poor choice for initial conditions when noise exists along multiple directions.
Another single-axis solution for low frequency noise that provides an intuitive way of shaping the FFs is oscillating control Ω(t) = Ω 0 cos(λt), where λ can typically be chosen as the high frequency cutoff ω H . These controls produce FFs that can be thought of as a frequency comb in λ and modulated by Bessel functions; more specifically F (ω) = T 2 ∑ k∈Z δ(ω − kλ)J k (Ω 0 λ) [59]. In order to cancel DC noise contributions, the control amplitude is chosen as Ω 0 = λx 0 , where x 0 is the first zero of the zeroth order Bessel function.
Motivated by oscillating and CD single-axis controls, as well as work with Walsh synthesized filters [19], we analyze the FFs obtained from CD control along one axis, and an oscillating square wave on the other axis. This is, we study the Hamiltonian in Eq. (2), with where s λ (t) is a square wave with unit amplitude and frequency λ, which defines the control period T c = 2π λ.
The polar control coordinates are Ω 0 = Ω 2 CD + Ω 2 SW , and cos φ 0 = Ω CD Ω 0 , sin φ 0 = Ω SW Ω 0 , which will be the representation used in the following discussion. In these coordinates, the control can be interpreted as having constant amplitude Ω 0 , and alternating phase φ(t) = ±φ 0 , depending on whether mT c < t < (m + 1 2 )T c or (m + 1 2 )T c < t < (m + 1)T c , for m = 0, 1, .., M −1. Here, M = T T c denotes the number of control periods that fit in the full control time duration. Note that in terms of the control frequency, M = λT 2π = λ δω, meaning that M is the discrete, normalized control frequency.
Hence, in what follows we will refer to M as a control parameter, rather than λ. The goal is, for fixed total time T , to find optimal parameters Ω 0 , φ 0 , M that minimize the FFs over a given CNB. We show next that it is possible to derive optimal values for control parameters that produce high-pass filters in all three axes.
The argument is as follows: first, we construct the noise Hamiltonian in the toggling frame with respect to the control; then we use this to extract the control matrix R(t) whose Fourier transform R(ω) is directly related to the FFs through Eq. (14). Lastly, we show that by rewriting R(ω) in a convenient way, it is possible to cancel all elements of the frequency space control matrix for some frequencies ω, and hence cancel the FFs within the CNBs.
In general, the time evolution operator will be hard to compute analytically. However, in the piece-wise constant control case, it is possible to arrive at a closed expression. Given the noise Hamiltonian in Eq (4), the toggling frame Hamiltoniañ where U C (t) is the control time propagator. Due to the periodicity of the control, the control propagator takes the form for m = 0, .., M − 1, wheren ± = (Ω CD , ±Ω SW , 0) Ω 0 = (cos φ 0 , sin ±φ 0 , 0).
The toggling frame Hamiltonian in the interval where we have used the Rodrigues' rotation formula [60]. Additionally, we define the half-period rotation matrix which captures the evolution induced by the control within a constant section of the control functions. The matrix R Ω0,φ0 (t) will be the building block of the full evolution. Note that it can be conveniently written, using Euler decomposition, in terms of basic rotations as R Ω0,φ0 (t) = R z (φ 0 )R x (−Ω 0 t)R z (−φ 0 ). Next, we compute the toggling frame Hamiltonian for the interval T c 2 < t < T c , From here it becomes clear that the action of the control consists of successive applications of the constant section rotation matrix, with variations in the parameters controlling the rotation.
The toggling frame Hamiltonian can then be constructed iteratively, yielding Since the FFs depend on the Fourier transforms of the con-trol matrix via Eq. (14), we compute Here we defined where the Fourier transform matrix is Note that R(ω) is now a matrix product of the matrices R Σ (ω) and R Φ (ω). The first one is a geometric sum over rotation matrices multiplied by a complex exponential. The second one consists essentially of the Fourier transforms over a single control period. We have thus arrived at an expression for the frequency domain representation of the control matrix in terms of the constant-section control matrix R Ω0,φ0 (t) and its Fourier transform. The matrix R Σ (ω) can be summed explicitly by diagonalizing R Ω0,φ0 ( T 2M ) ⋅ R Ω0,−φ0 ( T 2M ), which being a combination of rotation matrices is another rotation matrix. The diagonal form of the rotation matrix resulting from the product is D Ω0,φ0 ( T 2M ) = diag(1, e iθ , e −iθ ), where θ depends on Ω 0 , φ 0 , T, M and is the rotation angle performed by the combined rotation. The relationship between θ and the control parameters can be found using the following property: for any rotation matrix R with rotation angle θ, its trace is TrR = (1 + cos θ) 2. Additionally, we can find the vector ⃗ u parallel to the rotation axis satisfying R⃗ u = ⃗ u by using the formula , these equations yield expressions for the rotation angle and rotation axis ⃗ u Ω0,φ0 = (1, 0, sin φ 0 tan( T Ω0 4M )), in terms of the control parameters Ω 0 , φ 0 , T, M .
In the basis where R Ω0,φ0 where each diagonal entry is a geometric sum that can be summed explicitly. With finite total time T , frequency values are discretized in steps of δω = 2π T , taking the continuous frequencies to discrete frequencies ω → ω n = nδω = n2π T . The diagonal R Σ (ω) then becomes In general, we aim to cancel this matrix as much as possible over the range of frequencies corresponding to the CNB, which is equivalent to reducing its rank. Imposing the condition that θM = 2π , ∈ Z ensures that this matrix has rank 1 for n = 0, the value corresponding to the lowest frequency noise contribution ω = 0. Here, the matrix R Σ (ω) becomes diag(M, 0, 0), where the non-zero eigenvalue corresponds to the eigenvector parallel to the axis of rotation.
The diagonal elements of the matrix R Σ (ω) will have nonzero values only for those frequencies ω n = nδω with n ∈ {M ( ± θ 2π), M }, which for the first few values of ∈ Z means n = 0, M θ 2π, M (1 − θ 2π). Thus, by choosing the controls parameters M, θ from the noise cutoffs as with n z = ω z δω and n xy = ω xy δω, we ensure that R Σ (ω) (and consequently the FF) will be cancelled for all low frequencies except ω ∈ {0, ω z , ω xy }. Next, we will show how it is possible to cancel the product between R Σ (ω) and R Φ (ω) by properly Ω 0 , φ 0 . The zero frequency case is of particular interest, since in most practical applications, low frequency noise will be dominant. Having already shown that it is possible to choose M, θ such that R Σ (ω) is of rank 1 for ω = 0, we examine R(ω = 0) and impose the condition that it vanishes The goal is to find values of Ω 0 , φ 0 for which the matrix product between R Σ and R Φ is identically zero. From a Linear Algebra perspective, this means setting the values of Ω 0 , φ 0 such that the image of R Φ maps to the kernel of R Σ . Assuming that the control parameters are set as in Eq. (C19), for ω = 0 we have that the kernel of R Σ is the subspace orthogonal to the axis of rotation ⃗ u Ω0,φ0 . In other words, we look for Ω 0 , φ 0 that restrict the image of R Φ (0) to this orthogonal subspace. We can impose this condition by requesting that the inner product between ⃗ u Ω0,φ0 and R Φ (0)⋅v is 0 for all v ∈ R 3 . It is enough to show this for some basis vectors. Choosing the Cartesian basis v 1 = (1, 0, 0), v 2 = (0, 1, 0), v 3 = (0, 0, 1)}, we see that this is automatically satisfied for v 2 , v 3 , i.e., ⟨⃗ u Ω0,φ0 , On the other hand, applying this for v 1 yields a non-trivial condition for Ω 0 , φ 0 in terms of M, T Next, from Eq. (C16) we can see that one possible solution of this equation for Ω 0 , φ 0 in terms of the other parameters is which, combined with the previous equation results in Positive solutions to this equation in terms of Ω0T 4M can be easily found numerically using standard tools for each value of n z , n xy from where it is straightforward to determine Ω 0 . The value of φ 0 can be set by inverting Eq. (C24). Additionally, it can be shown that with this choice of parameters, F xy (ω < ω xy ) = 0 and F z (ω < ω z ) = 0.
To summarize, we use the multi-axis CD and square wave scheme to define a low frequency noise filtering problem. We start with the control Hamiltonian [Eq. (2)] with ⃗ Ω(t) = Ω 0 (cos φ(t), sin φ(t), 0). Here, φ(t) = φ 0 s λ (t) is a square wave of amplitude φ 0 and frequency λ. The problem is defined as finding control parameters Ω 0 , φ 0 , M that filter noise along all three axes up to high frequency noise cutoffs ω z = n z δω, ω xy = n xy δω. We find analytical solutions to this problem by solving the system of equations The third equation can be easily solved numerically with standard tools such as the optimize.fsolve function from the SciPy Python library, using the value of ξ 0 = 2 as a seed for ξ. In order to choose between the family of solutions found by solving this equation, it is possible to use an argument of efficiency of resources and choose the smallest x, i.e., the value that minimize the control amplitude Ω 0 .

b. Alternative choices of initial conditions
To conclude this section, we add that solutions achieving equivalent levels of cancellation were found numerically by using a sinusoidal control rather than a square wave. That is, Ω x (t) = Ω CD and Ω y (t) = Ω SW sin(λt), for given Ω CD , Ω SW values, where λ can be set as above from M = n z + n xy . The disadvantage of this control is that we lack an analytical derivation of the control parameters, since the piece-wise constant assumption of the square wave control used in the previous derivation is no longer valid. Nevertheless, it is straightforward to perform a numerical exploration to find which values of the Ω CD , Ω SW parameters yield the desired FF features. For example, by using the F-GRAFS gradients in the functional basis of {1, sin(λt)} one can find that the FFs obtained are equivalent to those described in the previous section.
Interestingly, the values of these controls that produce the desired FFs coincide with the ones obtained from solving Eqs. (C27). The reason behind this is that these controls only differ from those described in Eqs. (C27) by a π 4rotation about the z-axis. The benefit of choosing these rotated controls is that they are symmetric with respect to the xy-plane, and hence the FFs along xy that they produce are equal F x (ω, T ) = F y (ω, T ). When symmetry of the xy FFs is required, these controls can be conveniently used to achieve this feature.

Alternative IQ-control representation
The widespread use of the IQ-control suggests that, in some cases, it will be convenient to perform the optimization in this space. The polar IQ-control coordinate representation ⃗ Ω(t) = Ω(t)(cos φ(t), sin φ(t), 0), captures the degrees of freedom of the controls in the amplitude Ω(t) and phase φ(t) parameters. The first step towards adapting F-GRAFS to work in the IQ-controls framework is to expand Ω(t), φ(t) in terms of Slepians Let us consider the objective function Eq. (A11), and proceed as in Sec. II B. The calculation is analogous, where Eq. (29) now turns into The derivatives with respect to the IQ-control parameters now become, using the chain rule, where Ω x n = Ω n cos φ n and Ω x n = Ω n sin φ n . The derivatives of U C (t) with respect to Ω x n , Ω y n can be computed following Eq. (32).
Numerical investigations using F-GRAFS were performed, comparing the IQ-control optimization scheme with the Cartesian one described in the main text. Results show that utilizing IQ-control in F-GRAFS yields solutions that achieve equivalent levels of cancellation in the CNB, when the controls are initialized as described in Appendix C. In the main text, we analyze the single-axis noise optimization problem using both single-and multi-axis control F-GRAFS. The single-axis control (see Sec. IV A 3) presents a sharp improvement in optimization performance at a critical bandwidth of W c = 2 × B × δt 2π. In the multi-axis control and multi-axis noise scenarios (see Sec. IV B 3), a sharp improvement in performance can be observed at a smaller control bandwidth, namely 1.5× B ×δt 2π. The single-axis noise configuration serves as a probe to analyze the advantages of multi-axis control, since it can be analyzed effectively with both control schemes. Since the multi-axis initial conditions described in Sec. C 2 reduce to CD when there is no control noise, CD initial conditions were used along the x axis, while the y axis was initialized with zero control amplitude.
In the present section, we study the optimized control powers compared to the initial conditions, and find no distinguishing increase in power at the critical bandwidth. Hence, we propose that the improvement in critical control bandwidth is due to the increase in control capabilities. Additionally, we find that the critical bandwidth is gate-dependent, and argue that CD is not the optimal initial condition for the single-axis noise and multi-axis control configuration.

Optimized Power Analysis
In principle, the sharp improvement of noise cancellation performance at the critical bandwidths could be given by a significant change in control amplitude. This could allow the control waveforms to reach different, possibly better, solutions in the objective function landscape. As we show below, no significant change in power is observed, suggesting that no additional complexity is gained by increasing the control amplitude. Fig. 9 presents the optimized control powers of different noise and control configurations, averaged over the Clifford+T gate set as function of W . The directional control powers along σ x and σ y are defined as P i = ∫ T 0 Ω i (t) 2 dt for i = x, y and the full control power as P = ∑ i=x,y P i . Optimizations are initialized with CD as Ω CD (t) = ω z and the power values shown in the figure are normalized by its power P CD = ω 2 z T . Note that the y-axis controls (green crosses) need to be activated in order to produce general single-qubit rotations, but for W > 2 B δt 2π, the amplitudes along σ x are larger by over an order of magnitude. For W > 2 B δt 2π, the control powers deviate little from the initial condition values (normalized power close to 1). This means that there is no significant change in control power between low and high bandwidth solutions. Since the initial conditions are maintained the same, this suggests that the improvement in performance is due entirely to the increased control capabilities given by the bandwidth. Average power for single-axis noise with high-frequency cutoff ωz, solved using multi-axis control F-GRAFS. Each point represents a different combination of ωz and W , averaged over the Clifford+T gate set, generating high-pass filters. The initial conditions used correspond to CD along the x direction and zero control along y. The figure displays the full optimized control power (blue, circles)) as well as directional powers along x (orange, squares) and y (green, crosses), normalized by the initial condition power corresponding to each noise configuration.

Optimization Dependence on Ideal Gate Fidelity Constraints
Another factor influencing the optimization performance is the tolerance τ G set for the ideal gate fidelity, i.e., F G > 1−τ G . Throughout the work described in the main text, τ G was set to 10 −10 . In this section, we show how in some cases relaxing this requirement leads to improvements in optimal bandwidth conditions. Fig. 10 presents the bandwidth dependence of F-GRAFS performance when optimizing single-axis noise with multiaxis control. The figure illustrates how the critical bandwidth W c changes with the tolerance for the ideal gate fidelity. For all studied tolerances τ G = 10 −10 , 10 −8 , 10 −6 , the means over the Clifford+T gate set (dashed lines) satisfy a W c = 2 × ω z × δt 2π. The medians (solid lines) on the other hand, present a lower critical bandwidth, notably W c = 1.5 × ω z × δt 2π for τ G = 10 −8 , 10 −6 , like in the multi-axis noise with multiaxis control case. This difference between mean and median behavior implies that there exist numerous gates for which a bandwidth of W > 1.5 × ω z × δt 2π is sufficient to achieve the desired noise cancellation.
The fact that this trade-off between gate fidelity and bandwidth constraints is gate-dependent, suggests that the CD initial conditions are not universally optimal for all gates. In the presence of low-bandwidth constraints e.b. given by hardware, more gate-aware initial conditions are necessary in order to perform high degree of noise cancellation. To summarize, we observe that a degree of improvement in optimal bandwidth conditions can be obtained, albeit gate dependent, by relaxing the tolerance constraints on ideal gate fidelity. Dependence on tolerance for ideal gate fidelity τ G of F-GRAFS performance, for the single-axis noise along z case with multi-axis control along x, y. Solid (dashed) lines represent the means (medians) over the Clifford+T gate set. In each case, data shown represent averages over bandwidth windows of size 0.1. Optimized objective function values are normalized by τ G . Note that after Wc these values converge to 1, implying that the noise if filtered effectively and most of the infidelity is given by the gate constraints.

Appendix E: Effects of Noise PSD Specifications on F-GRAFS Optimized Controls
In Sec. III we describe the general procedure for finding a CNB from a given PSD, which involves the choice of a noise fractional power [Eq. (18)]. Intuitively, is the spectral leakage in the NB, meaning that 1 − corresponds to the amount of power that F-GRAFS-optimized controls will filter in the CNB. Consequently, there is strong interest in choosing ≪ 1 as small as possible. In this section, we address the effect of on the optimization performance. More specifically, we investigate the effect that different choices of have on the operational fidelity given in Eq. (11). We show analytically and numerically that the loss in fidelity is to first order proportional to .
In Fig. 11 we show the values of infidelity I(T ) = 1−F(T ) (dashed lines) for a qubit with single axis noise along z and control along x. The average fidelity F(T ) is obtained from simulation, averaging over 1000 different noise realizations as well as two different gates: the identity and X gates. The qubit is subject to F-GRAFS optimized controls and dephasing noise characterized by a Lorentzian spectrum with standard deviation σ and correlation time τ ∼ 1 γ. The error bars represent the standard deviations of these different processes.
Throughout this analysis, the FF is kept constant with ω H = 0.01 × 2π δt and tolerance of 10 −15 . The fractional power was varied in the range [10 −3 , 10 −1 ], which modified the degree to which the noise affects the qubit. In order to change the fractional power, the noise was modified by adapting γ through the relationship γ = ω H tan( π 2) (see Sec. V for further details).
The solid lines in Fig. 11 represent an analytical estimation of the values of the fidelity. This was obtained through approximating the FF as a delta function centered around ω H , namely F (ω) ≈ T 2 δ(ω − ω H ). Using the explicit formula for the Lorenzian PSD, we compute the overlap features in the FF that achieve greater cancellation over those frequencies where the noise is stronger. On the other hand, the F-GRAFS solution aims to get average cancellation over the entire band and is considerably more flat in the CNB. Additionally, both optimizations achieve equivalent objective function values, Γ PSD (T ) ∼ P × Γ F−GRAFS (T, S) ∼ 10 −10 , where P is the total power of the noise S(ω). It took the F-GRAFS method about 25 steps to achieve this level of cancellation, while the PSD-F-GRAFS method only took 6 iterations. This highlights that although finer resolution of the PSD can improve the computational cost, it is not necessary and F-GRAFS can find equally good solution for all the cases we studied. This is further substantiated in Fig. 12(a), where we can see that the control functions obtained through both methods are very similar.

Target Filter Functions
Alternatively, one could ask whether FFs can be specifically shaped to a given targeted objective FF F target (ω). For this application, we define the objective function as the quadratic difference between the FF and the targeted function, As in the previous section, the computation of the gradient proceeds in the same way outlined in Sec. II B, where the derivative of the FF is obtained by In Fig. 12 panels (c) and (d) we present an example of this optimization for a gaussian target function F target (ω) = A exp(−(ω − ω H ) 2 2σ 2 ). The F-GRAFS initial conditions were set such that the FF is centered around the same value ω H where the target Gaussian is located. In panel (d), the black curve represents the target Gaussian function. We can see that F-GRAFS is capable of reshaping the initial constant drive condition into a Gaussian function with good agreement. Note that the target functions need to be normalized such that the total area of the FF is the total time T , which means that A = T σ √ 2π. The noise PSD, shown in black, is defined as 1 f -type noise with hard cutoff at ω H = 0.08 × 2π δt, where S(ω) = A ω 2 , where the constant A is chosen to match the amplitude of the filter function. The PSD S(ω) presents an additional cutoff at low frequencies. Both the F-GRAFS (blue) and the PSD-F-GRAFS (orange) achieve greater cancellation than the initial condition with CD (green). The PSD-F-GRAFS FF presents more features in frequency, due to the access to higher PSD resolution. (Bottom row) Optimization results of controls (a) and FFs (b) obtained with Target-F-GRAFS (red), with target function Ftarget(ω) = A exp(−(ω −ω0) 2 2σ 2 ). In (b), it is clear that the target-F-GRAFS FF (red) approximates well the target function (black). In green we again show the initial condition resulting from CD control.