Number Partitioning with Grover’s Algorithm in Central Spin Systems

,


I. INTRODUCTION
Many quantum algorithms that offer a provable speedup over their best classical counterparts rely on the ability to query an oracle: a black box that knows the answer to the problem that the quantum computer is to solve.A paradigmatic example is Grover's search algorithm [1,2], which theoretically speeds up the time to search through an unstructured database of N entries, requiring only O( √ N ) queries of the oracle rather than the classical O(N ) queries.By extension, Grover's algorithm can in principle speed up the search for solutions to a wide range of decision problems, including NP-complete problems [3] such as boolean satisfiability, the clique problem, and the number partitioning problem [4,5], with applications from cryptography to finance [6][7][8][9].
Formally, any instance of a search or decision problem is represented by an oracle function f (x) that acts on a string x of n bits and returns either 0 (failure) or 1 (success).The search aims to find a value X such that f (X) = 1, while the decision problem asks whether such an X exists at all.In experimental demonstrations to date of Grover's search [10][11][12][13][14][15][16][17][18][19][20][21][22], implementing the oracle-a unitary operation controlled by f (x)-requires knowing the solution(s) X.To obtain a true benefit from a quantum algorithm involving an oracle, one requires a physical system that directly encodes the function f in a manner that is agnostic to the solution [23].
In this paper, we propose a genuine application of Grover's algorithm to solving the NP-complete number partitioning problem: Given n objects with integer weights, does there exist a bipartition that balances a * These authors contributed equally to this work.The weights wi are encoded by couplings of system spins (red) to an ancilla (blue), which can be either (i) a central spin (e.g., Rydberg atom); or (ii) a bosonic mode (e.g., cavity).
scale?Our approach can be implemented in physical systems that take the form of either a central spin or central boson model, featuring n qubits interacting with an ancilla spin or photon that plays the role of the oracle.Crucially, the decision problem is encoded in the couplings of the qubits to the ancilla, allowing the oracle to be implemented without a priori knowledge of the solution.Numerical simulations of the quantum algorithm illustrate physical manifestations of a known phase tran-sition in the computational complexity of number partitioning, including an exponential scaling of the spectral resolution required to solve hard problem instances.A recursive variant of our algorithm avoids this exponential resource requirement, providing improved scalability.By analyzing proposed implementations with Rydberg atoms and in cavity-QED systems, we show that a speedup is attainable in near-term experiments.

II. ALGORITHM AND IMPLEMENTATION
We begin with a brief review of Grover's algorithm [Fig.1(a)].The algorithm starts by initializing a collection of n = log 2 N qubits in an equal superposition of all possible standard basis states labeled by n-bit numbers x, with c x,0 = 1/ √ N .The objective is to amplify the amplitude c X of the solution state(s) |X .To this end, the oracle U first marks the solution(s) by applying a π phase shift (c X → e iπ c X ) for all X with f (X) = 1.The marked states are then amplified by inversion about the average: c x → c − (c x − c) for all x, where c = x c x /N .This inversion operation V is accomplished by combining single-qubit Hadamard gates with an n-qubit controlled phase gate that is similar to the oracle but less technically demanding (see App. B), or can alternatively be replaced by single-qubit rotations only [24].Thus, we focus on the challenge of realizing the oracle.
We will show a natural physical incarnation of the oracle for a class of decision problems known as subset sum problems [25], focusing on the special case of number partitioning.We specify each problem instance by a list of n weights w i ∈ (0, 1] of finite bit depth k, and search for a partition into two sublists of equal total weight.To encode the partition problem using n qubits representing the objects with weights w i , we let each qubit state indicate which subset (|0 or |1 ) an object is in [Fig.1(b)], so that the weighted collective spin represents the imbalance between the subsets.Implementing the oracle then requires applying a π phase shift to any n-qubit basis state |x satisfying S z |x = 0.The quantum oracle thus requires implementing a collective phase gate U = e iπf (x) = e iπδ (Sz) , where δ(•) denotes the Kronecker delta function.To design a physical implementation of this gate, it is helpful to define a generalized oracle U γ = e iΦγ (Sz) in terms of an S z -dependent phase shift Φ γ (S z ) = 2 arctan (2S z /γ) + π, which steps from zero to 2π as a function of S z and provides a π phase shift at S z = 0 [Fig.1(c)].The ideal oracle is obtained in the limiting case U ≡ U γ→0 of an infinitely steep phase step.The collective phase gate U γ can be enabled by coupling the qubits to an ancilla, which may take the form of an auxiliary qubit or a bosonic mode.We consider either a central spin model featuring an ancilla qubit represented by a spin-1/2 operator I z , or a central boson model featuring a cavity mode with annihilation operator c.In both cases, the ancilla couples to n system spins in the starlike graph of Fig. 1(d), and hence to the weighted collective spin S z .The maximum coupling between a system spin and the ancilla is parameterized by J max .
For concreteness, we describe representative implementations of the central boson and central spin models with cold atoms [Fig.1(d)].The system spins are encoded in two internal states |0 , |1 and coupled to either a cavity mode [26][27][28][29][30][31][32][33] or an auxiliary atom that can be excited to a Rydberg state [34][35][36][37][38][39][40][41][42][43][44][45][46].Each coupling w i J max represents the energy shift of the |0 → |1 transition in atom i when either a photon enters the cavity or the auxiliary atom is excited.In the cavity implementation, the photon imparts an ac Stark shift [26][27][28][29][30][31].In the Rydberg implementation, the excited ancilla suppresses an ac Stark shift induced by classical control fields that couple the system atoms' state |1 to a Rydberg state.In both cases, the weights w i can be programmed via the atomic positions or control fields.The net effect of the couplings w i J max on the ancilla is a frequency shift J max S z that depends on the weighted collective spin S z .
The S z -dependent resonant frequency of the ancilla is crucial to enabling the oracle.In the central boson model, the oracle relies on the phase response of a driven harmonic oscillator.For a one-sided cavity of linewidth κ, the output field is phase shifted by π for a resonant drive compared with the off-resonant case.Having the drive field consist of a single photon that is resonant if and only if S z = 0 yields precisely the oracle operation U γ , with a phase step of dimensionless width γ = κ/J max , where we set = 1.In the central spin model, the oracle U γ is implemented by attempting to drive a 2π rotation of the ancilla with a field that is resonant if the weighted spin S z is zero.For a suitably shaped drive pulse, the ancilla atom ends up in its initial state irrespective of S z [47], and the entire system acquires a π geometric phase shift only when S z = 0.The width γ = κ/J max of the phase step is now set by the bandwidth κ = 2π/τ of the pulse with temporal width τ .
To examine the performance of the generalized oracle, we first introduce a convenient visualization of Grover's algorithm [48].We define the solution space A = {|X : S z |X = 0} as the set of states that solve the partition problem and let denote the equal superposition of all solutions (assuming their existence) where N A is the number of solutions.We additionally define an orthogonal state where |ψ 0 is the initial state of Eq. 1.The states |A and |B span an SU(2) subspace that can be visualized on a Bloch sphere with |A and |B as poles.
Grover's algorithm ideally takes place entirely within this subspace of the full 2 n -dimensional Hilbert space, iteratively rotating the initial state |ψ 0 towards the solution state |A .Each iteration comprises the oracle U and inversion about the average V .The net effect of these two operations is a rotation about the Ŷ axis [Fig.2(a)].For N A /N 1, a nearunity success probability is achieved after an optimal number of iterations The generalized oracle with a nonzero step width introduces an error that, to lowest order, is correctable by spin echo.To visualize how, we consider a simplified scenario where there exist only two possible values of the phase Φ γ ∈ { , π}.For nonzero , the combination of the generalized oracle and inversion about the average induces the state to rotate about a tilted axis [Fig. 2

(b)].
To mitigate accumulation of error, we alternate between applying the oracle U γ and its Hermitian conjugate U † γ .A pair of two Grover iterations then takes the form where is implemented by a spinecho sequence involving two global π rotations R π (x) about the individual qubits' x axes.The result is the trajectory shown in Fig. 2(c), which achieves similar performance to the ideal oracle in Fig. 2(a).
Even with spin echo, the step width will ultimately limit the resolution of the generalized oracle: selectively amplifying only spin configurations with S z = 0 requires a narrow step.Further, producing a narrow step requires a long coherence time, so that dissipation will place physical limits on the performance of the algorithm.We elaborate on both of these considerations in Secs.III and IV.First, however, we examine the application of Grover's algorithm to number partitioning using a phase step narrow enough to resolve even the least significant bit of the weights.

III. SPEEDUP IN NUMBER PARTITIONING
To analyze the performance for number partitioning, we generate sets of n random k-bit weights and postselect for instances where at least one perfect partition exists.For each such instance, we calculate the success probability as a function of the number T of calls to the oracle, applied with spin echo (Eq.10). Figure 3(a) shows examples of P T for n = 8 spins, bit depths k = 4, 8, 12, and a step width γ = 2 −k just narrow enough to resolve changes in the least significant bit of S z .As expected from the Bloch-sphere picture, the success probability oscillates as a function of T .The maximum probability and the time to reach it combine to determine the effectiveness of the algorithm.
As a single figure of merit, we calculate the total number of calls to the oracle required to reach a specified (near-unity) success probability P. For a search procedure with fixed success probability P per trial, the number of trials M needed to reach a probability P = 1 − ε of finding a solution is Thus, reaching the target error ε with Grover's algorithm requires querying the oracle a total of T total = M (P T , ε)T times.To minimize this quantity, we first calculate its median value as a function of T over many instances of weights at a given (n, k, γ).We then define T opt as the number of Grover iterations that minimizes the median total number of queries T total .Note that T opt is independent of the target error ε. is shown in Fig. 3(b.ii) for a cut at n = k (black squares), where the number of perfect partitions is typically of order one [4].We additionally plot T opt for instances of the weights postselected according to the number of solutions N A = 2, 4, 6 (red triangles, green circles, and yellow stars).In each case, the optimal number of iterations approaches the prediction of Eq. 9 (dashed lines) at large N = 2 n , scaling as T opt ∝ √ N .Quantifying the resulting speedup requires additionally examining P opt , the success probability after T opt iterations [Fig.3(c)].
The dependence of success probability P opt on (n, k) reflects a known phase transition in the computational complexity of the number partitioning problem [4,49].For small bit depth k n (the "easy" phase), there typically exist many perfect partitions.For large bit depth k n (the "hard" phase), perfect partitions are rare and thus-even when postselecting for their existence-the probability of finding them by random guessing is exponentially small in n.By contrast, in our quantum search [Fig.3(c)], the success probability P opt is everywhere of order unity and highest in the "hard" phase, since Grover's algorithm is most effective when solutions are few.The phase boundary lies at a critical bit depth [4] shown by the black curve in Fig. 3(c), where the average number of perfect partitions is N A ∼ 6/(πn)2 n−k = 1 [50].
We quantify the advantage of the algorithm by calculating the limited quantum speedup Q, defined as in Ref. [51] by comparing the quantum search with an algorithmically similar classical search.The most analogous classical algorithm is a memoryless search, which at each trial samples (with replacement) a random partition with success probability P 0 = N A /N .The number of memoryless search trials needed to reach a target success probability P also follows from Eq. 12.For each problem instance, we define speedup Q as the ratio of memoryless trials to total Grover iterations: This speedup is independent of the target error ε, thanks to the algorithmic similarity of the two memoryless search algorithms, as further discussed in App. C. A caveat is that physical limitations might preclude successfully implementing the algorithm in cases requiring a narrow step width γ.We have so far assumed a step width γ = 2 −k , motivated by the intuition that γ sets a capture range of S z values amplified by Grover's algorithm.To test this intuition, we plot the normalized probability distribution P (S z ) ≡ P (S z )/P (S z = 0) after T opt (γ) Grover iterations as a function of step width γ [Fig.4(a.i)], for n = k = 6 without postselecting on the existence of perfect partitions.Consistent with our expectation, the width of the distribution is approximately set by the step width γ.An analytic derivation of this capture range is given in App.E.
To capture only true solutions S z = 0, the step width γ should be smaller than the smallest nonzero |S z | value.In the easy regime k n, a step width γ 2 −k is required to distinguish S z = 0 from S z = ±2 −k .However, with increasing k, the typical size of the smallest residue approaches a finite value |S z | ≈ 2 −kc [4].Thus, the critical bit depth k c (n) in Eq. 13 represents the resolution required to discriminate the smallest typical residue |S z | in the large-k limit.For arbitrary (n, k), we can choose the oracle to have resolution coarser than we have so far assumed in the hard regime.We verify Eq. 15 by plotting the resolution − log 2 γ required to reach a fixed success probability P opt , averaging over all pairs (n, k) with 3 ≤ n, k ≤ 16, in Fig. 4(a.ii).
For each of three different values of P opt = 0.4, 0.6, 0.8, the required step width γ is within a constant factor of γ c .We plot the quantum speedup for this less stringent choice of step width γ c in Fig. 4(b.i).The speedup exhibits a maximum along the phase boundary k = k c (n) iii. ii.
(b) i. (solid black curve).In Fig. 4(b.ii),we examine the scaling of the speedup along an approximation to this curve chosen to ensure integer values of (n, k), namely, the n = k cut (dotted blue line).We plot the speedup [Q] q versus N for different quantiles q (blue circles) and find good agreement with an asymptotic scaling Q ∝ √ N (solid lines) for all quantiles.Thus, the generalized oracle with the critical step width γ c suffices to achieve an O( √ N ) speedup, the same scaling that is achieved by the ideal oracle and proven to be optimal for an unstructured search [52][53][54].
The phase transition in computational complexity manifests in a sharp peak in the speedup at the phase boundary k = k c (n).We observe this peak in Fig. 4(b.iii)along a cut of fixed problem size nk, i.e., fixing the total number of bits encoding the set of n weights.The peak in the speedup reflects the known result that the hardest problem instances are not deep in the hard regime, but rather near the phase transition [4,55].In particular, the hardest problems are those with the largest ratio N/N A of the size of the search space to the number of solutions, after postselecting for the existence for solutions.This ratio reaches a maximum near the phase boundary, explaining the peak in Q ∝ N/N A .
Even in the experimentally relevant case where the weights are not restricted to a finite bit depth, the resolution of the oracle sets an effective bit depth k eff = − log 2 γ that can reveal the complexity phase transition.For real-numbered weights w i ∈ (0, 1], we consider the optimization problem of minimizing |S z |, defining the success probability P opt as that of finding the system in a configuration of minimal |S z | after an optimal number of Grover iterations.We plot the median speedup [Q(n, γ)] 0.5 in Fig. 4(b.iv).As a function of k eff at fixed n, the speedup first rises to a maximum at k eff ≈ k c before declining precipitously for k eff > k c due to the narrowness of the capture range, providing a striking signature of the complexity phase transition.

IV. EFFECTS OF DISSIPATION
A key challenge for experimental implementations is that producing a narrow phase step requires a long coherence time.Specifically, at fixed interaction strength J max , the step width γ determines the physical time κ −1 ∼ 1/(γJ max ) to implement the oracle operation U γ .Even a single error occurring during this time thwarts the amplification process.For concreteness, we consider an error model in which the excited ancilla decays-or, equivalently, the ancilla photon is lost-at rate Γ a .In terms of the interaction-to-decay ratio ρ ≡ J max /Γ a , the error rate per query of the oracle is then approximately Γ a /κ = 1/(ργ).Thus, on average T max ∼ ργ Grover iterations can be implemented before incurring an error.For ργ c T opt , the algorithm must be run at an increased step width γ > γ c that reduces the speedup.
Figure 5(a) shows the speedup calculated at finite interaction-to-decay ratio ρ = 10 3 .We model the decay by modifying the frequency shift of the ancilla's excited state (Sec.II) with an imaginary component, J max S z + iΓ a /2, as detailed in App.G.We choose the step width γ for each (n, k) to maximize the speedup, accounting for a reduction in success probability due to the chance of ancilla decay.While the speedup no longer achieves O( √ N ) scaling, we preserve an advantage Q ≈ 10 compared with the classical search.The dependence of the speedup on interaction-to-decay ratio ρ is shown in Fig. 5(b) for n = k at different system sizes n.The speedup scales as Q ∼ ρ 1/3 , consistent with an analytic model derived in App.G, before saturating to the value expected for the ideal Grover's algorithm.
An interaction-to-decay ratio ρ 10 3 is experimentally accessible in an implementation of the central spin model using Rydberg atoms, as detailed in App.H 1. In this implementation, the dominant dissipative process is decay of the ancilla from the Rydberg state, whereas decay of the system spins is suppressed by coupling to their Rydberg states off-resonantly [38][39][40]42].In terms of the maximum attainable Rabi frequency Ω max of this coupling, the interaction-to-decay ratio is limited to ρ < Ω max /(2 √ nΓ), which permits values of order ρ ∼ 10 3 for realistic laser powers and high-lying Rydberg states.
At lower interaction-to-decay ratios, the optimum speedup is obtained by performing only a single Grover iteration.In the absence of dissipation, this single-cycle speedup Q 1 is identical to the amplification factor P 1 /P 0 , assuming P 0,1 1. Figure 5(c) shows Q 1 = P 1 /P 0 as a function of step width γ for n = k = 12 with no dissipation (red circles), corroborating an analytical model derived in App.E in the large-N limit (dashed curve).The model shows that the gain is set by γ/ √ n, which parameterizes the ratio of the step width to the width of the initial S z distribution, and saturates at a maximum value Ancilla decay reduces the amplification Q 1 below this ideal curve, becoming significant for interaction-to-decay ratios ρ 1/γ.The optimum speedups in Fig. 5(b) are obtained from a single amplification cycle for interaction-to-decay ratios ρ 10 2 .
A single amplification cycle could be performed in near-term realizations of the central boson model with atoms in a cavity (App.H 2), by driving with a weak coherent field and heralding on the detection of a photon.The coherence of the atom-cavity coupling is quantified by the cooperativity η = 4g 2 /κΓ e , where g is the vacuum Rabi frequency and (κ, Γ e ) are the linewidths of the cavity and an atomic excited state to which it couples.The resulting interaction-to-decay ratio scales as ρ ∝ ηγ/n, reflecting the fact that decreasing the dimensionless step width γ = κ/J max comes at the cost of increasing the photon loss probability by atomic scattering.Achieving amplification requires reaching a step width γ < n/12 narrower than the initial S z distribution while keeping ργ > 1 to avoid photon absorption, and hence requires strong coupling η 1.The full dependence of amplification Q 1 on step width γ and cooperativity η is shown by the solid curves in Fig. 5(c).Notably, the amplification at an optimal step width [Fig.5(c) inset] is independent of the number of spins n, depending only on the cooperativity η.A stateof-the-art optical cavity with demonstrated cooperativity η ∼ 200 [56] thus allows for amplifying solutions to the partition problem at scalable system size.Stronger amplification could be attained by coupling Rydberg atoms or superatoms [57,58] to a high-cooperativity millimeterwave cavity [29,59,60].For the parameters of Ref. [29], the cooperativity η = 4 × 10 8 is no longer the limiting factor.Instead, finite lifetime Γ −1 of the Rydberg states places a limit ρ < g/(n 3/2 Γ) = 5 × 10 3 /n 3/2 on the interaction-to-decay ratio, which permits near-maximal Q 1 for up to n ∼ 30 atoms.Rydberg-based implementations might be further enhanced by inhibition of spontaneous emission [61,62].

V. SCALABLE ALGORITHM
The requirement of an exponentially fine resolution of the oracle poses challenges for scalability in the simple application of Grover's algorithm presented so far.Specifically, we showed in Sec.III that the required step width γ ∼ 2 −k becomes exponentially small with increasing system size n ∼ k for the hardest problem instances.As a result, if we scale the system-ancilla couplings such that the energy grows extensively with system size by fixing the maximum coupling J max , then the time required for each query of the oracle grows as 2 k ∼ 2 n .Alternatively, to keep the query time fixed, the energy must be chosen to grow exponentially with increasing system size.
The exponential resource requirement can be avoided by a more sophisticated version of our algorithm that operates at a fixed resolution γ ∼ 2 −m of the oracle for arbitrary (n, k).This scalable algorithm begins by identifying candidate solutions of the number partitioning problem by searching for spin configurations in which the m least significant bits of S z are zero.To this end, we first perform Grover amplification with each coupling J i set to a value given by the m least significant bits of the weights.We thereby amplify only spin configurations for which 2 k S z is a multiple of 2 m , thus producing a superposition state with a sparser distribution of S z values than the initial state |ψ 0 (Fig. 6).We subsequently consider increasing numbers m of bits in successive layers = 1, 2, 3, . . . of the algorithm, setting couplings while keeping the resolution of the oracle fixed.This scalable algorithm retains the benefit of an efficient encoding in a central spin system, but does place additional technical demands compared with our standard algorithm.First, the system-ancilla couplings must be changed between layers of the algorithm (App.F 1), a capability that is naturally present in our proposed implementation schemes.A second new ingredient is a modular oracle that can detect the imbalance 2 k S z modulo a specified power of 2 (App.F 2).This modular oracle can readily be implemented by applying a multifrequency drive to the ancilla.Finally, successive layers of the algorithm require increasingly complex operators V to in-Amplitude Amplitude Amplitude Layer Layer Final layer FIG. 6. Sketch of the scalable algorithm, showing amplitudes of basis states versus Sz.Each layer of the algorithm consists of T amplification cycles, each comprising the modular oracle U and recursive inversion operator V .The modular oracle acts on states spaced in energy by Jmax with a spectral resolution γJmax illustrated by the shaded blue curves.The operator V inverts the amplitudes of the states amplified by the previous layer of the algorithm about their average (dashed purple line).Implementing V requires recursion to the lower layers of the algorithm.In the final layer = k/m, the standard nonmodular oracle is used.vert about the average amplitude of previously amplified states.In fact, as we explain in App.F 3, the inversion step in layer involves repeating the entire algorithm up through layer − 1.For this reason, we call our scalable algorithm the recursive algorithm.
We describe and analyze the recursive algorithm in detail in App.F, showing that it solves the number partitioning problem in O(2 n/2+cn/m ) queries, where c = log 2 (π/2).Thus, in the limit of a high but fixed resolution of m 1 bits, we recover the ideal Grover speedup.Importantly, we now attain this speedup not only in query complexity but also in the physical time to implement the algorithm in a scalable manner, in the sense that the total interaction energy required to encode the problem grows linearly with the problem size.
A comparison of the recursive algorithm with the standard algorithm of Secs.II-IV is shown in Fig. 7.We simulate both algorithms with the same 1000 instances of weights to examine the scaling of their physical runtimes, which is proportional to T total /γ for a fixed maximum system-ancilla coupling strength J max [Fig.7(a)].The physical runtime of the standard algorithm with γ = γ c scales as O(2 1.5n ) due to the exponential narrowing of the step width γ ∼ 2 −k with system size n = k, whereas the scaling of the recursive algorithm for m = 6 and γ = 2 −m−1 is consistent with the theoretical prediction O(2 0.5n+0.65n/m ) derived in App.F. Thus, the recursive algorithm exhibits a scalable quantum speedup.The performance of the recursive algorithm in realistic implementations with finite interaction-to-decay ratio shows an advantage over the standard algorithm.In Fig. 7(b), we plot the speedup Q versus n = k for different interaction-to-decay ratios, with the step width γ chosen to minimize the total number of Grover queries T total .In the recursive algorithm, the number of amplification cycles per layer of the algorithm (App.F 3) is additionally optimized to minimize T total .Whereas the speedup of the standard algorithm plateaus with increasing system size because dissipation limits the resolution of the oracle, the recursive algorithm achieves a higher speedup because it is designed to operate at fixed resolution of the oracle.

VI. DISCUSSION AND OUTLOOK
In this paper, we have described practical implementations of Grover's algorithm for the number partitioning problem, relying on a natural encoding in spin systems with a starlike coupling graph.The problem offers an ideal setting for examining the physical manifestations of computational complexity, thanks to a well-understood phase diagram including easy and NP-hard regimes.Numerical simulations of our quantum algorithm show clear signatures of the complexity phase transition, yet even in the hard phase we are able to find an advantage over an analogous classical search.
Specifically, we compared our quantum algorithm to a probabilistic classical search, with query complexity O(2 n ) equivalent to that of a brute-force search (see App. C).While there exist classical algorithms that match [63,64] and surpass [65,66] our algorithm's query complexity, they do so at the expense of exponential memory requirements [67,68].To the best of our knowledge, the leading classical algorithm of polynomial space complexity is that by Esser and May, which achieves a time complexity of O 2 0.645n [69].Our proposed implementation achieves an improved O 2 0.5n runtime while remaining hardware efficient, underscoring the significance of attaining a Grover speedup.Further, the possibility of using our algorithm as a subroutine in more sophisticated classical algorithms [64,66,70] opens several directions for future work [5,[71][72][73][74].
In quantifying speedup, we have defined the runtime of the quantum algorithm in terms of the query complexity, i.e., the number of queries to the oracle.An additional consideration is the physical time required to implement a single query.In our standard algorithm, for a fixed maximum pairwise interaction strength J max , the spectral resolution γJ max required of the oracle results in a query time that scales as γ −1 ∼ 2 kc ≈ 2 n along the phase boundary k = k c .This exponential scaling highlights the importance of considering not only query complexity, but also the time required to implement the oracle given physical limitations of the hardware (the finite interaction energy).At finite interaction-to-decay ratio, this scaling limits the speedup of the standard algorithm in our simulations, whereas the recursive algorithm achieves higher performance limited only by the increase in optimal number of queries with system size.
In near-term experiments, despite fragility to dissipation, even the standard algorithm could produce a speedup in few-qubit systems in the hard regime, and in scalable systems in the easy phase.In the hard regime and along the phase boundary, achieving the ideal performance at scalable system size is precluded by the exponential decrease of the energy gap with n.If instead we vary n at fixed bit depth k, the time to implement each query saturates to a fixed value set by γ −1 ∼ 2 k as we cross the transition into the easy regime, allowing the ideal performance to be maintained at fixed interactionto-decay ratio.Irrespective of k, if we fix the duration of each query, the standard algorithm samples from a probability distribution P (S z ) of fixed effective temperature set by J max γ, which may enable extensions to Boltzmann sampling [75].
Our hardware-efficient approach to implementing the Grover oracle enables near-term realizations in cold-atom systems, as well as comparisons with alternative proposed methods for solving NP-hard problems in similar platforms [76,77].Our approach also generalizes to other platforms, including trapped ions [20] or superconducting qubits coupled to phononic [78,79] or microwave [80] resonators.The algorithms presented here might be further optimized by a variational approach that adapts the resolution of the oracle and the number of queries over multiple trials [81].Grover amplification could also be applied to engineer entangled states, e.g., to produce squeezed or Dicke states by amplifying a particular S z value.For more versatile quantum control, arbitrary superpositions of Dicke states might be amplified by shaping the drive pulse [27,28,82].The number partitioning problem is a special case of the more general class of decision problems known as subset sum problems.These problems answer the question: given a set of n objects with positive weights w i ∈ (0, 1] of finite bit depth k, does there exist a subset X ⊂ {w i } of total weight X w j = W * for a specified value W * ?The entire class of problems are naturally implemented with the two experimental realizations that we present in detail in Apps.H 1-H 2.
For general subset sum problems, implementing the oracle requires applying a π phase shift to the system of qubits if and only if the total weight of the qubits in state |1 is a specified target weight W * , i.e., if the system is in an eigenstate of with eigenvalue W * .Experimentally, the target weight is set by the frequency of a field that drives the ancilla.
For the special case of the partition problem, the target weight is set to W * = i w i /2, and the condition in Eq.A1 then reduces to the condition S z |x = 0 of the main paper.More generally, the oracle phase shift in terms of W 1 is given by Appendix B: Inversion about the average The operator V that performs inversion about the average, also known as the diffusion operator, requires a multiqubit controlled phase gate similar to the Grover oracle.In particular, the operator is a diagonal matrix in the basis of spin configurations |x , with matrix elements R 00 = 1 and R xx = −1 for x = 0. Thus, R applies a phase shift of π to all basis states except for |0 .The multiqubit controlled phase gate R can be implemented by adapting the protocol used for the generalized oracle.Specifically, the phase gate R is equivalent, up to a global phase, to the Grover oracle for a subset sum problem (Eq.A1) with target weight zero.A generalized version R γ can be implemented by setting all of the weights to the maximum value w i = 1, and simultaneously choosing the detuning to set the target weight W * = 0. We expect the resulting generalized diffusion operator H n R γ H n to produce the desired amplification for a relatively broad diffusion step width, requiring only γ < 1. Notably, the step width permissible for diffusion is much broader than that required for the oracle, allowing inversion about the average to occur with negligible added dissipation even at finite interaction-to-decay ratio ρ.
We verify that added dissipation due to the generalized diffusion operator has negligible effect by examining the quantum speedup.In Fig. 8, we compare the achievable quantum speedup between the perfect diffusion operator and the generalized diffusion operator, in the latter case including effects of decay during diffusion as well as the nonzero step width.The speedup is reduced by at most 23% over a wide range of ρ values, thanks to the less stringent requirement on the step width during the generalized diffusion transform compared with the oracle.Thus, for simplicity, we directly apply the ideal diffusion operator V in the calculations presented in Figs.2-5 of the main paper.
It is also possible to replace the diffusion operator with only single-qubit rotations, e.g., a global transverse field as in Ref. [24].While a detailed analysis of this alternative is beyond the scope of the present work, we have simulated the application of a transverse field for a time t = π/n in lieu of inversion about the average, finding success probabilities approximately half as large as those achieved with the multiqubit diffusion operator.The transverse field thus enables a technically convenient scheme in which the only multiqubit gate is the oracle.

Appendix C: Classical search algorithms
In the main text, we evaluate our implementations of Grover's algorithm by comparing them to the most analogous classical algorithm, memoryless search.We begin this section by quantifying that relationship, discussing the expected and worst-case performance for each method.We then consider increasingly more complex classical number partitioning algorithms and identify their benefits and drawbacks.This allows us to consider how our algorithm compares with the best classical algorithms, and indicates prospects for more sophisticated versions of our quantum algorithm.Both Grover's algorithm and the classical memoryless search have a probability of success p that is the same for every trial.For such search algorithms, the number of trials M to obtain a solution is a random variable with expected value E[M ] = 1/p.For Grover's algorithm the success probability is P (T opt ) = P opt , as defined in the main text, so the expected number of Grover readout measurements M G is For memoryless search with N = 2 n possible partitions and N A exact solutions, p = N A /N .The expected number of memoryless trials M M is then Incidentally, when T = 0, Grover's algorithm reduces to measuring an equal superposition of configuration states.The success probability is then P 0 = N A /N , equivalent to memoryless search.We also consider the worst-case performance of both algorithms.This is equivalent to the number of queries required to reach P = 1 − ε probability of having found a solution, in the limit ε → 0. For both algorithms, Speedup scaling over all (n, k), shown by plotting median total Grover iterations versus memoryless search trials required to reach P = 0.99 probability of success.Each point represents a particular (n, k), where n and k each take values over the range [3,16].Black and red lines denote linear and square root dependences, respectively.
even after an arbitrarily large number of queries, there remains an exponentially small probability that a perfect partition exists but has not been found.We quantify this worst-case performance when N N A by allowing ε to remain finite, so that the P quantile of M G T opt can be written as and the P quantile of M M is Figure 9 shows the relative median scaling of M M and M G T opt , each calculated according to Eq. 12, with Grover's algorithm showing the expected √ N speedup.While memoryless search follows the same probability distribution as Grover readout measurements, it is not as efficient as linear search through an unsorted list.The expected number of trials M L for linear search is For N, N A 1, the expected trial scaling of both memoryless and linear search algorithms is O(N/N A ).The largest difference occurs with postselection in the hard regime, where E[N A ] ≈ 2 and memoryless search is expected to take 1.5 times as many trials as linear search.
For the linear search, the worst-case performance is N − N A .More generally, we can take ε arbitrarily close to 0 such that [M L ] P converges to N − N A , while retaining the scaling of [M M ] P in Eq.C4.Thus, while both algorithms are both worst-case linear in N , worstcase memoryless search requires O[ln (1/ε)/N A ] times as many queries as worst-case linear search in the hard regime.Further, because both algorithms are unstructured, they do not need to precalculate a potentially exponential number of values before performing queries.Thus their memory scaling is O(n), set by the number of values to be partitioned.
Improving upon memoryless and linear search requires us to consider a variety of structured search algorithms, which can be grouped based on the difficulty of the problem instance they aim to solve.An instance's difficulty is related to its density, defined for a set of integer weights a = (a 1 , ..., a n ) as the ratio d = n/ log 2 (max i a i ) of the number of weights to the number of bits needed to represent the largest weight [83,84].Thus, d < 1 corresponds to the "hard phase" and d > 1 to the "easy phase" [4].In the easy phase there are typically many perfect partitions and a problem instance can generally be solved efficiently by various classical methods, with the best based on the Karmarkar-Karp differencing algorithm [50,70].In the hard phase, classical algorithms have been demonstrated to solve "almost all" problems of density d < d c < 1 in polynomial time, with subsequently published algorithms pushing d c closer to 1 [83][84][85][86].In such "low-density attack" algorithms, the number partitioning problem is reduced to the shortest vector problem, for which there exist algorithms that produce good approximations in polynomial time.
Indeed, the hardest instances of the number partitioning problem are not deep into the hard phase, but near the phase transition at a density close to 1 [4,55].For such instances, classical algorithms with the best known time complexity are subject to a space-time trade-off; improvements in runtime come at the cost of exponential memory requirements [67,68].However, Esser and May devised a classical algorithm that achieves a time complexity of O 2 0.645n while maintaining polynomial space complexity [69].This algorithm offers a compelling comparison to our proposed Grover implementations, as each algorithm is hardware efficient in its use of memory or qubits.With our Grover implementation requiring O 2 0.5n queries, a direct comparison would yield a speedup O 2 0.145n .
Finally, an interesting open question is whether one can design hardware-efficient quantum algorithms that exploit the problem structure of number partitioning.Answers to this question would build on recent work that combined quantum and classical methods to produce hybrid algorithms with exponential time, memory, and qubit trade-offs [5,[71][72][73][74].One avenue to explore is the use of our Grover search as a subroutine in a differencing algorithm, in which a pair of large weights w i , w j is replaced by their difference to reduce the size of the search space.Such differencing could be performed either classically (representing w i − w j by a single spin) or quantumly (representing w i − w j by an entangled state |01 + |10 of two spins).While classical differencing has the potential benefit of reducing the dynamic range of the weights, quantum differencing generalizes to initializing the system in a superposition state that reflects classically computed probabilities of finding certain pairs of spins on opposite sides of a perfect partition.

Appendix D: Numerical methods
The simulations of Grover's algorithm are performed numerically, by matrix multiplication according to Eq. 10.Each simulation for a specific problem size is performed on an ensemble of lists of randomly selected weights.To postselect on the existence of solutions, for each list of weights we first use the classical complete Karmarkar-Karp differencing algorithm [70] to search for solutions, and simulate the quantum algorithm only for instances with solutions.The number of problem instances in an ensemble, after postselection where applicable, ranges from 1000 to 5000 for all datasets except that used for Fig. 4(a.i), in which each probability distribution P (S z ) is determined from 5 × 10 4 instances.
To find a sufficiently narrow step width γ to reach a specified success probability P opt in Fig. 4(a.ii),we generate an ensemble of weights and numerically optimize γ using the Nelder-Mead algorithm to reach the specified value P opt .To find the optimal step width γ in the presence of decay [Fig.5 and Fig. 7(b)], we similarly optimize γ to minimize the median total number of Grover iterations using a gradient-descent algorithm.

Appendix E: Capture range and amplification
The interpretation of the step width γ as a capture range for S z values is illustrated in Fig. 4(a.i) of the main text, where we plot the amplification factor after T opt Grover iterations.Here, we additionally present an analytic derivation of the amplification factor after a single Grover iteration.Specifically, for a given spin configuration |x , we show that the amplification factor after the first Grover cycle is of the Lorentzian form with width γ set by the width of the phase step.While the amplitude A and offset B depend on the set of weights, we analytically derive their values averaged over instances of the weights to determine the amplification factor at S z = 0 as a function of step width.We first consider the combined effect of the generalized oracle and inversion about the average on a generic state In terms of phasors χ(x) = e iΦγ (x) and the average phasor χ = x χ(x)/N , the gain in probability of finding the system in state |x is then given by We now proceed to account for the specific functional form Φ γ (x) = 2 arctan(2S z /γ) + π of the oracle's phase response.Defining µ(x) ≡ 2S z (x)/γ as the weighted spin normalized by the step width, we have Furthermore, since for each spin configuration |x with weighted spin S z there exists a complementary spin configuration with weighted spin −S z , the average phasor χ is always real.Equation E5 then reduces to This result is of the Lorentzian form in Eq.E1, with amplitude A = 8χ and offset B = (1 − 2χ) 2 .The gain in the first Grover cycle for a solution state (µ = 0) is bounded above by G max = 9, which is achieved if χ = 1 and approached in the limit where the number of solutions is small and the step is narrow.
For illustration, we examine a single iteration of Grover's algorithm applied to number partitioning with n = 12 random weights of bit depth k = 12. Figure 10 shows the amplification factor G averaged over all spin configurations |x with the same value of the weighted spin S z , as a function of step width γ.Cuts at fixed γ are well fit by the Lorentzian form in Eq.E7 with µ = 2S z /γ, confirming that the step width γ sets the capture range for amplification.The peak amplification G 0 ≡ G(0) remains near its maximum possible value G max = 9 until the width γ grows to roughly σ Sz /G max , where σ Sz denotes the width of the initial S z distribution, which we plot for comparison in the bottom panel of Fig. 10.
The amplification G 0 of solution states depends to lowest order only on the ratio of γ to the width σ Sz ∝ √ n of the S z distribution.To calculate the dependence of G 0 on γ/ √ n from Eq. E7, we express χ in terms of the number of partitions g(µ) with a given value of the imbalance µ: Here, we have used the relation g(µ) = g(−µ) to eliminate the term that is odd in µ.Assuming a large number N 1 of spin configurations, we approximate the average multiplicity g(µ) over many instances of the weights using a normal distribution of standard deviation σ = w rms √ n/γ, where w rms ≡ w 2 i = 1/ √ 3 for weights chosen from a uniform distribution on (0, 1].In terms of p(µ)dµ ≈ g(µ) /N , we have where erfc is the complementary error function.
The average amplification over many instances of the weights is given in terms of χ by This bound is tight in the large-N limit, where the variance in χ over different instances of the weights is small.We plot the lower bound in Eq.E11 as the dashed red curve in Fig. 5(c).There, we denote the amplification as Q 1 ≡ G 0 to emphasize its equivalence to the quantum speedup for a single Grover cycle.We compare our model with the amplification calculated at n = k for n = 12, in each case averaging over 10 3 instances of the weights with postselection.We observe excellent agreement between the model and the simulation.

Appendix F: Scalable algorithm
In Sec.V, we outline a recursive version of our algorithm that allows for operating at a fixed resolution γ ∼ 2 −m of the oracle for arbitrary problem size.The essence of our approach is to consider only the m least significant bits of the weights for successive values = 1, 2, 3, . . . .These truncated weights suffice to amplify, in each layer of the algorithm, candidate solutions satisfying the condition mod(2 k S z , 2 m ) = 0.In the final layer of the algorithm, the fixed m-bit resolution of the oracle suffices to identify only true solutions satisfying S z = 0, thanks to the preamplification of a sparse distribution of S z values in prior layers.
In this appendix, we elaborate on the details of the scalable algorithm, including the encoding of the weights, the modular oracle, and the recursive implementation of inversion about the average.Finally, we derive the asymptotic scaling of the query complexity and present numerical simulations corroborating our analysis.

Encoding the weights
Key to our approach is the ability to dynamically change the mapping from weights w i to system-ancilla couplings J i between successive queries of the oracle.We define the following set of mappings from weights to system-ancilla couplings: with = 1, 2, . . .k/m.Here, we scale the couplings to a fixed maximum value J max as usual, but for a given value we use only the m least significant bits of the weights.(For > 1, we are keeping more bits than the oracle can actually resolve, to avoid subtleties of accounting for carry bits that can add up.It should never be necessary to program the weights with a resolution of more than m + log 2 (n) bits, but the higher precision assumed in Eq.F1 also does no harm.)

Grover amplification with modular oracle
In each layer of our algorithm, our objective is to amplify spin configurations satisfying the condition mod(2 k S z , 2 m ) = 0, i.e., spin configurations for which the m least significant bits of the imbalance S z are zero.
To check this condition a given value of , we need only to know the m least significant bits of each weight, so we use the couplings J i, defined in Eq.F1 for each layer .We further define representing the imbalance at layer using only the m least significant bits of the weights.We wish to design the oracle to produce a π phase shift if mod(2 m S z, , 2 m ) = 0, which is equivalent to the ancilla resonance energy shift being an integer multiple of J max .This modular oracle can be implemented in the central spin or central boson model by subjecting the ancilla to a multifrequency drive field, consisting of a comb with spacing J max .Since S z, ≤ n, there are ∼ n different possible values 2 m S z, that are equivalent to zero modulo 2 m , and correspondingly only approximately n drive frequencies are needed.Each tooth of the comb of drive fields has a spectral width γJ max , which we will choose to be independent of , with a value γ ∼ 2 −m 1.The separation of scales between the width and the spacing of the teeth ensures that the phase response of the oracle is well approximated as where mod(•, d, b) denotes the modulo with divisor d and offset b.In terms of the phase shift Φ at layer , we define the modular oracle U = exp(iΦ ).In the final layer of the algorithm, where m = k, we want to amplify only the partitions with S z = 0 without taking the modulus, so we apply our usual oracle U k/m = exp(iΦ) with resolution γ.

Recursive algorithm
The first layer of our algorithm consists simply of applying the modular oracle in alternation with the diffusion operator V = H n RH n , where H n is the n-qubit Hadamard and R is a multiqubit controlled phase gate (App.B).For an imperfect oracle, we apply the usual spin-echo sequence to produce a state where |ψ 0 = H n |0 ⊗n is the equal superposition of all spin configurations.We assume the number of amplification cycles T 1 to be even for notational simplicity, but Eq.F4 can also be generalized to allow an odd number of cycles.Since only a fraction 2 −m of the spin configurations satisfy the condition mod(2 m S z,1 , 2 m ) = 0, we expect to require approximately T ≈ (π/4)2 m/2 amplification cycles in the first layer = 1.Upon completion of this layer of the algorithm, the state |ψ 1 ≡ G 1 |ψ 0 is approximately an equal superposition of all spin configurations that are candidate solutions to the number partitioning problem based on the m least significant bits of S z .
Naively one might expect subsequent layers of our algorithm to be analogous to Eq. F4 with the replacement U 1 → U .However, an important subtlety is that the diffusion operator V must be modified so that at layer it rotates by π about the state |ψ −1 , i.e., In particular, it is important to rotate about |ψ −1as opposed to |ψ 0 -so that the amplitude of the solution states is inverted about the average amplitude in the sparse superposition of S z values produced in the preceding layer, while ignoring the near-zero amplitudes of the spin configurations that have already been suppressed.
To understand how to implement the generalized diffusion operator V , we first recall how our usual diffusion operator is constructed (Eqs.B1-B2).We can perform a π rotation about any state |ψ by a combination of (1) the operator R that rotates about the state |0 ≡ |0 ⊗n and (2) a unitary operator O that transforms |ψ to |0 .In terms of these ingredients, the rotation about |ψ is implemented by applying the compound operator O † RO.For the usual Grover's algorithm, the n-qubit Hadamard O = H n = O † is the operator that transforms |ψ to |0 and back.
To construct the diffusion operator V for any layer of our algorithm, we thus require an operator O that transforms the state |ψ −1 to state |0 .Conveniently, we know exactly how to perform this transformation for arbitrary -by applying Grover's algorithm all the way up to layer −1.If we define the Grover operator at level as Thus, the diffusion operator needed in layer of the algorithm is Note that Eq.F7 correctly reduces to V 1 = V for the first layer of our algorithm.

Query complexity
Due to the recursive nature of the algorithm, the query complexity grows exponentially with k and hence with n in the hard regime.This should not surprise us, since Grover's algorithm cannot produce an exponential speedup.The key performance metric, then, is the coefficient α in the exponent of the O(2 αn ) query complexity.
The query complexity is given by where T is the number of amplification cycles at layer and τ is the number of calls to the oracle required in each amplification cycle, including the queries involved in implementing the diffusion operator V for > 1.We expect to need T ≈ (π/4)2 m/2 amplification cycles at each layer except the final one, by the same argument given above for = 1.The final layer takes a factor of √ n more steps, but this factor will only introduce a subexponential correction to the query complexity so we can ignore it in the following analysis.The number of calls to the oracle in each amplification cycle of the th layer is based on Eq.F7.Put another way, we have Since the first layer requires only τ 1 = 1 call to the oracle per amplification cycle, for general we have as can readily be verified by induction.The total number of calls to the oracle given by Eq.F8 thus takes the form of a finite geometric series.Evaluating the geometric series yields where c = log 2 (π/2) ≈ 0.65.For n ≈ k, we obtain a query complexity O(2 αn ) with α = 0.5 + 0.65/m.We thus need m ≈ 5 bits of resolution to outperform the best scalable classical algorithm [69].
While the query complexity of the recursive algorithm is at best (i.e., for large m) the same as that of the standard algorithm, the recursive algorithm offers the benefit that the actual runtime in a scalable implementation Lines denote the 2 −0.5n scaling for the standard algorithm and 2 (0.5−0.65/m)n scaling for the recursive algorithm.
with fixed J max is directly proportional to the query complexity, and thus exhibits a Grover speedup.We thus eliminate the exponential overhead that is present in the simplest algorithm, and we have do so without compromising on hardware efficiency.

Simulation
A representative comparison of the standard algorithm and the recursive algorithm is shown in Fig. 11(a), where we simulate a single instance of number partitioning with n = k = 12.The instance is selected to have exactly one pair of solutions.The recursive algorithm was performed with m = 4 bits of resolution, and the amplification steps per layer (T 1 , T 2 , T 3 ) = (2, 3, 2) are chosen to maximize the probability at each layer.The resolution of the oracle is set to γ = 2 −m−1 for the recursive algorithm (orange squares), compared with γ = 2 −k for the standard algorithm (blue circles).While the recursive algorithm required approximately 3 times as many queries as the standard algorithm, the physical time per query at fixed J max is a factor of 2 k−m−1 = 2 7 times longer for the standard algorithm than for the recursive algorithm.Thus, in this example the recursive algorithm produces a significant reduction in runtime for a fixed maximum system-ancilla coupling.
To compare the time complexity of the algorithms, we simulate both algorithms for a range of problem sizes (n, k) with 1000 instances of weights [Fig.11(b,c)].For each layer of the recursive algorithm with m = 6 and γ = 2 −m−1 , the number of amplification cycles T is optimized to minimize the median total number of Grover oracle queries T total .We expect the total number of Grover queries T total in the recursive algorithm to fol-low the query complexity derived in the preceding section (App.F 4).For m = 6, the expected scaling is T total ∈ O(2 0.61n ), leading to a Q ∈ O(2 0.39n ) scaling of the speedup with system size, which is confirmed by the simulation in Fig. 11(b).While the speedup Q in query complexity for the recursive algorithm at finite bit depth m is slightly lower than that of the standard algorithm, the benefit of the recursive algorithm becomes apparent when we plot the speedup Qγ in physical runtime at fixed J max [Fig.11(c)].The growth in Qγ with system size in the recursive algorithm confirms its scalability.

Appendix G: Effects of decoherence
Two forms of decoherence that can limit the performance of our algorithm in realistic implementations are decay of the ancilla and decay of the system spins.In this section, we first provide an analytic estimate of the scaling of the quantum speedup with a generic interactionto-decay ratio in the standard algorithm (App.G 1).We then describe how we calculate the speedup in the numerical simulations of Fig. 5, focusing on decay of the ancilla, which is the dominant decay channel in the near-term experimental implementations proposed and analyzed in App.H.

Quantum speedup in presence of decay
Decay during the generalized Grover's oracle limits the maximum achievable quantum speedup.Here, we analytically derive the scaling of optimal quantum speedup with the interaction-to-decay ratio for the standard algorithm presented in Secs.II-III.The speedup is maximized at a step width γ opt set by a competition between the reduction in capture range at narrower step widths, which ideally increases the success probability, and the accompanying increase in decay.Figure 12 shows the optimal step width and the optimal number of Grover iterations T opt that produce the speedup shown in Fig. 5(b) of the main text.At small interaction-to-decay ratios, it is optimal to use a single amplification cycle with a wide phase step, while at larger interaction-to-decay ratios, the optimal step width is narrower, allowing for a performance closer to that of the ideal Grover's algorithm.
To estimate the optimal step width, we observe that the number of partitions N eff A (γ) within the capture range |S z | γ sets the behavior of the generalized Grover's algorithm in roughly the same way as the number of perfect partitions N A sets the behavior of the ideal Grover's algorithm.With increasing step width, in the absence of dissipation, the number of iterations required to maximize the success probability decreases as in analogy to Eq. 9. (In defining T * opt to maximize the success probability, we are choosing a slightly different definition from that of T opt in the main text.)For large step widths, where we capture a larger number N eff A of spin configurations than the actual number of solutions N A , we can approximate N eff A ≈ γN 6 πn from the theoretical distribution of total weights in the partition problem [87,88].Thus, in tems of the step width γ, we have The decrease in the optimal number of iterations T * opt with increasing step width comes at the cost of a reduced success probability P opt ≈ N A /N eff A , even before accounting for dissipation.Thus, employing a narrower step for a larger number of iterations T is preferable unless decay results in an appreciable reduction in P opt .To estimate the optimal number of Grover iterations at finite interaction-to-decay ratio ρ, we first determine the maximum number T C of iterations that can be performed with a given probability e −C of incurring no error.Here, C is a constant that we choose to optimize the speedup.The error rate per iteration is D/(ργ), where D is an orderunity factor that is derived in App.G 2 for the case of the first amplification step and, more generally, can be obtained from a fit to numerical data.We thus estimate the maximum number of iterations as T C ≈ Cργ/D.
We expect the optimum number of iterations in the presence of decay to be given by T * opt = T C for some order-unity value C. Combining the expression for T C and the relationship between T * opt and γ opt (Eq.G2), the optimal step width is then To estimate the speedup Q opt , we approximate P opt in the presence of dissipation as P opt ≈ e −C N A /N eff A .The speedup Q opt is then given by where P 0 = N A /N and we assume P opt 1 and P 0 1.Finally, collecting the expressions, we find The scaling of the optimal speedup as a function of interaction-to-decay ratio is given by Q opt ∼ ρ 1/3 .For high values of ρ, the optimal speedup will start to saturate to the quantum speedup of the ideal Grover's algorithm.This saturation occurs when the optimal step width becomes smaller than the smallest nonzero |S z | values, which for n = k is at γ opt ≈ √ n/N , with N = 2 n .Thus, the interaction-to-decay ratio where the speedup starts to saturate scales as ρ ∼ N 3/2 / √ n.This scaling exemplifies the fact that reaching the ultimate quantum speedup allowed by Grover's algorithm requires exponentially increasing the interaction-to-decay ratio with problem size.
The numerical results of the generalized Grover's algorithm with ancilla decay in Fig. 5(b) are well described by the model of Eq.G5 with constants C = 1/3 and D = 1.2.This equation is applicable in a region between 100 ρ N 3/2 / √ n.The upper limit of this regime of validity comes from the saturation of the speedup to the ideal Grover's algorithm limit, while the lower limit is reached when T * opt = 1.

Generalized oracle with ancilla decay
The effect of ancilla decoherence during the generalized Grover's oracle can be modeled as an imaginary term in the oracle phase shift (Eq.3).A particular system spin configuration |x will shift the ancilla excited state from resonance by ∆ x = (W * −W 1 )J max , where W 1 and W * are the actual and target weights in the subset sum problem as defined in Appendix A. To include the effect of ancilla decoherence, we make a substitution ∆ x − → ∆ x + iΓ a /2, where Γ a is the linewidth of the ancilla excited state [89].Thus, the oracle phase shift applied to the spin configuration |x is given by Here, µ = 2(W * − W 1 )/γ in an analogy to the definition in App.E and parameterizes the decay rate per query of the oracle, assuming the decay is dominated by the ancilla decay.
The effect of the oracle on the amplitudes of the spin states is given by χ(W 1 ) = exp[iΦ γ (W 1 )].Using Eq.G6 we derive This full form of the oracle including dissipation modifies the single-cycle amplification formula given in App.E. To see how, we rewrite χ in terms of its real and imaginary components, where we use the fact that both r and µ are real.The expression for the amplification in Eq.E5 now reduces to As before, χ is real and thus depends only the real components of χ, weighted by the density of states g(µ): Taking the continuum limit and using the probability distribution p(µ) derived in App.E yields the updated expectation value, From Eqs. (G10) and (G12) we compute the average amplification over many instances: The amplification in Eq.G13 is a lower bound both due to the substitution of χ 2 for χ 2 and due to the small additional probability, which we elsewhere neglected, that the spins end up in a solution state following a dissipation event.The inequality becomes exact in the limit of large N and low dissipation r 1.To estimate the reduction in amplification due to dissipation in this limit, we assume a phase step sufficiently narrow that χ ≈ 1. Expanding Eq.G13 to lowest order in r then yields Appendix H: Experimental implementations

Central spin model with Rydberg atoms
As a central spin system for encoding subset sum problems, we consider an array of atoms that can be optically coupled to Rydberg states to controllably turn on the interaction Hamiltonian H q (Eq.4).The implementation is illustrated in Fig. 13 The system is initialized with the ancilla in state |g and the system spins in state |ψ 0 .
To turn on the system-ancilla interactions, the system atoms are individually addressed by control fields that off-resonantly couple state |1 of the i th atom to the Rydberg state |R with Rabi frequency Ω i and detuning |∆ s | Ω i .In this regime, the lowest-order effect of the light on the atomic states is an ac Stark shift given by Ω 2 i /(4∆ s ).Thus we can write the interaction Hamiltonian as where and V R (r i ) is the Rydberg pair potential between the i th system atom and the ancilla.We choose V R and ∆ s to have opposite signs.If the ancilla is in the Rydberg state, the interaction energy V R then increases the detuning with weights w i = J i /J max , where J max is the largest of the system-ancilla couplings J i .The oracle is implemented by simultaneously turning on the couplings J i and attempting to drive a 2π pulse on the |g → |R transition of the ancilla.The ancilla is driven with a field of Rabi frequency Ω a (t), with the pulse shape chosen to ensure that the qubit ends up in its ground state irrespective of whether the pulse is resonant.This condition is satisfied for a pulse shape [47] Ω a (t) = 2π τ sech πt τ (H3) where τ sets the width of the oracle phase step.In practice, we must restrict the pulse to a finite window −t p /2 < t < t p /2, where a duration t p 3τ suffices to provide a smooth turn on.The detuning ∆ a of the ancilla's control field sets the target weight W * = ∆ a /J max in the subset sum problem (Eq.A1): for configurations of the system spins with weight W 1 = W * in state |1 , the ancilla undergoes a 2π rotation that imparts a geometric phase of π.
More generally, this protocol produces a unitary transformation where we set = 1, T denotes time ordering, and where I x = (I + + I − )/2.For the hyperbolic secant pulse in Eq.H3, we obtain a W 1 -dependent phase shift U R = e iΦγ where and the width of the phase step is given by γ = 2π/(J max τ ) [90].Two effects that can limit the performance of the Rydberg implementation are the finite lifetime 1/Γ R of the Rydberg state and residual interactions among the system spins.The residual interactions between the system spins are smaller than the system-ancilla couplings by a factor of order (Ω If necessary, these interactions can furthermore be cancelled by an echo procedure in which the control fields Ω i are applied again with the signs of ∆ s and V R reversed, the latter by tuning the electric field near a Förster resonance [91].We therefore neglect residual interactions in our analysis and focus on the limits set by Rydberg decay. To estimate the requirements for implementing Grover's algorithm while keeping the probability of Rydberg decay small, we define the maximum Ω max of the Rabi frequencies Ω i and the dressing amplitude = Ω max /(2 |∆ s |).Our perturbative analysis of the dressing assumes that 2 < 1/n, where n is the number of system spins.Let us furthermore assume that the most strongly weighted atom is sufficiently close to the ancilla that |V R | |∆ s |, such that its coupling is During the oracle pulse, the probability of decay for a system atom due to the coupling to the Rydberg state will be t p 2 Γ R .The worst-case decay probability of the system spins when each spin is in state |1 is 3πn 2 /ργ, based on the pulse time t p = 3π/γJ max .In addition, the error rate due to ancilla decay during the generalized oracle is approximately Γ R /J max γ.In the weak dressing limit n 2  1, the decay due to the ancilla dominates over the decay of the dressed system spins.
We now present concrete experimental parameters for implementing the central spin model with cesium atoms.Coupling to high-lying Rydberg states is beneficial as the lifetime scales as the cube of the principal quantum number.By coupling to the 80P 3/2 state, we can achieve Ω max ≈ 2π × 10 MHz with realistic laser parameters [42,92].The Rydberg interaction strength is given by V R (r) = −C 6 /r 6 , where C 6 ≈ 2π × 7000 GHz µm 6 for 80P 3/2 [93].For a typical distance between neighboring atoms in an optical tweezer array r 0 ≈ 4 µm, the interaction shift will be V R (r 0 ) ≈ 2π × 1.7 GHz.
The achievable interaction strength in the Rydberg implementation will depend on system size n, as the weak dressing condition 2 < 1/n puts an upper limit on J max < Ω max /(2 √ n).To give a particular example, for a system size n = 6, with n 2 = 0.1 and Ω max = 2π × 10 MHz, the interaction strength is J max ≈ 650 kHz for ∆ s ≈ 2π × 39 MHz.The interaction shift |V R (r 0 )| > |∆ s | is large enough to extinguish the light shift of the most strongly coupled atom as we assumed in the preceding analysis.For the state 80P 3/2 in cesium, Γ R ≈ 2π × 0.5 kHz, giving the interaction-to-decay ratio ρ ≈ 1200.

Central boson model with atoms in a cavity
As a central boson system for encoding subset sum problems, we consider n spins that are coupled to a cavity of linewidth κ.We require a dispersive atom-light interaction described by a Hamiltonian Here, J i = g 2 i /∆ i is the shift of the cavity resonance when the i th spin is flipped, in terms of the vacuum Rabi frequency g i and the detuning ∆ i Γ e of the cavity from resonance with a transition |1 → |e of linewidth Γ e [Fig.13(b)].
To implement the oracle, the cavity is driven by a weak, narrow-band coherent field |α of frequency ω = ω c + δ, where ω c is the resonance frequency of the bare cavity.The output and input modes b out = χb in (H9) are related by the cavity response function [94] where δ = δ − J max W 1 , assuming that cavity losses are negligible compared with transmission.The weighted sum W 1 is defined as in App.A using weights determined by the couplings of each spin to the cavity, w i = J i /J max .The choice of detuning of the drive field from bare cavity resonance δ sets the target weight W * = δ/J max for the subset sum problem.This can be tuned to specifically implement the partition problem (see App. A).
More generally, we can also account for a photon loss rate Γ a , including any absorption by the atoms, by letting δ = δ − J max W 1 + iΓ a /2. (H11) The magnitude and phase of the cavity response function χ determine, respectively, the probability |χ| 2 of successfully detecting the ancilla photon and the resulting oracle phase shift.On resonance, the magnitude of the response function is which yields a detection probability |χ| 2 ≈ 1 − 4Γ a /κ for small Γ a /κ.The phase shift is given by Φ(W 1 ) ≡ arg [χ] = 2 arctan(2δ /κ) + π. (H13) The phase Φ increases from 0 to 2π in a step of characteristic width κ, assuming low losses Γ a κ/2, as a function of the atom-dependent detuning between the drive and cavity resonance.We parameterize the step width by the dimensionless value γ = κ/J max .
To apply the oracle U γ , we initialize the system in a product state of the atoms, the vacuum field in the cavity, and a weak, narrow-band coherent state in the input mode: (H14) The coherent field leaks through the input mirror into the cavity mode, where the light and atoms interact according to Eq. H8, then leaks into the output mode b out .After a time t 1/(∆ω) 1/κ, where ∆ω is the bandwidth of the input field, the state evolves to The action of e iαχb † out displaces the vacuum state of the output mode |0 bout such that the detection of a single photon in the output mode heralds the state thus applying the oracle.
As an alternative to the coherent drive and heralding, an ancilla atom can be used as an intracavity singlephoton source.By coupling the ancilla to the cavity via a two-photon transition, with the first leg being a classical field, the cavity can be controllably excited from the vacuum to the single-photon state.The bosonic mode is thus reduced to two levels |0 c , |1 c that are coupled by the control field on the ancilla, so that we effectively recover a central spin model.The implementation of the oracle then proceeds much as in App.H 1, by driving a shaped 2π pulse that returns the ancilla atom to its initial state and the cavity to the vacuum state.The width τ of this pulse now controls the step width γ = 2π/(J max τ ), subject to the requirement that the pulse be short compared to the cavity lifetime.
We now proceed to estimate the cavity parameters required to observe Grover amplification [as in Eq. (E7)], as well as the attainable interaction-to-decay ratio.Amplifying the probability of solution states requires a phase step narrower than the initial probability distribution P (W 1 ), which in turn requires strong atom-light coupling.In particular, we will show that the single-atom cooperativity η = 4g 2 /(κΓ e ) sets an upper bound on the dispersive cavity shift J max achievable at low photon loss rate Γ a < κ, and hence a lower bound on the dimensionless step width γ = κ/J max in the driven cavity.
The lower bound on the step width γ arises because increasing the dispersive coupling J max comes at the cost of increased chance of atomic absorption.In the worst-case scenario where all n atoms are in state |1 in the scheme of Fig. in terms of the weights w i .While each weight can be tuned via either the atom-cavity coupling g i or the detuning ∆ i , the latter is preferable because it allows all atoms to benefit from the maximum cavity cooperativity.Thus we set g i ≡ g to be maximal for all atoms, reducing Eq.H17 to where w 2 rms represents the mean-squared value of weights and is given by w 2 rms = 1/3 for weights drawn from a uniform distribution w i ∈ (0, 1].Thus, keeping photon loss small (Γ a /κ 1) requires a step width γ n/η.Equation (H18) gives the decay parameter r = Γ a /κ necessary to determine the single-cycle amplification in Eq.G13.Notably, we can re-express the decay parameter in terms of the variance σ 2 = nw 2 rms /γ 2 of the normalized weighted spin µ = 2(W * − W 1 )/γ and the cooperativity: Achieving amplification requires σ 2 > 1, i.e., the probability distribution of W 1 should be broader than the width γ of the phase step.To achieve this condition at low loss r < 1, we require strong coupling η 1.This requirement is corroborated by plots of the amplification versus step width for various cooperativities in Fig. 5.The maximum achievable single-cycle amplification, shown in Fig. 5(c), becomes larger than 1 for η 50.This condition can be satisfied in state-of-the-art optical cavities, where the highest cooperativities achieved are η ∼ 10 2 [56, 95], at scalable system size n.
Achieving substantial quantum speedups requires operating in the ultrastrong coupling regime η n to reach step widths γ 1.A cooperativity as high as η = 4×10 8 has been achieved by coupling circular Rydberg atoms to a superconducting millimeter-wave cavity [59], with (g, κ, Γ) = 2π×(2.5×10 4, 1.4, 4.4) Hz.To access this high cooperativity, both spin states |0 , |1 must be Rydberg states with finite lifetime Γ −1 , and the dominant decay channel is then atomic decay rather than photon loss, resulting in an interaction-to-decay ratio ρ ≈ J max /(nΓ).The detunings ∆ i should be set to maximize the couplings, up to J max = g, where ≡ g/min(∆ i ) is limited by the requirement n 2 < 1 to avoid absorption of the photon.Fixing n 2 = 0.1 allows an interaction-to-decay ratio ρ ≈ 2 × 10 3 /n 3/2 for the parameters of Ref. [59].

FIG. 1 .
FIG. 1.(a) Sketch of Grover's algorithm, showing the amplitude of each basis state |x in the system state |ψ .One iteration consists of the oracle U marking the solution states (red) with a π phase shift, followed by inversion about the average V .(b) Number partitioning: a set of weighted spins is partitioned, if possible, into two sets of equal total weight.(c) Phase shift Φγ(Sz) applied by generalized oracle with step width γ.(d) The weights wi are encoded by couplings of system spins (red) to an ancilla (blue), which can be either (i) a central spin (e.g., Rydberg atom); or (ii) a bosonic mode (e.g., cavity).

FIG. 2 .
FIG. 2. Visualization of Grover's algorithm and generalized oracle.(a) Grover's algorithm with ideal oracle for N = 2 10 and NA = 1.Over repeated iterations (blue), the state |ψ0 (red) approaches the solution state |A .(b) Grover's algorithm with naive application of the generalized oracle for = 0.25.(c) The spin-echo sequence compensates for the imperfection of the oracle, allowing similar performance to the ideal case.

Figure 3 (FIG. 3 .
FIG. 3. Number partitioning with generalized oracle of step width γ = 2 −k .(a) Success probability PT for n = 8, k = 4, 8, 12 (blue squares, orange circles, and green diamonds).Shading indicates standard deviation over 5000 instances of the weights.(b.i) Optimal number of iterations Topt versus (n, k).(b.ii) Topt for n = k, grouped by number of solutions NA = 2, 4, 6 (red triangles, green circles, and yellow stars) and compared with asymptotic theory Topt ∝ √ N (dashed lines).Black squares show average over all instances.Dotted gray line indicates linear scaling Topt ∝ N .Blue diamonds show median speedup [Q] 0.5 for n = k, with error bars indicating interquartile range.(c) Probability Popt versus (n, k).Black line shows critical bit depth kc(n).
Figure 3(b.ii)shows the median speedup [Q] 0.5 , where [Q] q denotes the qth quantile over problem instances.We observe the expected scaling Q ∝ √ N of the speedup in query complexity.

ACKNOWLEDGMENTS
This work is supported by the ONR under Grant No. N00014-17-1-2279 and the AFOSR under Grant No. FA9550-20-1-0059.O. M. acknowledges support from the ARO under Grant No. W911NF-16-1-0490.J.A.H., M.S.-S., and A.S.-N.acknowledge support from the DOE Office of Science, Office of Basic Energy Sciences, under Grant No. DE-SC0019174.E.S.C. and A.P. were supported by the NSF under Grant No. PHY-1753021, the NSF GRFP (E.S.C.), and the NDSEG Fellowship (A.P.).We thank V. Vuletić and T. Zhang for helpful discussions.
Appendix A: Generalization to subset sum problem FIG.9.Speedup scaling over all (n, k), shown by plotting median total Grover iterations versus memoryless search trials required to reach P = 0.99 probability of success.Each point represents a particular (n, k), where n and k each take values over the range[3,16].Black and red lines denote linear and square root dependences, respectively.

FIG. 10 .
FIG.10.Top: Amplification factor G after one Grover iteration a single representative instance of weights at n=k=12.Fits with the Lorentzian model of Eq.E1 are shown as solid lines.Offset B is the sole free fit parameter, which is related to A by Eq.E7.Bottom: Initial Sz probability density distribution for n=k=12, averaged over 10 4 instances (yellow points with shading).Expected initial distribution, a Gaussian with σS z = 1, shown as a solid orange line.

FIG. 11 .
FIG. 11.Comparison of the standard (blue circles) and recursive (orange squares) algorithms.(a) Success probability P versus number of oracle queries T for a single instance of number partitioning at n = k = 12.Recursive algorithm is performed with m = 4 and γ = 2 −m−1 .The physical time per query (at fixed Jmax) is longer by a factor of 2 k−m−1 = 2 7 for the standard algorithm than for the recursive algorithm.(b) Median speedup [Q] 0.5 in query complexity, plotted versus n = k for the standard algorithm with γ = γc and the recursive algorithm with m = 6 and γ = 2 −m−1 .Lines denote the 2 0.5n scaling for the standard algorithm and 2 (0.5−0.65/m)n scaling for the recursive algorithm.(c) Speedup [Q] 0.5 γ in physical runtime at fixed Jmax, plotted versus n = k for the standard algorithm with γ = γc and the recursive algorithm with m = 6 and γ = 2 −m−1 .Lines denote the 2 −0.5n scaling for the standard algorithm and 2 (0.5−0.65/m)n scaling for the recursive algorithm.

10 1 10 2 10 3 10 4 10
FIG. 12.(a) Ratio of optimal step width γ to critical step width γc versus the interaction-to-decay ratio ρ for n = k = (4, 6, 8, 10) denoted by markers shaded from darkest to lightest.(b) Optimal number of Grover iterations Topt versus the interaction-to-decay ratio ρ for n = k = (4, 6, 8, 10) denoted by markers shaded from darkest to lightest.Dashed lines represent the number of iterations Topt at which the speedup is maximized in the ideal Grover's algorithm.
(a).The spins of the system atoms are encoded in two ground states |0 , |1 .The ancilla qubit is encoded using a ground state |g and a Rydberg state |R , in terms of which we define the spin raising operator I + = |R g| and lowering operator I − = |g R|.

FIG. 13 .
FIG. 13.(a) Central spin model realized by Rydbergdressed atoms (red) interacting with ancilla qubit encoded on a ground-to-Rydberg transition (blue).(b) Central boson model realized by driving one-sided cavity coupled to system spins and heralding on photodetection.
13(b), atomic absorption produces a photon loss rate Γ a = Γ e 10 1 10 2 10 3 10 4 10 5 10 6 ρ [2]IG.8.Comparison of speedups Q between two diffusion operator methods: Q versus the interaction-to-decay ratio ρ for the perfect diffusion operator (full lines) and the diffusion operator using the generalized oracle (dashed lines) for n = k = (4, 6, 8, 10), shaded from darkest to lightest.canbedecomposed into two n-qubit Hadamard transforms H n and a multiqubit controlled phase gate R[2].The operation H n is performed by applying a single-qubit Hadamard gate to each qubit.The operator