Implementing a Fast Unbounded Quantum Fanout Gate Using Power-Law Interactions

The standard circuit model for quantum computation presumes the ability to directly perform gates between arbitrary pairs of qubits, which is unlikely to be practical for large-scale experiments. Power-law interactions with strength decaying as $1/r^\alpha$ in the distance $r$ provide an experimentally realizable resource for information processing, whilst still retaining long-range connectivity. We leverage the power of these interactions to implement a fast quantum fanout gate with an arbitrary number of targets. Our implementation allows the quantum Fourier transform (QFT) and Shor's algorithm to be performed on a $D$-dimensional lattice in time logarithmic in the number of qubits for interactions with $\alpha \le D$. As a corollary, we show that power-law systems with $\alpha \le D$ are difficult to simulate classically even for short times, under a standard assumption that factoring is classically intractable. Complementarily, we develop a new technique to give a general lower bound, linear in the size of the system, on the time required to implement the QFT and the fanout gate in systems that are constrained by a linear light cone. This allows us to prove an asymptotically tighter lower bound for long-range systems than is possible with previously available techniques.

In the standard circuit model for quantum computation, the size of a quantum circuit is measured in terms of the number of gates it contains.In typical quantum systems, coherence times are a limitation, so low-depth ("shallow") quantum circuits prioritized for the regime of noisy intermediate-scale quantum computers are more desirable [1].Various proposed models of quantum computation are equivalent up to polynomial overhead, making the definition of the complexity class BQP insensitive to the model of computation [2][3][4][5].
However, these models can differ in the precise complexity of operations.As a drastic example, suppose we are given access to a fast unbounded fanout gate represented by the map |x |y 1 |y 2 . . .→ |x |y 1 ⊕ x |y 2 ⊕ x • • • where the ⊕ operator denotes bitwise XOR (bit y i is flipped if x = 1 and not flipped otherwise).This operation is a reversible analog of a gate that copies x to registers y 1 , y 2 , . . . .By "unbounded," we mean that there is no limit on the number of bits that can be targeted by this operation.
The unbounded fanout gate makes it possible for constantdepth quantum circuits to perform a number of fundamental quantum arithmetic operations [5].Furthermore, unbounded fanout can also reduce the quantum Fourier transform (QFT)-a subroutine of a large class of quantum algorithms, including most famously Shors algorithm for integer factorization [6]-to constant depth as well.In fact, it enables implementing the entirety of Shor's algorithm by constantdepth quantum circuits with access to a polynomial amount of classical pre-and post-processing [7].
While the unbounded fanout gate is clearly a powerful resource for quantum computation, its efficient implementation in physically realizable architectures has not been studied in great depth.In the standard circuit model of digital quantum computation-where one may apply single-qubit and two-qubit gates from a standard gate set on arbitrary nonoverlapping subsets of the qubits-a fanout gate on n qubits can be implemented optimally in Θ(log n)-depth [8,9].We also consider the Hamiltonian model, in which one may apply single-qubit and two-qubit Hamiltonian terms.In particular, in the Hamiltonian model with all-to-all unit-strength interactions, one can implement the fanout gate in constant time [10,11].However, the assumption of being able to directly apply an interaction between two arbitrarily distant qubits does not hold in practice for large quantum computing architectures [12][13][14][15].Mapping these circuits to restricted architectures inevitably leads to overheads and potentially even different asymptotic scaling.In D-dimensional nearest-neighbor architectures, for example, the unbounded fanout gate can only be implemented unitarily in depth Θ(n 1/D ) [16].While there exist protocols that can implement the fanout gate in constant depth on these architectures [17], these proposals require intermediate measurements along with classical control-a resource that may be inaccessible in certain near-term experimental systems [18].The overheads resulting from such physical restrictions could therefore limit the potential asymptotic speed-up from a fast quantum fanout.
Systems with power-law interactions present an opportunity for realizing these speed-ups.Specifically, for a lattice of qubits in D dimensions, suppose the interaction strengths between pairs of qubits separated by a distance r are weighted by a power-law decaying function 1/r α .These long-range interactions are native to many experimental quantum systems and have attracted interest as potential resources for faster quantum information processing.Examples of long-range interactions include dipole-dipole and van der Waals interactions between Rydberg atoms [19,20], and dipole-dipole interactions between polar molecules [21] and between defect centers in diamond [20,22].Previous works have explored the acceleration of quantum information processing using strong and tunable power-law interactions between Rydberg states [23][24][25][26][27][28][29], which can implement k-local gates that control or target simultaneously k 10 qubits.Those gates still have a finite spatial range and can therefore only give a constant-factor speed-up over nearestneighbor architectures.Recently, Refs.[30][31][32] gave protocols that take advantage of power-law interactions to quickly transfer a quantum state across a lattice.As we will show, it is also possible to leverage the power of these interactions to implement quantum gates asymptotically faster than is possible with finite-range interactions [33].
In this Letter, we describe a method of implementing the unbounded fanout gate using engineered Hamiltonians with power-law interactions.As an application of this protocol, we show that simulating strongly long-range systems with α ≤ D for logarithmic time or longer is classically intractable, if factoring is classically hard.We also develop a new technique that allows us to prove the tightest known lower bounds for the time required to implement the QFT and unbounded fanout in general lattice architectures.
Protocol for fast fanout using long-range interactions.-Weuse a modified version of the state transfer protocol from Ref. [30] to perform a fanout gate on n logical qubits in O(log n) time using long-range interactions with α = D and in O(1) time for α < D.
As an intermediate step, the state-transfer protocol "broadcasts" a single-qubit state into the corresponding Greenberg-erHorneZeilinger (GHZ)-like state: (1) where ψ 0 , ψ 1 ∈ C and |ψ 0 | 2 + |ψ 1 | 2 = 1.This long-range broadcast is achieved by performing a sequence of cascaded controlled-NOT (CNOT) gates-similar to the standard gate-based implementation of the unbounded fanout gate.The CNOT gate from qubit i to qubit j can be implemented by a Hamiltonian H ij = h ij |1 1| i ⊗ X j acting for time t = π/(2h ij ), up to a local unitary [30].Applying a Hamiltonian H(t) = ij H ij (t), which variously turns on/off interactions between pairs of qubits at different times, allows one to implement the broadcast in Eq. (1).
By using Hamiltonians with long-range interactions h ij satisfying h ij ≤ 1/r α ij , it is possible to implement the broadcast operation in sublinear time.For a system of n qubits, this broadcast time depends on the power-law exponent α and the dimension of the system D as follows [30]: We term the broadcast time t GHZ , since it corresponds to the GHZ-state-construction time when long-range broadcast is not the same as fanout because it requires that all intermediary qubits (besides the first qubit) be initialized in the |0 state.However, as we now show, it is possible to adapt the broadcast protocol to implement the fanout gate in time t GHZ using n ancillary qubits.
Consider a system of n qubits arranged on a D-dimensional lattice.Furthermore, assume there are n ancillary qubits, each located adjacent to one of the n original data qubits.Let us describe the qubits as |d 1 , |d 2 , . . ., |d n and |a 1 , |a 2 , . . ., |a n for data and ancilla, respectively.Suppose we want to perform fanout with |d 1 as control, and that all ancillae are guaranteed to be in state |0 .Then the following sequence of operations (depicted graphically in Fig. 1) implements the fanout operation: In addition to accomplishing fanout, this protocol returns the ancillary qubits to the |0 state.Modulo O(n) short-range operations that can be done in parallel in one time step, the protocol requires time 2t GHZ .Hence, it can implement the fanout gate in time that is constant for α < D, logarithmic for α = D, and polynomial for α > D.
We briefly comment on the constant-depth implementation of the QFT and Shor's algorithm using the unbounded fanout gate.An n-qubit QFT circuit can be performed with O(n log n) gates to 1/poly(n) precision [34].Using unbounded fanout, the circuit can be reduced to constant depth with O(n log n) ancillary qubits [5].We note that including these ancillae in the lattice would not change the asymptotic scaling of our protocol for α ≤ D, since t GHZ is either O(1) or O(log n) in this regime.
Intractability of classical simulation of strongly long-range systems.-As a corollary, the protocol shows that strongly long-range interacting systems with α ≤ D evolving for time logarithmic in n or longer are difficult to simulate classically.By this we mean that to approximately sample from the timeevolved state to within constant total-variation-distance error ε, a classical computer requires time at least superpolynomial in the worst case [35].The argument operates by a complexity-theoretic reduction from integer factoring, a problem that is assumed to be difficult for classical computers with the ability to use random bits (FACTORING / ∈ BPP).The time required to implement the fanout gate using Algorithm 1 is t FO = O(1) for α < D and O(log n) for α = D.It is possible to implement Shor's order-finding algorithm in time O(t FO ) using a small amount of classical pre-processing (polynomial in n) [5,36].Using the ability to sample from the output of the order-finding algorithm to error ε < 0.4 < 4/π 2 , classically efficient post-processing can output a factor of an n-bit integer with probability Ω(1) [6].Therefore, if it were possible to efficiently sample from the output distribution in strongly longrange systems for evolution-time t = O(log n), then it would be possible to factor n-bit integers efficiently as well.The best classical algorithm currently known for factoring an n-bit integer takes runtime exp O( √ n log n) [37] and the problem is widely believed to be classically intractable.This stands in contrast to systems with finite-range interactions in 1D, for which efficient classical simulation is possible up to any time satisfying t ≤ O(log n) [38].Under the complexity assump-tion mentioned above (FACTORING / ∈ BPP), we have shown that this result is not fully generalizable to strongly long-range interacting systems.
Lower bounds on the time required to implement QFT and fanout.-Asa way to benchmark our long-range protocol for fanout, we discuss circuit-depth lower bounds in general lattice systems.Recall that the protocol in Algorithm 1 can implement fanout-and by corollary, the QFT-in time t GHZ , which scales as O(log n) for long-range systems with α = D and as O(1) for α < D. In this section, we show that such fast asymptotic runtimes cannot be achieved in architectures with strict locality constraints.
In Ref. [39], Maslov showed that a specific way of implementing the QFT requires Ω(n) depth on the 1D nearestneighbor architecture, though this did not rule out other QFT implementations with smaller depth.Here, we devise a technique that yields a lower bound of Ω(n 1/D ) for the time required to perform a QFT in the Hamiltonian model.This result strengthens and generalizes Maslov's bound to higher dimensions and to the Hamiltonian model.In addition, we show that the same lower bound applies even to circuits that perform the QFT approximately.
Our lower bound holds for any lattice system with finite velocities of information spreading, which include short-range interactions (i.e., finite-ranged or exponentially decaying) and power-law interactions with α > 5/2 in one dimension or α > 2D + 1 for D > 1 [32,40,41].Combined with our results above, this implies that systems with strongly long-range interactions can implement the QFT and fanout asymptotically faster than more weakly interacting systems.
The intuitive idea behind our proof is that the QFT unitary can spread out operators in a certain precise sense, a task that can be bounded by the "Frobenius-norm light cone" of Ref. [31].The fact that this light cone imposes a finite speed limit on information propagation in short-range interacting systems implies that the minimum time t 2 (r) required for operator-spreading is proportional to the distance between qubits r.This constrains the computation time for the QFT, denoted t QFT , by Ω(n 1/D ), from which the evolution-time (and hence circuit-depth) lower bound follows.The same Frobenius-norm bound also constrains the time required to implement the approximate QFT (AQFT).
We consider the 4 n -dimensional vector space of n-qubit operators for which the set of Pauli operators {I, X, Y, Z} ⊗n forms a basis.We quantify operator spreading outside a region of radius r as follows.Taking an operator |O) initially supported on site 1, we measure the weight of its time-evolved version, |O(t)), on sites at distance r (and beyond) using a projection operator Q r , which projects onto strings of Pauli operators that act nontrivially on at least one site at distance r or greater.We measure the weight of this projected operator |O r ) := Q r |O(t)) via the (squared) normalized Frobenius norm O r 2 F := Tr O † r O r /2 n , which coincides with the Euclidean norm over the operator space, (O r |O r ) [42].
We show that operators spread by the action of the QFT can have high weight on distant regions, which implies that Lemma.Let U QFT be the QFT operator on n qubits arranged in D dimensions such that the first and nth qubits are a distance r = Θ(n 1/D ) apart.Then U † QFT Z 1 U QFT =: Z 1 is an operator with at least constant weight at distance r.
Proof.We explicitly compute the weight of the operator Z 1 on site n.Define ω := e 2πi/2 n .The QFT operation on n qubits is defined as , where we interpret the bit string y 1 , y 2 , . . ., y n as a binary representation of a number y ∈ {0, 1, . . ., 2 n − 1} in the canonical ordering, i.e. y = y 1 2 n−1 + y 2 2 n−2 + • • • + y n .The inverse of the QFT is obtained simply by taking ω → ω −1 .First, we compute We divide the sum over y into two cases, y 1 = 0 and y 1 = 1: We can compute these sums separately, giving The nonzero terms in the sum on the right-hand side of Eq. ( 5) occur when x − z is odd, i.e., when x n − z n = 1 mod 2. Therefore, the only terms that remain are off-diagonal on qubit n or, equivalently, contain only the X n or Y n Pauli operators.This implies that Z 1 has all its weight on operators acting nontrivially at distance r-formally, that As a result of the Lemma, t QFT follows the light cone defined by the normalized Frobenius norm, which is at least as stringent as the Lieb-Robinson light cone.This leads directly to the following theorem: Theorem.For systems with finite-range or exponentiallydecaying interactions in D dimensions, the time required to implement the QFT unitary is lower bounded by t QFT = Ω(r), where r = Θ(n 1/D ) is the distance between the first and nth qubits.
For systems with long-range interactions, the Lieb-Robinson light cone gives the following bounds [32,43,44]: For one-dimensional long-range systems, the Frobenius light cone gives the following tighter bounds [31]: We note that the lower bounds in the Theorem also apply to the fanout time, t FO , through the observation that fanout also performs operator spreading (using X 1 instead of Z 1 ).We emphasize that these bounds pertain to the Hamiltonian model, where commuting terms can be implemented simultaneously and state transfer could in theory be done in o(1) time for sufficiently small α.
We observe that the QFT can implement quantum state transfer as well.The goal of state transfer is to find a unitary V such that V |ψ ⊗ |0 ⊗n−1 = |0 ⊗n−1 ⊗ |ψ [30,45].The unitary V = H ⊗n U QFT (where H represents the single-qubit Hadamard gate) satisfies this definition of state transfer.
For the AQFT, the lower bound follows in a similar fashion.The circuit that implements the QFT approximately with error ε can be represented by a unitary ŨQFT such that [36] Consider the operator Ũ † QFT Z 1 ŨQFT .We argue that this operator is spread out as well.From Eq. ( 8), it follows that where • indicates the operator norm and we let U := U QFT and Ũ := ŨQFT for simplicity.Since the normalized Frobenius norm is upper-bounded by the operator norm, we have Moving to the vector space of operators and applying the projector Q r onto operators with support beyond radius r yields where • is the Euclidean norm and using Q r = 1.By the triangle inequality, we have Equation ( 12) implies that the operator Z 1 after conjugating by the approximate QFT has large support on sites beyond distance r as well, implying that the lower bounds in Eqs. ( 6) and ( 7) also hold for the approximate QFT.Conclusions and Outlook.-Insummary, we have developed a fast protocol for the unbounded fanout gate using power-law interactions.For α ≤ D, the protocol can perform the gate asymptotically faster than is possible with shortrange interactions.In particular, for experimentally realizable dipole-dipole interactions with α = 3, it allows the quantum Fourier transform and Shor's algorithm to be performed in logarithmic time in three dimensions.As a corollary, we showed that classical simulation of strongly long-range systems with α ≤ D for time t = O(log n) is at least as difficult as integer factorization, which is believed to be intractable in polynomial time.Currently, the question of whether the fanout protocol is optimal remains open.The best lower bound gives Ω(n α/D−1 log n) for α < D and Ω(1) for α = D [46].We conjecture that the broadcast time t GHZ = O(log n) is indeed the tightest that can be achieved for α = D.
In addition, we gave a general Ω(n 1/D ) lower bound on the time to implement fanout, as well as the exact or approximate QFT, for all systems constrained by a linear light cone.In doing so, we used the state-of-the-art Frobenius bound from [31], which has been shown to be tighter than the Lieb-Robinson bound for certain long-range interacting systems in one dimension.For higher dimensions, the conjectured critical value of α above which a linear light cone exists is 3D/2 + 1.If this generalization of the Frobenius bound were to hold, it would immediately tighten our lower bounds on the QFT and fanout.Among other applications, this would imply the impossibility of implementing fanout in o(n) time in cold-atom systems with van der Waals interactions (α = 6 in D = 3 dimensions).We also note the room for improvement in the catalogue of Lieb-Robinson bounds in Eq. ( 6)-especially for the power-law light cone between α ∈ (2D, 2D + 1].The range of validity likely extends below α = 2D, and the exponent is suboptimal-at α = 2D + 1, it is still a factor of 1/(D + 1) from giving a linear light cone.
As a final remark, we have derived our lower bounds on t QFT under the assumption that the first and last qubits of the QFT are separated by a distance of r = Θ(n 1/D ).However, other mappings of computational qubits to lattice qubits could potentially lead to faster implementations.For example, consider the mapping onto a one-dimensional chain of qubits wherein the second half of the chain is interleaved in reverse order with the first half [47].Applying the QFT to a product state in this layout results in a state with two-qubit correlations that decay exponentially in the distance between the qubits.In this case, our lower bound techniques cannot rule out the possibility of t QFT = o(n) for short-range interacting Hamiltonians.This suggests that t QFT could depend strongly on qubit placement.Given that the QFT is typically used as a subroutine for more complex algorithms, it may not always be possible to reassign qubits without incurring costs elsewhere in the circuit.Still, it would be interesting to investigate whether careful qubit placement could yield a faster QFT.

FIG. 1 .
FIG. 1.A protocol for a fast unbounded quantum fanout gate using long-range interactions, depicted here for a 1D lattice.The layout consists of a chain of data qubits, along with their adjacent ancillary qubits that are initialized to |0 .(a) The first step is a local controlled-NOT (CNOT) gate from |d1 to |a1 .(b) The application of the longrange "broadcast" from |a1 to the rest of the ancillary qubits |ai creates a GHZ-like state in Eq. (1) for the ancillary qubits together with the first data qubit.(c) We apply CNOT gates from ancillary qubit |ai to the data qubit |di , which can be done in parallel.After this step, we reverse process (b) and process (a) to return the ancillary qubits to |0 (not redrawn here).