Preparing Bethe Ansatz Eigenstates on a Quantum Computer

Several quantum many-body models in one dimension possess exact solutions via the Bethe ansatz method, which has been highly successful for understanding their behavior. Nevertheless, there remain physical properties of such models for which analytic results are unavailable, and which are also not well-described by approximate numerical methods. Preparing Bethe ansatz eigenstates directly on a quantum computer would allow straightforward extraction of these quantities via measurement. We present a quantum algorithm for preparing Bethe ansatz eigenstates of the spin-1/2 XXZ spin chain that correspond to real-valued solutions of the Bethe equations. The algorithm is polynomial in the number of T gates and circuit depth, with modest constant prefactors. Although the algorithm is probabilistic, with a success rate that decreases with increasing eigenstate energy, we employ amplitude amplification to boost the success probability. The resource requirements for our approach are lower than other state-of-the-art quantum simulation algorithms for small error-corrected devices, and thus may offer an alternative and computationally less-demanding demonstration of quantum advantage for physically relevant problems.

Quantum computers hold the promise of transformative applications in a variety of fields including cryptanalysis [1], quantum chemistry [2, 3], materials science [4,5], and potentially combinatorial optimization [6,7]. To realize the full potential of quantum computing, large-scale, fault-tolerant devices will ultimately be necessary. As these do not yet exist, much recent work has studied possible near-term applications in the present era of noisy, intermediate-scale quantum computers (NISQ) [8,9]. In this context, a key question concerns the demonstration of 'quantum advantage' -that is, the ability to perform computations that can not be done efficienly with classical methods. Recently, quantum advantage was shown for a superconducting processor sampling random quantum circuits [10] and photonic-based Gaussian boson sampling [11]. Although these are important achievements, the specific tasks performed were not closely related to the practical applications mentioned above, but were essentially designed for the purpose of demonstrating advantage. Thus, the realization of quantum advantage for a problem of practical interest remains open.
A question of increasing interest is what applications become feasible with small-scale error-corrected devices, i.e., in the intermediate era between NISQ and faulttolerant quantum computers with many logical qubits and a high clock speed for non-Clifford gates. Recent estimates suggest that algorithms that provide only a quadratic speedup over classical methods may have difficulty achieving quantum advantage on problem sizes accessible with small devices [12][13][14]. The simulation of quantum systems, on the other hand, can yield exponential improvement over conventional approaches. These still require formidable resources, despite recent algorithmic advances [15][16][17][18]. This motivates the search for physically-interesting problems and algorithms that can lead to quantum advantage with fewer resources in the near future.
We propose the study of Bethe ansatz (BA) states on a quantum computer as a computationally less-demanding route to the demonstration of quantum advantage for problems relevant to physics, including quantum magnetism [19,20], ultracold atoms [21], and unconventional superconductivity [22]. BA methods allow for deep insight into the static and dynamic properties of these many-body systems, and are able to explore not only ground states, but also interactions between complex collective excitations, such as magnons, and the response to quantum quench experiments [23]. More specifically, the BA technique yields exact solutions to a class of one-dimensional quantum many-body models, including the spin-1/2 Heisenberg and Hubbard models, among others [24][25][26][27]. The resulting wave functions depend on algebraic equations that can be efficiently solved classically. The exponential growth of the Hilbert space with the system size L has historically limited the direct computational studies of the eigenstates to small systems. Instead, various mathematical techniques have been extensively developed to access physical quantities in the thermodynamic limit L → ∞, bypassing the calculation of the wave function itself. While many different quantities can be determined, the difficulty of their calculation varies widely. In particular, arbitrary-range and higher-order correlation functions have been very challenging to access, and remain an active area of research [28][29][30]. Quantum computers, however, can compute such correlation functions [5,31] straightforwardly, thus suggesting the possibility of quantum advantage for this task. The importance of higher-order correlation functions for strongly correlated systems has recently been emphasized [32]. Calculating such observables using a quantum computer in turn hinges on the possibility of efficiently preparing the Bethe ansatz states.
To this end, we demonstrate an efficient quantum algorithm that can prepare a subset of the Bethe ansatz eigenstates of the one-dimensional XXZ chain, a model which is fundamental to the study of quantum mag-netism. Our algorithm uses the so-called coordinate Bethe ansatz method, and has polynomial scaling in the circuit depth and T-gate complexity, along with low constant prefactors. As we show with explicit gate counts for the corresponding circuits, the approach scales to large enough systems for the calculation of classically inaccessible quantities in near-term error-corrected devices. While the algorithm we present is probabilistic, we also show that amplitude amplification can be used to increase the success rate [33].
Algorithms have been previously given (and also implemented) that prepare exact eigenstates of quantum many-body models [34][35][36][37]. However, these were largely limited to cases that are equivalent to non-interacting fermions, for instance, under the Jordan-Wigner mapping. In contrast, the XXZ model corresponds to an interacting fermionic problem, which is computationally much more difficult. We note that the possibility of constructing circuits to diagonalize Bethe ansatz-solvable models and measure challenging correlation functions was previously suggested, though without an indication of how this could be done [34].
The importance of Bethe ansatz-solvable models for benchmarking NISQ devices has been previously recognized [37][38][39], as they provide exact values for quantities (such as the energy) to compare against the results of noisy quantum computations. On the other hand, the direct preparation of Bethe ansatz states has been relatively unexplored. This question was recently studied in Ref. [40], which considered treating Bethe ansatz states variationally (using the algebraic Bethe ansatz) and concluded that the approach was not scalable. Furthermore, that work did not make a connection to the possibility of quantum advantage. Apart from direct preparation of Bethe ansatz eigenstates, other works have considered variational approaches using generic ansatzes [38,39,[41][42][43]. The comparison of the computational complexity of these methods to that of the direct construction is an interesting question for future studies, as are probabilistic algorithms for preparing other strongly-correlated states [44].
The paper is organized as follows. Section I introduces the XXZ model and the elements of the Bethe ansatz solution needed for the construction of the algorithm. Section II describes the Bethe ansatz state preparation algorithm. Section III presents numerical results that validate the method and studies its success probability. This section also includes resource estimates for classically intractable problem sizes. Section IV describes the amplitude amplification procedure for our algorithm, and presents numerical calculations that confirm its success. Section V compares our algorithm with conceptually simpler but less efficient approaches to the same task, explicitly verifying the enormous speedup of our method. Section VI argues that quantum advantage can be achieved with Bethe state preparation by comparison with classical computational methods, and presents additional applications of the algorithm. Finally, we conclude in Sec-tion VII.

I. MODEL AND SOLUTION
We consider the one-dimensional spin-1/2 XXZ chain on L sites with periodic boundary conditions, whose Hamiltonian is given by with S α L ≡ S α 0 (α = x, y, z). Here S α j are the spin operators with eigenvalues ±1/2. The exact solution of this model via the Bethe ansatz method was presented in Ref. [45], and many introductions to the problem (in both the coordinate and algebraic formulations) exist [46][47][48][49][50]. We follow the account of the Bethe ansatz method given in Ref. [48]. The eigenstates of the above Hamiltonian are given by where x 1 , . . . , x M label the positions of the M down spins in the chain (the Hamiltonian conserves the z component of the total spin, S z tot = i S z i ) and the momenta k i label the different states. The wave function of Eq. (2) gives the amplitude for the M down spins to occur on the sites x j = 0, . . . , L−1. The summation here is over the M ! permutations of the down spin sites. These permutations arise from the fact that, within Bethe ansatz models, scattering processes exchange momenta between particles, but do not alter their magnitudes. The coefficients A P are related by A P A P = − 1 + e i(k P l +k P l ) − 2∆e ik P l 1 + e i(k P l +k P l ) − 2∆e ik P l ≡ −e −iΘ(k P l ,k P l ) . ( To fix the coefficients, we take A I = 1, where I is the identity permutation. Here ∆ = J z /J xy is the anisotropy in the interactions, and P and P are permutations that differ by a single transposition between adjacent elements, P (l + 1) = P (l) and P (l) = P (l + 1). The momenta k i are constrained by the quantization conditions, with I i an integer (half-integer) for M odd (even). Physically, these constraints on k i arise from imposing periodic boundary conditions on the model. Eqs. (3) and (4) are a set of algebraic equations (the Bethe equations) for the quantum numbers {k i }. In general, these equations admit complex solutions, but for the special case when all {k i } are real, Θ(k i , k j ) is also real and is given by . (5) In the spirit of hybrid quantum-classical algorithms, we solve the Bethe equations classically to obtain the momenta {k i } and phases Θ(k i , k j ). These values are then used as input to our quantum algorithm for generating the corresponding eigenstate. Our algorithm allows for the preparation of Bethe ansatz eigenstates for which the {k i } are real, such that A P and e ik P j xj amount to complex phases applied to the second-quantized basis states of the system.

II. BETHE ANSATZ STATE PREPARATION ALGORITHM
The quantum algorithm for preparing a Bethe ansatz state consists of several steps, and is summarized in Algorithm 1. The general structure of our approach is based on the linear combination of unitaries (LCU) method (which also finds application in the Taylor series approach to Hamiltonian simulation) [51,52]. However, a key difference is that our algorithm aims to generate specific quantum states starting from a particular initial state, rather than compiling a generic unitary evolution operator. In addition to the L qubits representing the system, a register of M 2 ancilla qubits are used to label the different permutation terms in Eq. (2). By preparing a superposition of the allowed label values on these ancillas, using these to apply controlled operations on the system, and finally disentangling the label and system registers, we perform the summation over all permutations present in Eq. (2). This process is facilitated by introducing a second ancillary register of M qubits that we call the "faucet register", along with one additional ancilla work qubit. Thus, the algorithm requires a total of M 2 + M + 1 ancilla qubits.
Algorithm 1 Bethe state preparation 1: Prepare the Dicke state |DL,M on the system qubits 2: Create permutation labels while applying pieces of AP 3: Apply e ik P j x j using the "faucet" method 4: Reverse permutation label (without phases) 5: Measure permutation label, with success on |00 · · · 0 The algorithm begins by preparing the Dicke state on L sites with M down spins, |D L,M . Relabeling |↑ ≡ |0 , |↓ ≡ |1 , |D L,M is the equal superposition (that is, without relative phases) of all basis states on L qubits with Hamming weight M . This state forms the underlying "canvas" on which the phases in Eq. (2) are applied. Dicke state preparation can be accomplished using the recent deterministic algorithm of Ref. [53], for which the gate count was improved in Ref. [54]. This algorithm uses an inductive method to prepare smaller Dicke states which are subsequently combined to yield the desired one. We have used this algorithm in our explicit circuit constructions, though any other deterministic method of preparing |D L,M would also work.
As discussed above, the amplitudes that must be applied to |D L,M to generate a Bethe ansatz state depend on the permutations {P } of M objects. We use the permutation label register to create the different permutations and their associated phases A P . Naively, one could use an integer encoded in a binary representation to label each of the permutations. The difficulty with this approach is that the number of permutations is M !, so that imprinting the phases A P and e ik P j xj onto |D L,M would require combinatorially many operations. This leads to circuit depths and complexities that are superexponential in M , quickly becoming unfeasible as M grows (we explore a concrete realization of this approach in Section V). To overcome this fundamental limitation of this method and design an efficient algorithm, we introduce a conceptually distinct approach for labeling the permutations. Rather than assigning an arbitrary number to a given permutation, we implement its explicit action on the string consisting of the numbers 1, . . . , M . As described below, this allows for an efficient generation of the permutation labels, while also generating the distinct A P simultaneously.
The permutation label register consists of M subregisters, each of which can store an integer value k ∈ {1, . . . , M }. To represent a valid permutation, the subregisters must contain distinct values (for instance, |213 is valid whereas |233 is not). We use a one-hot encoding such that each subregister consists of M qubits, and the number k is represented by a 1 on the kth qubit and 0s on the rest. Thus, for M = 3 the allowed states on each subregister are |1 ≡ |001 , |2 ≡ |010 , and |3 ≡ |100 . This one-hot encoding requires M 2 qubits to represent the complete label. The use of the one-hot encoding facilitates a trade-off between time and space resources [15,55,56] by reducing the number of controls required to implement the necessary phase gates.
The goal of step 2 of Algorithm 1 is to create the state 1 √ M ! P A P |P on the permutation label register. The phases A P are kicked back onto the system qubits, while the |P are used to apply the conditional gates needed in step 3, as explained below. For clarity, we first describe the construction of the equal superposition of all permutation labels. We then show how to slightly modify this procedure to simultaneously generate the phases A P for all M ! permutations. We use an iterative method to construct the permutation label state starting from the vacuum state |00 . . . 0 on M 2 qubits. The complete label superposition state is built up sequentially from the first (rightmost) subregister to the last (leftmost) using a series of exchange-type gates. We describe the method inductively as follows. The zeroth sublabel is prepared by setting the zeroth qubit of the zeroth subregister to 1 (in the following, the index k is enumerated starting from 0). Assume the kth sublabel (i.e., an equal super-position of permutations of integers 1 through k + 1) has been constructed on the k + 1 rightmost subregisters. Set the (k + 1)th qubit of the (k + 1)th subregister to 1, thus introducing the next integer value to be included in the permutation label state. Perform the exchange-type aswap gate [57,58], between the (k + 1)th qubits of subregisters k + 1 and k, with θ = arccos(1/ √ k + 2), φ = 0. This generates a superposition state consisting of two sets of terms: those in which the 1 remains in the (k + 1)th subregister and those in which it is transferred to the kth. In the latter case, the (k + 1)th subregister now contains all 0s, while the kth has two qubits with 1, which is not valid. This is fixed by applying controlled-swap gates between all qubits l < k + 1 in subregisters k and k + 1, controlled on the state of qubit k + 1 in subregister k. Taken together, these operations produce a partial swap between subregisters k + 1 and k, where |i j is the one-hot encoded state for i on subregister j, and m < k by construction. One repeats this partial swapping process, now between subregisters k and k − 1, then between k−1 and k−2, and so on, until the last register has been swapped. By implementing the inductive process up to the (M − 1)th subregister, the complete equally-weighted superposition of permutation labels is formed. As an example, for M = 3 the above algorithm generates the following sequence of state transformations: The explicit circuit for M = 3 is shown in Fig. 1, where the gates inside the red dashed rectangles are excluded at this point. Furthermore, gates acting on distinct qubits have been pushed to the left, thereby reducing the circuit depth. This produces a different sequence of intermediate states than above, but the final state is the same. A slight modification of the above procedure allows one to simultaneously apply the phases A P to the appropriate terms in the superposition. After each partial subregister swap, one applies a controlled phase gate with angle Θ(k i , k j ) + π, where i, j are the integer values that have been swapped, and where the additional phase π implements the signature of the permutation (red dashed rectangles in Fig. 1). Suppose, as in Eq. (7), that the value (k + 1) is swapped to the right. Then the (k + 1)th qubit of the right subregister can serve as the target qubit in the required controlled phase. Since the left subregister can store any value m < k + 1, a separate controlled phase is used for each possibility, where the control bits are given by the values m. This leads to a total of M 3 /3 − M 2 /2 + M/6 controlled phase gates for this part of the algorithm.
By the end of the construction, the phases A P have been applied to the corresponding permutation label state, having been successively built-up from the elementary transpositions of which they are composed. The explicit circuit for M = 3 is displayed in Fig. 1, where now the gates in the red dashed rectangles are included, in order to produce the phases A P . At this point in the construction, the total state of the physical system and the permutation label is where |· · · p is a state of the permutation label qubits and |· · · s is a state of the system qubits.
In the next step, one applies the position-dependent phase factors e ik P j xj to the relevant basis states on the system qubits (step 3 of Algorithm 1). To do this, we introduce an efficient method that acts on all the L M basis states in |D L,M simultaneously, yielding an enormous speedup over classical approaches. The technique, which we call the "faucet" method, is based on the observation that the positions x j take integer values x j = 0, . . . , L−1, so that the total phase e ik P j xj can be produced by x j repetitions of the phase e ik P j .
For this part of the algorithm, we use the M additional ancilla qubits comprising the faucet register. Each of these new qubits is initialized to |1 . In the outer loop of the faucet subroutine, one traverses the register of the system qubits site by site from x = 0 to x = L − 1. At each site, if it is occupied by a down spin (i.e. the bit is 1), one turns off the next faucet ancilla qubit , |1 → |0 . This is achieved through a sequence of multi-controlled X gates, which are controlled on the previous ancilla being in the state |0 and the next one being in the state |1 , along with the additional control that the current system site is a |1 . Since the meaning of the "next faucet ancilla" at a given site depends on the bitstring, one must generically apply a multi-controlled X gate for each ancilla at every step. We note that one can decrease the number of gates for sites near the edges of the chain. For instance, at site 2 at most 2 faucets could have been turned off, so that the later ones do not need to be checked. Phases e ik P j are applied for each system qubit while traversing the bitstring. When a '1' is encountered, the next faucet in the list is turned off, so that no more phases with the given kP j value are applied.
Next, the phases e ik P j (j = 1, . . . , M ) are applied to the system qubits, each being controlled on the state of one of the faucet ancillas. For the jth ancilla, this gate is also controlled on the state of the permutation label subregister j (since the value of k P j is permutationdependent). By the end of the system bitstring, all the faucet register qubits are in state |0 . Thus, the subroutine can be compared to a set of M running faucets (that correspond to applying the phases e ik P j ), which are turned off at the appropriate times (upon encountering a '1' in the traversal of the system register). This analogy is illustrated in Fig. 2. In total, this step requires M 2 L doubly-controlled phase gates and a number of Toffoli gates that scales like ∼ M L (as mentioned above, some gates can be skipped near the edge of the system). In our numerical implementation of the method, we introduce additional work qubits to facilitate the construction of these gates using chains of Toffolis [59].
After the relevant phases have been applied, it remains to disentangle the system from the permutation label (the entanglement having been generated during the faucet method, since the phases there are permutation dependent). This is accomplished by applying the inverse of the circuit that generates the permutation label superposition, without the additional controlled-phase gates that were used to produce the A P phases. The fact that the phases A P and e ik P j xj have been applied to the initial Dicke state implies that the permutation label reversal will not completely disentangle the system qubits from the permutation label register. However, it turns out that the |00 . . . 0 component of the permutation state corresponds precisely with the occurrence of the target Bethe ansatz state on the system qubits. That is, the full state vector takes the form where |ψ B s is the normalized target Bethe ansatz state on the system qubits, |φ j is a junk state with (|00 . . . 0 00 . . . 0| p ⊗ 1 s )|φ j = 0. Thus, by measuring the permutation label qubits, the target Bethe ansatz state is successfully prepared on the outcome |00 . . . 0 . This result is essentially that obtained from LCU methods [52], though here we have the additional construction of A P during the label preparation step,which is not present in the standard LCU. To illustrate the full algorithm, the complete circuit to construct a BA eigenstate with L = 4, M = 2 is given in Appendix A. As discussed in our numerical simulations below, the success probability |α| 2 depends on the system parameters, and also varies between different eigenstates. Thus, in general one must repeat the procedure multiple times to obtain the correct state, which can then be used to calculate physical quantities or in other applications, as we discuss in Section VI. We also show in Section IV that amplitude amplification can be used to boost the success rate, thereby reducing the overall resource requirements.

III. NUMERICAL SIMULATIONS
To calculate the momenta {k i } defining the Bethe eigenstates, we have solved the Bethe equations iteratively using the approach presented in Ref. [48]. We then performed numerical simulations of Algorithm 1 using the IBM Qiskit library's state vector simulator [60]. These calculations verify the correctness of our algorithm, and reveal its success probabilities for the sufficiently small systems that can be studied on a classical computer. However, we can also explicitly construct the circuits that would need to be run for much larger instances, far beyond what is classically tractable. The corresponding circuit depths and gate counts in these cases indicate that our algorithm is feasible for near-term error-corrected quantum computers. We stress that our analysis does not rely on asymptotic resource scaling arguments, but rather provides exact gate counts, since the corresponding circuits are precisely known. Although we have not compiled our algorithm down to an errorcorrecting code such as the surface code, we estimate the required number of T gates below.
In Fig. 3(a) we present the numerically-calculated success probability of the algorithm for preparing selected eigenstates when L = 2M and M = 2, 3, 4, with J xy = 1, J z = −1/2. The interaction strength in this case corresponds to the critical regime of the ferromagnetic model. We note the eigenstates included in Fig. 3 are not meant to comprise the complete set of real-valued solutions, but rather are simply those for which we obtained numerical solutions of the Bethe equations by using the algorithm presented in Ref. [48]. Two general trends are apparent: a significant suppression of the success rate with increasing M , and a more moderate suppression as a function of the eigenstate energy, within each set of system parameters L, M . The worst-case probabilities are roughly consistent with 1/M !, although we have only limited values of M to support this (larger M being outside of our computational resources for classical simulation). This value is further supported by Fig. 3(b), which shows the success probabilities for various eigenstates when M = 3 and for different values of L, with the clear trend that increasing L tends to flatten the success probability across the spectrum. Although the low energy states enjoy less of an advantage over the higher energy ones in this case, the lowest probabilities are still around 1/M ! on average. However, the behavior of the success rate changes significantly depending on the value of J z . These effects are considered Appendix B.
These results suggest one can go to very large system sizes L, while still preserving a reasonably large success probability, if M is sufficiently small. We note that the case of small M is particularly interesting for physics applications in the ferromagnetic regime of the model. In this case, the all-up state |0 ⊗L is a ground state of the model, while small M states are low-lying excited states of interacting magnons (spin waves). The ability to study these states as a function of M , as enabled by our algorithm, would yield deeper insight into the development of strong correlations in these systems as the number of interacting quasiparticles grows. While the ferromagnetic regime is especially natural for our algorithm, we note that interesting physics in the paramagnetic and antiferromagnetic regimes can also be explored at small values of M . These correspond to highly-excited eigenstates, which are relevant, for instance, in the study of quantum thermalization [61,62].
For these applications (and others discussed below), it appears feasible to access values of L and M that would not be classically simulable (even by approximate methods), while maintaining relatively modest resource requirements for the algorithm. This is demonstrated in Fig. 4, which provides circuit depths [ Fig. 4(a)], Toffoli gate counts [ Fig. 4(b)], controlled phase gate counts [ Fig. 4(c)], and the number of qubits required [ Fig. 4(d)] for the Bethe state preparation algorithm. The linear scaling of all these metrics in L is immediately apparent. Apart from the asymptotic scaling behavior, the absolute values of the circuit depths and gate counts are seen to be very low, on the order of 10 3 -10 4 , even for large systems of L ∼ 100 sites. Furthermore, the total number of qubits required (∼ 10 2 ) is also quite reasonable for small error-corrected devices. In Fig. 5 we show the total gate and measurement counts for the case M = 5 as L is varied. This indicates that the controlled-phase, R y , and Toffoli gates are the most prevalent non-Clifford operations in the algorithm. To estimate the fault-tolerant resources needed, we therefore convert the counts for these gates into the corresponding numbers of T gates. Following Ref. [63], we assume a worst-case scenario for the number of T gates needed to realize an arbitrary z rotation to be given by 4 log 2 (1/ )+11, where is the rotation synthesis error [64]. Similarly, arbitrary y rotations can be performed by conjugation with Clifford operations. As in Ref. [16], we replace each Toffoli gate with two T gates [65]. For L = 100, M = 5 this leads to a T count of ∼ 6.2 × 10 5 for a single run of the state preparation algorithm, with = 10 −10 . Assuming a worst-case success probability of 1/M !, approximately 120 attempts would need to be performed on average to correctly generate the target eigenstate. This yields ∼ 7.4 × 10 7 T gates overall, which is comparable to the estimates for simulating the Hubbard model using the methods of Ref. [16]. We note that the estimates in that work involve optimizing an error budget between multiple sources (Trotterization, phase estimation, and rotation synthesis), and do not appear to include the cost of preparing a good initial state for the phase estimation routine. Furthermore, the above estimate for our algorithm assumes the seemingly worst-case scenario in the number of repetitions (∼ M !), whereas the results in Fig. 3 suggest that lower energy states require fewer repetitions in general. To reduce the number of repetitions required for our algorithm, we implement amplitude amplification in the following section.

IV. AMPLITUDE AMPLIFICATION
Amplitude amplification, a generalization of the wellknown Grover search algorithm, is a quantum subroutine which can boost the probability of a desired measurement outcome, leading in general to a square root improvement in the number of repetitions required for the success of a probabilistic algorithm [33]. For our problem, we use B to denote Algorithm 1 with the measurement step removed. Amplitude amplification defines an operator where S B changes the relative sign of the "good" states in the Hilbert space, while S 0 changes the relative sign of the vacuum state |00 . . . 0 . In the present case, the good states are the components of the Bethe ansatz state, which correspond to |00 . . . 0 p on the permutation label qubits. S B can therefore be implemented using a OR circuit on the permutation label, followed by Z on the work qubit that stores the result, after which the OR is uncomputed. In our numerical calculations of the success probability, we use the ancilla-free implementation of OR in Qiskit's standard circuit library. The ancillafree approach is used here to decrease the number of qubits needed for the simulation, allowing us to study larger system sizes. Below we examine an ancilla-based method for which the gate counts at large sizes are reduced. To produce S 0 we use the same approach, with the OR circuit extended to include the system qubits (we do not implement the −1 in Eq. (11), as it is an overall phase). We present numerical results for amplitude amplification in Fig. 6(a). This confirms the clear enhancement of the algorithm success probability using this method. Although only one round of amplification has been applied here, the protocol can be repeated in the standard way to further increase the success rate.
Applying this improvement to the resource estimate of the previous section, the M = 5 worst-case eigenstates should require on average √ 120 ≈ 11 repetitions of the algorithm B to achieve success, leading to an overall T count of ∼ 4.1 × 10 6 (neglecting the costs of S B and S 0 ).  To estimate the resources needed for amplitude amplification more precisely, we implement the multicontrolled OR operation using elementary gates and ancillas [59]. The Toffoli count, controlled-phase count, and number of qubits are shown in Fig. 6(b) for a single round of amplification when M = 5. Since the S 0 reflection depends on the state of both the system and the permutation label, the implementation requires an additional L + M 2 − M ancillas (we can re-use the M faucet ancillas for the amplification). The additional gates required are dominated by the cost of executing B three times. Although the total number of qubits is approximately doubled in this approach, we note that it may be possible to reduce this number by acting with the OR operation on only a subset of the qubits that are nominally necessary to identify the state. For example, although S B is an OR circuit acting on the M 2 permutation label qubits, in practice the computational basis states appearing in the junk state |φ j can be distinguished from |00 . . . 0 p by only acting on a reduced number of qubits. While the particular subset of qubits needed will depend on the state under consideration, it is possible to confirm the success of this approach by measuring the energy or other quantities that can be compared to exact analytic expressions.
We have also implemented a different version of amplitude amplification, which is a modified form of the oblivious amplitude amplification of Ref. [52]. Unfortunately, this method leads to a reduced fidelity of the actually prepared state with the exact target state, though in some cases the overlap remains quite high (> 0.99). We attribute this reduced fidelity to the non-unitarity of summing exponentials with unit modulus, since |e ia +e ib | = 1 in general. We note, however, that nearly deterministic success of the oblivious amplitude amplification procedure was obtained for the application of Ref. [52] (simulation of Hamiltonian dynamics with Taylor series expansions). For this reason, it is less clear how the approach will fare for the Bethe state preparation problem at larger values of M . In addition, further modification to the method of Ref. [52] may ameliorate some of the difficulties with applying it to Bethe state preparation in its present form.

V. COMPARISON WITH ALTERNATIVE ALGORITHMS
To highlight the advantages of Algorithm 1, we compare it with conceptually simpler, but much less efficient, methods of preparing Bethe ansatz states on a quantum computer. First, one could imagine applying controlled phase gates directly to each term in the Dicke state superposition to generate the desired eigenstate. This approach still requires permutation label ancillas to generate the linear combination of phases needed in Eq. (2). However, it has the seeming advantage of allowing one to combine the phases A P and e ik P j xj into a single controlled-phase rotation, whereas the former term required order M 3 and the latter one order M 2 controlledphases to implement using Algorithm 1. Nevertheless, it is clear that this benefit is vastly outweighed by the large number of terms in the superposition that must be separately addressed, M ! L M . For the case L = 100, M = 5 this amounts to ∼ 9.0 × 10 9 controlled phases, compared to the 2530 of Algorithm 1.
A more promising approach is to use the "faucet" method of Algorithm 1 to handle the e ik P j xj phases while still applying the full A P in a single multi-controlledphase gate, rather than decomposing it into its elementary transpositions. This replaces the M ! L M dependence above with M !LM . While the scaling is still inferior to that of Algorithm 1 for large M , it is conceivable that for small M this alternative method may be competitive. In particular, one can replace the complicated permutation label of Algorithm 1, which required M 2 qubits to construct A P in terms of individual transpositions, with a compressed label that simply assigns a num- ber to each permutation. This approach uses significantly fewer qubits, at the expense of requiring more controls for the relevant phase gates. Since the permutation label construction still needs to be reversed to disentangle the system from the ancillas, it is important that it can still be executed in a unitary fashion. This in turn requires an efficient method for generating an equal superposition of M ! states. We implement such states using the prime factorization M ! = 2 n2 3 n3 · · · . We then construct the permutation label as the tensor product of the binary representation of 2 n2 (using n 2 qubits) and W n states for the odd prime factors. The equal superposition for the binary part of the label is easily generated by applying H gates to the relevant qubits, while various efficient algorithms exist for W n state preparation [66][67][68]. We implemented this algorithm numerically and verified that it successfully prepares Bethe ansatz eigenstates. Since the fundamental approach for creating the linear combi- nation of phases is the same between this method and Algorithm 1, their success probabilities are equal. Unfortunately, explicit construction of the circuits for the alternative method indicates that the resource requirements are significantly higher, even for small M . This is shown in Fig. 7, which reveals that the number of controlled phase and Toffoli gates required for the alternative method vastly exceeds those of Algorithm 1, even for M = 5.
It is also useful to compare Algorithm 1 with other more standard techniques, such as adiabatic state preparation [2, 69,70]. As is well-known, the evolution time to produce a large overlap with the desired eigenstate is expected to scale as the inverse square of the gap between the energy of the given state and that of the next closest one. This time can be short, for instance, in the ordered regime of the XXZ chain, where the gap between the ground and first excited state remains finite in the large system limit. However, in the critical regime of the model the spectrum is gapless, and so the evolution time is expected to diverge. Furthermore, even in the ordered regime the spectrum at higher energies above the ground state generically has dense regions, preventing an efficient preparation of those eigenstates by the adiabatic approach. Our algorithm does not suffer from such complications, as it implements the exact analytic expression for the wave function, irrespective of the gap between the target eigenstate and the other ones.

VI. DISCUSSION
To achieve quantum advantage for a physically relevant problem, it should be the case that no classical method could deliver results of a comparable accuracy. Since the present quantum algorithm exactly prepares eigenstates of the XXZ chain, it is reasonable to compare it to the numerical exact diagonalization of finite-size systems (i.e. using classical computers). In a recent study, a matrix-free approach was used to investigate Heisenberg spin chains up to length L = 26 [71]. This work had vastly reduced the memory requirements compared to conventional methods, though the scaling remained exponential with system size. Specifically the S z = 0 subspace (M = 13) was considered, for which the dimension is ∼ 10 7 . In contrast, the L = 100, M = 5 subspace is roughly seven times larger (dimension ∼ 7.5 × 10 7 ). Although state vectors of this size can still be stored in memory, we note that the computation time is also exponential in the system size, ultimately limiting the practicality of exact diagonalization.
In addition to numerically exact calculations, approximate tensor network methods have been highly successful for studying one-dimensional quantum many-body systems with the matrix product state (MPS) ansatz [72,73]. However, these approaches are best-suited for states with a relatively low amount of entanglement, such as gapped ground states obeying an area law for the entanglement entropy. This makes simulation of long-time dynamics challenging, due to the growth of entanglement from, for instance, an initial product state. Our Bethe ansatz algorithm can prepare eigenstates throughout the spectrum at the same computational cost, including highly excited states whose entanglement entropy grows more quickly than logarithmically, even in the small M limit [74]. This yields an advantage over MPS methods for large systems when targeting these strongly entangled states. An explicit link between the Bethe ansatz and MPS was developed in Refs. [75,76], which used the algebraic Bethe ansatz to produce exact tensor network representations for generic eigenstates. These networks have a PEPS-like structure (but with fewer physical indices), which underscores the computational intractability of such states for large systems.
In addition to computing arbitrary-range and higherorder correlation functions that are inaccessible with traditional Bethe ansatz methods, our algorithm has a number of other applications. Simulation of the real-time dynamics of many-body systems is widely recognized as a task allowing for quantum advantage. Such simulations often take the form of quench experiments, for which the system is initialized in an easy-to-prepare product state, then allowed to evolve under the influence of an interacting many-body Hamiltonian. Our algorithm would enable interesting variations on this approach, for instance by initializing the system in an eigenstate of a given value of the interaction strength, then subsequently evolving it with a different value. The evolution here can be per-formed using any of the known algorithms for quantum simulation, whether by Trotterization [16,77,78], Taylor expansions [52,79], or other approaches [51,80].
In a different direction, one could use our algorithm as a starting point to explore integrability-breaking perturbations. Thus, we consider a Hamiltonian of the form H = H 0 + H p , where H 0 is solvable by the Bethe ansatz and H p includes perturbations that break the integrability of the total Hamiltonian H (such as disorder in the coupling strengths). In this case, the Bethe state prepration algorithm is used to prepare an eigenstate of H 0 , which then serves as an initial state for quantum annealing or phase estimation on H. For sufficiently weak perturbations, the overlap of this state with the corresponding exact eigenstate of H should be much greater than that of a mean-field or non-interacting trial state.
Although we have focused on deploying our algorithm on small error-corrected quantum computers, one may also consider implementing it on present-day or nearterm NISQ devices. Many of the controlled-phase gates in our algorithm involve repetitions of the same basic rotation angles, of which there are only (M 2 + M )/2 distinct values. This suggests replacing the exact values with variational parameters, similarly to Ref. [40]. The resulting variational form can then be optimized under the cost function |E − E B |, where E is the energy calculated on the quantum computer and E B is the exact value, known analytically from the Bethe ansatz solution. We note that the present optimization problem should be significantly easier than that of a standard VQE, since the ideal values of the rotation angles can serve as a good initial guess. Updates to the parameters then serve to directly mitigate systematic errors due to over-or under-rotation in the controlled-phase gates.

VII. CONCLUSIONS
We have presented a quantum algorithm for the efficient preparation of Bethe ansatz eigenstates of the XXZ model. To our knowledge, this is the first quantum algorithm for the direct preparation of eigenstates of an interacting many-body problem. The circuit depth and gate counts of the algorithm scale linearly in the system size, for a fixed number of down spins. Our algorithm is feasible to perform on small error-corrected devices of order 100 qubits, provided that the number of down spins is small. The usefulness of the approach can be extended through amplitude amplification. In particular, quantum advantage over classical computational methods appears to be achievable, with resource estimates that are comparable to the most efficient known quantum simulation algorithms. Our work suggests directions for future research, including the modification of the algorithm for the case of complex-valued {k i }, and its generalization to other Bethe ansatz-solvable models, such as the onedimensional Hubbard model.

ACKNOWLEDGMENTS
Numerical simulations were performed with IBM Qiskit and the QuSpin Python library [81]. We would like to thank Chandra Sekhar Mukherjee, Jesko Sirker, and Rafael Nepomechie for helpful discussions.