Randomized adaptive quantum state preparation

We develop an adaptive method for quantum state preparation that utilizes randomness as an essential component and that does not require classical optimization. Instead, a cost function is minimized to prepare a desired quantum state through an adaptively constructed quantum circuit, where each adaptive step is informed by feedback from gradient measurements in which the associated tangent space directions are randomized. We provide theoretical arguments and numerical evidence that convergence to the target state can be achieved for almost all initial states. We investigate different randomization procedures and develop lower bounds on the expected cost function change, which allows for drawing connections to barren plateaus and for assessing the applicability of the algorithm to large-scale problems.


I. INTRODUCTION
Methods for preparing quantum states are an integral component of any quantum technology.For example, the preparation of states that encode ground, excited, and thermal states of many-body systems is a key element in quantum simulation [1,2].Ground state preparation can also be leveraged to solve combinatorial optimization problems, with a variety of applications including in routing and scheduling [3].Although the task of ground state preparation is known to be hard, including for quantum computers, there is nonetheless significant interest in algorithms for quantum state preparation [4,5].In particular, the growing availability of noisy, intermediate-scale quantum [6] devices has inspired immense interest in variational methods for preparing desired quantum states [7][8][9][10].These variational quantum algorithms (VQAs) are heuristics that function by classically optimizing over a set of parameters that enter into a quantum circuit whose structure is typically fixed.The parameterized quantum circuit is executed on a quantum device and serves as an ansatz to minimize a cost function, J, whose global minimum is achieved for the desired target state.
Even in the absence of noise, a variety of challenges are present in VQAs.On the quantum device, for example, one must select a quantum circuit ansatz and associated initial state out of a formidably large design space.Meanwhile, the difficulty of the cost function minimization means that the challenges on the classical side can be even more significant [11], often involving the navigation of optimization landscapes that contain barren plateaus and suboptimal local minima.
To overcome these challenges, approaches have been proposed that utilize feedback from qubit measurements to adaptively construct a quantum circuit to minimize J.The first such algorithm was the Adaptive Derivative-Assembled Problem-Tailored Variational Quantum Eigensolver (ADAPT-VQE) [12,13], which was also adapted to combinatorial optimization problems dθ k at θ k = 0 of a cost function J with respect to a parameter θ k , e.g., a rotation angle, whose corresponding direction is randomized through conjugation with a random unitary transformation V k (brown).The parameter θ k is then updated in the negative direction of the gradient (blue arrow), which moves the system closer to the target state, and the quantum circuit U k is extended accordingly.[14].Instead of relying on a predefined ansatz, ADAPT-VQE grows it in a layer-wise manner tailored to the problem, in tandem with classical optimization over the quantum circuit parameters.
Other methods have considered defining the structure of the ansatz a priori and then performing layer-wise optimization to adaptively set the circuit parameter values [15][16][17][18].Adaptive methods that do not require any classical optimization have also been developed, including the feedback-based algorithm for quantum optimization [19][20][21] and methods based on Riemannian gradient flows [22,23].In methods where each adaptive step is efficiently implementable, e.g., [12][13][14][15][16][17][18][19][20], and the approximate method in [22], the circuit growth can get stuck when the gradient vanishes.This means that these methods can face similar issues as conventional VQAs that are prone to converge to suboptimal solutions.
Here, we propose randomized adaptive quantum state preparation as a generic, adaptive quantum algorithm that minimizes up-front design choices, does not require classical optimization, and allows for preparing target quantum states from arbitrary (random) initial states.As a consequence of the latter point, the algorithm, or its key subroutine depicted in Fig. 1, can be readily combined with other state preparation methods to improve convergence and state preparation fidelities.We provide theoretical arguments and numerical evidence that substantiate our claims about convergence, explore different methods for achieving randomization in practice, and develop lower bounds on the expected change in J at each adaptive step.These bounds give guarantees for how much the cost function value changes when randomization is used, thereby allowing us to relate the efficiency of this randomized approach to the existence of barren plateaus [24].We go on to discuss how this approach can be applied to cooling in open quantum systems and mixed state preparation in general.

II. RANDOMIZED ADAPTIVE QUANTUM ALGORITHMS
We consider minimizing cost functions of the form by creating states We consider a quantum circuit that is adaptively created according to where in each step k we move into the negative direction of the gradient of J with respect to θ k by setting θ k ← −γ dJ(0) dθ k , where dJ(0) Alternatively, in situations where J k can be estimated via repeated measurements of H p (e.g., for dθ k can be estimated via a finite difference approximation, by estimating J k for different perturbations of θ k .For sufficiently small learning rates γ, this ensures that J k+1 ≤ J k .References [12,14,19,20,22] consider adaptive procedures similar to Eq. (2), and in cases where the circuit growth gets stuck in suboptimal solutions, i.e., when the gradient vanishes, the utility of incorporating randomness into the circuit structure to escape these suboptima has been observed numerically in [22].
In this work, we utilize randomness to overcome challenges associated with convergence through the introduction of an intrinsically randomized framework for quantum state preparation in which the H k 's are selected at random.In the following, we provide theoretical arguments that this randomization enables convergence from almost all initial states to arbitrary target states.
We first note that the cost function gradient can be expressed as where ⟨•, •⟩ denotes the Hilbert-Schmidt inner product and gradJ [23,25,26].Both iH k and gradJ[U k ] belong to the special unitary algebra su(2 n ) consisting of all traceless and anti-Hermitian 2 n × 2 n matrices where n is the number of qubits.From this geometric perspective we can now deduce two different cases (i) and (ii) for when dJ(0) dθ k vanishes [27].In case (i), the gradient vanishes when iH k is orthogonal to gradJ We first discuss case (i).Typically, no assumptions can be made on whether gradJ[U k ] moves into a lower dimensional subspace of su(2 n ) when growing the circuit.As such, the situation that iH k has no overlap with gradJ[U k ] can occur, e.g., when iH k is an element of a subspace over which gradJ[U k ] has no support.To overcome this issue, we propose to create each H k at random.This can be achieved by conjugating a traceless Hermitian operator H by a Haar random unitary transformation V k in each adaptive step, i.e., such that Since H k is created uniformly randomly according to the Haar measure, the probability that iH k is orthogonal to gradJ[U k ] is zero.That is, for almost all iH k , but a set of measure zero, case (i) does not occur.While Haar random unitaries are not efficiently implementable, below we discuss the efficient implementation via approximate unitary 2-designs [28].
We now focus on case (ii).For cost functions of the form (1), the set of critical points where gradJ[U k ] vanishes consists of global optima and saddle points only [29,30].Under mild assumptions on the nature of the saddle points (strict saddles), relevant works from the classical machine learning and optimization literature have found that saddle points are avoided for almost all initial conditions [31][32][33].We thus expect that randomized adaptive quantum state preparation will almost surely converge to the ground state of H p for almost all initial states |ψ 0 ⟩.We remark that the convergence result cannot hold for all initial states, as we immediately see that for eigenstates of H p , gradJ[U 0 ] = 0.However, the situation that gradJ[U 0 ] = 0 can be avoided with probability one when the initial state is randomized too.
Each step k of randomized adaptive quantum state preparation can now be summarized as follows.First, the unitary transformation V † k e −iθ k H V k , whose generator H is randomized through conjugation with a random unitary V k , is applied to the state |ψ k ⟩.Second, the gradient, dJ(0) dθ k , is estimated, e.g., using the parameter shift rule [34][35][36][37][38]. Third, the parameter θ k is updated in the negative direction of the gradient, i.e., θ k ← −γ dJ(0) dθ k .

III. EFFICIENCY AND RELATION TO BARREN PLATEAUS
Randomized adaptive quantum state preparation is not expected to be efficient in general, as ground state preparation is QMA complete [39,40].Here, we show that for a given problem, the efficiency can be related to the scaling of the gradient (3) with the system size, and therefore, to the existence of barren plateaus [24], i.e., exponentially flat regions in the optimization landscape where the variance of the gradient vanishes exponentially in the number of qubits n.
In Appendix A, we show that if we select γ = 1/(4∥H p ∥ 2 ) and assume ∥H k ∥ 2 = 1, where ∥ • ∥ 2 denotes the spectral norm, then the cost function change If we assume that it takes M steps to create the ground state up to an error ϵ, i.e., J M = E min + ϵ, we find from (4) that M is upper bounded by, where the constant in the numerator is given by does not vanish faster than 1/poly(n), the ground state of H p can be prepared up to precision ϵ (in the corresponding eigenvalue) in polynomially many steps.We remark that the minimum in the denominator of Eq. ( 5) explicitly depends on M , and thereby on the random path taken.It is interesting to note that similar expressions are obtained in adiabatic state preparation [41], where the scaling of the adiabatic state preparation time T is determined by the smallest value of the spectral gap ∆(t), taken over all times, i.e., min t∈[0,T ] ∆(t) [42].We proceed by investigating the efficiency of different randomization strategies.At each step k, the expected cost function change the gradient, up to the prefactor in (4), assuming that the expectation vanishes.This suggests that sampling V k from unitary 2-designs suffices to obtain convergence to the ground state.This observation is further substantiated by the numerical simulations in Fig. 2, which considers the task of preparing the ground state of an Ising Hamiltonian.We specifically consider a model in which we map each spin to a vertex on a 3-regular graph, and couplings are present between spins whose corresponding vertices are spins.We map each spin to a vertex on a 3-regular graph with 8 vertices, and couple spins whose corresponding vertices are connected by an edge.Each data point corresponds to the average of the approximation ratio, α, taken over 100 different algorithm realizations and initial states.This is plotted as a function of the number of adaptive steps, M , on a logarithmic scale.In each step, the randomization of H k is implemented by conjugating the Pauli operator X1I2 • • • In with a Haar random unitary transformation (blue circles) and with a unitary transformation sampled from an approximate unitary 2-design (orange triangles), as depicted in Fig. 1, created using the sequence in [44] with ℓ = 1.The two curves are nearly superimposed.Inset shows the difference between the two curves, computed as |αHaar − α 2-design |, which never exceeds 4 × 10 −3 .
connected by an edge.This is equivalent to solving the combinatorial optimization problem MaxCut on an unweighted, 3-regular graph [43].We consider the approximation ratio α = J k /E min as our figure of merit.Fig. 2 compares results of randomized adaptive quantum state preparation when V k is sampled at random from the Haar measure, with results when V k is sampled from a unitary 2-design.The results that are obtained are nearly identical, with the difference shown in the inset.
The appearance of the variance in the bound (4) means that we additionally expect the efficiency and practical utility of this method to be closely related to the existence of barren plateaus.While barren plateaus pose a major challenge to the scalability of VQAs [24,45,46], examples have been found where the variance of the gradient does not vanish faster than 1/poly(n) [47].Leveraging these instances for efficient realizations of randomized adaptive quantum state preparation will be the subject of future studies.
We now consider lower bounds for E H k ∆J k to derive guarantees for how much J k can be improved when randomization is used.When H k is created through conjugation by a unitary V k sampled from a unitary 2-design, we show in Appendix B that E V k ∆J k is lower bounded by where Var is the variance of H p with respect to the state |ψ k ⟩.Since Var ψ k (H p ) is bounded from above by a constant that is independent of the system dimension, we see that the bound in (6) vanishes exponentially in n.
Another way of creating random H k 's is by sampling uniformly from an operator pool A whose size we denote by |A|.While in this case, situation (i) can occur when an operator is selected that is orthogonal to gradJ k [U k ], on average, we have that which suggests that situation (i) can be avoided on average for sufficiently large operator pools, e.g., when span{A} = su(2 n ).Note that in this case, convergence to the ground state can also be obtained by simply continuing to sample from A when (i) occurs.However, in this situation we expect an exponential runtime, as M scales as 1/|A|.For |A| = poly(n), on the other hand, convergence to the ground state is no longer guaranteed, as the adaptive procedure can get stuck in suboptima where the gradient vanishes, due to (i).Furthermore, while the situation that |A| = poly(n) implies polynomial scaling of the denominator of Eq. ( 7), it does not necessarily imply that randomized adaptive quantum state preparation would be efficient in this setting, as it does not imply polynomial scaling of the numerator.
In Fig. 3, we numerically investigate this tradeoff and study the convergence of randomized adaptive quantum state preparation for the problem of preparing the ground state of an Ising Hamiltonian with equal couplings present between all n spins, which is equivalent to solving the combinatorial optimization problem MaxCut on an unweighted, complete graph [43].In Fig. 3(a), we plot the approximation ratio, α, as a function of the number of adaptive steps M for the different randomization strategies described above.In Fig. 3(b), we consider fixed M = 35000 and investigate α as a function of the pool size |A|.The pool size is increased by adding successively heigher weight Pauli operators to A, until the full pool used in Fig. 3(a) containing all 2 2n − 1 terms is formed.We observe that the curves in Fig. 3(a) are almost identical, suggesting that the three different ranstrategies converge in the same manner to the ground state for which α = 1.Furthermore, the curves in Fig. 3(b) suggest that a full Pauli operator pool is not needed to obtain convergence to the ground state.However, the inset semilog plots in 3(a) and 3(b) do suggest an exponential scaling of the number of adaptive steps, M , and the number of operators in the pool, |A|, with respect to n.

(b) (a)
Figure 3. Performance of randomized adaptive quantum state preparation when Hp is an Ising Hamiltonian with all-to-all couplings between n spins.Each data point corresponds to the average taken over 100 different algorithm realizations and initial states.In (a), the approximation ratio, α, is plotted as a function of the number of adaptive steps, M , shown on a logarithmic scale.In each step, the randomization of H k is implemented by conjugating the Pauli operator X1I2 • • • In with a Haar random unitary transformation (right triangles) and with a unitary transformation sampled from an approximate unitary 2-design (left triangles), as depicted in Fig. 1, created using the sequence in [44] with ℓ = 1, and also by sampling H k uniformly from an operator pool A containing all 2 2n − 1 Pauli operators (circles).In (b), α is plotted as a function of the number of pool elements, |A|, shown on a logarithmic scale, for fixed M = 35, 000.Insets show semilog plots of the scaling with respect to n for both cases to achieve α > 0.99 in (a) and α > 0.9 in (b).

IV. MIXED STATES
Thus far, we have discussed the preparation of pure (ground) states from initial pure states.Here, we explore generalizations to preparing a target pure state from an initially mixed state, and vice versa.Since it is not possible to create pure quantum states from mixed quantum states in a closed quantum system through unitary transformations, we consider an extended, "dilated" space by coupling a set of system qubits, S, to a set of auxiliary qubits, A. We assume that the combined system is ini-tially in a separable state ρ SA 0 = ρ S 0 ⊗ρ A 0 , where we denote the initial states of S and A by ρ S 0 and ρ A 0 , respectively.We then consider growing a quantum circuit over the full composite system according to (2), in order to adaptively create the state ρ The state ρ S k at the k-th adaptive step is then obtained by tracing over the degrees of freedom of the auxiliary qubits in subsystem A, i.e., ρ S k = Tr A {ρ SA k }.In this setting, we now consider the task of preparing the system qubits in a target pure state |ψ T ⟩.The cost function (1) becomes J k = 1 − ⟨ψ T | ρ S k |ψ T ⟩, and an adaptive change is described by This yields the gradient where we have made use of the fact that the cost function above can be obtained from ( 1) by setting H p = 1 − |ψ T ⟩ ⟨ψ T | ⊗ 1 A where 1 A denotes the identity operator on subsystem A. Decreasing J k in each step can now be achieved by moving into the negative direction of the gradient given by Eq. ( 8).
If the auxiliary qubits are initially in a pure state, then due to the Stinespring dilation [48], there exists a unitary transformation over the composite system that allows for creating every state for the system qubits in subsystem S as long as subsystem A has dimension at most d 2 S , where d S is the dimension of subsystem S. For uniformly randomized H k , the gradient can only vanish (with probability 1) when gradJ[U k ] = [ρ SA k , |ψ T ⟩ ⟨ψ T | ⊗ 1 A ] = 0, i.e., at critical points that are given by saddle points and global optima, as in the closed system case [30].Thus, when the initial system state ρ S 0 does not commute with the target system state |ψ T ⟩ ⟨ψ T |, we expect to obtain convergence almost surely to the target state.Although the fully mixed state ρ S 0 = 1/d A trivially commutes with every target state, if we additionally apply a random unitary transformation to ρ SA 0 , we can almost surely obtain convergence to a generic pure state.Thus, randomized adaptive quantum state preparation allows for "cooling" the system from a state of infinite temperature, i.e., the fully mixed state, to a pure state of zero temperature by adaptively "dumping" entropy into the auxiliary system.

V. CONCLUSION
We have introduced an algorithm for preparing quantum states that has favorable convergence properties and is applicable to almost all initial states.Knowledge of the initial and target quantum states is not required.The algorithm leverages randomization as the primary innovation, and operates by minimizing a cost function through an adaptively constructed quantum circuit.Each adaptive update step is informed by gradient measurements in which the associated tangent space directions are randomized.We have presented lower bounds on the average improvement that is obtained in each step, and have numerically studied the behavior of the algorithm for different randomization methods.We additionally discussed a generalization to mixed states that could be leveraged for thermal state preparation, an application area which is currently receiving significant interest [49][50][51][52][53][54][55].
A tradeoff is that, on one hand, the consideration of a uniformly random tangent space direction in each step allows for convergence to the target state, while on the other hand, the gradients may become exponentially small in the system size, thereby causing the convergence time to diverge and making the algorithm impractical for large-scale problems.This should not be surprising, as selecting directions at random is far from being optimal.The largest gradient, and thereby the largest (guaranteed) cost function change, is obtained when iH k = gradJ[U k ].This situation corresponds to the full gradient flow, which is, in general, not efficiently implementable.It is an interesting question how gradient flows can be efficiently approximated [22] while maintaining convergence.To that end, the algorithm we present could be modified to project into random subspaces, rather than into a single random direction.
Furthermore, we emphasize that the algorithm possesses significant design flexibility that can be harnessed to increase its practicality, for example, by developing application-specific adaptations that tailor the randomness to the problem instance, e.g., by taking into account the symmetries of the target state [56][57][58][59], and by studying how much the rate of convergence can be improved by incorporating classical optimization in different ways [12].More fundamentally, we view the randomized adaptive circuit update, which can be applied to arbitrary input states and is depicted in Fig. 1, as a subroutine that could be incorporated into or appended onto other algorithms [41,60] in a flexible way, e.g., those utilizing problem-specific ansätze that aren't random, to improve convergence and state preparation fidelities.
We conclude by noting that we have focused this work on randomized adaptive quantum state preparation in the circuit model of quantum computing.However, randomness and adaptive constructions can also be created outside the circuit model, e.g. through random fields [61,62].Extensions to this latter setting could open up new approaches for state preparation in both closed and open quantum systems that can be driven by an applied field, including qubit systems, analog quantum simulators, molecules, and materials.The employee owns all right, title and interest in and to the article and is solely responsible for its contents.The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this article or allow others to do so, for United States Government purposes.The DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan https://www.energy.gov/downloads/doepublic-access-plan.This paper describes objective technical results and analysis.Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.C. A. and S. E. acknowledge support from the National Science Foundation (Grant No. 2231328).

Figure 1 .
Figure 1.Schematic representation of randomized adaptive quantum state preparation.Each adaptive step k involves first estimating the gradient dJ(0)dθ k at θ k = 0 of a cost function J with respect to a parameter θ k , e.g., a rotation angle, whose corresponding direction is randomized through conjugation with a random unitary transformation V k (brown).The parameter θ k is then updated in the negative direction of the gradient (blue arrow), which moves the system closer to the target state, and the quantum circuit U k is extended accordingly.

Figure 2 .
Figure2.Performance of randomized adaptive quantum state preparation when Hp is an Ising Hamiltonian with n = 8 spins.We map each spin to a vertex on a 3-regular graph with 8 vertices, and couple spins whose corresponding vertices are connected by an edge.Each data point corresponds to the average of the approximation ratio, α, taken over 100 different algorithm realizations and initial states.This is plotted as a function of the number of adaptive steps, M , on a logarithmic scale.In each step, the randomization of H k is implemented by conjugating the Pauli operator X1I2 • • • In with a Haar random unitary transformation (blue circles) and with a unitary transformation sampled from an approximate unitary 2-design (orange triangles), as depicted in Fig.1, created using the sequence in[44] with ℓ = 1.The two curves are nearly superimposed.Inset shows the difference between the two curves, computed as |αHaar − α 2-design |, which never exceeds 4 × 10 −3 .
search and Development Program under the Truman Fellowship.Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-NA0003525.This article has been authored by an employee of National Technology & Engineering Solutions of Sandia, LLC under Contract No. DE-NA0003525 with the U.S. Department of Energy (DOE).
The goal is to apply this circuit to a fixed initial state |ψ 0 ⟩ to achieve J k+1 ≤ J k in each adaptive step k.Here, H p is a Hermitian operator whose ground state |E min ⟩, with corresponding eigenvalue E min , is taken to be the target state.