Layering and subpool exploration for adaptive Variational Quantum Eigensolvers: Reducing circuit depth, runtime, and susceptibility to noise

Adaptive variational quantum eigensolvers (ADAPT-VQEs) are promising candidates for simulations of strongly correlated systems on near-term quantum hardware. To further improve the noise resilience of these algorithms, recent efforts have been directed towards compactifying, or layering, their ansatz circuits. Here, we broaden the understanding of the algorithmic layering process in three ways. First, we investigate the non-commutation relations between the different elements that are used to build ADAPT-VQE ans\"atze. Doing so, we develop a framework for studying and developing layering algorithms, which produce shallower circuits. Second, based on this framework, we develop a new subroutine that can reduce the number of quantum-processor calls by optimizing the selection procedure with which a variational quantum algorithm appends ansatz elements. Third, we provide a thorough numerical investigation of the noise-resilience improvement available via layering the circuits of ADAPT-VQE algorithms. We find that layering leads to an improved noise resilience with respect to amplitude-damping and dephasing noise, which, in general, affect idling and non-idling qubits alike. With respect to depolarizing noise, which tends to affect only actively manipulated qubits, we observe no advantage of layering.


I. INTRODUCTION
Quantum chemistry simulations of strongly correlated systems are challenging for classical computers [1].While approximate methods often lack accuracy [1][2][3][4][5], exact methods become infeasible when the system sizes exceed more than 34 spin orbitals-the largest system for which a full configuration interaction (FCI) calculation has been conducted [5].For this reason, simulations of many advanced chemical systems, such as enzyme active sites and surface catalysts, rely on knowledge-intense, domain-specific approximations [6].Therefore, developing general chemistry simulation methods for quantum computers could prove valuable.
Variational quantum eigensolvers (VQEs) [1,[7][8][9][10][11][12][13][14][15][16] are a class of quantum-classical methods intended to perform chemistry simulations on near-term quantum hardware.More specifically, VQEs calculate upper bounds to the ground state energy E 0 of a molecular Hamiltonian H using the Rayleigh-Ritz variational principle A quantum processor is used to apply a parametrized quantum circuit to an initial state.In the presence of noise the quantum circuit can, in general, be represented by the parameterized completely positive tracepreserving (CPTP) map Λ( ⃗ θ) and the initial state can be represented by the density matrix ρ 0 .We will use square brackets to enclose a state acted upon by a CPTP map.This generates a parametrized trial state ρ( ⃗ θ) ≡ Λ( ⃗ θ)[ρ 0 ] that is hard to represent on classical computers.The energy expectation value E( ⃗ θ) of ρ( ⃗ θ) gives a bound on E 0 , which can be accurately sampled using polynomially few measurements [1,8].A classical computer then varies ⃗ θ to minimize E( ⃗ θ) iteratively.Provided that the ansatz circuit is sufficiently expressive, E( ⃗ θ) converges to E 0 and returns the ground state energy.Initial implementations of VQEs on near-term hardware have been reported in [7,[17][18][19][20][21].Despite these encouraging results, several refinements are needed to alleviate trainability issues [22][23][24][25] and to make VQEs feasible for molecular simulations with larger numbers of orbitals.Moreover, recent results indicate that the noise resilience of VQE algorithms must be improved to enable useful simulations [14,25,26].
Adaptive VQEs (ADAPT-VQEs) [10] are promising VQE algorithms, which partially address the issues of trainability and noise resilience.They operate by improving the ansatz circuits in t max consecutive steps Λ t (θ t , . . ., θ 1 ) = A t (θ t ) • Λ t−1 (θ t−1 , . . ., θ 1 ), (2) starting from the identity map Λ 0 = id.Here, t = 1, ..., t max indexes the step and • denotes functional composition of the CPTP maps.An ansatz element A t (θ t ) is added to the ansatz circuit in each step.The ansatz element A t (θ t ) is chosen from an ansatz-element pool P by computing the energy gradient for each ansatz element and picking the ansatz element with the steepest gradient.Numerical evidence suggests that such ADAPT-VQEs are readily trainable and can minimize the energy landscape [27].In the original proposal of ADAPT-VQE, the ansatz-element pool was physically motivated, comprising single and double fermionic excitations.Since then, different types of ansatz-element pools have been proposed to minimize the number of CNOT gates in the ansatz circuit and thus improve the noise resilience of ADAPT-VQE [11][12][13]28].ADAPT-VQEs still face issues.Compared with other VQE algorithms, ADAPT-VQEs make more calls to quantum processors.This is because in every iteration, finding the ansatz element with the steepest energy gradient requires at least O (|P|) quantum processor calls.This makes more efficient pool-exploration strategies desirable.Moreover, noise poses serious restrictions on the maximum depth of useful VQE ansatz circuits [14].This makes shallower ansatz circuits desirable.A recent algorithm called TETRIS-ADAPT-VQE compresses VQE ansatz circuits into compact layers of ansatz elements [15].This yields shallower ansatz circuits.However, it has not yet been demonstrated that shallower ansatz circuits lead to improved noise resilience.It is, therefore, important to evaluate whether such shallow ansatz circuits boost the noise resilience of ADAPT-VQEs.
In this paper, we broaden the understanding of TETRIS-like layering algorithms.First, we show how non-commuting ansatz elements can be used to define a topology on the ansatz-element pool.Based on this topology, we present Subpool Exploration: a poolexploration strategy to reduce the number of quantumprocessor calls when searching for ansatz elements with large energy gradients.We then investigate several flavors of algorithms to layer and shorten ansatz circuits.Benchmarking these algorithms, we find that alternative layering strategies can yield equally shallow ansatz circuits as TETRIS-ADAPT-VQE.Finally, we investigate whether shallow VQE circuits are more noise resilient.We do this by benchmarking both standard and layered ADAPT-VQEs in the presence of noise.For amplitude damping and dephasing noise, which globally affect idling and non-idling qubits alike, we observe an increased noise resilience due to shallower ansatz circuits.On the other hand, we find that layering is unable to mitigate depolarizing noise, which acts locally on actively manipulated qubits.
The remainder of this paper is structured as follows: In Sec.II, we introduce notation and the ADAPT-VQE algorithm.In Sec.III and Sec.IV, subpool exploration and layering for ADAPT-VQE are described and benchmarked, respectively.We study the runtime advantage of layering in Sec.V.In Sec.VI, we investigate the effect of noise on layered VQE algorithms.Finally, we conclude in Sec.VII.

II. PRELIMINARIES: NOTATION AND THE ADAPT-VQE
In what follows, we consider second-quantized Hamiltonians on a finite set of N spin orbitals: h pq a † p a q + N p,q,r,s=1 h pqrs a † p a † q a r a s .
a † p and a p denote fermionic creation and annihilation operators of the pth spin-orbital, respectively.The coefficients h pq and h pqrs can be efficiently computed classically-we use the Psi4 package [29].
The Jordan-Wigner transformation [1] is used to represent creation and annihilation operators by respectively.Here, are the qubit creation and annihilation operators and X p , Y p , Z p denote Pauli operators acting on qubit p.The fermionic phase is represented by Anti-Hermitian operators T generate ansatz elements that form Stone's-encoded unitaries parametrized by one real parameter θ: A(θ)[ρ] := exp(θT )ρ exp(−θT ).(7) Different ADAPT-VQE algorithms choose T from different types of operator pools.There are three common types of operator pools.The fermionic pool P Fermi [10] contains fermionic single and double excitations generated by anti-Hermitian operators: T pq rs := a † p a † q a r a s − a † s a † r a q a p , where p, q, r, s = 1, ..., N .The QEB pool P QEB [11] contains single and double qubit excitations generated by anti-Hermitian operators: The qubit pool P qubit [13] contains parameterized unitaries generated by strings of Pauli-operators σ p ∈ {X p , Y p , Z p }: T pq := iσ p σ q , (12) T pqrs := iσ p σ q σ r σ s . ( Further definitions and discussions of all three pools are given in Appendix F. It is worth noting that all ansatz elements have quantum-circuit representations composed of multiple standard single-and two-qubit gates [28].All three pools contain O N 4 elements.ADAPT-VQEs optimize several objective functions.At iteration step t, the energy landscape is defined by A global optimizer may repeatedly evaluate E t and its partial derivatives at the end of the tth iteration to return a set of optimal parameters: These parameters set the upper energy bound of the tth iteration: A loss function L t : P → R is used to pick an ansatz element from the operator pool P at each iteration t: Throughout this paper, we use the standard gradient loss of ADAPT-VQEs, defined in Eq. (20).We denote the state after t − 1 iterations with optimized parameters θ * t−1 , . . ., θ * 1 by Further, we define the energy expectation after adding the ansatz element A ∈ P as Then, the loss is defined by We consider alternative loss functions in Appendix D.
The ADAPT-VQE starts by initializing a state ρ 0 .Often, ρ 0 is the Hartree-Fock state ρ HF .The algorithm then builds the ansatz circuit Λ t by first adding ansatz elements A t ∈ P of minimal loss L t , according to Eq. (17).Then, the algorithm optimizes the ansatz circuit parameters according to Eq. ( 15).This generates a series of upper bounds, until the improvement of consecutive bounds drops below a threshold ε such that Select ansatz element: At ← arg min A∈P Lt(A)

III. SUBPOOL EXPLORATION AND LAYERING FOR ADAPT-VQES
In this section, we present two subroutines to improve ADAPT-VQEs.The first subroutine optimally layers ansatz elements, as depicted in Fig. 1.We call the process of producing dense (right-hand side) ansatz circuits instead of sparse (left-hand side) ansatz circuits "layering".This subroutine can be used to construct shallower ansatz circuits, which may make ADAPT-VQEs more resilient to noise.The second subroutine is subpool exploration.It searches ansatz-element pools in successions of non-commuting ansatz elements.Subpool exploration is essential for layering and can reduce the number of calls an ADAPT-VQE makes to a quantum processor.Combining both subroutines results in algorithms similar to TETRIS-ADAPT-VQE [15].Our work focuses on developing and understanding layering algorithms from the perspective of non-commuting sets of ansatz elements.

Sparse Dense
Layer FIG. 1. Layering: A sparse ansatz circuit (left), as produced by standard ADAPT-VQEs, can be compressed to a dense structure (right) by layering.Boxes denote ansatz elements.Each line represents a single qubit.Note that ansatz circuit elements entangle two or four qubits.

III A. Commutativity and Support
Commutativity of ansatz elements is a central notion underlying our subroutines: Definition 1 (Operator Commutativity) Two ansatz elements A, B ∈ P are said to "operator commute" iff A (θ) and B (ϕ) commute for all θ and ϕ: Conversely, two ansatz elements A, B ∈ P do not operator-commute iff there exist parameters for which the corresponding operators do not commute: Definition 2 (Operator non-commuting set) Given an ansatz-element pool P and an ansatz element A ∈ P, we define its operator non-commuting set as follows Operator commutativity is central to layering.The operator non-commuting set is central to subpool exploration.Structurally similar and more-intuitive notions can be defined using qubit support: Definition 3 (Qubit support) Let B (H) denote the set of superoperators on a Hilbert space H. Let H Q := q∈Q H q denote the Hilbert space of a set of all qubits Q ≡ {1, . . ., N } where H q is the Hilbert space corresponding to the qth qubit.Consider a superoperator A ∈ B (H Q ).First, we define the superoperator subset that acts on a qubit subset W ⊆ Q as Then, we define the qubit support of a superoperator A as its minimal qubit subset W: The notion of support extends to parameterized ansatz elements: where B is a parameterized ansatz element.
Intuitively, the qubit support of an ansatz element A is the set of all qubits the operator A acts on nontrivially Fig. 2. The concept of qubit support allows one to define support commutativity of ansatz elements as follows.
Definition 4 (Support commutativity) Two ansatz elements A, B ∈ P are said to "support-commute" iff their qubit support is disjoint.
Conversely, two ansatz elements A, B ∈ P do not support-commute iff their supports overlap Definition 5 (Support non-commuting set) Given an ansatz-element pool P and an ansatz element A ∈ P, we define the set of ansatz elements with overlapping support as Operator commutativity and support commutativity are not equivalent-see Fig. 2.However, the following properties hold.Elements supported on disjoint qubit sets operator commute: Conversely, operator non-commuting ansatz elements act on, at least, one common qubit which implies they are support non-commuting: The last relation also implies that the operator noncommuting set of A is contained in its support noncommuting set We further generalize the notions of operator and support commutativity in Appendix J. Henceforth, we will use generalized commutativity to denote either operator or support commutativity or any other type of commutativity specified in Appendix J. Further, N G and {•, •} G will be used to denote the generalized non-commuting set and the generalized commutator, respectively.For later reference, we note that generalized noncommuting sets induce a topology on P via the following discrete metric.
With this metric, the generalized non-commuting elements N G (P, A) form a ball of distance one around each ansatz element A ∈ P. The metric is represented diagrammatically in Fig. 3.This allows us to identify an element A ∈ P as a local minimum if there is no element with lower loss within A's generalized non-commuting set.
is a local minimum on P with respect to L.
This property is important as we will later show that subpool exploration always returns local minima.
To gain intuition about the previously defined notions, we consider the ansatz elements of the QEB and the Pauli pools, Eqs.(10) to (13).The ansatz elements of these pools have qubit support on either two or four qubits, as is illustrated in Fig. 1.Commuting ansatz elements with disjoint support can be packed into an ansatz-element layer, which can be executed on the quantum processor in parallel.This is the core idea of layering, which helps to reduce the depths of ansatz circuits.Moreover, as generalized non-commuting ansatz elements must share at least one qubit, we conclude that the generalized noncommuting set N G (P, A) has at most O N 3 ansatz elements.This is a core component of subpool exploration.Analytic expressions for the cardinalities of the generalized non-commuting sets are given in Appendix H.In Appendix G, we prove that two different fermionic excitations operator commute iff they act on disjoint or equivalent sets of orbitals.The same is true for qubit excitations.Pauli excitations operator commute iff the generating Pauli strings differ in an even number of places within their mutual support.

III B. Subpool exploration
In this section, we introduce subpool exploration, a strategy to explore ansatz-element pools with fewer quantum-processor calls.Subpool exploration differs from the standard ADAPT-VQE as follows.Standard ADAPT-VQEs evaluate the loss of every ansatz element in the ansatz-element pool P in every iteration of ADAPT-VQE, (Algorithm 1, Line 4).This leads to O (|P|) quantum-processor calls to identify the ansatz element of minimal loss.Instead, subpool exploration evaluates the loss of a reduced number of ansatz elements by exploring a sequence of generalized non-commuting ansatz-element subpools.This can lead to a reduced number of quantum-processor calls and returns an ansatz element which is a local minimum of the pool P. The details of subpool exploration are as follows.
Algorithm:-Let P denote a given pool and L a given loss function.Instead of naïvely computing the loss of every ansatz element in P, our algorithm explores P iteratively by considering subpools, S m ⊊ P, in consecutive steps.During this process, the algorithm successively determines the ansatz element with minimal loss within subpool S m as Meanwhile, the corresponding loss value is stored: Iterations are halted when loss values stop decreasing.
The key point of subpool exploration is to update the subpools S m using the generalized non-commuting set generated by A m : where S ≤m := ∪ m l=0 S l .A pseudo-code summary of subpool exploration is given in Algorithm 2, and a visual summary is displayed in Fig. 4. We now discuss aspects of subpool exploration.Select ansatz element Am ← arg min A∈Sm L(A).
Efficiency:-Let m s denote the index of the final iteration and define the set of searched ansatz elements as Seeking local minimum 4. Subpool exploration halts when loss stops decreasing.
2. Find element with minimum loss in subpool.
3. Create subpool to include all elements that do not commute with element A 1 .
2. Find element with minimum loss in subpool.
3. Create subpool to include all elements that do not commute with element A 0 .
2. Find element with minimum loss in subpool.
1. Initialize a subpool.We note that this pool-exploration strategy ignores certain ansatz elements.In particular, it may miss the optimal ansatz element with minimal loss.Nevertheless, as explained in the following paragraphs, it will always return ansatz elements which are locally optimal.This ensures that the globally optimal ansatz element can always be added to the ansatz circuit later in the algorithm.
Optimality:-As the set of explored ansatz elements S is a subset of P, the ansatz element returned by subpool exploration may be sub-optimal to the ansatz element returned by exploring the whole pool That is, Yet, there are a couple of useful properties that pertain to the output A * S of subpool exploration.At first, the outputs of subpool exploration are local minima.
Proof.We prove this property by contradiction.Assume that there is an ansatz element L(B) < L(A * S ) such that {A * S , B} G ̸ = 0.This implies that B is in the generalized non-commuting set N G (P, A * S ) and exploring the corresponding subpool would have produced L(B) < L(A * S ) leading to the exploration of N G (P, B).This, in turn, can only return an ansatz element with a loss L(B) or smaller.This would contradict A * S having been the final output of the algorithm.Finally, we use Eq.(31) to show Eq.(44) Property 3 is useful as it ensures that subpool exploration can find better ansatz elements, which first were missed, in subsequent iterations.To see this, suppose a first run of subpool exploration returns a local minimum A ∈ P. Further, suppose there is another local minimum B ∈ P such that L(B) < L(A).Property 3 ensures that A and B generalized commute.Hence, by running subpool exploration repeatedly on the remaining pool, we are certain to discover the better local minimum eventually.Ultimately, this will allow for restoring the global minimum.
Initial subpool:-So far, we have not specified any strategy for choosing the initial set S 0 .This can, for example, be done by taking the subpool of a single random ansatz element A 0 ∈ P. Alternatively, one can compose S 0 of random ansatz elements enforcing an appropriate pool size, e.g., |S 0 | = O N 3 for QEB and qubit pools.
We will refer to the ADAPT-VQE with subpool exploration as the Explore-ADAPT-VQE.This algorithm is realized by replacing Line 4 in Algorithm 1 with subpool exploration, Algorithm 2, with L → L t .

III C. Layering
Below, we describe two methods for arranging generalized non-commuting ansatz elements into ansatz-element layers.
Definition 7 (Ansatz-element layer) Let A be a subset of P. We say that A is an ansatz-element layer iff We denote the operator corresponding to the action of the mutually generalized-commuting ansatz elements of an ansatz-element layer A with Here, ⃗ θ is the parameter vector for the layer: We note that for support commutativity, the product can be replaced by the tensor product.
Since ansatz-element layers depend on parameter vectors, the update rule is As before, the algorithm is initialized with Λ 0 = id and ⃗ ϑ = ().To make the dependence on the ansatz circuit explicit, we denote the energy landscape as The energy landscape of the tth iteration is denoted as and its optimal parameters are Further, the gradient loss (c.f.Eq. ( 20)) is where the definitions in Eqs. ( 20) and (52) satisfy the following relation With this notation in place, we proceed to describe two methods to construct ansatz-element layers.

III C 1. Static layering
Our algorithm starts by initializing an empty ansatzelement layer and the remaining pool P ′ to be the entire pool P. Further, the loss is set such that L ← L Λt−1 for the tth layer.The algorithm proceeds to fill the ansatz-element layer by successively running subpool exploration to pick an ansatz element A n in n = 0, . . ., n max iterations.This naturally induces an ordering on the layer.At every step of the iteration, the corresponding generalized non-commuting set N G (P ′ , A n ) is removed from the remaining pool P ′ .If the loss of the selected ansatz element A n is smaller than a predefined threshold L(A) < ℓ, it is added to the ansatz-element layer A. The layer is completed once the pool is exhausted (P ′ = ∅) or the maximal iteration count n max is reached.A pseudocode summary of static layering is given in Algorithm 3.

8:
Reduce pool P ′ ← P ′ \ NG(P ′ , A) In Static-ADAPT-VQE, static layering is used to grow an ansatz circuit iteratively.In each iteration, the layer is appended to the ansatz circuit, and the ansatz-circuit parameters are re-optimized.Iterations halt once the decrease in energy falls below ε, the energy accuracy per ansatz element.A summary of Static-ADAPT-VQE is given in Algorithm 4.

Optimize after appending each element if dynamic layering
Iteratively seeking local minima

FIG. 5. Visualization of layer construction and successive pool reduction. Gray areas indicate the removal of generalized non-commuting sets corresponding to ansatz elements
An added to the layer A. Parameters can either be optimized once the whole layer is fixed (static layering) or after adding each ansatz element (dynamic layering).
Property 4 Assume that all ansatz elements A, B ∈ P have distinct loss L(A) ̸ = L(B).Using support commutativity and provided that ℓ = 0 and n max are sufficiently large to ensure that the whole layer is filled, Static-ADAPT-VQE and TETRIS-ADAPT-VQE will produce identical ansatz-element layers.
This property is proven by induction.Assume that the previous iterations of ADAPT-VQE have yielded a specific ansatz circuit Λ t−1 ( ⃗ ϑ * ).The next layer of ansatz elements A t can be constructed either by Static-ADAPT-VQE or TETRIS-ADAPT-VQE.For both algorithms, the equivalence of Λ t−1 ( ⃗ ϑ * ) implies that the loss function, Eq. ( 53), of any ansatz element is identical throughout the construction of the layer A t .First, by picking ℓ = 0, we ensure that both TETRIS-ADAPT-VQE and Static-ADAPT-VQE only accept ansatz elements with a nonzero gradient.Next, we note that if an ansatz element is placed on a qubit by Static-ADAPT-VQE, then by Property 3, there exists no ansatz element that acts actively on this qubit and generates a lower loss.Moreover, there exists no ansatz element with identical loss that acts nontrivially on qubit, as we assume that all ansatz elements have a distinct loss.Similarly, TETRIS-ADAPT-VQE places ansatz elements from lowest to highest loss and ensures no two ansatz elements have mutual support.Thus, if an ansatz element is placed on a qubit by TETRIS-ADAPT-VQE, there exists no ansatz element with a lower loss that acts nontrivially on this qubit.Again, there also exists no ansatz element with identical loss supported by this qubit, as we assume that all ansatz elements have a distinct loss.Combining these arguments, both Static-and TETRIS-ADAPT-VQE will fill the ansatz-element layer A t with equivalent ansatz elements.The ansatz elements may be chosen in a different order.By induction, the equivalence of Λ t−1 and A t implies the equivalence of the ansatz circuit Λ t .

III C 2. Dynamic layering
In static layering, ansatz-circuit parameters are optimized after appending a whole layer with several ansatz elements.In dynamic layering, on the other hand, ansatz-circuit parameters are re-optimized every time an ansatz element is appended to a layer.The motivation for doing so is to simplify the optimization process.The price is having to run the global optimization more times.We now describe how to perform dynamic layering.
The starting point is a given ansatz circuit Λ, a set of optimal parameters ⃗ ϑ * [Eq.( 51)] and their corresponding energy bound E ≡ E Λ ( ⃗ ϑ * ).The remaining pool P ′ is initiated to be the entire pool P. Starting from an empty layer A ′ and a temporary ansatz circuit Λ ′ = Λ, a layer is constructed dynamically by iteratively adding ansatz elements A to A ′ and Λ ′ while simultaneously re-optimizing the ansatz-circuit parameters ⃗ ϑ * .Based on the loss L ′ induced by the currently optimal ansatz circuit Λ ′ ( ⃗ ϑ * ), subpool exploration is used to select ansatz elements A. Simultaneously, the pool of remaining ansatz elements P ′ is shrunk by the successive removal of the generalized non-commuting sets N G (P ′ , A).Finally, ansatz elements are only added to the layer A if their loss is below a threshold ℓ and the updated energy bound E ′ exceeds a gain threshold of ε.A pseudocode summary is given in Algorithm 5 Dynamic-ADAPT-VQE iteratively builds dynamic layers A t and appends those to the ansatz circuit Λ t−1 .The procedure is repeated until an empty layer is returned; that is, no ansatz element is found that reduces the energy by more than ε.Alternatively, the algorithm halts when the (user-specified) maximal iteration count t max is reached.A pseudocode summary is given in Algorithm 6.

IV. BENCHMARKING NOISELESS PERFORMANCE
In this section, we benchmark various aspects of subpool exploration and layering in noiseless settings.To this end, we use numerical state-vector simulations to study a wide variety of molecules summarized in Table I demonstrate the utility of subpool exploration in reducing quantum-processor calls.Further, we show that when compared to standard ADAPT-VQE, both Static-and Dynamic-ADAPT-VQE reduce the ansatz circuit depths to similar extents.All simulations use the QEB pool because it gives a higher resilience to noise than the fermionic pool and performs similarly to the qubit pool [14].Moreover, unless stated otherwise, we use support commutativity to ensure that Static-ADAPT-VQE produces ansatz circuits equivalent to TETRIS-ADAPT-VQE.

IV A. Efficiency of subpool exploration
We begin by illustrating the ability of subpool exploration to reduce the number of loss function calls when searching for a suitable ansatz element A to append to an ansatz circuit.To this end, we present Explore-ADAPT-VQE (ADAPT-VQE with subpool exploration) using the QEB pool, Eq. ( 10), and operator commutativity.We set the initial subpool, S 0 , such that it consists of a single ansatz element selected uniformly at random from the pool.To provide evidence of a reduction in the number of loss-function calls, we track the number of subpools searched, m s , to find a local minimum.The results are depicted in Fig. 6.There is a tendency to terminate subpool exploration after visiting two or three subpools.This should be compared with the maximum possible QEB-pool values of m s : N − 2 = 6, 10, 10, 12, 12 for H 4 , LiH, H 6 , BeH 2 , and H 2 O, respectively.Thus, Fig. 6 shows that subpool exploration reduces the number of loss-function calls in the cases tested.

IV B. Reducing ansatz-circuit depth
Next, we compare the ability of Static-(TETRIS)-and Dynamic-ADAPT-VQE to reduce the depth of the ansatz circuits as compared to standard and Explore-ADAPT-VQE.The data is depicted in Fig. 7. Here, we depict the energy error, given as the distance of the VQE predictions E t from the FCI ground state energy E F CI as a function of (left) the ansatz-circuit depths and (right) the number of ansatz-circuit parameters.The left column shows that layered ADAPT-VQEs achieve lower energy errors with shallower ansatz circuits.Meanwhile, the right column demonstrates that all ADAPT-VQEs achieve similar energy accuracy with respect to the number of ansatzcircuit parameters.

IV C. Reducing runtime
In this section, we provide numerical evidence that subpool exploration and layering reduce the runtime of ADAPT-VQE.A mathematical analysis of asymptotic runtimes will follow in Section V. To provide evidence of a runtime reduction in numerical simulations, we show that layered ADAPT-VQEs require fewer expectation value evaluations (and thus shots and quantum processor runtime) to reach a given accuracy.Our numerical results are depicted in Figs. 8 and 9 for expectation-value evaluations related to calculating losses and parameter optimizations, respectively.We now discuss our results.
To convert data accessible in numerical simulations (such as loss function and optimizer calls) into runtime data (such as expectation values and shots), we proceed as follows.For our numerical data, we evaluate runtime in terms of the number of expectation value evaluations rather than processor calls or shots.This is justified as   the number of shots (or processor calls) is directly proportional to the number of expectation values in our simulations, as detailed in Appendix B. Next, we evaluate the runtime requirements associated with loss-function evaluations by tracking the number of times a loss function is called.The evaluation of the loss function over a subpool S is recorded as |S| + 1 expectation-value evaluations, assuming the use of a finite difference rule.Thus we produce the data presented in Fig. 8. Finally, we evaluate the runtime requirements of the optimizer by tracking the number of energy expectation values or gradients it requests.The gradient of P variables is then   recorded as P + 1 energy expectation value evaluations, assuming the use of a finite difference rule.This gives the data in Fig. 9.
In Fig. 8, we show that layered ADAPT-VQEs require fewer loss-related expectation-value evaluations to reach a given energy accuracy.We attribute this advantage to subpools gradually shrinking during layer construction.They thus require fewer loss function evaluations per ansatz element added to the ansatz-element circuit.We further notice that Explore-ADAPT-VQE does not reduce the loss-related expectation values required for standard ADAPT-VQE.We attribute this result to our   examples' small pool sizes, with only 8 to 14 qubits.As qubit sizes increase, we expect a more noticeable advantage for Explore-ADAPT-VQE, as discussed in Sec.V.In Fig. 9 (left), we show that Static-ADAPT-VQE reduces the number of optimizer calls needed to reach a given accuracy.As expected, the left column shows that Static-ADAPT-VQE calls the optimizer O (N ) times less than any other algorithm.This is expected, as standard, Explore-, and Dynamic-ADAPT-VQE calls the optimizer each time a new ansatz element is added to the ansatz-element circuit.Meanwhile, Static-ADAPT-VQE calls the optimizer only after adding a whole layer of O (N ) ansatz elements to the ansatz-element circuit.In Fig. 9 (right), we analyze how the reduced number of optimizer calls translates to the number of optimizer-related expectation values required to reach a given accuracy.The data was obtained using a BFGS optimizer with a gradient norm tolerance of 10 −12 Ha and a relative step tolerance of zero.Compared to the optimizer calls on the left of the figure, we notice two trends.Dynamic-ADAPT-VQE, while being on par with standard and Explore-ADAPT-VQE for optimizer calls, tends to use a higher number of expectation value evaluations.Similarly, Static-ADAPT-VQE, while having a clear advantage over standard and Explore-ADAPT-VQE for optimizer calls, tends to have a reduced advantage (and for LiH, even a disadvantage) when it comes to optimizerrelated expectation value evaluations.These observations hint towards an increased optimization difficulty for layered ADAPT-VQEs.These observations may be highly optimizer dependant and should be further investigated in the future.

IV D. Additional bemchmarks
We close this section by referring the reader to additional benchmarking data presented in the appendices.In Appendix C, we compare support to operator commutativity for the qubit pool.In Appendix D, we compare the steepest-gradient loss to the largest-energy-reduction loss.We also compare the QEB pool to the qubit pool in Appendix D.

V. RUNTIME ANALYSIS
In this section, we analyze the asymptotic runtimes of standard, Explore-, Dynamic-, and Static-ADAPT-VQE.We find that under reasonable assumptions, Static-ADAPT-VQE can run up to O N 2 faster than standard ADAPT-VQE.In what follows, we quantify asymptotic runtimes using O (x), Ω (x), or Θ (x) to state that a quantity scales at most, at least, or exactly with x, respectively.For definitions, see Appendix A. We begin our runtime analysis by listing some observations, assumptions, and approximations.(g) For each energy expectation value in (e) and (f), we assume that a constant number of shots N S = Θ (1) is needed to reach a given accuracy.This is a standard assumption in VQE [1,8], and further details justifying this assumption are discussed in Appendix B.
(h) For each shot in (g), one must execute an ansatz circuit with p ansatz elements.Here, we assume that the runtime C(p) of an ansatz circuit with p ansatz elements is proportional to its depth d(p), i.e., C(p) = Θ (d(p)).
Combining (e,g,h) and (f,g,h), we can estimate the runtime each algorithm spends on evaluating losses and performing the optimization, respectively: Below we analyze further these runtime estimates for standard, Explore-, Dynamic-, and Static-ADAPT-VQE.
In standard ADAPT-VQE, we re-evaluate the loss of each ansatz element in every iteration.Thus, N L (p) = |P|.Moreover, the circuit depth d(p) is upper bounded by d(p) = O (p).In the best-case scenario, ADAPT-VQE may arrange ansatz elements into layers accidentally.(An effect more likely for large N .)This can compress the circuit depths down to d = Ω (p/N ).We summarize this range of possible circuit depths using the compact expression d(p) = Θ (pN −γ ), with γ ∈ [0, 1].In numerical simulations, we typically observe that γ ≈ 0, i.e., the depth of an ansatz circuit, is proportional to the number of ansatz elements.These expressions for N L (p) and d(p) allow us to estimate the runtime of standard ADAPT-VQE algorithms: Explore-ADAPT-VQE results in circuits of the same depths as ADAPT-VQE, i.e., d = Θ (pN −γ ).However, the use of subpool exploration in Explore-ADAPT-VQE may reduce the number of loss-function evaluations N L (p).As discussed in Section III B (paragraph on Efficiency), in the best case scenario, the number of loss function evaluations per iteration is lower bounded by N L (p) = Ω (|P|/N ).In the worst case scenario, subpool exploration may explore the whole pool of ansatz elements, such that N L (p) = O (|P|).Based on these relations, we can estimate the runtime of Explore-ADAPT-VQE: The analysis of Static-ADAPT-VQE's runtime is more straightforward with respect to the layer count t than to the parameter count p.Therefore, we revisit and modify our previous observations, assumptions, and approximations.Combining the updated (e,g,h) and (f,g,h), we find the loss-and optimization-related runtimes of Static-ADAPT-VQE, respectively: Since P = Θ (N ) t max implies t max = P Θ N −1 , we can simplify these runtime estimates: We summarize this section by listing the ratios of asymptotic runtimes for Explore-, Dynamic-, and Static-ADAPT-VQE divided by the asymptotic runtime of standard ADAPT-VQE in Table II.Here, we assume equal polynomial scaling (α A = α E = α D = α S ) of the optimization runtime for standard, Explore, Dynamic-, and Static-ADAPT-VQE.As expected from our numerical runtime analysis in Section IV C, for typical ADAPT-VQE circuit depth (where γ = 0), Static-ADAPT-VQE can provide the largest runtime reduction.This reduction is quadratic in the number of qubits: Θ N −2 .Further improvements to bounding the number of losses in Explore-and Dynamic-ADAPT-VQE are discussed in Appendix I.
The ratio of the runtimes of the listed algorithms to the runtime of standard ADAPT-VQE.N is the qubit number, and P is the number of parameters in the final ansatz circuit.See text for further explanation.

VI. NOISE
In this section, we explore the benefits of reducing ADAPT-VQEs' ansatz-circuit depths with respect to noise.Our main finding is that the use of layering to reduce ansatz-circuit depths mitigates global amplitudedamping and global dephasing noise, where idling and non-idling qubits are affected alike.However, reduced ansatz-circuit depths do not mitigate the effect of local depolarizing noise, which exclusively affects qubits operated on by noisy (two-qubit, CNOT) gates.The explanation for this, we show, is that the ansatz-circuit depth is a good predictor for the effect of global amplitude-damping and dephasing noise.On the other hand, we show that the errors induced by local depolarizing noise are approximately proportional, not to the depth, but to the number of (CNOT) gates.For this reason, a shallower ansatz circuit with the same number of noisy two-qubit gates will not reduce the sensitivity to depolarizing noise.

VI A. Noise models
Our noise models focus on superconducting architectures, where the native gates are arbitrary single-qubit rotations and two-qubit CZ or iSWAP gates [30].Further, we assume all-to-all connectivity.We tune our analysis towards the ibmq quito (IBM Quantum Falcon r4T) processor.For this processor, the quoted two-qubit gate times are for CNOT gates.Thus, we will take CNOT gates to be our native two-qubit gate.To this end, our simulations use one-and two-qubit gate-execution times of 35.5 ns and 295.1 ns, respectively. 1Similar native gates and execution times apply to silicon quantum processors [31].
In our simulations, we model amplitude damping of a single qubit by the standard amplitude-damping channel.(For detailed expressions of the amplitude-damping channel and the other noise channels we use, see Appendix K.1.)Its decay constant is determined by the inverse T 1 time: ω 1 = 1/T 1 .Similarly, we model dephasing of a single qubit by the standard dephasing channel. 1 These values were taken from the IBM Quantum services.
The T 1 and T * 2 times determine its phase-flip probability via the decay constant ω z = 2/T * 2 − 1/T 1 .Finally, we model depolarization of a single qubit by a symmetric depolarizing channel with depolarization strength p ∈ [0, 1], where p = 0 leaves a pure qubit pure and p = 1 brings it to the maximally mixed state.
In our simulations, we model the effects of amplitude damping, dephasing, and depolarizing noise on the ansatz circuits Λ t in a layer-by-layer approach.This is illustrated in Fig. 10.We decompose the ansatz circuit Λ t into l = 1, ..., L t layers of support-commuting ansatzelement layers {A l }: For amplitude damping and dephasing noise, each ansatz-element layer A o l is transpiled into columns of native gates that can be implemented in parallel (see Ref. [28] for more details).The native gate with the longest execution time of each native-gate column sets the column execution time.The sum of the column execution times then gives the execution time τ l of the ansatz-element layer A o l .After each ansatz-element layer A o l , amplitude damping is implemented by applying an amplitude-damping channel to every qubit r = 1, ..., N in an amplitude-damping layer.This results in an amplitude-damped ansatz circuit Λ t (ω 1 ).Similarly, after each ansatz-element layer A o l , dephasing is implemented by applying a dephasing channel to every qubit r = 1, ..., N in a dephasing layer.This results in a dephased ansatz circuit Λ t (ω z ).Finally, for depolarizing noise, we apply the whole ansatz-element layer and then a depolarizing channel to each qubit.The strength of a qubit's depolarizing channel is determined by the exact number of times that the qubit was the target in a CNOT gate in the preceding layer.This results in a depolarized ansatz circuit Λ t (p).For a visualization of our layer-by-layer-based noise model, see Fig. 10.For detailed mathematical expressions, see Appendix K.2.
We note that applying the noise channels after each ansatz-element layer could be refined by applying the noise channels after each gate in the ansatz-element layer.However, as shown in Ref. [14], such a gate-by-gate noise model, as opposed to our layer-by-layer based noise model, would increase computational costs and has limited effect on the results.In what follows, we collectively refer to amplitude-damped ansatz circuits Λ t (ω 1 ), dephased ansatz circuits Λ t (ω z ), and depolarized ansatz circuits Λ t (p) as Λ t (α).Here, α refers to the key noise parameters ω 1 , ω z , or p of each respective noise model.

VI B. Energy error and noise susceptibility
Going forward, we analyze the effect of noise on the energy error [c.f.Eq. ( 54)] Noiseless ansatz-element layer Noise Model Single Qubit Evolution A 1 (θ 1 ) Amplitude Damping Native-Gate Single Qubit Evolution A 2 (θ 2 ) Depolarising Dephasing Column FIG. 10. Circuit diagram visualizing the layer-by-layer noise model: on the left, a noiseless ansatz-element layer with two support-commuting ansatz elements is decomposed into columns of native gates.On the right, noise is added to the ansatzelement layer.For global amplitude damping (or dephasing) noise, the channel F (or C) is applied to each qubit.For local depolarizing noise, the channels D are applied to the target of the noisy (two-qubit, CNOT) gate.
∆ t (α) now depends, not only on the iteration step t, but also the noise parameter α, via the noise-dependent expectation value To analyze the energy error, we expand the methodology of Ref. [14].More specifically, we decompose the energy error into two contributions: The first term, ∆ t (0), is the energy error of the noiseless ansatz circuit.The second term, ∆ t (α) − ∆ t (0), is the energy error due to noise.Subsequently, we Taylor expand the energy error due to noise to first order: As depicted in Fig. 11, in the regime of small noise parameters α (where energy errors are below chemical accuracy), the linear approximation is an excellent predictor for the energy error.Conveniently, this allows us to summarize the effect of noise on the energy error through the noise susceptibility χ t , defined as In Appendix L, we calculate the noise susceptibility χ t of amplitude damping F, dephasing C, and depolarizing D noise: respectively.Here, N denotes the number of qubits; L t is the number of ansatz-element layers in the ansatz circuit Λ t ; N II is the number of noisy (two-qubit, CNOT) gates in the ansatz circuit Λ t ; and the dE's denote the average energy fluctuations, defined in Eqs.(L11) of Appendix L. As discussed further in Appendix L, the average energy fluctuations can be calculated from noiseless expectation values.This allows us to compute the noise susceptibility with faster state-vector simulations rather than computationally demanding density-matrix simulations.

VI C. Benchmarking layered circuits with noise
In this section, we compare the noise susceptibility of standard, Static-(TETRIS-), and Dynamic-ADAPT-VQE in the presence of noise.As before, we showcase these algorithms on a range of molecules (summarized in Table I) using the QEB pool with support commutativity.When performing our comparison, we grow the ansatz circuits Λ t and optimize its parameters in noiseless settings, as previously discussed in [14].We then compute the noise susceptibility of Λ t as described in the previous section.The results for amplitude damping, dephasing, and depolarizing noise are depicted in Fig. 12, Fig. 13, and Fig. 14, respectively.In all three figures, we plot the noise susceptibility as a function of (left) the noiseless energy accuracy ∆ t (0) or (right) the number of parameters.The rows of each plot depict different molecules in order of increasing spin orbitals from bottom to top: H 4 , H 6 , LiH, BeH 2 , and H 2 O.
Layering benefits:-From Fig. 12, it is evident that layering is successful in mitigating the effect of amplitudedamping noise.Here, we observe that the noise susceptibility of Static-and Dynamic-ADAPT-VQE is approximately half that of standard ADAPT-VQE.This is a clear indication that layering can reduce the effect of noise.In Fig. 13, we observe that layering also tends to reduce the noise susceptibility in the presence of dephasing noise.However, in this scenario, the advan- tage is less consistent across different ansatz circuits and molecules.Finally, in Fig. 14, we observe that for depolarizing noise, all algorithms tend to produce similar noise susceptibilities.Sometimes one shows an advantage over the other, and vice versa, depending on the ansatz circuit and molecule.Our simulations indicate no clear disadvantage of using layering in the presence of depolarizing noise.In summary, our numerical simulations suggest that layering is useful for mitigating global amplitude damping and dephasing noise.Moreover, layering seems to have neither a beneficial nor a detrimental effect in the presence of local depolarizing noise.In order to explain these findings, we further investigate the dependence of noise susceptibility on several circuit parameters 0.0 0.5 1.0 Gate-fidelity requirements:-We now use the noise susceptibility data in Fig. 12, Fig. 13, and Fig. 14 to estimate the fidelity requirements for operating ADAPT-VQEs.For this estimation, recall that quantum chemistry simulations of energy eigenvalues target an accuracy of 1.6 mHa.To achieve this chemical accuracy, we require the energy error due to noise to be smaller than ≈ 1 milli-Hartree: χ t α ≲ 1mHa.Applying this condition to amplitude damping (where α = 1/T 1 ), dephasing (where α ≈ 1/T * 2 ), and depolarizing noise (where α = p), we find a set of gate fidelity requirements: The data presented in Figs. 12, 13, and 14, suggests the following requirements for the gate operations to enable chemically accurate simulations: A more detailed breakdown of the maximal p and minimal T 1 and T * 2 times for each algorithm and molecule is presented in Fig. 15.These requirements are beyond the current state-of-the-art quantum processors [31,32].How much these requirements can be improved by errormitigation techniques [33] remains an open question for future research.

VI D. Noise-susceptibility scalings
In this section, we investigate the dependence of noise susceptibility on basic circuit parameters, such as the number of qubits N , circuit depth d ∝ L t , or the number of noisy (two-qubit, CNOT) gates N II .Our analysis will help in understanding why layering can mitigate global amplitude damping and dephasing noise but not local depolarizing noise.
We study numerically how noise susceptibility scales with circuit depth and the number of noisy (two-qubit, CNOT) gates N II .The data is presented Fig. 16.The top panels show the noise susceptibility in the presence of amplitude damping (left), dephasing (center), and depolarizing noise (right) for various algorithms and molecules.The noise-susceptibility data is presented on a log-log plot as a function of circuit depths d (left and center) as well as N II (right), respectively.From Fig. 16, we find that the noise susceptibility scales roughly linearly with the plotted parameters.To further analyze this rough linearity, we produce a log-log plot in the bottom panels of χ F t /d (left), χ C t /d (center), and χ D t /N II (right) as a function of d, d, and N II , respectively.Had the scalings of interest been linear, the bottom panels would have depicted constant curves.This is not entirely the case.But, the curves' deviations from constants are sufficiently sublinear to support our claim that the curves in the upper plots are roughly linear.
The scalings observed in Fig. 16 confirm our previous intuition.Based on Eq. (67), and using the assumption that dE is roughly constant, we would expect that the noise susceptibility in the presence of amplitude damping or dephasing noise is proportional to the circuit depth and the number of qubits: This claim is supported by Fig. 16.Moreover, previous studies [14] have found that the noise susceptibility scales linearly with the number of depolarizing two-qubit gates: Also, this claim is supported by Fig. 16.
Thus, for global (amplitude damping and dephasing) noise, which affects idling and non-idling qubits alike, our analysis indicates that circuit depth is a good predictor of noise susceptibility.On the other hand, for local (depolarizing) noise, which affects only the qubits which are nontrivially operated on, N II is a good predictor of the noise susceptibility.Consequently, we expect that compressing the depth of an ansatz circuit by layering can mitigate noise in the former, but not the latter, of these settings.

VII. SUMMARY AND CONCLUSION
In this paper, we introduced layering and subpoolexploration strategies for ADAPT-VQEs that reduced circuit depth, runtime, and susceptibility to noise.In noiseless numerical simulations, we demonstrate that layering reduces the depths of an ansatz circuit when compared to standard ADAPT-VQE.We further showed that our layering algorithms achieve circuits that are as shallow as TETRIS-ADAPT-VQE.The reduction in ansatz circuit depth is achieved without increasing the number of ansatz elements, circuit parameters, or CNOT gates in the ansatz circuit.The noiseless numerical simulations further provide evidence that layering and subpool-exploration can reduce the runtime of ADAPT-VQE by up to O N 2 , where N is the number of qubits in the simulation.Finally, we benchmarked the effect of reducing the depth of ADAPT-VQEs on the algorithms' noise susceptibility.For global noise models, which affect idling and non-idling qubits alike (such as our amplitude-damping and dephasing model), we show that the noise susceptibility is approximately proportional to the ansatz-circuit depth.For these noise models, reduced circuit depth due to layering is beneficial in reducing the noise susceptibility of ADAPT-VQEs.For local noise models, where only non-idling qubits are affected by noise (as with our depolarizing noise model), we show that the noise susceptibility is approximately proportional to the number of noisy (two-qubit, CNOT) gates.For these noise models, layering strategies are neither useful nor harmful, as they hardly change the CNOT count of ADAPT-VQEs.We finish our paper by stating three conclusions from our work.
To layer or not to layer?:-Depending on the dominant noise source of a quantum processor, layering may or may not lead to improved noise resilience.For processors where global noise dominates, we recommend layering.
Static or dynamic layering?:-Ourpaper considered static and dynamic layering.Which of the two should be used?Static layering optimizes each layer once, while dynamic layering optimizes the ansatz after adding each ansatz element.Both layering strategies lead to ansatz circuits of similar depths and require a similar number of parameters and CNOT gates to reach a certain energy accuracy.However, static layering calculates significantly fewer energy expectation values on the quantum processor.Therefore, we recommend static layering for the small molecules studied in this work.For larger molecules, dynamic layering could be preferable.
How useful is subpool exploration?:-Ourpaper introduced a new pool-exploration strategy, that reduces the number of loss-function evaluations and, thereby, the number of calls to the quantum processor.However, in the examples studied in this work, the number of loss-function evaluations was exceeded by the energyexpectation-value calls.Thus, subpool exploration had little impact on the algorithms.Again, this could change when larger molecules are studied.
Energy−FCI / Ha qubit Dynamic Non-Commutation qubit Dynamic Support FIG.18.Energy accuracy against ansatz-circuit depths (left) and the number of ansatz-circuit parameters (ansatz elements, right), for qubit-Dynamic-ADAPT-VQE using support and operator commutation-dashed and solid lines, respectively.Each row shows data for a specific molecule, with the number of orbitals increasing up the page.Energy accuracies better than chemical accuracy are shaded in cream.
the Hamiltonian H can be bounded using Chebyshev's inequality to give where s is the number of independent samples of the observable H and ⟨H⟩ is the sample mean estimator for the expectation value ⟨H⟩.However, our Hamiltonian H is represented by a sum of g Pauli strings, so the total number of shots is S = gs.Thus, we find the number of shots required is bounded: where we have picked some desired confidence q = P ⟨H⟩ − ⟨H⟩ ≥ ϵ .
Two factors contribute to variation in expectation value runtimes.Assuming equality in Eq. (B2) allows us to read off the first factor.That is, the runtime is directly proportional to which will generally vary throughout the algorithm through the dependence on ρ t .The evolutions of the variance Var H throughout the four algorithms benchmarked in Sec.IV B and IV C are presented in Fig. 17.We observe that the variance is well predicted by the energy accuracy independent of the algorithm (left column).This leads to the variance, like the energy accuracy, being well predicted by the number of ansatz parameters independent of the algorithm (right column).
The second factor is that each shot will have a runtime directly proportional to the ansatz-circuit depth-this assumes that initialization and readout are negligible.Thus, we treat each expectation value evaluation as an oracle.The cost of each oracle call is approximately the same for each algorithm for equal energy accuracies or equal numbers of ansatz parameters-providing the algorithms produce ansatz circuits with approximately equivalent depths.This is not true when comparing Static-and Dynamic-ADAPT-VQE to standard and Explore-ADAPT-VQE.That is the number of shots required per expectation value S (x (P )) (see Sec. V) as a function of the number of parameters that is approximately the same for the four algorithms considered.
Further, comparisons of total expectation value evaluations (see Sec. IV C) for a given energy convergence are a good proxy for directly comparing the runtimes providing the algorithms produce ansatz circuits with approximately equivalent ansatz-circuit depths.In the layered versus non-layered comparison, the layering-based algorithms will have a faster runtime given by the ratio of the depths-at best O (N ).
Definition 8 (Generalized Fermionic Pool) All the distinct fermionic excitations that act on q ∈ V distinct qubits are given by: where This is the fermionic pool over N qubits.
Definition 9 (Generalized QEB Pool) All the distinct qubit excitations that act on q ∈ V distinct qubits are given by: This is the QEB pool over N qubits.
The cardinality of both the fermionic and QEB pools is: where the second choice is the number of sets q distinct qubits of N , and the first is the number of permutations of these qubits that give distinct excitations.
Definition 10 (Generalized qubit Pool) The qubit pool is the set of all the Pauli excitations with an odd number of Y gates in the generator that act on q ∈ V distinct qubits.
The cardinality of the qubit pool is: where the combinatorics follow as for the fermionic and QEB pools, but the first choice is replaced with the factor 2 q−1 .
Lemma 1 For each P ∈ P Fermi , P QEB , P qubit , there exists a finite constant V , that will depend on the pool definition, such that |P (N )| is a logarithmically concave function of N for N ≥ V .
Proof.Consider the polynomial which has q integer roots: [0, q − 1].We note that the kth derivative can be expressed as Further, let Using this notation, we will consider the following products and Now note the first terms from each cancel in difference of these products: which is non-negative for x ≥ q.
Note f q (x) is logarithmically concave within some convex domain iff within the convex domain.Thus, f q (x) is logarithmically concave for x ≥ q.As the discrete function N q = 1 q! f q (N ) it must also be logarithmically concave for N ≥ q.However, |P (N )| is a linear combination with non-negative coefficients of such functions with q ∈ V. Therefore there will be a finite constant depending on these coefficients for which the function N max V dominates sufficiently that |P (N )| is logarithmically concave.
Representing the set of operators Q l := Q, Q † ⊗l in the computational basis is an injection Q l → {M ij ∈ M l : i ̸ = j} for l ≥ 1.Thus, we consider the following skew-Hermitian operator T = M ij + aM ji , which could represent a qubit excitation generator in the computational basis when a = −1.
Theorem 1 (Singleton Matrix Excitation Commutation) Consider the following two operators acting on a tripartite vector space: ) Proof.Condition 1 follows trivially, as operators with disjoint support always commute.Thus, henceforth, we will assume M kl , M γδ ∈ M l for l ≥ 1 (the compliment of Condition 1).Now consider the product Thus, we find the commutator using Property 2 of Definition 11: First, suppose the tensor factors in parentheses are non-zero and M ij ̸ = 1.Line G7 cannot cancel with Line G8 or G10 due to the first tensor factor: M ji ̸ = M ij .Further, Line G7 cannot cancel with Line G9 due to the bracketed tensor factor as γ ̸ = δ.Thus, T 1 and T 2 do not commute if M ij ̸ = 1 and the first condition is not met.By symmetry, the same argument can be applied if we suppose the tensor factors in parentheses are non-zero and M αβ ̸ = 1.Therefore, if the first condition is not met, then we require M ij = M αβ = 1 in order for T 1 and T 2 to commute-that is, we require equivalent support.
Suppose now that T 1 and T 2 do have equivalent support.We can simplify the commutator to: If two such generators have equivalent support, then either Condition 2 or 3 must hold.Thus, two-qubit excitation generators commute iff they have disjoint or equivalent support.

G.3. Fermionic Excitations
A Fermionic excitation generator generalizes to an operator of the form: Definition 13 (Generalized Fermionic Excitation) and x is a bit-string and ⃗ i is a tuple of unique orbital indices.
In the Jordan-Wigner encoding a i = Q i ⊗ Z i where Z i = i−1 j=1 Z j is a Pauli string of Z operators.Now we will define the commutation product and use it to express the generalized Fermionic excitation generators as the tensor product of a singleton matrix excitation generator and a Pauli string of Z operators: Note that Q and Q † anti-commute with Z (i.e.Q ( †) , Z = −1).
Lemma 2 (Generalized Fermionic Excitation Tensor Factorisation) Generalized Fermionic excitation generators can be expressed as follows: where T s is a singleton matrix excitation generator and Z ∈ {1, Z} ⊗m for some m ∈ N. Proof.
Now note that which can be combined with the factor of 1 or −1 from earlier to produce a factor ±1. Therefore we find: Finally, we can substitute this into the form of T : and note that the tensor factor in brackets is a singleton matrix excitation generator.
Moving forward with this form of generalized fermionic excitation generators allows us to use Theorem 1 to show the following commutation relations hold for generalized fermionic excitations: Theorem 2 (Generalized Fermionic Excitation Commutation) Two generalized fermionic excitations T 1 and T 2 commute iff any of the following conditions are satisfied: 1. T s1 and T s2 have disjoint support and SuppT s1 ∩ Supp Z2 + SuppT s2 ∩ Supp Z1 is even, 2. T s1 ∝ T s2 , 3. T s1 and T s2 have equivalent support and can be written as where r, s, t and u are all distinct.
Proof.Consider the product , where tensor products with the identity are implied (G35) where c 1 , c 2 ∈ {1, −1} and we have used the fact that Zm , T sn ∈ {1, −1} for all m, n ∈ {1, 2}.Therefore, we can write the commutator as The entries of T sn are drawn from the set {0, 1}.Thus, if T s1 and T s2 do not commute, then T s2 T s1 and [T s1 , T s2 ] are linearly independent, so for [T 1 , T 2 ] = 0 we require both both terms to vanish: Thus, Condition G43 requires we satisfy at least one of the three conditions from Theorem 1.Therefore, each condition in Theorem 2 corresponds to the condition with the same number in Theorem 1.However, Condition G42 requires us to strengthen the conditions in Theorem 2.
First, consider when T 2 T 1 = 0.The Zn factors will only produce phase factors and so T 2 T 1 = 0 ⇐⇒ T s2 T s1 = 0. Using the identities where we note this asymptotic scaling does not necessarily hold for m > m ′ max , but we no longer have any terms of this form.
However, in our proposed pool exploration strategy, the A m are drawn from a dependent distribution.We take A m as the ansatz element with the minimum loss in S ≤m .Assuming this dependent sequence is a better exploration strategy than independent random sampling then this strategy should drive the sequence to a local minima faster than the independent proposal strategy.Thus, one would expect the cumulative probability distribution of termination to shift to larger probabilities: where m is the random variable for the number of steps taken to find a local minimum using this dependent sequence.That is, m is the random variable for the number of steps taken such that the minimum ansatz element in S ≤ m−1 is a local minimum.We can write the mean of a monotonically increasing function f with respect to m as Indeed, M can depend on N , and so will have an asymptotic scaling M = Ω (1) , O (N ).Thus, the asymptotic scaling of the mean is as m s = O (N ).Following a similar method, one can show Further, inequalities (I26) and (I27) yield As these scalings are independent of ⃗ v, then conditioning our probabilities and means with some prior distribution for ⃗ v will leave the scalings invariant:

I.5. Statistical Analysis of Dynamic-ADAPT-VQE
Finally, when we additionally apply layering, we have the complication that the pool size reduces throughout the layer.Thus, the upper bound on the means becomes where P (⃗ q) is the frequency at which the algorithm had already placed dim ⃗ q gates in a layer each acting on q i qubits prior to a given iteration.P (⃗ q) will decay with dim ⃗ q due to early termination or use of larger ansatz elements.

FIG. 2 .
FIG. 2. A diagram comparing support and operator commutativity.(a) The elements both support and operator commute.Note that Ry(θ) only has qubit support on the top qubit line.(b) The elements operator commute but do not support commute.(c) The elements neither support nor operator commute.

Property 1 ( 2 FIG. 3 .
FIG.3.A diagram of the distance d from an element A ∈ P under the pool metric, Definition 6.The ansatz element A (black dot) is surrounded by ansatz elements of the noncommuting set NG(P, A) of distance 1 (white circle).All other elements P \ NG(P, A) have distance 2 (gray shaded region).
let L : P → R denote a loss function.Then, any element A ∈ P for which L (A) = min B∈NG(P,A)

FIG. 7 .
FIG. 7. The energy accuracy is plotted as a function of ansatzcircuit depth (left) and the number of ansatz-circuit parameters (ansatz elements, right), for QEB standard, Explore-, Static-(TETRIS)-, and Dynamic-ADAPT-VQE.These simulations use support commutation.Each row shows data for a specific molecule.The region of chemical accuracy is shaded.

FIG. 8 .
FIG.8.Energy accuracy against the number of loss function calls.Using the QEB pool and support commutation, we compare standard-, Explore-, Static-(TETRIS)-, and Dynamic-ADAPT-VQE.Each row shows data for a specific molecule, with the number of orbitals increasing up the page.Energy accuracies better than chemical accuracy are shaded in cream.

FIG. 9 .
FIG.9.Energy accuracy against the number of times the ansatz is optimized (left column); and the number of expectation values calculated during optimizer calls (right column).Using the QEB pool and support commutation, we compare standard-, Explore-, Static-(TETRIS)-, and Dynamic-ADAPT-VQE.Each row shows data for a specific molecule, with the number of orbitals increasing up the page.Energy accuracies better than chemical accuracy are shaded in cream.
(a) Each algorithm operates on N qubits.(b) Ansatz circuits are improved by successively adding ansatz elements with a single parameter to the ansatz circuit.This results in iterations p = 1, ..., P , where the pth ansatz circuit has p parameters.(c) In each iteration p, the algorithm spends runtime on evaluating N L (p) loss functions.(d) In each iteration p, the algorithm spends runtime on optimizing p circuit parameters.(e) Using the finite difference method, we approximate each of the N L (p) loss functions in (c) by using two energy-expectation values.This results in evaluating at most 2N L (p) energy expectation values on the quantum computer in the pth iteration.(f) We assume that the optimizer in (d) performs a heuristic optimization of ansatz circuits with p parameters in polynomial time.Thus, in the pth iteration, a quantum computer must conduct N O (p) = Θ (p α ) evaluations of the energy landscape and N O (p) = Θ (p α ) evaluations of energy expectation values.
) Dynamic-ADAPT-VQE has the same scaling of the number of loss function evaluations per iteration, N L (p), as Explore-ADAPT-VQE.Thus, N L (p) = Ω (|P|/N ) in the best case and N L (p) = O (|P|) in the worst case.The circuit depth of Dynamic-ADAPT-VQE scales as d(p) = Θ (p/N ).One can observe a clear benefit from layering.The upper bound, d(p) = O (p), in standard and Explore-ADAPT-VQE becomes d(p) = O (p/N ) in Dynamic-ADAPT-VQE.Using these relations for N L (p) and d(p), we can estimate the runtime of Dynamic-ADAPT-VQE: (a) Static-ADAPT-VQE operates on N qubits.(b) Static-ADAPT-VQE builds ansatz circuits in layers indexed by t = 1, ..

FIG. 11 .
FIG. 11.Energy error ∆t(α) for an ansatz circuit Λt of H4 as a function of noise strength α.Connected dots are calculated using full density-matrix simulations.Dashed lines show the corresponding extrapolation using noise susceptibility.

FIG. 12 .
FIG.12.Amplitude-damping noise susceptibility as a function of (left) energy accuracy and (right) the number of parameters in ansatz circuits Λt.QEB-ADAPT is compared to support-based Dynamic-ADAPT-VQE.Each row shows data for a specific molecule, with the number of orbitals increasing up the page.Energy accuracies better than chemical accuracy are shaded in cream.

FIG. 14 .
FIG.14.Same as Fig.12, but for depolarizing noise.In addition, black crosses correspond to density matrix simulations corroborating noise susceptibility via finite differences.The black crosses are discussed further in Appendix E.

FIG. 15 .
FIG. 15.The ratio of the minimal T1 and T * 2 times and maximal depolarizing probability p for Dynamic-and Static-ADAPT-VQE to standard ADAPT-VQE required to reach an accuracy of ∆ = 10 −7 mHa for each molecule.

FIG. 16 .
FIG. 16.Noise susceptibility for (a) amplitude-damping, (b) dephasing, and (c) depolarizing noise as a function of (a, b) ansatz-circuit depths d and (c) the number of noisy CNOT-gates NII .The top panels show noise susceptibility as a function of d (a, b) and NII (c).The bottom panels of (a) and (b) show noise susceptibility divided by d, as a function d.The bottom panel of (c) shows noise susceptibility divided by NII as a function of NII .The colored lines correspond to the simulation of Dynamic-ADAPT-VQE using support commutation for a range of molecules.The black lines represent simulations based on QEB-ADAPT-VQE.

FIG. 19 .
FIG.19.Energy accuracy against ansatz-circuit depths (left), the number of ansatz-circuit parameters (ansatz elements, center), and the number of CNOT gates in the ansatz circuit (right), for qubit-Dynamic-ADAPT-VQE using support commutation as well as QEB standard and Dynamic-ADAPT-VQE using support commutation with both steepest gradient and largest energy reduction decision rules, are compared.Each row shows data for a specific molecule, with the number of orbitals increasing up the page.Energy accuracies better than chemical accuracy are shaded in cream.

FIG. 20 .
FIG.20.Energy accuracy against the number of loss function calls (left), the number of times the ansatz is optimized (center), and the number of expectation values calculated during optimizer calls (right), for qubit-Dynamic-ADAPT-VQE using support and operator commutation, as well as QEB standard and Dynamic-ADAPT-VQE using support commutation with both the steepest gradient and largest energy reduction decision rules, are compared.Each row shows data for a specific molecule, with the number of orbitals increasing up the page.Energy accuracies better than chemical accuracy are shaded in cream.

2 . T 1 ∝ T 2 , 3 .
and γ ̸ = δ.T 1 and T 2 commute if and only if any of the following three conditions hold:1.T 1 and T 2 have disjoint support (i.e.M kl = M γδ = 1), T 1 and T 2 have equivalent support and k, l, γ and δ are all distinct.

TABLE I .
Table of molecular conformations and the corresponding number of spin-orbitals N used in numerical simulations.Get: Pool P, accuracy ε, max.loss ℓ, nmax 3: Initialize: pool P ′ ← P, layer A ′ ← ∅, ansatz circuit Λ ′ ← Λ 4: for n = 0, ..., nmax do . While BeH 2 and H 2 O are among the larger molecules to be benchmarked, H 4 and H 6 are prototypical examples of strongly correlated systems.Our simulations Histograms of the relative frequencies of the number of subpools searched for identifying a suitable ansatz element ms with Explore-ADAPT-VQE.The mean and uncertainty in the mean are indicated by solid and dashed lines, respectively.
., t max .The tth layer contains t t ′ =1 n tot (t ′ ) = Θ (N ) t circuit parameters.(e) Using the finite difference method, we approximate each of the N L (t) loss functions in (c) by using two sume that the runtime C(t) of an ansatz circuit with p(t) ansatz elements is proportional to its depth d(p(t)), i.e., C(t) = Θ (d(p(t))).Due to layering, the circuit depth of Static-ADAPT-VQE scales as d(p) = Θ (p/N ).(This scaling is identical for Dynamic-ADAPT-VQE.)This results in C(t) = Θ (p(t)/N ).Further, using p(t) = Θ (N ) t from (d), we find that each shot in (g) requires a circuit runtime of C(t) = Θ (t).