Modular Parity Quantum Approximate Optimization

The parity transformation encodes spin models in the low-energy subspace of a larger Hilbert-space with constraints on a planar lattice. Applying the Quantum Approximate Optimization Algorithm (QAOA), the constraints can either be enforced explicitly, by energy penalties, or implicitly, by restricting the dynamics to the low-energy subspace via the driver Hamiltonian. While the explicit approach allows for parallelization with a system-size-independent circuit depth, the implicit approach shows better QAOA performance. Here we combine the two approaches in order to improve the QAOA performance while keeping the circuit parallelizable. In particular, we introduce a modular parallelization method that partitions the circuit into clusters of subcircuits with fixed maximal circuit depth, relevant for scaling up to large system sizes.

The recently introduced parity architecture [23,24] addresses this mismatch between the connectivity of the problem and hardware graphs by mapping problemdefining interactions onto single-body terms, while restricting the enlarged Hilbert space via quasilocal constraint terms. In particular, the parity architecture allows one to tackle generic optimization problems, i.e. problems with long-range and higher-order couplings, on a problem-independent and fixed qubit layout utilizing only quasilocal interactions.
In previous works [25,26], QAOA implementations for the parity architecture have been proposed where the constraints are explicitly enforced through an energy penalty. However, it has been shown that preserving the constraint conditions can be achieved implicitly by making the involved operators commute with the constraint operators [27][28][29]. While the former enables a parallelizable implementation with low circuit depth in the QAOA, the latter can lead to significantly improved success probabilities.
In this work we suggest a hybrid approach, which keeps the required circuit depth constant while reducing the number of constraints to be enforced explicitly and thus Problem graph to be implemented, with a subgraph highlighted in blue. (b) Implementation layout of the parity encoded problem. Blue dots represent parity qubits, gray (yellow) squares and triangles represent implicitly preserved (explicitly enforced) three-and four-body constraints. Explicitly enforced constraints are used to divide the layout into modules. (c) One module of the layout, corresponding to the highlighted subgraph in (a), with terms of the driver Hamiltonian illustrated by red and blue lines. The modularization leads to a highly parallelizable circuit implementation of the required driver terms. The green highlighting illustrates the correspondence of interactions in (a) to parity qubits in (b) and (c).
improving performance. We do this by partitioning the constraints into a set that is enforced explicitly and a set where the constraints are preserved implicitly by adapting the driver Hamiltonians. Fig. 1c shows an example layout of encoded qubits and constraints for a problem described by the subgraph highlighted in blue in Fig. 1a. In this layout a single constraint is enforced explicitly (yellow) while the others are implicitly preserved by the driver, which acts on qubits in each of the shown lines simultaneously. By choosing which constraints are in which set, we can divide bigger layouts into smaller modules (see Fig. 1b), enabling a parallel implementation of all required unitaries with an adjustable maximal circuit depth.

II. PARITY QAOA
Finding solutions to generic combinatorial optimization problems can be formulated as energy minimization of general (classical) N -spin Hamiltonians of the form where s i = ±1 denote spin variables and the coefficients {J i , J ij , J ijk , . . . } describe long-range and potentially higher-order interactions between spins. We denote the number of non-zero coefficients by K and the number of spin-flip symmetries in the Hamiltonian by n s . Note that we do not consider problems with side-conditions in this work.
On state-of-the-art hardware platforms, the available inter-qubit couplings are typically two-body and limited in distance. Therefore, interactions as occurring in Eq. (1) can be challenging to implement directly. Instead, we utilize the parity architecture [23,24] that allows one to encode arbitrary k-body terms on a square lattice requiring only nearest-neighbor interactions. This involves mapping the product of k problem spins s i onto a single, physical parity qubit (denoted byσ z ), e.g. J ijk s i s j s k → J mσ (m) z , where we label each parity qubit with the corresponding k-tuple m of problem spin indices. As a result, the K problem-defining interaction terms of Eq. (1) are represented by local fields of strength J m acting on K ≥ N parity qubits. This gives rise to a physical Hamiltonian of the formĤ phys =Ĥ Z +Ĥ C , wherê encodes the combinatorial optimization problem andĤ C contains constraints to ensure that the code space corresponds to the low-energy subspace of the enlarged Hilbert space H phys . The constraint HamiltonianĤ C is constructed asĤ with three-or four-body interactions (the square brackets indicate the optional factor) acting on 2 × 2 plaquettes of physical qubits (cf. Fig. 1) and a constraint strength c l > 0 [30]. Here, l i are labels of physical qubits with the property that all problem-spin indices involved in constraintĈ l appear an even amount of times across all the l i . Following this construction, constraint-satisfying states are characterized by an even number of qubits in the |↓ -state per constraint (witĥ σ z |↓ = − |↓ ), and the code space coincides with the constraint-fulfilling subspace Figure 2a shows the parity implementation of an all-toall connected Ising spin-glass model, where all constraints are explicitly enforced by three-and four-body interactions.
A. Explicit Parity QAOA The QAOA [15] attempts to find low energy solutions of H problem by evolving a quantum state alternately with a driver HamiltonianĤ B = N i=1σ (i) x and the (quantum mechanical) problem HamiltonianĤ problem for variable durations. To implement the QAOA in the parity architecture [25], the single-qubit driver Hamiltonian H X = mσ (m) x now acts on K physical qubits, while the problem HamiltonianĤ problem is replaced by the two components ofĤ phys , i.e.Ĥ Z andĤ C . A parity-QAOA sequence of depth p thus corresponds to variationally evolving the system with HamiltoniansĤ X ,Ĥ Z andĤ C as where the variational parameters β j , γ j and Ω j are optimized in a quantum-classical feedback loop in order to minimize ψ|Ĥ phys |ψ . As a consequence, during optimization, the constraints introduced by the parity mapping are treated on the same footing as the problem encoding single-body terms, and the QAOA unitary e −iΩĤ C needs to be implemented explicitly in order to steer the dynamics into H CF .

B. Implicit Parity QAOA
An alternative approach to perform parity QAOA is to start with a state in H CF and restrict the dynamics to that subspace by adapting the driver Hamiltonian [28,29]. The constraint conditions then are preserved implicitly throughout the QAOA sequence, which implies that we are looking for a driver HamiltonianĤ imp X fulfilling The colored lines denote sets of qubits, each of which can be flipped simultaneously without leaving the constraint-fulfilling subspace (constraintpreserving driver lines). All constraints are implicitly enforced via the driver Hamiltonian, and no energy penalty for constraints is needed. (c) Only the bottom-most row of 3-body constraints is enforced explicitly, while the others are satisfied implicitly due to the restriction of dynamics via the driver Hamiltonian. In this setting, the blue and red driver lines (hybrid driver lines) can be implemented in parallel, respectively.
We chooseĤ imp X to be a sum over products ofσ x operators. As the constraints are products ofσ z operators, each constraint term commutes with individual terms of H imp X whenever they share an even number of qubits. For Eq. (7) to hold, each term inĤ imp X must commute with each constraint term independently. Fig. 2b shows the qubits in such terms ofĤ imp X as colored lines for the example of an all-to-all connected problem graph [27]. In this example, the qubit labels contributing to a certain line share a common problem spin index, and therefore can be associated to this particular problem spin. Thus, the sum over all such products is the parity-mapped analogue of the "standard" driver HamiltonianĤ B acting on the problem spins. In the following we formalize the aforementioned considerations and define the elements of constraint-preserving driver Hamiltonians.
First, we consider a set of physical qubits that can be flipped simultaneously without changing which constraint conditions are preserved. These qubits are typically arranged on the layout along a line (see for example the colored lines in Fig. 2b), or in more general cases manifest as a tree graph of adjacent qubits. In the following, we refer to these sets as constraint-preserving driver lines Q µ , with the index µ enumerating the driver lines for a given problem.
With each driver line, we associate a driver term with the property We refer to the number of qubits in a driver line as its length. Our goal is to explore the full code space with a set of driver terms that allows for independent flips of any problem spin, which requires the following properties for driver lines: A set D of driver terms is independent iff no element Q µ ∈ D can be obtained as a product of (multiple) other elements in D. Furthermore, we call the set D valid iff D is independent and |D| = N − n s holds. This ensures that each of the N − n s independent spins of the original problem can be flipped. Two driver lines Q µ and Q ν are said to overlap, iff Q µ ∩ Q ν = ∅. In the following, we use D to refer to a set of driver lines as well as to the set of its associated driver terms. Note that the product of driver terms translates to the symmetric difference of associated driver lines. In contrast to the standard parity-QAOA approach enforcing all constraints explicitly as discussed in [25], we now investigate the performance of parity QAOA utilizing driver Hamiltonianŝ consisting of the operators associated with a valid set of constraint-preserving driver lines. Provided that we start from a constraint-fulfilling state, such a driver only introduces transitions to other constraint-fulfilling states and therefore restricts the dynamics to H CF . Using the constraint-preserving driver Hamiltonian, the corresponding QAOA protocol is given by with |ψ 0 being an appropriately chosen initial state fulfilling all parity constraints. Usually, |ψ 0 is chosen to be the equal superposition of all constraint-fulfilling computational states (for details on the initialization see Sec. IV). Note that compared to the QAOA-protocol described in Eq. (6), the step involvingĤ C is not required anymore, since all constraints are now implicitly preserved and do not have to be enforced by the constraint Hamiltonian. Apart from saving one variational parameter per QAOA-cycle, the intrinsic fulfillment of parity constraints also results in an exponential reduction of the size of the accessible Hilbert space, decreasing the probability of populating undesired states and thus significantly enhances the performance of the algorithm.
Hamiltonians with multi-qubit terms of the formX (µ) can in general not be simulated in quantum hardware directly. The unitary operatorÛ = e −iβX (µ) , however, can be readily implemented as a sequence of CNOT-gates and single qubit rotations [31], with a circuit depth scaling linear in the length of the driver line Q µ (see Appendix A).

III. HYBRID APPROACH AND MODULARIZATION
A drawback of the fully implicit QAOA implementation described in Section II B is that the driver lines can become arbitrarily long or overlap, which limits the ability to perform gates in parallel to achieve a low overall circuit depth. In particular, a fully implicit implementation of the complete graph requires a driver unitary with a circuit depth scaling at least linearly with the system size N 1 . In order to keep the circuit depth feasible, especially for non-error-corrected quantum devices, we introduce a hybrid implementation as a way to balance between the advantages of the fully explicit and the fully implicit approaches. The main idea is to shorten/split driver lines to obtain a parallelizable implementation that requires a minimal number of explicitly enforced constraints (cf. Fig. 2c). This can be achieved by separating the required n tot C constraints into n C explicitly enforced constraints and n tot C − n C implicitly preserved constraints. The resulting hybrid Hilbert space H hyb is thus spanned by the computational basis states fulfilling all implicitly preserved constraints with dim(H hyb ) = 2 N +nC−ns .
Similar to the implicit approach, a hybrid driver line Q µ is given by a set of physical qubits that can be simultaneously flipped without changing the population in the hybrid subspace H hyb . The definitions of length, overlap, independence and driver terms for the hybrid case are analogously defined as for the implicit approach introduced in Sec. II B w.r.t. H hyb . A set D of hybrid driver terms is valid iff it is independent and any computational basis state in the constraint-fulfilling Hilbert space H CF can be transformed to any other by applying operators in D only. Note that this requirement is less strict compared to fully constraint-preserving driver lines, since D can contain N − n s ≤ |D| ≤ N + n C − n s driver terms. In the present work we focus on |D| = N + n C − n s , since in all other cases, there are explicitly enforced constraints which are naturally preserved by the driver lines.
As a consequence, for a problem with N spin variables and n tot C constraints of which n C are enforced explicitly 1 In this case, the length of a single driver line and therefore also its implementation depth is already proportional to N .
we can choose the hybrid driver Hamiltonian aŝ with driver termsX (µ) associated to a valid set of hybrid driver lines. In contrast to the fully implicit implementation, we can no longer associate individual driver terms to single-qubit operations on the original problem spins. Note that the fully implicit and the fully explicit approach correspond to the limiting cases of the hybrid approach with n C = 0 and n C = n tot C , respectively. The QAOA-protocol is now given by with replacementsĤ X →Ĥ hyb X andĤ C →Ĥ hyb C compared to the protocol described in Eq. (6). The initial state |ψ 0 is typically chosen to be the equal superposition of all computational basis states in H hyb (see Sec. IV for details on the initialization).
In the following, we demonstrate our hybrid approach on the example of the complete graph and then show how this can be applied to arbitrary graphs. At the end of this section we introduce the concept of modularization in order to extend our approach to large system sizes with system-size independent circuit depths.

A. Example: Complete graph
In this section we illustrate the above introduced concepts on the example of a parity-encoded problem graph with all-to-all connectivity as pictured in Fig. 2. Starting from a constraint-fulfilling state, flipping a single physical qubit leads to the violation of at least one constraint. When flipping more qubits until all constraints are fulfilled again, the minimal set of flipped qubits will correspond to a constraint-preserving driver line as in Fig. 2b. The corresponding driver terms, however, cannot be implemented in parallel and result in impractical circuit depths.
A particular way to render the circuit shorter and parallelizable is to "break" each long constraint-preserving driver line into two shorter driver lines by enforcing all three-body constraints explicitly as depicted in Fig. 2c.
Note that in such a setting, the original driver lines (cf. Fig. 2b) also remain valid, even though the bottom constraints are explicitly enforced. Switching a constraint from an implicit to an explicit implementation doubles the dimension of the reachable subspace by including states that violate the corresponding explicitly enforced constraint. This increased flexibility allows one to split the original driver line into two shorter driver lines, such that each of these two lines violate the switched constraint and the symmetric difference of the two lines restores the original driver line. The resulting driver Hamiltonian can be easily parallelized by classifying the lines into two groups by their orientation in the layout. In Fig. 2c, this classification is represented by the red and blue coloring of lines. As none of the lines within a group overlap, their corresponding gate sequences can be executed at the same time. Hence, the implementation of the total driver unitary exp(−iβĤ hyb X ) takes a circuit depth of at most 2N . This can be further reduced to a constant depth by modularization of the layout, as explained in section III C.

B. Arbitrary (hyper-)graphs
Compiling more general graphs, and in particular hypergraphs, to the parity architecture leads to a variety of placements of three-and four-body constraints among qubits in a square lattice geometry (cf. Fig. 3 and Ref. [24]). In the simplest case, requiring only fourbody constraints, we can construct a driver Hamiltonian which preserves all constraints from only straight horizontal and vertical lines (cf. Fig 3a). This is still true for most layouts with mixed three-and four-body constraints where all three-body constraints are enforced explicitly (see Fig. 3b). The only exception are layouts with isolated groups of three-body constraints, which are not The optimization aims at increasing the number of implicitly preserved three-body constraints (gray triangles) which cause the driver lines shown in green to deviate from straight line shapes. The yellow square is a fourbody constraint which is kept explicitly enforced to connect the adjacent explicitly enforced three-body constraint to the boundary, simplifying the driver lines. One line has been omitted as it can be obtained via symmetric difference of the others.
connected to the boundary of the layout through adjacent explicitly enforced constraints. Enforcing isolated constraints explicitly can require more complicated driver lines, including turns and branches. This can be circumvented by explicitly enforcing additional constraints until all isolated explicitly enforced constraints are connected to the boundary via other explicitly enforced constraints. Hence, a simple strategy to partition the constraints is to explicitly enforce all three-body constraints, and all four-body constraints required to connect them to the boundary, while the remaining four-body constraints are implicitly preserved by the drivers. The full driver circuit can then be implemented in two steps, where all horizontal and all vertical driver lines are implemented in parallel, respectively. In many cases, the number of of explicitly enforced constraints can be further reduced since some of the three-body constraints are automatically preserved by the above mentioned horizontal and vertical driver lines. In other cases, small adjustments to the driver lines by adding or removing qubits (which may introduce turns and branches), are sufficient to preserve even more threebody constraints. An example of minimizing the number of explicitly enforced constraints at a fixed maximal circuit depth can be seen in Fig. 4.

C. Modularization
With the procedure described in the previous section, the average length of hybrid driver lines (and therefore the depth of the QAOA-circuit) grows linearly with the dimensions of the layout. We now utilize the concept of implicitly preserved and explicitly enforced constraints to restrict the driver circuit depth to an ad-justable and system-size independent value, while minimizing the number of explicitly enforced constraints.
Given a compiled problem layout, i.e. the distribution of three-and four-body constraints, we subdivide the entire square lattice into modules involving at most l max × l max qubits, separated by rows and columns of explicitly enforced constraints (see Fig. 5). As a consequence, this limits the length of the driver lines within a module and each module can be treated separately when constructing driver lines. In particular, if all three-body constraints within a module are enforced explicitly, i.e. there are only vertical and horizontal lines, the maximal length of a driver line is given by l max . Therefore, the circuit depth of the driver Hamiltonian implementation scales linearly with l max , which is a user-determined and problem-independent quantity that can be chosen in accordance with device-specific needs. Even in the more general case of conserving some of the three-body constraints within a module implicitly, the problem of finding appropriate hybrid driver lines now reduces to smaller, separate problems for each module and the maximal length of the driver lines will still be approximately l max . In any case, the circuits implementing the respective driver terms for each module can be executed simultaneously. Considering that e −iγĤZ and e −iΩĤ hyb C can also be implemented with constant-depth circuits, we see that a single QAOA cycle [see Eq. (15)] for arbitrary problem-sizes can be implemented with a constant circuit depth.

IV. INITIAL STATE PREPARATION
In order to obtain a suitable initial state |ψ 0 for the QAOA protocol described in Eq. (11) we want to prepare the system in an equal superposition of all computational states spanning H hyb , which includes the limiting cases H phys and H CF . Consider the hybrid driver Hamilto-nianĤ hyb X involving a valid set of driver lines D. The desired state |ψ 0 is the simultaneous eigenstate of all driver terms inĤ hyb X and all implicitly preserved constraints, with eigenvalues +1 and 0 respectively. While this can be easily achieved in the purely explicit approach by preparing each physical qubit in the |+ -state, the initial state preparation is more challenging in the implicit and especially the hybrid approach. There are known methods to construct circuits generating such a stabilizer state from a trivial product state [32,33], however, the resulting circuits are not necessarily straightforward to implement and might result in large circuit depths on architectures with limited connectivity.
In the following, we propose a simple initialization procedure with a low circuit depth. Resembling the concept of relating a constraint-preserving driver line to a logical qubit (cf. Sec. II B) we now introduce a conceptual driver qubit for each hybrid driver line, such that the associated driver termX (µ) acts as the bit-flip operator on that driver qubit. These |D| = log 2 dim(H hyb ) driver FIG. 5. Modularization of a larger layout with additional explicitly enforced constraints (yellow) arranged in a grid for constant circuit depth implementation of driver terms. All blue lines, and all red lines can be implemented in parallel, respectively. Green lines, which are caused by implicitly enforced three-body constraints, only add a small contribution to the depth and are partially parallelizable with the other steps. In each submodule, one driver line has been omitted as it can be obtained via symmetric difference of the others.
qubits represent the states of the considered Hilbert space (satisfying all implicitly enforced constraints).
Thus, the desired initial state corresponds to all driver qubits being in the |+ -state, which can be obtained from the |↑ -state through consecutive rotations around the xand z-axis. To this end, we also define the phase-flip op-eratorẐ (µ) acting on driver qubit µ, analogous to the stabilizer formalism introduced in Ref. [27]. The newly defined operators must fulfill the Pauli commutation relations for µ = ν. For a single driver line Q µ , it is easy to show that any operatorσ (k) z , acting on a physical qubit k ∈ Q µ , fulfills the desired commutation relations with the cor-respondingX (µ) -rotation. As long as this qubit is not involved in any other driver lines, this remains a valid choice. For example, in the fully implicit implementation shown in Fig. 2b, it is possible to do theσ z -rotations on the physical qubits involving the index 0, since each of them is only involved in a single driver line.
If none of the physical qubits involved in a driver line Q ν is exclusively part of this line, this construction fails, as any possibleσ z -rotation will introduce crosstalk to other driver qubits 2 . However, this rotation does not affect the other driver qubits if they are still in aẐeigenstate. Thus, with an appropriate order of rotations on the driver qubits, it is still possible to use the same state preparation protocol. Additional details for arbitrary driver configurations can be found in Appendix B.
Having defined the bit-and phase-flip operators for the driver qubits we can now prepare all driver qubits in the |+ -state. We start with the constraint-fulfilling state |↑ ⊗K (in the basis of physical qubits), which corresponds to all driver qubits being in the |↑ -state as well. To prepare |ψ 0 from this state, we have to perform physical operations corresponding to a π/2-rotation around the y-axis on all driver qubits. These operations can be decomposed into consecutive rotations e −i π 4X (µ) and e −i π 4Ẑ (µ) , and thus be implemented with the previously defined operators. The circuit depth of the resulting initialization procedure scales the same as the implementation of exp(−iβĤ hyb X ).

V. NUMERICAL RESULTS
A. Circuit depth scaling Figure 6 shows the required circuit depth to implement a single step of the QAOA protocol for a complete problem graph (see Fig. 2) as a function of the relative amount of explicitly enforced constraints n r = n C /n tot C . In the fully implicit case (n r = 0, see Fig. 2b) the circuit depth grows linearly with the system size and the large prefactor in the circuit depth scaling is due to excessive overlap of driver lines. The circuit depth can be reduced by increasing the number of explicitly enforced threebody constraints until, at the points marked by crosses in Fig. 6, all three-body constraints are explicitly enforced (see Fig. 2c). In this situation the circuit depth still scales linearly with the system size, but with a more favorable prefactor. All points from there on correspond to modularized layouts of decreasing module size l max , further improving the circuit depth. Initially this can lead to a small increase of the circuit depth due to the additional implementation cost of the constraint Hamiltonian. This depth-increase is independent of the system size as all explicitly enforced constraints can be implemented in parallel. For sufficiently large lattices, the relation between reachable circuit depth and relative amount of explicitly enforced constraints becomes independent of the system size. The points at n r = 1 correspond to the fully explicit implementation (see Fig. 2a). 2 An operatorσ (k) z on a physical qubit k translates to the product ofẐ-operators acting on all driver qubits whose driver lines include the physical qubit k.

B. QAOA performance
In order to demonstrate the advantages of this new approach, we compare the QAOA performance of the fully implicit (n r = 0), the hybrid (0 < n r < 1) and the fully explicit (n r = 1) parity-QAOA protocol. Subsequent to noiseless QAOA simulations to find optimal parameters, we simulate the resulting optimal QAOA circuits (including the respective state-preparation circuit) under varying noise levels of the required CNOT-gates using qiskit [34]. The single-qubit gate error rate is kept constant at 10 −3 . The simulations were done for various problem instances of a complete graph with N = 6, corresponding to K = 15 physical qubits (see Fig. 2).
Each data point in Fig. 7 represents the median performance of 96 problem instances for complete graphs with random local fields J m [cf. Eq. (2)], drawn from a uniform distribution U [−1,1] . For the noiseless parameter optimization of each problem instance, we repeatedly initialize the QAOA-parameters for p = 3, equally distributed in the range [0, 2π), and search for a local optimum of the energy expectation value. Note that for the fully implicit approach there is one QAOA-parameter less per cycle as the constraint unitary has been removed. For each initialization we perform consecutive updates of random QAOA-parameters until the energy expectation value converges to a local minimum. If the energy of the system decreases after a parameter update, the new parameter is accepted, otherwise rejected. After repeating the initialization and optimization 100 times the lowest energy expectation value E = ψ|Ĥ phys |ψ [cf. Eq. (6)] for each instance is kept. We subsequently calculate the residual energy E res of the system, defined as where E max and E min denote the maximal and minimal eigenvalues ofĤ phys . Furthermore, the corresponding fidelity is given by the ground state population with respect to H phys .
Clearly, the QAOA performance increases with decreasing number of explicitly enforced constraints n r . This is related to the fact that with increasing n r , the search space grows and thus additional terms complicate the cost function to be minimized. Moreover, we observe that up to error rates of about 10 −2 , the different approaches show similar noise-dependency with respect to the QAOA performance. Note that for all values of n r the absolute CNOT-gate counts are similar and show the same N -scaling.

VI. CONCLUSION AND OUTLOOK
In summary, we have shown how to improve the parity-QAOA performance by interpolating between the standard single-qubit driver Hamiltonian and a driver Hamiltonian tailored to the parity architecture. In particular, the proposed hybrid approach keeps the parallelizability of the fully explicit parity QAOA while gaining performance by reducing the overall search space. As the key-point of our approach, the trade-off between circuit depth and QAOA performance can be dynamically chosen according to hardware-specific needs by adjusting the size of the implicitly driven submodules. The presented ideas can be readily realized on any hardware platform providing a regular grid of qubits connected via nearestneighbor gate operations. This is crucial for addressing questions about the practical QAOA performance of modularized layouts for problem sizes inaccessible to classical simulations.
While the present work focuses on improving the quantum implementation part of the parity QAOA, there are additional opportunities to improve its performance via the classical part, for example different decoding strategies that result in smarter cost functions. More generally, further improvements of the parity QAOA might also involve exploiting recently investigated phenomena regarding QAOA parameters [35][36][37] and utilizing other types of mixing Hamiltonians [38]. The case where a physical qubit is involved in multiple driver lines has to be treated with care. Implementing a physicalσ z -operation on such a qubit has an effect on all involved driver lines and thus can introduce unwanted crosstalk. Whenever possible, we must therefore choose a qubit which is not involved in any other driver lines to perform the phase operation on. If this is not possible, theẐ (µ) -operation, performed on a qubit k for a driver line Q µ k can still be used, as long as all driver qubits associated with other driver lines Q ν involving qubit k are in an eigenstate ofẐ (ν) and thus not affected by the rotation. In the initially prepared state |↑ ⊗K , all driver qubits are in theẐ-eigenstate. That enables us to find a sequence of driver rotations such that for everyẐ (µ) -rotation there is at least one qubit of the corresponding driver line which is either not included in any other driver lines, or only involved in driver lines whose state has not been rotated yet. After initializing all K physical qubits in |↑ we assign a priority to every driver line, such thatX-andẐ-rotations on the driver qubits, applied in descending order in their priority, will transform all driver qubits into the |+ state.
Here λ enumerates the driver lines and the X (λ) /Z (λ) describe bit-flip/phase-flip operations on the corresponding driver qubit. The priorities of the lines can be found iteratively, we call every line "unassigned" until it has been assigned a priority: 1. Assign all lines Q µ which contain at least one qubit which is not in any other lines the priority P µ = 0.
2. Assign all unassigned lines Q ν which overlap at least one line with priority P µ (and do not overlap other, unassigned lines at the same qubit) the priority P ν = P µ + 1 3. Repeat step 2 until all lines have a priority.
The initial state |ψ 0 can then be prepared as where P max is the highest assigned priority and D κ ⊆ D is the subset of driver lines with priority κ. Note that the order of the products must be such that the terms with higher priority are applied first. Equal priorities can be implemented in any order, their required gate sequences can be performed in parallel (or as parallel as possible, if there are qubit overlaps of the driver lines). If with this procedure, not all lines can be assigned a priority, the partitioning of constraints can be changed to include more explicitly enforced constraints. At the latest, a valid prioritizing of all lines can be found once all threebody constraints are explicitly enforced. Fig. 9 shows an example of two sets of connected driver lines in a sub-module with assigned priorities.