Unitary Partitioning and the Contextual Subspace Variational Quantum Eigensolver

The contextual subspace variational quantum eigensolver (CS-VQE) is a hybrid quantum-classical algorithm that approximates the ground-state energy of a given qubit Hamiltonian. It achieves this by separating the Hamiltonian into contextual and noncontextual parts. The ground-state energy is approximated by classically solving the noncontextual problem, followed by solving the contextual problem using VQE, constrained by the noncontextual solution. In general, computation of the contextual correction needs fewer qubits and measurements compared with solving the full Hamiltonian via traditional VQE. We simulate CS-VQE on different tapered molecular Hamiltonians and apply the unitary partitioning measurement reduction strategy to further reduce the number of measurements required to obtain the contextual correction. Our results indicate that CS-VQE combined with measurement reduction is a promising approach to allow feasible eigenvalue computations on noisy intermediate-scale quantum devices. We also provide a modification to the CS-VQE algorithm; the CS-VQE algorithm previously could cause an exponential increase in Hamiltonian terms, but with this modification now at worst will scale quadratically.


I. INTRODUCTION
One of the fundamental goals of quantum chemistry is to solve the time-independent non-relativistic Schrödinger equation. The eigenvalues and eigenvectors obtained allow different molecular properties to be studied from first principles. Standard methods project the problem onto a Fock space (η electrons distributed in M orbitals) and solve. Under this approximation, the problem scales exponentially with system size, where the number of Slater determinants (configurations) scales as M η making the problem classically intractable [1]. Quantum computers can efficiently represent the full configuration interaction (FCI) Hilbert space and offer a potential way to efficiently solve such molecular problems [2,3]. This use case is often the canonical example of where the first quantum computers will be advantageous over conventional computers [4,5].
The constraints on present-day devices have given rise to a family of quantum-classical algorithms that leverage as much classical processing as possible to reduce the quantum resources required to solve the problem at hand. Common examples of NISQ algorithms are the variational quantum eigensolver (VQE) [9], quantum approximate optimization algorithm (QAOA) [10] and variational quantum linear solver (VQLS) [11]. A good example is the recently proposed entanglement forging method [12], where the electronic structure problem for H 2 O was reduced from a 10-qubit problem to multiple 5 qubit problems that were each studied using conventional VQE and classically combined. Recently another novel approach known as the quantum-classical hybrid quantum Monte Carlo (QC-QMC) method was used to unbias the sign problem in the projector Monte Carlo (PMC) method, which implements imaginary time evolution [13]. At a high level, the accuracy of a constrained PMC calculation is determined by the quality of trial wave functions. Quantum computers offer a way to efficiently store highly entangled trial wave functions and measure certain overlaps, which would require exponential resources classically. Huggins et al. performed QC-QMC simulations of different chemical systems on Google's Sycamore processor and obtained results competitive with state-ofthe-art classical methods [13].
The contextual-subspace VQE algorithm is another hybrid quantum-classical approach [14]. It gives an approximate simulation method, where the quantum resources required can be varied for a trade off in accuracy. This allows problems to be studied where the full Hamiltonian would normally be too large to investigate on current NISQ hardware. This was shown in the original CS-VQE paper, where chemical accuracy for various molecular systems was reached using significantly fewer qubits compared with the number required for VQE on the full system [14]. As CS-VQE reduces the number of qubits required for simulation, the number of terms in a Hamiltonian requiring separate measurements is also reduced.
This paper is structured as follows. Section II summarizes the CS-VQE algorithm. Here we provide a modification to the unitary partitioning step of the CS-VQE algorithm the CS-VQE algorithm previously could cause the number of terms in a Hamiltonian to exponentially increase, but with this modification now will at worst cause a quadratic increase. Section III is split into a description of the method in Section III A and two main parts, Sections III B and III C. Section III B examines a model problem to exemplify each step of the CS-VQE algorithm. Section III C gives the numerical results of applying unitary partitioning measurement reduction to a test bed of different molecular structure Hamiltonians, where the contextual subspace approximation has been employed.

II. BACKGROUND
To keep our discussion self-contained and establish notation, we summarize the necessary background theory of the contextual-subspace VQE algorithm in this section.

A. Contextuality
The foundation of quantum contextuality is the Bell-Kochen-Specker (BKS) theorem [31]. In lay terms, every measurement provides a classical probability distribution (via the spectral theorem) and a joint distribution can be built as a product over all possible measurements [32]. The BKS theorem proves that it is impossible to reproduce the probabilities of every possible measurement outcome for a quantum system as marginals of this joint probability distribution [33]. This is related to how quantum mechanics does not allow models that are locally causal in a classical sense [34]. Contextuality is a generalization of nonlocality [34,35]. This means that quantum measurement cannot be understood as simply revealing a pre-existing value of some underlying hidden variable [36,37]. Bell's theorem also reaches a similar conclusion against hidden variables [38], but in a different way.
A good example of this phenomenon is the "Peres-Mermin square" [37,39], where no state preparation is involved and only observables are considered. We include an example in Appendix A and remark on the relation to VQE. Colloquially, for a noncontextual problem it is possible to assign deterministic outcomes to observables simultaneously without contradiction; however, for a contextual problem this is not possible [14].
The following subsections set out the contextual subspace VQE algorithm and we provide an alternate way to construct U W (defined below) compared to the original work [14]. This modification addresses the exponential scaling part of the method. Further background on the full CS-VQE algorithm is given in the Supplemental Material [40].
where c a are real coefficients. Each Pauli operator P a is made up of an n-fold tensor product of single qubit Pauli matrices σ j ∈ {I, X, Y, Z}, where j indexes the qubit the operator acts on. The CS-VQE algorithm is based on separating such a Hamiltonian into a contextual and noncontextual part [14]: As it is possible to assign definite values to all terms in H noncon without contradiction, a classical hidden variable model (or quasiquantized model) can be used to represent this system [41].
In [42], such a model is constructed along with a classical algorithm to solve it. This was based on the work of Spekkens [43,44]. Solving this model yields a noncontextual ground-state.
Once that solution is obtained, the remaining contextual part of the problem is solved. Solutions to H con must be consistent with the noncontextual ground-state, which defines a subspace of allowed states [14]. By projecting the problem into this subspace the overall energy is given by: E( θ; q, r ) =E noncon ( q, r ) + E con ( θ; q, r ) =E noncon ( q, r )+ ψ con ( θ)| Q † W U † W H con U W Q W |ψ con ( θ) ψ con ( θ)| Q † W Q W |ψ con ( θ) = ψ con ( θ)| Q † W U † W H full U W Q W |ψ con ( θ) ψ con ( θ)| Q † W Q W |ψ con ( θ) .
(3) We have written U W rather than U W ( q, r) to simplify our notation.
The vector ( q, r) should be thought of as parameters that define a particular noncontextual state: normally, this will be a parameterization for the noncontextual ground-state [14,42]. This vector has a size of at most 2n + 1 for a Hamiltonian defined on n qubits [42]. From ( q, r), we define a set of stabilizers W which stabilize that particular noncontextual state [14]. The unitary U W maps each of these stabilizers to a distinct single-qubit Pauli matrix; details of this are covered in II E. By enforcing the eigenvalue of these single-qubit Pauli operators we define a subspace of allowed quantum states that are consistent with the noncontextual state. To constrain the problem to this subspace, we use the projector Q W . Note that this is not a unitary operation, hence the renormalization in equation 3. By projecting our contextual Hamiltonian into this subspace H con → H W con = Q † W U † W H con U W Q W , we ensure that solutions to H W con remain in the subspace consistent with the noncontextual solution [14]. In other words, this operation means that solutions to H W con will remain consistent with the noncontextual solution. Section II E goes into detail on this.
The contextual trial or ansatz state is prepared as is the parameterized operator that prepares it. The projector Q W , in equation 3, then projects this state into the subspace of possible states consistent with the noncontextual groundstate. Again this depends on which stabilizer eigenvalues are fixed. Note that as Q W is not a unitary operation the state must be renormalized. Further analysis of the contextual subspace VQE projection ansatz is provided in [45]. In section II C we discuss how to solve the noncontextual problem.

C. Noncontextual Hamiltonian
For a given noncontextual Hamiltonian, we define S Hnoncon to be the set of P i present in H noncon . This set can be expanded as two subsets denoted as Z and T , representing the set of fully commuting operators and its complement respectively [14,42]. The set T can be expanded into N cliques, where operators within a clique must all commute with each other and operators between cliques must pairwise anticommute. This is because com-mutation forms an equivalence relation on T if and only if S Hnoncon is noncontextual [14,42]. A hidden variable model for such a system can be built, where the set of observables R that define the phasespace points of the hidden variable model is [14,42,46]: The set G represents an independent set of Pauli operators that generates the set of commuting observables Z. Each P (j) 0 corresponds to a chosen Pauli operator in the j-th clique of T : by convention we say that this is the first operator in the set, but this can be any operator in the jth clique.
With respect to the phase-space model given in [42], a valid noncontextual state is defined by the parameters ( q, r), which set the expectation value of the operators in R (Equation 4). Each operator in G is assigned the value G i = q i = ±1. The operators in T are assigned the values P (j) 0 = r j , where r is a unit vector (| r| = 1) [14,42]. The number of elements in q and r are |G| and N respectively. For n qubits the size of |R| is bounded by 2n + 1, which bounds the size of ( q, r) [42,46].
The observables for the N anticommuting P (j) 0 operators in R can be combined into the observable [42]: We denote the set of Pauli operators making up this operator as A ≡ {P (j) 0 |j = 0, 1 . . . , N − 1}. The expectation value of A( r) is assigned by the hidden variable model to always be +1, due to: using P (j) 0 = r j and | r| = 1. The expectation value for H noncon can be induced, by setting the expectation values of operators in R (Equation 4), as this set generates S Hnoncon [14,42]. To find the ground-state of H noncon , we perform a brute force search over this space. For each possible ±1 combination of expectation values for each G j (2 |G| possibilities), the energy is minimized with respect to the unit vector rthat sets the expectation values P (j) 0 = r j . The Supplemental Material provides further algorithmic details [40]. The vector ( q, r) that was found to give the lowest energy defines the noncontextual ground-state. Each noncontextual state ( q, r), corresponds to subspaces of quantum states, which we will describe in subsection II D.

D. Contextual subspace
In [15] and [29] it was shown that an operator constructed as a normalized linear combination of pairwise anticommuting Pauli operators, such as A( r) (Equation 5), is equivalent to a single Pauli operator up to a unitary rotation R. We can therefore write A( r) → P We write the set R r (Equation 4) under this transformation: It will be shown later that the unitary R is constructed from the operators in A. This means that the terms in G are unaffected by this transformation, as operators in G and so must universally commute so must commute with R. A given noncontextual state ( q, r) is equivalent to the joint expectation value assignment of G i = q i = ±1 and A( r) = +1. This defines a set of stabilizers: which by definition must stabilize that noncontextual state ( q, r) or more precisely, the subspace of quantum states corresponding to it 1 . Note that A( r) is not a conventional stabilizer, but is unitarily equivalent to a single qubit operator P (k) 0 [15, 29]. We can consider this problem under the unitary transform defined in Equation 7. The stabilizers in W all become: which defines a regular set of stabilizers for the noncontextual state ( q, r), which defines a subspace of quantum states. Here ξ = ±1, is determined by RA( r)R † = ξP (k) 0 and can always be chosen to be +1, which we do throughout this paper. Altogether, when certain noncontextual stabilizers are fixed (by the noncontextual state) they specify a subspace of allowed quantum states that will be consistent with that noncontextual state and thus define the constraints for the contextual part of the problem. We refer to this subspace as the contextual subspace [14].

E. Mapping a contextual subspace to a stabilizer subspace
In CS-VQE, the expectation value of the full Hamiltonian is obtained according to Equation 3. First, the noncontextual problem is solved yielding the noncontextual state ( q, r) -normally the ground-state ( q 0 , r 0 ). The Supplemental Material shows how ( q 0 , r 0 ) can be obtained via a brute force approach [40]. Next the contextual Hamiltonian is projected into the subspace of allowed quantum states consistent with the defined noncontextual state. This constraint is imposed via: where the expectation value is then found on a quantum device.
The unitary operation U W is defined by the set of contextual stabilizers W ⊆ W all (equation 8), whose eigenvalue we fix according to the noncontextual state. If A( r) ∈ W, meaning that A( r) is fixed to be +1, then the steps summarized in Equation 7 must first be performed to reduce A( r) to a single Pauli operator. Clifford operators V i (P ) are then used to map each P ∈ W to a single-qubit Z operator. Each V i is made up of at most two π 2 Clifford rotations, generated by Pauli operators, per element in W. In [14] it was shown that at most there will be 2n of these rotations, where n is the number of qubits the problem is defined on [47]. We can write this operator as: (10) Applying U † W WU W = W Z results in a set of singlequbit Z Pauli operators. An implementation note is that each operator V i (P i ) in U W depends on the others. This can be seen by expanding U † W WU W . Therefore each V i operator is dependent on the stabilizers in W and the order in which they occur. We recursively define each V i as follows: 2. Find the unitary V 0 mapping the first Pauli operator P 0 ∈ W to a single qubit Pauli operator.
3. Apply this operator to each operator in the set: 5. Apply this operator to all operators in the set: Repeat this procedure from step (3) until all the operators are mapped to single qubit Z Pauli operators: W → W Z .
Finally, the eigenvalue of each single-qubit Z Pauli stabilizer in W Z is defined by the vector q of the noncontextual ground-state ( q, r), note that A( r) is fixed to +1 and thus r is not important here. U W can flip the sign of these assignments, but it is efficient to classical determine by tracking how U W affects the sign of the operators in W.
To project the Hamiltonian into the subspace consistent with the noncontextual state, we first perform the following rotation H full → H full = U † W H full U W . As this is a unitary transform, the resultant operator has the same spectrum as before. We then restrict the rotated Hamiltonian to the correct subspace by enforcing the eigenvalue of the operators in W Z , where the outcomes are defined by the noncontextual state. As each operator in W Z only acts nontrivially on a unique qubit, each stabilizer fixes the state of that qubit to be either |0 or |1 . We write this state as: where v indexes the qubit a given single-qubit stabilizer acts on and P v is defined by the noncontextual state. We can write the projector onto this state as: where I (n−|W Z |) is the identity operator acting on the (n − |W Z |) qubits not fixed by the single-qubit P v stabilizers. The action on a general state |φ is: where Q W has only fixed the state of qubits v and thus each stabilizer P v removes 1 qubit from the problem. As the states of these qubits are fixed, the expectation values of the single-qubit Pauli matrices indexed on qubits v are known. Thus the Pauli operators in the rotated Hamiltonian H full → H full = U † W H full U W acting on these qubits can be updated accordingly and the Pauli matrices on qubits v can be dropped. Any term in the rotated Hamiltonian that anticommutes with a fixed generator P v is forced to have an expectation value of zero and can be completely removed from the problem Hamiltonian. The resultant Hamiltonian acts on |W Z | fewer qubits. We denote this operation as The noncontextual approximation will be stored in the identity term of the problem and therefore does not need to be tracked separately.
The choice of which stabilizer eigenvalues to fix (i.e. what is included in W) and which to allow to vary remains an open question of the CS-VQE algorithm. The number of possible stabilizer combinations will be Rather than searching over all 2 |W all | −1 combinations of stabilizers to fix, in this paper we use the heuristic given in [14]. This begins at the full noncontextual approximation, where W contains all possible stabilizers. We then add a qubit to the quantum correction, by removing an operator from W and greedily choosing each pair that gives the lowest ground-state energy estimate [14]. Alternative strategies on how to do this remain an open question of CS-VQE. A possible way to approach this problem is to look at the priority of different terms in H con [48]. Note that the quality of the approximation is sensitive to which stabilizers are included in W. When fewer stabilizers are considered (included in W), the resultant rotated Hamiltonians will act on more qubits and approximate the true groundstate energy better.
In . Additional structure between R S and H full can make the base of the exponent slightly lower; however, the scaling still remains exponential in the number of qubits n, where |A| ≤ 2n + 1 [42]. The only case in which there is not an exponential increase in terms is for the trivial instance that R S commutes with H full . In the next section, we provide an alternative construction of R via a linear combination of unitaries (LCU) that results in only a quadratic increase in the number of terms of the Hamiltonian when transformed. This avoids the need to apply the unitary partitioning operator R (via a sequence of rotations) coherently in the quantum circuit after the ansatz circuit, which was proposed in [14].

F. Linear combination of unitaries construction of R
In the unitary partitioning method [15,29], it was shown that R could also be built as a linear combination of Pauli operators [15,30]. We provide the full construction in the Supplemental Material [40]. We denote the operator as R LCU . Rotating a general Hamiltonian H full by this operation R LCU results in: The Pauli operators P j , P k and P l are operators in A, further details are covered in the Supplemental Material [40]. Overall, this unitary transformation causes the number of terms in the Hamiltonian to scale as O |H full | · |A| 2 . This scaling is quadratic in the size of A and as |A| ≤ 2n + 1 [42], the number of terms in the rotated system will at worst scale quadratically with the number of qubits n. In a different context, this scaling result was also obtained for involutory linear combinations of entanglers [49]. Overall, unlike the sequence of rotations approach, this non-Clifford operation doesn't cause the number of terms in a Hamiltonian to increase exponentially.
The transformation given in Equation 14 H full → H LCU full = R † LCU H full R LCU is performed classically in CS-VQE. This is efficient to do because it just involves Pauli operator multiplication, which can be done symbolically or via a symplectic approach [50]. This operation could be applied within the quantum circuit. However, in contrast to the deterministic sequence of rotations approach, this implementation would be probabilistic as it requires post selection on an ancillary register [15,30,51,52]. Amplitude amplification techniques could improve this, but would require further coherent resources [53][54][55][56]. Performing this transformation in a classical pre-processing step therefore reduces the coherent resources required and at worst increases the number of terms needing measuring quadratically with respect to the number of qubits.

G. CS-VQE implementation
In [14], U W ( q, r) was fixed to include all the stabilizers of the noncontextual ground-state W ≡ W all (Equation 8), rather than possible subsets W ⊆ W all . The whole Hamiltonian was mapped according to H full → H full = U † W all H full U W all . In general, A( r) ∈ W and U W all will therefore normally include the unitary partitioning operator R. The problem with this approach is that the unitary R is not a Clifford operation and the transformation can cause the number of terms in the Hamiltonian to increase. This increase is exponential if R S is used and quadratic if R LCU is employed. As this step can generate more terms, R should only be included in U W if the eigenvalue of A( r) is fixed to +1, otherwise it is a redundant operation as the spectrum of the operator rotated by R is unchanged. We therefore modify the CS-VQE algorithm to construct U W from the CS-VQE noncontextual generator eigenvalues that are fixed. This means that W ⊆ W all and ensures that the number of terms can only increase if the eigenvalue of A( r) is fixed.

III. NUMERICAL RESULTS
We describe the method in Section III A and then split our results into the two remaining sections. First, we ex-plore a toy problem, showing the steps of the CS-VQE algorithm. We show how classically applying R without fixing the eigenvalue of A( r) to +1 can unnecessarily increase the number of terms in a Hamiltonian without changing its spectrum. Finally, in Section III C we apply measurement reduction combined with CS-VQE to a set of electronic structure Hamiltonians and show that this can significantly reduce the number of terms requiring separate measurement. The raw data for these results are supplied in the Supplemental Material [40].

A. Method
We investigated the same electronic structure Hamiltonians considered in the original CS-VQE paper [14]. All molecules considered had a multiplicity of 1 and thus a singlet ground-state. The same qubit tapering was performed to remove the Z 2 symmetries [57]. For each tapered Hamiltonian, we generate a set of reduced Hamil- where the size of W varies from 1 to |W all |, representing differing noncontextual approximations, as summarized in Section II G. To generate the different CS-VQE Hamiltonians, we modify the original CS-VQE source code used in [14,58]. The code was modified to implement the unitary partitioning step of CS-VQE if and only if the eigenvalue of A( r) was fixed. This ensured that the number of terms in the rotated Hamiltonian did not increase unnecessarily, as described in Section II G.
For each electronic structure Hamiltonian generated in this way, we then apply the unitary partitioning measurement reduction scheme to further reduce the number of terms requiring separate measurement [15,29,30]. Partitioning into anticommuting sets was performed using NETWORKX [59]. A graph of the qubit Hamiltonian is built, where nodes represent Pauli operators and edges are between nodes that commute. A graph coloring can be used to find the anticommuting cliques of the graph. This searches for the minimum number of colors required to color the graph, where no neighbors of a node can have the same color as the node itself. The "largest first" coloring strategy in NETWORKX was used in all cases [59,60].
We calculate the ground-state energy of each Hamiltonian in this paper by directly evaluating the lowest eigenvalues. This was achieved by diagonalizing them on a conventional computer.

B. Toy example
We consider the qubit Hamiltonian: and use it to exemplify the steps of the CS-VQE algorithm. The results are reported to three decimal places and full numerical details can be found in the Supplemental Material [40].
Following the CS-VQE procedure [14], we first split the Hamiltonian into its contextual and noncontextual parts (Equation 2):   each operator in H noncon can therefore be inferred without contradiction. To find the ground-state of H noncon , we checked all possible ±1 expectation values for each G j (2 3 = 8 possibilities). For each possible ±1 combination, the energy was minimized with respect to the unit vector r, which sets the expectation value for each P (j) 0 = r j . The vector ( q, r) that was found to give the lowest energy defines the noncontextual ground-state. In this case the ground-state is: This noncontextual state defines the operator A( r 0 ): or linear combination of unitaries [15], These operators perform the following reduction: (23) Equations 18, 20 and 23 define the noncontextual stabilizers: (24) Next, we define different U W (Equation 10), depending on which stabilizers W we wish to fix. For this problem we found the optimal ordering of which stabilizers to fix to be This was achieved by a brute force search over all |W all | i = 2 4 − 1 = 15 possibilities for W. The members of the resulting set of four different W each represent different noncontextual approximations. These give four different U W built according to Equation 10. The full definition of each operator is given in the Supplemental Material [40].
Taking a specific example, for ). This operator transforms W as The eigenvalues of the operators in W Z are fixed by the noncontextual state to be IZII = +1, IIZI = +1, IIIZ = −1. This defines the projector: (25) The reduced Hamiltonian is therefore (26) The Supplemental Material gives further details about this operation and provides the specifics for the other projected Hamiltonians [40].
Overall four Hamiltonians are generated, representing different levels of approximation, that act on 0, 1, 2 and 3 qubits respectively. The 4 qubit case represents the standard VQE on the full Hamiltonian. Figure 1 summarizes the error ∆E of each of these compared with the true ground-state energy (scatter plot). The number of terms in each Hamiltonian is given by the bar chart. The green and orange results have W ≡ W all for all cases and represent the old CS-VQE implementation. For these results, in the 3 and 4 qubit Hamiltonians have an increased number of terms due to R S/LCU being implemented, even though the eigenvalue of A( r 0 ) is not being fixed to +1. On the other hand, the gray and blue results in Figure  1 build U W according to Equation 10, where W ⊆ W all . This approach ensures that R S/LCU is only applied when necessary.

C. Measurement reduction
Figures 2 and 3 summarize the results of applying the unitary partitioning measurement reduction strategy to a set of electronic structure Hamiltonians We report the number of terms and number of qubits in each Hamiltonian required to achieve chemical accuracy compared with the original problem. The Supplemental Material gives further information about each result, where the different levels of noncontextual approximation are shown [40]. As previously discussed in [14], even though CS-VQE in general is an approximate method, chemical accuracy can still be achieved using significantly fewer qubits. Applying unitary partitioning on-top of the reduced CS-VQE Hamiltonians required to achieve chemical accuracy can further reduce the number of terms by roughly an order of magnitude. This is consistent with the previous results in [30].
To actually obtain a measurement reduction, one needs to show that the number of measurement required to measure the energy of a molecular system, to a certain precision , is reduced. Currently, Figure 2 only shows that we have reduced the number of Pauli terms being measured. We have not commented on the variance. In the Supplemental Material [40], we prove that simultaneous measurement of normalized anticommuting cliques can never do worse than performing no measurement reduction and will more often than not give an improvement. The proof given is state independent. There are other measurement strategies based on grouping techniques, such as splitting a Hamiltonian into commuting or qubit-wise commuting cliques [16, 17, 21, 24, 25]. The measurement reduction obtained from these methods is more complicated, as the covariance of operators within a clique must be carefully accounted for [24,61]. This is one of the reasons we do not analyze the performance of these strategies in this paper. Many other measurement methods have also been proposed [18-20, 22, 23, 25-28, 62,63] and their effect on the number of measurements would be interesting to investigate.
In Table I, we report the upper bound on the gate count required to implement measurement reduction as a sequence of rotations. The LCU method would require ancilla qubits and analysis of the circuit depth is more complicated. Further analysis can be found in [30]. The number of extra coherent resources required to implement unitary partitioning measurement reduction is proportional to the size of each anticommuting clique a Hamiltonian is split into [15,30]. The sequence of rotations circuit depth scales as O N s (|C| − 1) single qubit and O N s (|C|−1) CNOT gates, where N s is the number of system qubits and |C| is the size of the anticommuting clique being measured. Table I   The heuristic used to determine the operators in H noncon selected terms in the full Hamiltonian greedily by coefficient magnitude, while keeping the set noncontextual [42]. The Hamiltonians studied here had weights dominated by diagonal Pauli operators, as the Hartree-Fock approximation accounts for most of the energy. This heavily constrains the operators allowed in A. For the electronic structure Hamiltonians considered in this paper, we found in all cases that |A| = 2. In general, we do expect more commuting terms in H noncon than anticommuting terms. This is because there are more possible commuting Pauli operators defined on n qubits compared with anticommuting operators ( 2 n vs 2n + 1). G will therefore in general be the larger contributor to the superset R (Equation 4).
In Figure 2, the CS-VQE bars have not been split into two for the case when R is constructed as R LCU or R S . This is due to |A| being 2 in all cases, which is the special case when these operators (R LCU and R S ) end up being identical. In this instance R has the form R = αI + iβP and thus the number of terms will only increase for every term in the Hamiltonian that P anticommutes with. However, in general |A| will be greater than 2 and the effect of R can dramatically affect the number of terms in the resultant rotated Hamiltonian. We observe this in Fig. 1 of the toy example, where the 2 and 3 qubit CS-VQE Hamiltonians have had U W all applied to them even though the eigenvalue of A( r) is not fixed. In that example, for the 3 qubit approximation the sequence-ofrotations rotated Hamiltonian (green) actually has fewer terms than the LCU rotated operator (orange). This result is an artifact of the small problem size. In the Supplemental Material we show that the scaling will favor the LCU implementation, where the number of terms in a Hamiltonian can only increase quadratically, not exponentially, when performing the unitary partitioning rotation as a LCU rather than a sequence of rotations [40].
In the Supplemental Material, we show the convergence of CS-VQE at different noncontextual approximations. The results illustrate that CS-VQE can converge to below chemical accuracy well before the case when no noncontextual approximation is made (full VQE). Results beyond convergence are included to show the different possible levels of approximation. In practice knowledge of the true ground-state energy is not known a priori and so using chemical precision to motivate the noncontextual approximation will not be possible. In this setting, a way to approach quantum advantage is to note that CS-VQE is a variational method. The quantum resources required can be expanded until the energy obtained by CS-VQE is lower than that coming from the best possible classical method. At this point, either the algorithm can be terminated or further contextual corrections can be added until the energy converges, at which point the algorithm should be stopped.

IV. CONCLUSION
The work presented here shows that combining the unitary partitioning measurement reduction strategy with the CS-VQE algorithm can further reduce the number of terms in the projected Hamiltonian requiring separate measurement by roughly an order of magnitude for a given molecular Hamiltonian. The number of qubits needed to achieve chemical accuracy in most cases was also dramatically decreased, for example the H 2 S (STO-3G singlet) problem was reduced to 7 qubits from 22.
We also improve two parts of the CS-VQE algorithm. First, we avoid having to apply the unitary partitioning operator R after the ansatz which averts the potential exponential increase in the number of Pauli operators of the CS-VQE Hamiltonian caused by classically computing the non-Clifford rotation of the full Hamiltonian when R defined as a sequence of rotations [14,15]. We show that applying this operation as a linear combination of unitaries [15]: results in the number terms at worst increasing quadratically with the number of qubits. This result makes classically precomputing this transformation tractable and R no longer needs to be performed coherently after the ansatz. Secondly, we define the unitary U W , which maps each stabilizer in W all (equation 9) to a distinct singlequbit Pauli matrix, according to which stabilizer eigenvalues are fixed by the noncontextual state. This ensures that the non-Clifford rotation required by CS-VQE is only applied when necessary and also reduces the number of redundant Clifford operations that are classically performed.
There are still several open questions for the CS-VQE algorithm. We summarize a few here. (1) What is the best optimization strategy to use when minimizing the energy over ( q, r) in the classical noncontextual problem? (2) What heuristic is best to construct the largest |H noncon |? (3) How can we efficiently determine which noncontextual stabilizers to fix while maintaining low errors? In this paper, the size of each electronic structure problem allowed us to classically compute the groundstate energies at each step, but if this is not possible then VQE calculations would be required. However, as each run requires fewer qubits and decreases the number of terms requiring separate measurement this approach may overall still be less costly than performing VQE over the whole problem, especially when combined with further measurement reduction strategies. (4) What are the most important terms to include in H con or equivalently in H noncon ? Currently, it is not known whether |H noncon | should be maximized or whether selecting high priority terms [48] from the whole Hamiltonian results in a better approximation for a given problem. We leave these questions to future work.
We have written an open-source CS-VQE code that includes all the updated methodology discussed in this paper. We welcome readers to make use of this, which is freely available on GitHub [64].  The "Peres-Mermin square" [37, 39] involves the construction of nine measurements arranged in a square. In this appendix we follow the construction given in [33]. Each measurement has only two possible outcomes (dichotomic) +1 and −1. In a realistic interpretation, performing each measurement on an object reveals whether the property is present (+1) or absent (−1), yielding nine properties.
We take three measurements along a column or row to form a "context" -a set of measurements whose values can be jointly measured i.e. the observables commute and thus share a common eigenbasis. Table II gives an example.
In a classical (noncontextual) model for this system, the nine measurements {IZ, ZI, ZZ, XI, IX, XX, XZ, ZX, Y Y } can be assigned a definite value independent of the context the measurement is obtained in.
For example if all measurements are assigned +1 in Table II, then c 0 = c 1 = c 2 = r 0 = r 1 = r 2 = +1 and six positive products are obtained. If a single entry in Table II is changed it will affect two products (a row and column product). We consider the following Equation in this setting: We find that classically we get an inequality P M ≤ 4.
We reiterate that this is the setting of eight +1 assignments and a single −1 assignment. This inequality is saturated when the −1 value is assigned to one of the observables in the last column of Table II. The significance of this inequality is that it can be violated by quantum systems. Thinking of this in a quantum setting, the operators in rows and columns of Table  II commute. If we multiply along the rows and columns we get +II apart from the last column where c 2 = −II (see Table III). This is the case regardless of what quantum state is considered . Using the expectation values of the product of these operators in Equation A1, we find P M = 6, violating the classical bound.
Classically Equation A1 is bounded as P M ≤ 4 due to the assumption that the nine observables of the object can be assigned a value consistently. Violation of this bound implies that either the value assignment must depend on which context (row or column) the observable appears in or there is no value assignment. This phenomenon is known as quantum contextuality [33].
In VQE, a Hamiltonian is defined by a linear combination of Pauli operators. The expectation value is obtained by measuring each Pauli operator in a separate experiment and combining the results. Different groups of commuting operators form contexts. In general there will be incompatible contexts where it is impossible to consistently assign joint outcomes. In other words, different inference relations will lead to contradictions. Outcomes assigned to individual measurements are therefore context-dependent and the problem is contextual. If not, then the problem is noncontextual and a noncontextual (classical) hidden variable model can be used to solve such systems.

Obtaining Noncontexual Hamiltonian
To begin CS-VQE, we first need to define the contextual and noncontextual parts. The task of finding the largest noncontextual subset of Pauli operators in S H full is a generalization of the disjoint cliques problem [? ? ], which is NP-complete. However, different heuristics can be used to approximately solve this problem.
To date, VQE experiments have mainly focused on chemistry Hamiltonians, where Hartree-Fock accounts for most of the energy. Such Hamiltonians contain Pauli operators that l 1 norms are dominated by diagonal terms -Pauli operators made up of tensor products of single qubit I and Pauli Z matrices. To find a noncontextual set in such a scenario, a greedy heuristic selecting high weight terms from the full Hamiltonian first can be used, while checking the set remains noncontextual using algorithm 1. This gives a noncontextual set containing mainly diagonal terms, with some additional operators [? ]. Alternative procedures to find the largest noncontextual subsets remain an open question for the CS-VQE algorithm.

Noncontextual hidden variable model
Once the noncontextual Hamiltonian H noncon is determined, we can define the set S Hnoncon to be the Pauli operators in H noncon . We split S Hnoncon into two subsets Z and T -representing the set of universally commuting Pauli operators Z and their complement respectively [? ? ]: Slight modifications to Algorithm 1 achieve this -where P would be set to be S Hnoncon and both Z, T should be returned.
The operators in Z are noncontextual, as by definition they are universally commuting and represent symmetries of H noncon . For the overall super-set S Hnoncon to be noncontextual, the remaining operators in T must be made up of N disjoint cliques C j [? ], where operators within a clique must all commute with each other and operators between cliques pairwise anticommute. This is because commutation forms an equivalence relation on T if and only if S Hnoncon is noncontextual [? ? ? ]. We can write T as: We re-define each clique C j using the identity operation defined by the first operator of the jth clique, P The new operators A (S.5b) So far we have considered the noncontextual set of Pauli operators S Hnoncon , which in general will be a dependent set. By this we mean that some operators in the set can be written as a product of other commuting operators in the set. We need to reduce this set S Hnoncon to an independent set of Pauli operators, where all operators in the noncontextual Hamiltonian can be inferred from the values of other operators in the set under the Jordan product.
To obtain an independent set from S Hnoncon , we first take the completely commuting Pauli operators: Inspecting the properties of R, one can bound its size. The set G has size at most n − 1, as n independent commuting Pauli operators form a complete commuting set of observables for n qubits. In other words, as G is a universally commuting set, if its size was n (or more) then taking G and one operator P = r j . We summarise this as: (S.11) The expectation value of all the operators in S Hnoncon are generated from some finite combination of terms in R under the Jordan product. This by extension will induce the expectation value for H noncon . Explicitly, let P Z i ∈ Z ⊆ S Hnoncon then if we let J G P Z i be the set of indices such that P Z i = i∈J G P Z i G i ; then [? ]: In words, we combine the expectation value of some finite set of Pauli operators in the independent set G -given by J G P Z i -to reproduce the expectation value for P Z i .
Similarly, the expectation value for each A (j) k P (j) 0 ∈ T ⊆ S Hnoncon (Equation S.5) term is given by: We can write the noncontexual Hamiltonian as: and find the energy by: where β i and β k are real coefficients and each expectation value is given by Equation S.12 and S.13 [? ].

Solving the Noncontexual Hamiltonian
To find the ground state of H noncon , we minimize Equation S.15 via a brute-force search as described in [? ]. Algorithm 2 summarises the steps. This could be done in the work presented here, because R was small for all molecular systems considered. First a trial q is defined. This is a set of ±1 expectation values for each G j . An initial guess of the amplitudes r i of the unit vector r is made and the energy (Equation S.15) is minimized over this continuous parameterization of r for a fixed trial q, until the energy converges to a minimum. These steps are repeated for all the 2 |G| assignments of q. The ( q, r) combination that gives the lowest overall energy represents the noncontextual ground state of the physical system. In the main text we denote this parameterization as ( q 0 , r 0 ). Note for a fixed q, we optimize over r. This can be thought of as optimizing a function defined on a hypersphere. Currently we haven't explored the properties of this function.
It remains an open question for the CS-VQE algorithm if alternate optimization strategies are possible, for example using chemical intuition during optimization. This brute force approach of searching over all 2 |G| possibilities for q may not be necessary. In the next section, we discuss how to map the contextual problem into a subspace consistent with a defined noncontextual state ( q, r). In section ?? of the main text, the full Hamiltonian is mapped contextual subspace consistent with the noncontextual ground state by implementing U W (Equation ??) followed by projecting the rotated Hamiltonian with Q W . However, the definitions for the operators making up U W were omitted. This subsection gives these details and is split into three parts. The first two parts consider how R is constructed. This is the problem of mapping a linear combination of pairwise anticommuting Pauli operators to a single Pauli operator and is known as unitary partitioning [? ? ]. In the context of this work, we use this to define R such that RA( r)R † → P (k) 0 . In the original formulation of CS-VQE only the sequence of rotations construction of R is used in the algorithm. We provide an alternative approach using the linear combination of unitaries construction proposed in [? ], which results in superior scaling. We show the effect each conjugate rotation R has on the number of terms in a given qubit Hamiltonian. The last subsection gives the unitary rotations required to map a commuting set of Pauli operators to single qubit Pauli Z operators.
In each of these subsections, we use the notation that Pauli operators with multiple indices represent the multiplication of Pauli operators: P a P b P c = P abc . These terms will also be Pauli operators up to a complex phase.

Unitary partitioning via a sequence of rotations
In this subsection, we show how A( r) → P where P (k) 0 ∈ A. To simplify the notation we drop the subscript 0 (denoting the first operator in a clique) and write each P as P k and P j respectively. The adjoint rotation generated by one of these operator X kj operators will be: e (−i θ kj 2 X kj ) A( r)e (+i θ kj 2 X kj ) = R S kj (θ kj )A( r)R † S kj (θ kj ) = r j cos θ kj − r k sin θ kj P j + β j sin θ kj + r k cos θ kj P k + P l ∈A ∀l =k,j β l P l . (S.17) The coefficient of P j can be made to go to 0, by setting r j cos θ kj = r k sin θ kj . This approach removes the term with index j and increases the coefficient of P k from r k → r 2 k + r 2 j [? ]. This process is repeated over all indices excluding j = k until only the P k term remains. This procedure can be concisely written using the following operator [? ]: which is simply a sequence of rotations. The angle θ kj is defined recursively at each step of the removal process, as the coefficient of P k increases at each step and thus must be taken into account. The correct solution for θ kj must be chosen given the signs of r k and r k [? ]. The overall action of this sequence of rotations is: Looking at Equation S.18, expanding the product of rotations results in R S containing O(2 |A|−1 ) Pauli operators. We write this operator as: (S.20) The adjoint rotation of R S on a general Hamiltonian H q = |Hq| a c a P a is:

(S.21)
We see that the number of terms increases as O(2 |A| |H q |) which was previously shown in [? ]. What we show next is additional structure in R S -due to the X kj operators -means the base of the exponent can be slightly lower; however, it still remains exponential in |A|.
Consider the adjoint rotation of a particular X kj in R S (Equation S.18): Performing the adjoint rotation on H q results in the following: a c a P a α kj I + β kj P jk = a c a α kj P a + β kj P kj P a α kj I + β kj P jk = a c a α 2 kj P a + α kj β kj P a P jk + α kj β kj P kj P a + β 2 kj P kj P a P jk = a c a α 2 kj P a + α kj β kj P a P jk − α kj β kj P jk P a + β 2 kj P kj P a P jk = a c a α 2 kj P a + α kj β kj [P a , P jk ] + β 2 kj P kj P a P jk = a c a α 2 kj P a + β 2 kj P kj P a P jk , if [P a , P jk ] = 0 α 2 kj P a + 2α kj β kj P a P jk + β 2 kj P kj P a P jk , else {P a , P jk } = 0 (S.23) When [P a , P jk ] = 0, we get: a c a α 2 kj P a + β 2 kj P kj P a P jk = a c a α 2 kj P a + β 2 kj P kj P jk P a = a c a α 2 kj P a + β 2 kj P a = a c a α 2 kj + β 2 kj P a = a c a P a .

(S.24)
When {P a , P jk } = 0, we find: a c a α 2 kj P a + 2α kj β kj P a P jk + β 2 kj P kj P a P jk = a c a α 2 kj P a + 2α kj β kj P a P jk − β 2 kj P kj P jk P a = a c a α 2 kj P a + 2α kj β kj P a P jk − β 2 kj P a = a c a α 2 kj P a + sin(θ kj )P a P jk − β 2 kj P a = a c a (α 2 kj − β 2 kj )P a + sin(θ kj )P a P jk = a c a cos(θ kj )P a + sin(θ kj )P a P jk .

(S.25)
Both cases use the following identities: where α kj = cos θ kj 2 and β kj = sin θ kj 2 . Using these results Equation S.23 reduces to: c a P a + a ∀{Pa,P jk }=0 c a cos(θ kj )P a + sin(θ kj )P a P jk = a η a P a + a ∀{Pa,P jk }=0 η a P jk P a , where η a represent the new real coefficients.
Consider the application of the next rotation operator R kl in R S (note k index represents the same Pauli operator P k ): Focusing on the last term in Equation S.28: R S kl a ∀{Pa,P jk }=0 η a P jk P a R † S kl = γ kl I + δ kl P kl a ∀{Pa,P jk }=0 η a P jk P a γ kl I + δ kl P lk = a ∀{Pa,P jk }=0 η a γ kl P jk P a + δ kl P kl P jk P a γ kl I + δ kl P lk = a ∀{Pa,P jk }=0 η a γ 2 kl P jk P a + γ kl δ kl P jk P a P lk + γ kl δ kl P kl P jk P a + δ 2 kl P kl P jk P a P lk = a ∀{Pa,P jk }=0 η a γ 2 kl P jk P a + γ kl δ kl P jk P a P lk − γ kl δ kl P kj P lk P a + δ 2 kl P kl P jk P a P lk = a ∀{Pa,P jk }=0 η a γ 2 kl P jk P a + γ kl δ kl P jk P a P lk + γ kl δ kl P jk P lk P a + δ 2 kl P kl P jk P a P lk = a ∀{Pa,P jk }=0 η a γ 2 kl P jk P a + γ kl δ kl P jk {P a , P lk } + δ 2 kl P kl P jk P a P lk kl P jk P a + 2γ kl δ kl P jk P a P lk + δ 2 kl P kl P jk P a P lk , if [P a , P lk ] = 0 For the case {P a , P lk } = 0: a ∀{Pa,P jk }=0 η a γ 2 kl P jk P a + δ 2 kl P kl P jk P a P lk = a ∀{Pa,P jk }=0 η a γ 2 kl P jk P a − δ 2 kl P kl P jk P lk P a = a ∀{Pa,P jk }=0 η a γ 2 kl P jk P a + δ 2 kl P jk P kl P lk P a = a ∀{Pa,P jk }=0 η a γ 2 kl P jk P a + δ 2 kl P jk P a = a ∀{Pa,P jk }=0 η a P jk P a .

(S.30)
We observe that there is no increase in the number of terms and the weight of each Pauli operator changes.
For the case [P a , P lk ] = 0: a ∀{Pa,P jk }=0 η a γ 2 kl P jk P a + 2γ kl δ kl P jk P a P lk + δ 2 kl P kl P jk P a P lk = a ∀{Pa,P jk }=0 η a γ 2 kl P jk P a + 2γ kl δ kl P jk P a P lk + δ 2 kl P kl P jk P lk P a = a ∀{Pa,P jk }=0 η a γ 2 kl P jk P a + 2γ kl δ kl P jk P a P lk − δ 2 kl P jk P kl P lk P a = a ∀{Pa,P jk }=0 η a γ 2 kl P jk P a + 2γ kl δ kl P jk P a P lk − δ 2 kl P jk P a = a ∀{Pa,P jk }=0 η a (γ 2 kl − δ 2 kl )P jk P a + 2γ kl δ kl P jk P a P lk = a ∀{Pa,P jk }=0 η a cos θ kl P jk P a + sin θ kl P jk P a P lk .
(S.31) The number of terms in the resulting operator has increased for each case where [P a , P lk ] = 0. The action of two rotations of R S on the whole Hamiltonian results in: η a cos θ kl P jk P a + sin θ kl P jk P a P lk =R S kl a η a P a R † S kl + a ∀{Pa,P jk }=0 µ a P jk P a + a ∀{Pa,P jk }=0 [Pa,P lk ]=0 µ a P jk P a P lk = a ν a P a + a ∀{Pa,P lk }=0 ν a P lk P a + a ∀{Pa,P jk }=0 µ a P jk P a + a ∀{Pa,P jk }=0 [Pa,P lk ]=0 µ a P jk P a P lk .
.32. From these results we can infer how the terms in H q will scale for a general sequence of rotations of size |R S | (Equation S.18), which in general change as: This operation increases the number of terms in H q to O 2 (|A|−1) |H q | . However, the structure of the sequence of rotation operator actually requires 2 g commuting/anticommuting conditions to be met for new Pauli operators to be generated by subsequent rotations. We therefore need to consider the probability that a given Pauli operator will either commute or anticommute with another. For the case of single qubit Pauli matrices σ a , σ b ∈ {I, X, Y, Z} by a simple counting argument P ([σ a , σ b ] = 0) = 5 8 and P ({σ a , σ b } = 0) = 3 8 , for Pauli matrices selected uniformly at random. Generalising this to tensor products of Pauli matrices on n qubits, for a Pauli operator to anticommute with another there needs to be an odd number of anticommuting tensor factors. First consider the binomial distribution: where n is the number of trials (repeated experiments), p is the probability of success -here the probability a single Pauli matrix anticommutes with another (p = 3 8 ) -and q is the probability of failure -here the probability a single Pauli matrix commutes with another (q = 5 8 ). Under these conditions, P (x) gives the probability that two n-fold Pauli operators, selected uniformly at random, anticommute in x-many tensor factors. Therefore, the probability of two uniformly random Pauli operators anticommuting (commuting) is given as a sum over odd (even) values of x ≤ n: Now, the binomial theorem states for any p, q ∈ R and define the following difference: Overall we find the probability that two n-fold Pauli operators anticommute to be: can be seen that the probability of two n-fold Pauli operators anticommuting quickly converges to 0.5 as the number of qubits n increases. The motivation for S.37 arises from observing that the quantity we subtract, (1/4) n , is the probability of obtaining an n-fold identity operator, which has the unique property of commuting universally. The complement 1 − (1/4) n therefore corresponds with the probability of selecting uniformly at random a Pauli operator with at least one non-trivial tensor factor. After discounting identity operators from consideration, the probabilities of anticommuting or commuting coincide, hence each occurs half of the time, explaining the 1/2 factor in S.38; the probability bias towards commutation is a consequence of the identity operator commuting universally, whereas there is no such operator that can anticommute universally. If we consider how the number of terms in H q changes upon the sequence of rotations transformation: H q → R S H q R † S where terms either commute or anticommute with a probability of 0.5, then the scaling is as follows: .33 is modified to have a constant factor of 2 −g , where g represents the number of commuting or anticommuting conditions required for operators in H q to obey in order to increase the number of terms upon a rotation of R S . Here each condition is assumed to occur with a probability of 0.5. This operation increases the number of terms in H q to O 1.5 (|A|−1) |H q | . Note |R S | = |A| − 1. In general, the scaling will be O x (|A|−1) |H q | where 1 ≤ x ≤ 2, depending on how each rotation in the sequence of rotations commutes with terms in H q . The x = 1 case occurs if each rotation in R S commutes with the whole Hamiltonian. Apart from this special case, the number of terms in H q will increase exponentially with the size of A or equivalently with the number of qubits n (as |A| ≤ 2n + 1 [? ]) when R is defined by a sequence of rotations.

Unitary partitioning via a linear combination of unitaries
Here we show how A( r) → P where R LCU is defined by a linear combination of Pauli operators. We consider the set of anticommuting Pauli operators making up A( r) (Equation ??). We can re-write this Equation, with the term we are reducing to (r k P (k) 0 ) outside the sum: (S.40) To simplify the notation we drop the subscript 0 (denoting the first operator in a clique) and write each P (k) 0 , P (j) 0 as P k and P j respectively.
A re-normalization can be performed on the remaining sum yielding: where: Using the Pythagorean trigonometric identity: sin 2 (x) + cos 2 (x) = 1, A( r) can be re-written as: Comparing Equations S.41 and S.43, it is clear that cos(φ k ) = r k and sin(φ k ) = Ω. It was shown in [? ] that one can consider rotations of A( r) around an axis that is Hilbert-Schmidt orthogonal to both H A\{r k P k } and P k : X anticommutes with A and is self-inverse [? ]: and has the following action [? ]: One can also define the rotation [? ? ]: The conjugate rotation will be: Note the different order of j and k for R LCU and R † LCU . The adjoint action of R LCU on A( r) is: By choosing α = φ k , the following transformation occurs R LCU A( r)R † LCU = P k [? ? ]. This fully defines the R LCU operator required by unitary partitioning. Next we need to consider the use of this operator in CS-VQE.
The adjoint action of R LCU on a general Hamiltonian H q = |Hq| i c i P i is: We can rewrite the final term (Equation S.50e) as: Here we have applied the identity of conjugating a Pauli operator P u with another Pauli operator P v resulting in two cases: Focusing on the last term of Equation S.51, we can simplify S.51c as j and l run over the same indices we can re-write each l = j sum as l > j and expand into two terms: (δ j c i δ l ) P jk P i P kl + P lk P i P kj (S.53) We can expand then expand this equation into the four cases for when: For the first case and last case: Whereas, for the second and third cases: We can rewrite Equation S.51 using this result: where we have combined the second and third conditions into a single condition of {P a , P jk P kl } = {P a , P j P l } = 0 and combined the new coefficients into one coefficient denoted ν.
Next consider the S.50d term of equation S.50. One can use the fact that j and l run over the same indices: Overall we can re-write equation S.50 using these results, yielding: 58) We observe that the number of terms in R LCU H q R † LCU at worst scales as The total number of qubits n bounds the size of |A| ≤ 2n + 1 [? ], and thus the number of terms in H q will increase quadratically with the size of A or number of qubits n when R is defined by a linear combination of unitaries.

Mapping Pauli operators to single-qubit Pauli Z operators
In Appendix A of [? ] a proof is given on how to map a completely commuting set of Pauli operators to a single qubit Pauli Z operator. We summarise the operation required and omit the proof. We denote a given Pauli operator on n qubits as: P = n−1 i=0 σ P i , where σ i are a single qubit Pauli operators. There are two cases we need to consider (diagonal and non-diagonal), with the goal to reduce the operators in R ≡ {P For a non-diagonal Pauli operators P a ∈ R , there must be at least one single qubit Pauli operator indexed by qubit k such that: σ Pa k ∈ {X, Y }. We can use this to define operator P b that must anticommute with P a : These two Pauli operators differ by exactly one Pauli operator on qubit index k. We can define the rotation: Conjugating P a with this operator results in: and P a has been mapped to a single qubit Pauli Z operator. For diagonal operators P c ∈ R , all the n-fold tensor products of single qubit Pauli operators must be either Z or Since R is an independent set, for all the rotated P a there must be at least one index l such that σ P a l = I and σ Pc l = Z. We denote this operator P c . We also define a new operator P d from this, which only acts non-trivially on the l-th qubit with a single qubit Y . To summarise: We can define the rotation: Conjugating P c with this operator results in: P c is now a non-diagonal Pauli operator (contains a single qubit X acting on qubit l). This operator P c can now be mapped to a single qubit Z operator using a further π 2 -rotation following the previously given procedure for non-diagonal Pauli operators.
The operators V i in the main text (equation ??) are defined by these π 2 -rotations, such that each q i G i and P (k) 0 is mapped to a single qubit Pauli Z term. At worst, two π 2 -rotations are needed for every operator in R (Equation ??), which occurs when all operators in R are diagonal.

II. UNITARY PARTITIONING MEASUREMENT REDUCTION
In unitary partitioning, the Hamiltonian is partitioned into groups of operators that's linear combination are unitary Hermitian operators. This is done by forming normalized groups of Pauli operators that pairwise anticommute. We can write this as: This can be thought of as each clique is of size one. The subscript u is to denote no grouping. A natural metric to evaluate the the measurement cost of a particular grouping of Pauli operators is therefore given by the ratio R of these two terms: where the greater the value of R, the better the measurement saving is by assembling these operators into a particular group. Next, our analysis diverges from Crawford et al., where we consider groups of anticommuting operators (rather than commuting operators) [? ]. First we consider the covariance of two anticommuting Pauli operators.
The amount two random variables vary together (co-vary) is measured by their covariance. Consider the results of random variables x and y, one can obtain a set of M paired measurements: A positive covariance indicates that higher than average values of one variable tend to be paired with higher than average values of the other variable. A negative covariance indicates that a higher than average value of one variable tend to be paired with lower than average values of the other. If two random variables are independent, then their covariance will be zero. However, a covariance of zero does not mean two random variables are independent, as nonlinear relationships can result in a covariance of zero. In the context of measuring a quantum state in the Pauli basis on a quantum computer this would be a set of paired single shot samples {s a i , s b i |i = 0, 1, . . . , N − 1}, where s a i , s b i ∈ {−1, +1}. Experimentally, each pair is the (single shot) measurement outcome for P a followed by the (single shot) measurement outcome for P b . Taking simultaneous projective measurements, without re-preparing the quantum state is a meaningful operation if the operators share a common eigenbasis. The order of measurement does not effect measurement outcomes, but the paired samples will be statistically correlated and have a certain covariance. However, for anticommuting Pauli operators this is not the case, as these operators do not share a common eigenbasis. Projective measurement means the expectation value of these operators cannot be known simultaneously. We consider the covariance in this scenario.
Consider the the spectral decomposition of two anticommuting Pauli operators {P a , P b }: and where for Pauli operators: Without loss of generality, assume P a is measured first on a general normalized quantum state |ψ = γ |κ 0 + δ |κ 1 . The only possible post measurement outcomes are |κ 0 or |κ 1 , with probabilities |γ| 2 or |δ| 2 respectively. Consider the result of subsequently measuring P b . The expectation value in each scenario will be: Overall, we find the probabilities of all possible combinations of measurement outcomes to be: This result shows that the probability of obtaining |Ω 0 or |Ω 1 is not affected by the probability of obtaining |κ 0 or |κ 1 in the first measurement. The variables are therefore statistically independent 1 . We find the covariance of P a and P b , where {P a , P b } = 0, to be: where under independence: P a P b = P a P b . Intuitively, this result makes sense. The projective measurement of the first Pauli operator maximally randomizes the expectation value of the other Pauli operator and thus the covariance will be zero. Interestingly, the projective measurement causes the underlying distribution of the quantum state to change and so subsequent measurements generating paired samples are not well defined in this setting (for anticommuting operators). This phenomenon is not present in classical experiments. However, the same statistical analysis can be done if we just take pairs of subsequent measurements and only do a statistical analysis on these random variables. We note that our analysis did not have to account for P a P b not being a valid observable, as for anticommuting Pauli operators this operator is not Hermitian. Given the covariance of two anticommuting Pauli operators is zero, we find the variance of a normalized anticommuting clique γ j C j to be: We use this to obtain the following R ratio (equation S.68): |Cj |−1 ). Minkowski inequality ensures x j 2 ≤ x j 1 . At worst unitary partitioning will achieve the same number of measurements as no grouping and will more often achieve an improvement. However, we can actually bound the improvement in general as: where the Cauchy-Schwarz inequality has been utilized. Overall, we find u 2 ≤ u 1 ≤ √ n u 2 and thus: (S.79)

III. NUMERICAL DETAILS OF THE TOY EXAMPLE
This section provides all the details for the Toy problem described in Section ??. The full noncontextual ground state is: Their action results in: R S A( r 0 )R † S = R LCU A( r 0 )R † LCU = Y XY I. We then defined U depending on which generators we wish to fix. We found the optimal ordering of stabilizers (supplied in Equation ??) to fix via a brute force search over all |W all | i = 2 4 − 1 = 15 possibilities for W. The following optimal ordering was obtained: This defines all the information required to implement CS-VQE.  Here W has been set to W all , which defines U † W all = e 1i π 4 XIY I e 1i π 4 IY Y I R S/LCU . R S and R LCU are defined in Equations S.82 and S.83. The two left columns (H SeqRot and H LCU ) give H rotated by U W . Each projected Hamiltonian is generated from these, where the eigenvalue of certain stabilizers are fixed according to the projector Q W . For the last two rows, the eigenvalue of A( r 0 ) has not been fixed, but the non-Clifford operator R S/LCU is still included within U † W all . This leads to an unnecessary increase in the number of Pauli operators for these two cases, as these transformed operators are isospectral with associated Hamiltonians in  All the subplots in Figure S.1 give the simulation results of each molecular Hamiltonian at different levels of noncontextual approximations. This is equivalent to how many contextual stabilizers W eigenvalues are fixed. In each plot, the leftmost data represents the case when all the noncontextual stabilizer eigenvalues are fixed and is the case for the full noncontextual approximation to a given problem [? ]. Moving right, we remove a single stabilizer from W and thus don't fix the eigenvalue of that stabilizer. This reintroduces a qubits worth degree of freedom into the problem. At the limit that no stabilizer eigenvalues are fixed (W = {}) we return to standard VQE over the full problem and no noncontextual approximation is made. In each plot this scenario is represented by the far right data point (excluding the data for the full non tapered Hamiltonian that is supplied for reference only). The raw data for these results is supplied in the Supplemental Material (see the zipped file). We include data beyond Hamiltonians achieving chemical accuracy, to show the different possible approximations, rather than stopping once chemical accuracy was achieved.