Nearly Optimal Measurement Scheduling for Partial Tomography of Quantum States

Many applications of quantum simulation require to prepare and then characterize quantum states by performing an efficient partial tomography to estimate observables corresponding to $k$-body reduced density matrices ($k$-RDMs). For instance, variational algorithms for the quantum simulation of chemistry usually require that one measure the fermionic 2-RDM. While such marginals provide a tractable description of quantum states from which many important properties can be computed, their determination often requires a prohibitively large number of circuit repetitions. Here we describe a method by which all elements of $k$-body qubit RDMs acting on $N$ qubits can be directly measured with a number of circuits scaling as ${\cal O}(3^{k} \log^{k-1}\! N)$, an exponential improvement in $N$ over prior art. Next, we show that if one is able to implement a linear depth circuit on a linear array prior to measurement, then one can directly measure all elements of the fermionic 2-RDM using only ${\cal O}(N^2)$ circuits. We prove that this result is asymptotically optimal, thus establishing an exponential separation between the number of circuits required to directly measure all elements of qubit versus fermion RDMs. We further demonstrate a technique to estimate the expectation value of any linear combination of fermionic 2-RDM elements using ${\cal O}(N^4 / \omega)$ circuits, each with only ${\cal O}(\omega)$ gates on a linear array where $\omega \leq N$ is a free parameter. We expect these results will improve the viability of many proposals for near-term quantum simulation.


I. INTRODUCTION
Extracting data from an exponentially complex quantum state is a critical bottleneck for near-term quantum computing applications. The advent of variational methods, most notably the variational quantum eigensolver [1,2], inspires hope that useful contributions to our understanding of strongly-correlated physical and chemical systems might be achievable in pre-error corrected quantum devices [3]. Following this initial work, much progress has gone into lowering the coherence requirements of variational methods [4], calculating system properties beyond ground state energies [5][6][7], and experimental implementation [8][9][10][11]. However, initial estimates for the number of measurements required to accurately approximate the energy of a variationally generated quantum state were astronomically large, with initial bounds on the number of repetitions required for N qubit instances of the quantum chemistry problem as high as O(N 8 ) [12,13]. Although these results might seem discouraging for variational approaches, until recently, little effort has been devoted to lowering the scaling of the number of measurements needed.
The most common way to estimate the energy of a quantum state during a variational quantum algorithm is to perform an efficient partial tomography [2] on a set of observables which comprise a k-body reduced density matrix (k-RDM) 1 [13]. For instance, in the case of quantum chemistry one typically estimates the energy from the fermionic 2-RDM. These reduced density matrices catalogue important correlations between particles from which one can compute a large number of important properties, thus offering a useful and tractable description of an otherwise complex state.
A simple approach to measurement of a quantum state is to group the degrees of freedom of interest (often elements of the RDMs) into sets of mutually-commuting operators, or 'cliques', which may then be simultaneously measured. One then wishes to find the smallest 'clique cover' -a set of cliques that contain every measurement required [14]. The general problem of finding such cliques in an unstructured setting is known to be NP-hard [15], and recent work has focused on applying approximate algorithms for clique finding or graph colouring to these problems, achieving constant or empirically determined linear scaling improvements over an approach that measures each term individually [14,[16][17][18][19]. However, often the operators of interest are local fermionic or qubit terms which have significant internal structure not considered in previous clique-finding approaches. Leveraging this structure is critical to optimizing clique covers for measurement of quantum states, and proving bounds on what improvements may be obtained via the cliquecovering approach. For instance, a promising approach based on measuring the quantum chemistry Hamiltonian in a factorized form [20] was recently introduced in [21]; however, that technique focused on estimating the energy rather than the 2-RDM. In addition to determining the energy, the 2-RDM allows one to predict other important observable such as energy derivatives [7,22], multipole moments [23] and enables techniques for relaxing orbitals to reduce basis error [5,24].

ref.
partitioning method circuits based on partitions gate count depth connect. RDM sym. [2] commuting Pauli heuristic -O(N 4 ) ----- [8] compatible Pauli heuristic single rotations O(N 4 ) N 1 linear yes no [13] n-representability constraints single rotations O(N 4 ) N 1 linear no no [25] mean-field partitioning fast feed-forward [14] compatible Pauli clique cover single rotations O(N 4 ) N 1 linear yes no [16] commuting Pauli graph coloring stabilizer formalism O (N 3 ) --full no no [19] anticommuting Pauli clique cover Pauli evolutions [17] commuting Pauli clique cover symplectic subspaces O(N 3 ) O(N 2 / log N ) -full no no [21] basis rotation grouping Givens rotations O(N ) N 2 /4 N/2 linear no number [18] commuting Pauli clique cover full yes no [26] commuting Pauli clique cover  Gate counts and depths are given in terms of arbitrary 1-or 2-qubit gates restricted to the geometry of 2-qubit gates specified in the connectivity column. In the "RDM" column we report whether the technique is able to measure the entire fermionic 2-RDM with the stated scaling, or just a single expectation value (e.g., of the Hamiltonian). In the "sym." column we report whether any symmetries of the system commute with all measurements made -this allows for simultaneous measurement, enabling strategies for error mitigation by post-selection at zero additional cost. The term "compatible" above is used to mean that the operators are grouped so that the operators within a group commute on each tensor factor on which the operator acts. The number of partitions refers to the number of unique term groupings which can each be measured with a single circuit -thus, this reflects the number of unique circuits required to generate at least one sample of each term in the Hamiltonian. However, we caution that one cannot infer the total number of measurements required from the number of partitions, and often this metric is highly misleading. The overall number of measurements required is also critically determined by the variance of the estimator of the energy. As explained in the earliest reference contained in this table, when terms are measured simultaneously one must also consider the covariance of those terms. In some cases, a grouping strategy can decrease the number of partitions but increase the total number of measurements required by grouping terms with positive covariances. Alternatively, certain strategies, such as the third reference in this table, actually increase the number of partitions while reducing the number of measurements required overall by lowering the variance of the estimator.
In this work, we find lower bounds for the clique cover sizes in k-local fermionic and qubit systems for k ≤ 4, and provide schemes which saturate these bounds while obtaining estimates of the fermionic 2-RDMs. Via a binary partition strategy, we construct a clique cover of all k-local qubit operators in an N -qubit system with size O(3 k log k−1 N ) 2 . We then prove a lower-bound on the size of a clique cover of 4-index fermionic operators (such as the 2-RDM) of Ω(N 2 ), and describe a protocol that saturates this bound. We detail measurement circuits for all cliques in this cover with a circuit depth of O(N ) and gate count of O(N 2 ) (requiring only linear connectivity), that additionally allow for error mitigation by symmetry verification [29,30]. Finally, we detail an alternative measurement scheme for fermionic systems, based around finding cliques of anti-commuting operators, which requires O(N 4 /ω) measurements but with a gate count for the measurement circuits of only O(ω) on a linear array, for a free parameter ω < N . 2 Here and throughout this paper all logarithms are base two.
In Tab. I, we provide a history of previous art in optimizing measurement schemes for the electronic structure problem, and include the new results found in this work. We further include the lower bounds for the number of partitions required for anti-commuting and commuting clique cover approaches that were presented in this work.

II. BACKGROUND
Physical systems are almost always characterized by local observables. However, the definition of locality depends on the exchange statistics of the system in question. In an N -qubit system, data about all k-local operators within a state ρ is given by the (qubit) k-reduced density matrices, or k-RDMs [13] Here, the trace is over all other qubits in the system. To estimate k ρ, we need to estimate expectation values of all tensor products of k single-qubit Pauli operators P i ∈ {X, Y, Z}. We call such operators k-qubit operators (as opposed to k-local operators), and label the set of all k-qubit operators S (k) P . In an N -fermion system, data about all k-body operators is contained in the (fermionic) k-body reduced density matrices, which are obtained from ρ by integrating out all but the first k particles [13] Estimating k D requires estimating the expectation values of all products of k fermionic creation operators c † j with k fermionic annihilation operators c j . For instance, the 2-RDM catalogues all 4-index expectation values of the form c † p c † q c r c s . Equivalently, this information is contained in the expectation values of products of 2k Majorana operators γ j , defined by We call such operators k-Majorana operators, and label the set of such operators S may be performed by repeated direct measurement of the relevant operators, so long as they commute. As the (commutativity) structure between these sets is significantly different, this implies that measurement scheduling schemes for fermions and qubits need to be designed significantly differently as well. One may schedule a set of measurements to obtain expectation values of all operators in any given set S by defining a 'clique cover' α C α = S of cliques C α of mutuallycommuting operators -[A, B] = 0 for all A, B ∈ C α for all α [14]. One may measure all elements of each C α in parallel (given an appropriate measurement circuit), but afterwards the state |ψ needs to be re-prepared, implying that the amount of time for measurement scales with the size of the clique cover (i.e. the number of cliques). In this work, we focus on finding bounds for the size of clique covers of fermionic and qubit systems, and providing schemes to achieve these bounds.

III. CLIQUE COVERS FOR LOCAL QUBIT OPERATORS
To achieve a clique cover of the set S (k) P of all kqubit operators, we aim to associate a Pauli word W ∈ {X, Y, Z} N to each clique, and measure the ith qubit in the W i basis. Such a clique covers every tensor product of the individual Pauli operators W i . It thus suffices to determine a set of words such that every k-qubit operator is contained within at least one word. We construct such a set through a k-ary partitioning scheme, which we first demonstrate for k = 2. As motivation, consider that the set of 9 words (with A, B = X, Y, Z) covers all 2-qubit operators that act on qubits j < N/2 and k ≥ N/2. We may generalize this to obtain all other 2-qubit operators by finding a set of binary partitions S n,0 ∪ S n,1 = {1, . . . , N } such that for any pair 0 ≤ i = j ≤ N there exists n, a such that i ∈ S n,a , j ∈ S n,1−a . Let us define L = log N , and write each qubit index i in a binary representation, Then, for n = 0, . . . , L − 1 we define All 0 ≤ i = j ≤ N differ by at least one of their first L binary digits (as shown in Fig. 1(a)), so the set of words W defines a clique cover of 2-qubit operators. As W is the same word for every n we need only choose this word once and so the size of the cover may be reduced to 6L + 3.
To see how the above may be extended to k > 2, let us consider k = 3. We wish to find 3-ary partitions ∪ 3 a=1 S n,a = {1, . . . , N } that, given any set i 1 , i 2 , i 3 , we can find some index n for which i a ∈ S n,a (allowing for permutation of the i a ). Then, by running over all combinations of X, Y, Z on the three parts of each partition, we will obtain a set of words that cover all 3-qubit operators. We illustrate a scheme that achieves this Fig. 1(b). We iterate first over n = 1, . . . , L, and find the largest n such that i 1 , i 2 and i 3 are split into two subsets by a binary partition. (i.e. where S n,a ∩ {i 1 , i 2 , i 3 } is non-empty for a = 0 and a = 1). This implies that two of the indices lie in one part, and one in the other. Without loss of generality, let us assume i 1 ∈ S n,1 and i 2 , i 3 ∈ S n,0 (following Fig. 1). It now suffices to find a set of partitions for S n,0 so that we guarantee i 2 and i 3 are split in one such partition. We could imagine repeating the binary partition scheme over all S n,0 ; i.e. generating the log N sets S n,0 ∩ S n ,a . However, we can do better than this. As i 1 , i 2 and i 3 are not split in any binary partition S n ,0 , S n ,1 with n > n, i 2 and i 3 must be in a contiguous block of length 1/2 n within S n,0 . This means that we need only iterate over n = 0, . . . , n − 1. We must also iterate over the same number of partitions of S n,1 , and so the total number of partitions we require is The above generalizes relatively easily to k > 2. Given a set I = {i 1 , . . . , i k }, we find the binary partition S n,0 , S n,1 with the largest n that splits I into non-empty sets I 0 = I ∩ S n,0 and I 1 = I ∩ S n,1 . Then, we iterate over |I 0 |-ary partitions of the contiguous blocks of S n,0 and the |I 1 |-ary blocks of S n,1 . In total there are k − 1 possible ways of dividing I (up to permutations of the elements). This implies that at each n we have to iterate over k − 1 different sub-partitioning possibilities, making the leading-order contribution to the number of cliques and the total number of cliques O(3 k log k−1 N ).

IV. CLIQUE COVERS FOR LOCAL FERMIONIC OPERATORS
One can prove a lower bound for the size of the clique cover of the set S (k) M of all k-Majorana operators to be Ω(N k/2 ) 3 . This bound follows from the fact that no clique of commuting k-Majorana operators may contain more than Ω(N k/2 ) unique elements (up to minus signs). This in turn may be seen by induction. All 1-Majorana operators anti-commute, so the bound clearly holds for k = 1. Then, let us consider the situation where k is even and when k is odd separately. Suppose we have a clique of O(N d ) k-Majorana operators with k even. As there are only 2N individual Majorana operators, by counting there must be a set of O(N d−1 ) of these operators that share a single Majorana γ 0 , which we may write in the form ±γ 0 Γ i . As [γ 0 Γ i , γ 0 Γ j ] = 0 if and only if [Γ i , Γ j ] = 0, this gives a clique of O(N d−1 ) commuting (k − 1)-Majorana operators. Then, by our inductive assumption (and recalling that k is even), Now, let k be odd, and suppose we have a clique of O(N d ) k-Majorana operators. Two products of Majorana operators anticommute unless they share at least one term in common, so this implies that there must be a set of O(N d ) of these operators that share a single Majorana γ 0 4 . Following the same argument as above, this generates a clique of O(N d ) (k − 1)-Majorana operators, and so by our inductive assumption (recalling that k is odd) which completes the proof. This implies that the minimum number of measurements required to obtain all fermionic k-RDM elements is O(N k ) (recalling that fermionic k-RDM terms are expectation values of 2k-Majorana operators). We now construct clique covers for S M and S M that saturate this lower bounds, allowing for asymptotically optimal estimation of the fermionic 1-RDM and 2-RDMs respectively. 2-Majorana operators that share any term do not commute, so to cover S Then, our cover [ Fig. 2(a)] is given by pairing the ith element of B n 2m with the (i+a)th element of B n 2m+1 (modulo 2 n ), as n runs over 0, . . . , log(2N ) − 1 and a runs over 0, . . . , 2 n − 1. Formally, this gives the set of cliques C a,n := {γ α γ β , α = m2 n+1 + i, β = (2m + 1)2 n + (i + a mod 2 n ) m = 1, . . . N 2 −n−1 , i = 1, . . . , 2 n }, with a total cover size of As all operators in one of the above cliques S a,n commute, their products commute, and the set is clearly a clique of commuting 4-Majorana operators. However, each 2-Majorana operator is guaranteed to be in only one of the cliques C a,n , so this will not yet cover all 4-Majorana operators. To fix this, we aim to construct a larger set {C α } of cliques of commuting 2-Majorana operators, such that for every set γ i1 , γ i2 , γ i3 , γ i4 there exists one C α containing both γ ia γ i b and γ ic γ i d (for some permutation of a, b, c, d = 1, 2, 3, 4). This may be achieved by the strategy illustrated in Fig. 2(b). For each I = i 1 , i 2 , i 3 , i 4 , choose the smallest n such that I ⊂ B n m for some m. This implies that the {B n m } split I into two parts -I a = I ∩ B n−1 2m+a , for a = 0, 1, and |I 0 | = 1, 2 or 3. Suppose first |I 0 | = 2, (case 1 in Fig. 2(b)). In this case, by iterating over all pairs of elements in B n−1 2m and subsequently all pairs of elements in B n−1 2m+1 , we will at some point simultaneously pair the elements of I 0 and the elements of I 1 , as required. This may be performed in parallel for each m, making the total number of cliques generated at each n |B n−1 2m | 2 = 4 n−1 . Now, suppose |I 0 | = 3 (case 2 in Fig. 2) -or |I 0 | = 1 as the two situations are equivalent. Let n < n be the smallest number such that I 0 ⊂ B n m for some m , and we may split I 0 into two sets I 0,a = I 0 ∩ B n −1 2m +a for a = 0, 1. Of the three elements in I 0 , two of them must either lie in I 0,0 or I 0,1 -suppose without loss of generality that |I 0,0 | = 2. Then, by iterating over all pairs within B n −1 2m , and all pairs between elements of B n −1 2m +1 and B n−1 2m+1 , we will at some point pair both elements in I 0,0 and both elements in I 0,1 ∪ I 1 .
This pairing needs to occur for all n > n , which implies we need to iterate over all combinations of pairs between elements of B n −1 2m +1 and {1, . . . , 2N }/B n −1 2m (while iterating over pairs within B n −1 2m ). This may be performed in parallel for each m at each n . First, iterate over all possible pairings of B n m0 and B n m1 (which requires O(N 2 −n ) iterations). Then, iterate over all pairs between B n −1 2m0+a0 and B n −1 2m1+a1 for all combinations of a 0 , a 1 = 0, 1 (requiring 4 × 2 n −1 iterations). Simultaneously, iterate over all pairs within B n −1 2m0+1−a0 and B n −1 2m1+1−a1 (requiring again 2 n −1 iterations). This generates 4 × 4 n −1 cliques at each n . The total number of cliques we then require to cover all 4-Majorana operators using this scheme is then Direct measurement of products of Majorana operators is a more complicated matter than measurement of Pauli words (which require only single-qubit rotations). However, when the fermionic system is encoded on a quantum device via the Jordan-Wigner transformation [31], a relatively easy measurement scheme exists. Within this encoding, we have so if we can permute all Majorana operators such that each pair (γ i , γ j ) of Majoranas within a given clique is mapped to the form (γ 2n , γ 2n+1 ), they may be easily read off. To achieve such a permutation, we note that the Majorana swap gate U i,j = e π 4 γiγj satisfies And so repeated iteration of these unitary rotations may be used to 'sort' the Majorana operators into the desired pattern. This may be performed in an odd-even search format [32] -at each step t = 1, . . . , N we decide for each n = 1, . . . N whether to swap Majoranas 2n and 2n + 1, and then whether to swap Majoranas 2n and 2n − 1. Within the Jordan-Wigner transformation these gates are local: and so each timestep is depth 3, for a total maximum circuit depth of 3N and total maximum gate depth 3N 2 .
(To see that only N timesteps are necessary, note that each Majorana can travel up to 2 positions per timestep.) Following the Majorana swap circuit, all pairs of Majoranas that we desire to measure will be rotated to neighbouring positions and may then be locally read out. As each Majorana swap gate commutes with the global parity 2N i=1 γ i , this will be measurable alongside the clique as the total qubit parity N i=1 Z i , allowing for error mitigation by symmetry verification [29,30]. As the above circuit corresponds just to a basis change, for many VQEs it may be pre-compiled into the preparation itself, negating the additional circuit depth entirely.
As an alternative to the above ideas, it is possible to extend the paritioning scheme for measuring all kqubit operators to a scheme to measure all fermionic 2-RDM elements via the Bravyi-Kitaev transformation [33,34]. This transformation maps local fermion operators to k = O(log N ) qubit operators, and so using our approach the resulting scheme would require a number of measurements that is O(3 k log k−1 N ) = (3 log N ) O(log N ) . Although this is superpolynomial, it is a slowly growing function for small N and also has the advantage that the measurement circuits themselves are just single qubit rotations. Furthermore, as the set of fermion operators is very sparse in the sense that it has only O(N 4 ) terms rather than N O(log N ) terms, the scheme may be able to be further sparsified. Understanding this in more detail is a clear target for future work.

V. MEASURING LOCAL FERMIONIC OPERATORS BY GROUPING INTO ANTI-COMMUTING CLIQUES
Products of Majorana and Pauli operators have the special property that any two either strictly commute or strictly anti-commute. This raises the question of whether there is any use in finding cliques of mutually anti-commuting Pauli operators. Such cliques may be found in abundance when working with Majoranase.g. for fixed 0 ≤ j, k, l ≤ 2N , the set A j,k,l = {γ i γ j γ k γ l } is a clique of 2N − 3 mutually anti-commuting operators. Curiously, it turns out that asymptotically larger anti-commuting cliques are not possible -the largest set of mutually anti-commuting Pauli or Majorana operators contains at most 2N + 1 terms (see App. A for a proof). The number of anti-commuting cliques required to contain all 4-Majorana operators is thus bounded below by Ω(N 3 ), matching the numerical observations of [19].
Although direct measurement of each term in an anticommuting clique A of size L must take O(N ) time, it is possible to measure a (real) linear combination O = L i=1 c i P i of clique elements in a single shot. Since all elements of A j,k,l share three of the same four indices, here we can associated each P i in the sum over the elements of A j,k,l with the Majorana P i = γ i γ j γ k γ l . We will argue thatÕ = ( First, we note that for P i , P j ∈ A the (anti-Hermitian) product P i P j commutes with every element in A but P i and P j itself. This implies that the unitary rotation e θPiPj may be used to rotate between P i and P j without affecting the rest of A: (19) This rotation may be applied to remove the support of We extend this to remove support of O on each P i in turn by choosing θ i = j<i c 2 i /c i+1 , and then (21) Following this measurement circuit, O may be measured by reading all qubits in the basis of the final Pauli P L . Intriguingly, for P i , P i+1 ∈ A j,k,l , we have that P i P i+1 = γ i γ i+1 , which maps to a 2-qubit operator under the Jordan-Wigner transformation (as noted previously). This implies a measurement circuit for these sets may be achieved with only linear gate count and depth, linear connectivity, and no additional ancillas. We can slightly reduce the depth by simultaneously removing the P i from the "top" and "bottom"; i.e., we remove P 2N −3 by rotating with P 2N −4 at the same time as removing P 1 by rotating with P 2 , until after exactly N −2 layers, we have only the term P N remaining. All generators in this unitary transformation commute with the parity It is possible to reduce the depth further by removing Majoranas from the set -if we restrict ourselves to subsets of ω elements of A j,k,l , the measurement circuit will have ω gates and be depth ω, but O(N 4 /ω) such sets will be needed to estimate a linear combination of 4-Majorana operators. This makes this scheme very attractive in the near-term, where complicated measurement circuits may be prohibited by low coherence times in NISQ devices.

VI. CONCLUSION
Experimental quantum devices are already reaching the stage where the time required for partial state tomography is prohibitive without optimized scheduling of measurements. This makes our work developing new and more-optimal schemes for partial tomography of quantum states exceedingly timely. In this work, we have shown that a binary partition strategy allows one to measure all k-local qubit operators in poly-log(N ) time, reaching an exponential improvement over previous art. By contrast, in fermionic systems we have found a lower bound on the number of independent preparations required to directly measure all k-local operators of Ω(N k/2 ), an exponential separation. We have developed schemes to achieve this lower bound for k = 2 and k = 4, allowing estimation of the entire fermionic 2-RDM to constant error in O(N 2 ) time. Additionally, we have demonstrated that one can leverage the anti-commuting structure of fermionic systems to measure all 4-Majorana operators in O(N 4 /ω) time with a gate count and circuit depth of only ω, allowing one to trade off an decrease in coherence time requirements for an increase in the number of measurements required.
We note that during the final stages of preparing this manuscript a preprint was posted to arXiv which independently develops a similar scheme for measuring kqubit RDMs [28]. This scheme seems to be identical to ours for k = 2 but uses insights about hash functions to generalize the scheme to higher k with scaling of e O(k) log N which improves over our bound of O(3 k log k−1 N ) by polylogarithmic factors in N . While Ref. [28] does not contemplate schemes for measuring fermionic RDMs, our suggestion to extend our k-qubit clique cover scheme with k = log(N ) via the Bravyi-Kitaev transformation is made more attractive by the reduced scaling in k. Unfortunately, this is still asymptotically unfavourable -the largest 4-Majorana operator under the Bravyi-Kitaev mapping has size O(4 log(N )), and as the lower bound on the size of a set of perfect hash functions is O(e k log(N )k −1/2 ) [35], which corresponds to a set of O(3 k e k log(N )k −1/2 ) cliques, the resulting clique cover size scales as approximately O(N 12 ). Whether this may be improved further, and how this method performs at small system sizes, are interesting questions for future research.
The clique cover bounds for qubit and fermionic systems in isolation are perhaps not so surprising, but taken together they give a somewhat interesting result: it requires exponentially more measurements to estimate all terms in a fermionic RDM than in a qubit RDM to a constant error in the system size. As far as we can tell, this gap will not be able to be overcome by any transformation of the Hamiltonian, and it will definitely not be overcome by a unitary transformation or embedding of the system -such a transformation could be inverted on the measurements to give a scheme for simultaneous measurement of non-commuting observables in the original system, which is a clear contradiction.
In particular, our results provide a sort of no-go theorem which argues against the existence of any fermionto-qubit mapping which can transform all elements of the 2-RDM into qubit operators with locality less than k = O(log N ). This holds even for schemes which introduce auxillary degrees of freedom while mapping to qubits, such as those explored in [33,[36][37][38]. For instance, it is impossible to achieve locality k = O(log log N ). This is the case because such a mapping could be combined with the e O(k) log N qubit partitioning technique to provide a scaling that violates our lower bound of needing at least Ω(N 2 ) circuits to measure the fermionic 2-RDM. Whether this no-go theorem has physical significance, i.e. by limiting the internal structure possibilities for 0-dimensional fermionic or spin Hamiltonians, is an interesting question left open by this work.
The exponential gap described above exists only when we are interested in estimating the entire RDM of a qubit or fermionic system. In most situations, we are instead interested in bounding the error in estimating some linear combination of poly(N ) RDM terms (e.g. to calculate the ground state energy of a system). The variance of such a combination typically grows as poly(N )/M , where M is the number of repetitions of each measurement, and so the total time required for estimation (at fixed variance) should grow polynomially in N for both qubit and fermionic systems. This variance estimation is made more complicated by constraints between the expectation values of different operators (the so-called n-representability constraints), the study of which is an active field of research [13,21].