Circuit connectivity boosts by quantum-classical-quantum interfaces

High-connectivity circuits are a major roadblock for current quantum hardware. We propose a hybrid classical-quantum algorithm to simulate such circuits without swap-gate ladders. As main technical tool, we introduce quantum-classical-quantum interfaces. These replace an experimentally problematic gate (e.g. a long-range one) by single-qubit random measurements followed by state-preparations sampled according to a classical quasi-probability simulation of the noiseless gate. Each interface introduces a multiplicative statistical overhead which is remarkably independent of the on-chip qubit distance. Hence, by applying interfaces to the longest range gates in a target circuit, significant reductions in circuit depth and gate infidelity can be attained. We numerically show the efficacy of our method for a Bell-state circuit for two increasingly distant qubits and a variational ground-state solver for the transverse-field Ising model on a ring. Our findings provide a versatile toolbox for error-mitigation and circuit boosts tailored for noisy, intermediate-scale quantum computation.


I. INTRODUCTION
Quantum computation promises a major disruption in high-performance computing, with applications on diverse fields ranging from many-body physics and chemistry to machine learning, finance, automation, or logistics, to name a few [1][2][3]. However, the current paradigm of noisy, intermediate-scale quantum (NISQ) devices limits quantum algorithms to circuits of low qubit numbers, low depth, and low connectivity [4]. This poses serious concerns on the actual usefulness of quantum computers in the near term and has thus ignited a both experimental and theoretical quest for ways to unleash the potential of quantum algorithms with NISQ hardware [5][6][7].
A large class of NISQ algorithms are based on hybrid quantum-classical approaches. One of the most succesful of these consists of parametrized quantum circuits variationally optimized through a classical optimizer aimed at approximating a target ground state [8,9]. To combat the noise in these systems, subsequent variants incorporated the idea of quantum error mitigation [10][11][12][13]. This refers to schemes whereby noisy experimental implementations (e.g., at different noise regimes or with different gate choices), together with suitable classical post-processing, are used to simulate a target, noiseless quantum circuit of limited size. This offers a NISQ alternative to quantum error correction (which requires large-scale quantum circuits), where full fault tolerance is achieved by actively correcting errors on the quantum hardware during the execution of the computation. * rwiersema@uwaterloo.ca More recently, a different type of hybrid method has been put forward [14][15][16][17][18]. There, a classical algorithm calls a quantum computer as a sub-routine to simulate a larger quantum circuit. However, the cost of this is that both the number of queries to the quantum subroutine and the classical post-processing runtime unavoidably grow exponentially with the size of the target circuit. Moreover, a particularly challenging aspect of NISQ devices is their inability to run algorithms that require high, long-range connectivity among the constituent qubits. In most NISQ hardwares, long-range gates are synthesized by a long sequence of nearest-neighbor gates. This drastically inflates the circuit depth and causes large infidelity due to noise accumulation incurred during the syntheses. This is a crucial limitation in the NISQ era.
Here, we take a conceptually different direction from previous hybrid schemes: instead of assembling a large quantum circuit from small pieces, we simulate a highconnectivity circuit from circuits with low connectivity and depth. To that end, we introduce the notion of quantum-classical-quantum (QCQ) interfaces. A QCQ interface for a gate U corresponds to a local measurement on the qubits on which U acts followed by a re-preparation of those same qubits in a random product state that depends on U . In other words, the interface performs a hybrid quantum-classical simulation of U . Each interface introduces a multiplicative statistical overhead that, as we prove below, is independent of the on-chip distance between the qubits. Hence, for fixed number of interfaces, e.g., the longer the range of the target gates is, the more drastic the reduction in depth attained is at the expense of a constant overall statistical overhead.
More technically, our interfaces combine state-of-theart state estimation based on single-qubit random meas-arXiv:2203.04984v1 [quant-ph] 9 Mar 2022 urements [19,20] with quasi-probability representations based on frames [21,22]. Such representations have been used for classically simulating a quantum circuit with Monte Carlo sampling techniques [23][24][25]. In particular, our algorithm can be seen as a hybrid version of the scheme of Ref. [25] where everything is quantum except for a subset of gates that one wishes to "cut out" of the experimental circuit. Here, we choose such subset in terms of the on-chip qubit distance. However, other relevant choices may be due simply to error mitigation or hardware-specific limitations. As most quasi-probability schemes, our method suffers from the infamous sign problem [26][27][28]. Remarkably, the severity of the problem depends only on the number of interfaces and not the on-chip distance between the qubits. Moreover, as byproduct contribution, in order to minimize the average sign of our quasi-probability representation, we develop a Metropolis-Hastings simulated-annealing algorithm based on random walks in the space of dual frames. We implement such walks through a convenient, long-known parametrization of generalized inverse matrices [29]. This allows us to decrease the sample complexity overhead per interface by almost a factor of two relative to the canonical-frame choice, constituting a practical tool of general relevance for sign-problem mitigation [30,31].
The paper is organized as follows. In Sec. II we introduce our notation and the necessary concepts from frame theory to understand QCQ interfaces. We then present our algorithm in detail in Sec. III. In Sec. IV we perform numerical experiments to show the efficacy of our method on two illustrative circuits, namely the preparation of a Bell state between two increasingly distant qubits and a variational ground-state solver for the 1D transverse-field Ising model with periodic boundary conditions. We end with a discussion of our results in Sec. V and provide a perspective on other potential applications of our method.

II. PRELIMINARIES
We consider an N -qubit system S of Hilbert space H S , and denote the space of bounded, linear operators on H S by L(H S ). We now consider the notion of a frame, which generalizes the notion of basis [21,22]. For our purposes, a frame F S for L(H S ) is any set F S := {M a } a of Hermitian operators M a that spans L(H S ). Such a (in general linearly-dependent) spanning set is sometimes referred to as over-complete basis of L(H S ). In turn, a frame D S := {M a } a s.t.
where I is the identity map on L(H S ), is called dual to F S (and we then refer to F S as the primal to D S ). The round kets and bras appearing in Eq. (1) are a shorthand notation to denote operators in L(H S ) and their adjoints, respectively. Accordingly, we denote by (A| B) the Hilbert-Schmidt inner product Tr A † B in L(H S ). This is a popular notation in quantum information [21,22,32] that will used here interchangeably with the (more usual) operator notation upon convenience. We take throughout M a ≥ 0 for all a and a M a = 1 1 S , with 1 1 S the identity operator on H S , so that F S is a positive operator-valued measure (POVM) on H S . POVMs define generalized (i.e. beyond von Neumann) measurements [33,34]. This, together with Eq. (1), allows us to express any density operator ∈ L(H S ) as where P (a) := (M a | ) is the probability of measurement outcome a on . Eq. (2) is the basis of classical-shadow tomography, a powerful technique to get compact classical representations of states from measurements [20,35]. Note that M a ≥ 0 for all a impliesM a 0 in general [21,22]. In addition, it will be useful to express the dual frame elements as affine combination of elements of F S , for some adequately chosen T. With this parametrization, the primal-and dual-frame overlap matrices T andT , respectively defined as T a,a := (M a | M a ) andT a,a := M a M a , are related asT = T T T. An experimentally convenient choice of F S and D S is for a := (a 1 , . . . a N ). Here, M aj is the j-th element of a single-qubit POVM frame andM aj that of the corresponding dual frame. We refer to these as factorable frames. By virtue of Eqs. (2) and (3), these allow one to express any as an affine combination of product states σ a := M a /t a , where t a := Tr[M a ] [19]. This fact has been used to reconstruct quantum states [19], processes [36], and overlaps [37] from single-qubit measurements. Additionally, this has been used to simulate quantum circuits [38] with generative machine learning models, where T was taken as the canonical pseudo-inverse of T . However, other choices of T are possible. It can be seen (see App. B) that Eq. (3) defines a dual to F S iff T a,a ∈ R, a T a,a = 1, and In general, the elements of T can be positive or negative. As shown below, the negativity of T governs the sample complexity of Monte Carlo estimations of expectation values of observables. Finally, note also that if T fulfills Eq. (4), necessarily so doesT = T T T (T and T collapsing to each other for the canonical choice of T being a pseudoinverse of T ).

III. INTERFACES FOR HYBRID CLASSICAL-QUANTUM CIRCUITS
Our goal is to simulate quantum circuits using hybrid classical-quantum ones. More precisely, we are given an observable O, an N -qubit input state 0 := |0 0|, and a target circuit C := {U k } k∈[f ] , with f ∈ N single-or twoqubit unitary gates U k . We denote by s k ⊂ S the subset of qubits on which U k acts, and by a s k a corresponding sub-string of measurement outcomes on s k . In addition, we use the short-hand notations s k := S \s k for the qubits on which U k does not act and 1 1 s k for the identity on H s k . From the f gates, l < f are particularly experimentally demanding for NISQ implementations, and they are marked by the set of labels L := {k 1 , k 2 , . . . k l }. The case we explicitly study below is that of two-qubit gates on qubits far apart in the connectivity graph in question. However, other relevant cases may be due to error mitigation convenience or other hardware-specific limitations, e.g. Either way, our goal is to estimate the expectation value Tr . . U † f by substituting every U k with k ∈ L by a classical simulation of it.
Our main tool to achieve this are interfaces between quantum objects and their (classical) frame representations. The first one is based on Eq. (2). Definition 1 (Quantum-classical interfaces). We refer as a QC interface on s k to the assignment of a classical snap-shotM as k to s k according to the measurement outcome a s k of a factorable POVM frame F s k on a state ∈ H S , occurring with probability P (a s k ) = (1 1 s k |(M as k | ).
The second one is the reverse interface, which simulates M as k as a linear combination of states σ bs k := M bs k /t bs k . This is done by importance-sampling b s k fromT (Is k ) , given a s k , withT (Is k ) the dual-frame overlap matrix on s k . To see this, we apply on |M as k ) the Hermitian conjugate of Eq. (1) and get |M as k ) = bs kT (Is k ) as k ,bs k t bs k |σ bs k ). Then, using a standard trick, we rewritẽ whereT (Is k ) as k is a short-hand notation for the vector given by the a s k -th row ofT (Is k ) , T (Is k ) By construction, P Is k (•|a s k ) is a valid probability distribution, from which b s k can be sampled. This can be used to quantum Monte Carlo simulateM as k [25].
Definition 2 (Classical-quantum interface). We refer as CQ interface on s k to the repreparation of s k in the state The third and final ingredient integrates QC and CQ interfaces with a classical simulation of U k . Multiplying U k from the right by Eq. (1) and from the left by the Hermitian conjugate of Eq. (1), we get That is, the action of U k is absorbed into the repreparation by sampling fromT (U k ) instead ofT (Is k ) (see Fig. 1). This leads to: We refer as a QCQ interface for U k on s k to the measurement of F s k , with outcome a s k , followed by the repreparation of σ bs k with probability as k ,bs k ; and the corresponding interface realized in such experimental run is thus mathematically represented by the operator V k (a s k , b s k ) := v as k ,bs k |σ bs k )(M as k |.
Our hybrid-circuit simulation then applies on k−1 the and using the fact that O is Hermitian, we can express the target expectation value Tr with the short-hand notation Eq. (8) can be experimentally estimated through an averageÔ M over M ∈ N runs. M is chosen to guarantee that the statistical error and significance level (failure probability) of the estimation are respectively given by target values ε and δ. We refer to M as sample complexity of the protocol and its explicit value is given in Theo. 1 below.
The procedure is sketched by the following pseudo-code.
Apply the gate U k on qubits s k .
To quantify the runtime of the algorithm, we define the interface negativity of the gate U k and the total forward interface negativity of the entire circuit C respectively as This allows us to state the following theorem.
with O the operator norm of O, then, with probability at least 1 − δ, the statistical error of O * M is at most ε.
The proof follows straightforwardly from the Hoeffding bound. We note that the factor 2 O 2 log (2/δ) ε 2 in Eq. (10) is the equivalent sample complexity bound one would obtain if Tr[ f O] was estimated from measurements on the actual state f . Hence, n 2 → quantifies the runtime overhead introduced by the interfaces. In that regard, the interface negativities play the same role in our hybrid classical-quantum simulation as the negativities of Ref. [25] in fully classical simulations with quasi-probability representations. An innovative and advantageous feature of Eq. (9) is the presence of the POVM-element trace t bs k in n U k , which comes from the state repreparation. Indeed, since t bs k < 1, the n U k 's (and therefore also n → ) are significantly smaller than their counterparts for fully classical simulations [25]. This is consistent with the intuition that hybrid classical-quantum Monte Carlo simulations should cause lower sample-complexity increases than fully classical ones.
Either way, the most relevant property for our purposes is that n 2 → (and therefore also M ) is independent not only of the numbers of gates f or qubits N but also, and most importantly, of the connectivity-graph distance between the qubits on which the interfaces act. In other words, for a fixed budget of measurement runs, simulating a gate U k with a QCQ interface increases the statistical error at most by a constant factor n U k , regardless how far apart in the circuit the qubits s k are. In contrast, experimentally synthesizing U k with noisy nearest-neighbor gates would give a systematic error due to infidelity accumulation that grows linearly with the distance between those qubits. Clearly, the drawback is that n 2 → grows exponentially with the number l of interfaces used. However, for a many circuits, Alg. 1 constitutes a better alternative than the bare NISQ implementation. We study relevant exemplary circuits with such trade-offs in the next sections.
Finally, note that n 2 → is frame-dependent. This is crucial to the efficiency of classical simulations [26][27][28]. For instance, in quantum Monte Carlo, it is known that the statistical overhead due to negative (quasi-)probabilities can be ameliorated [31] or even removed [30] by local base changes. Something similar applies here: the interface negativities depend not only on the primal frame but also on the choice of dual to it.

IV. NUMERICAL EXPERIMENTS
Here, we provide numerical experiments to validate the procedure outlined in Algorithm 1. Throughout the rest of this work, we take {M a } a to be the Pauli-6 Informationally Complete POVM (IC-POVM), which can be implemented in an experimental setting without the usage of ancilla qubits (see App. A). For our simulations, we make use of full density matrix simulations and Locally Purified Density Operator tensor networks [39] (see also App. E). For the latter, we choose the Kraus and Bond dimensions such that the simulation errors are under control and we end up with a high fidelity (> 99.9%) state approximation. To simulate realistic experimental settings, we apply noise to the two-qubit gates in our circuit. In particular we implement noisy CNOTs throughout our circuits by applying single-qubit depolarizing channels E : → E( ) to both the control and target qubit of the CNOT gate. We apply depolarizing noise in the CNOTs with λ unit = 0.005. This values correspond to experimentally realistic values [40]. At the end of the circuit we estimate observables Tr{ O} exactly, i.e. without further sampling bitstrings but relying on the full state representation. To improve the sample complexity of our algorithm, we use a Monte Carlo algorithm to minimize the interface negativities. We first note that Eq. (4) defines a domain over which to optimize such negativity. Similar optimizations (but for bases instead of frames) have been used for alleviating [30,31] the sign problem in partitionfunction estimations. In our setting, we use a convenient parametrization of generalized inverse matrices by Rao [29] to propose new dual frames for an adaptive random walk Metropolis-Hastings algorithm. This allows us to decrease the multiplicative sample complexity overhead per interface by almost a factor of four relative to the canonical dual frame (corresponding to T = T −1 , with T −1 the pseudo-inverse of T ), which reduces the number of samples required by a factor of four (see App. F for details).

A. Simulation of long-range maximal Bell violations
As a proof of principle experiment, we show that a maximally entangled state simulated with our method attains the maximal violation of the Clauser-Horne-Shimony-Holt (CHSH) inequalities (see App. D) as expected. Specifically, we create the Bell state |Φ + = 1 2 (|00 + |11 ) which has the maximum CHSH violation S(A, B) = 2 √ 2. We consider the case where the state is prepared on two qubits separated by a distance d. Applying the CNOT between these distant qubits requires implementing a swap chain to bring the two states close together. In Fig. 2 we compare the CHSH violation of the Bell state simulated with our algorithm and one prepared with a circuit containing a noisy swap chain. We see that the CHSH violation is only affected by the statistical fluctuations of our method and therefore approximates the maximum value independent of the distance between the qubits.

B. The Transverse Field Ising-model circuit
As a practical example of implementing our method in an experimentally realistic setting, we investigate the ground state of a prototypical model for quantum magnetism: the transverse field Ising-model (TFIM) on a one where we assume periodic boundary conditions and set g = 1. The ground state of H can be approximated reliably with a depth p = N/2 circuit ansatz called the Hamiltonian Variational Ansatz [41][42][43]. This circuit for the ground state is given in Fig. 3. To evaluate the accuracy of the state reconstruction, we compare the finite statistics estimator of the energy Ĥ M from our algorithm with the ground-state energy E gs = ψ gs | H |ψ gs from exact diagonalization. We consider three setups: First, we consider the N = 4 and N = 8 qubit TFIM chains where the last long range ZZ gate (in the 2nd and 4th layer respectively) is classically simulated with our algorithm (See Fig. 4). Next, we apply our method twice for the same circuits, with simulation of both the last and first-to-last long range ZZ gate (See Fig. 5). Finally, we consider the ground state of a N = 20 TFIM chain, where we only apply the first two layers of the circuit and simulate the second long range ZZ gate (See Fig. 6). For all experiments, we confirm that we can greatly improve the final energy . . , p can be found with a Variational Quantum Eigensolver optimization [9]. Each layer in the circuit contains a long range two qubit ZZ rotation. We assume that the distance between the first and last qubit is N −2. Implementing the nearest neighbor ZZ gates come at a cost of 2(N − 1) CNOTs. The long range ZZ rotations require 4(N − 2) + 1 CNOTs since we must use a swap chain to bring the first and last qubit together. The total number of CNOT gates per layer is therefore dominated by the implementation of long range ZZ.
estimates by making use of QCQ interfaces at the cost of M measurement-and-reprepare steps.

V. FINAL DISCUSSION
We have introduced a rigorous framework of hybrid quantum-classical interfaces for quantum-circuit simulations. We applied a specific variant of these gadgetswhich we dub quantum-classical-quantum (QCQ) interfaces -to simulate long-range gates in low-connectivity devices without using swap-gate ladders. QCQ interfaces replaces an experimentally problematic gate (e.g. a very long-range one) by single-qubit random measurements and state-preparations sampled according to a classical quasi-probability simulation of the ideal target gate. This procedure eliminates long swap-gate ladders which would otherwise be required to physically synthesize the target gate. This results in a drastic increase in gate fidelity. The final output of the scheme is an estimate of the expectation value of a given observable on the output of the target high-connectivity circuit.
The quasi-probability distribution used is given by a frame representation of the gate simulated at each interface. As any sampling scheme based on non-positive quasiprobabilities, our method suffers from the sign problem. Because of this, the overall sample complexity grows exponentially with the number of interfaces applied. However, the statistical overhead per interface is independent of the on-chip distance between the qubits on which the interface acts. To ameliorate the sign problem, we developed a Metropolis-Hastings simulated-annealing algorithm based on random walks in the space of dual frames. This allowed us to decrease the statistical overhead per interface by almost a factor of two over that of the canonical dual frame. This is potentially interesting on its own beyond the current scope and further optimization is possible. All together, we show that any circuit with a limited number of gates to cut out can be simulated at the expenses of a moderate overall overhead in sample complexity. As examples, we explicitly considered a Bell-state preparation circuit for two qubits increasingly far apart and variational ground-state solvers for the transverse-field Ising model on ring lattices. The former involves a single long-range gate, whereas the latter one such gate per variational layer.
Importantly, our method requires platforms supporting mid-circuit measurements and state preparations, which are readily provided by some quantum hardware companies such as, e.g., IBM and Honeywell [44,45]. This may pave the way to implement our method in a practical setting in the near future.
Finally, we emphasize that our framework is not restricted to connectivity boosts only. It could also be applied to any gate that is too noisy for a given platform or combined with error-correcting codes to remove a gate that is particularly difficult to implement fault-tolerantly by the code. Another interesting application that will be studied elsewhere is circuit-depth boosts, where a deep circuit is simulated by shallower experimental circuits together with classical simulations of entire slices of the target circuit. In conclusion, our framework provides a versatile toolbox for both error-mitigation and circuit boots well suited for noisy, intermediate-scale quantum hardware.

VI. ACKNOWLEDGEMENTS
The authors thank Ingo Roth for helpful insights.
LG and LA acknowledge support by the Serrapilheira Institute (grant number Serra1709-17173)     . This scaling only depends on the mean negativity, which differs between the two circuits because we apply a different ZZ rotation on each circuit. The energy of the noiseless circuit (orange dashed line) corresponds to the ground-state energy Egs. The noisy circuit (green dashed line) shows the energy obtained when we apply depolarizing channels with λunit = 0.005 to the CNOT gates in the circuit. We see that for both the 4 and 8 qubit our algorithm provides a significant improvement on the final estimated energy of the circuit for a reasonable number of measure-and-reprepare steps. In (b) we observe that the large number of number of noisy CNOTs dominates the simulation, hence the improvement is not as significant as for 4 qubit.  These results were obtained with a full density matrix simulation. In (a), we see that we can almost approximate the true ground-state energy of the 4 qubit state, because the only noisy operations are the 12 CNOTs required for implementing the 6 nearest neighbor ZZ gates in layers 1 and 2. In (b) we see a more significant improvement over the energies from Figure 4b, but still the noise dominates. Since we apply the QCQ method twice, the standard deviation σ =σ/ N samples of the error bars increases quadratically, as per Eq. (9). We findσ ≈ 333.1 andσ ≈ 856.8 for 4 and 8 qubits respectively.

M
where o λ (i) ,α (i) s L is the eigenvalue obtained from the singleshot i obtained from a state that is measured and reprepared according to α with a (i) Importantly, O * M is an unbiased estimator.

D. THE CLAUSER-HORNE-SHIMONY-HOLT INEQUALITIES
The CHSH inequalities constrain a set of four correlators in an Alice (A) and Bob (B) type experiment and provide a condition to check if the correlations between the observations of Alice and Bob can be explained by a local theory, or necessitate a non-local theory such as quantum mechanics [46]. Consider the quantity where

E. LOCALLY PURIFIED DENSITY OPERATORS
Numerical simulations with the full density matrix of size 2 N × 2 N quickly become prohibitive due to the large memory requirements. Hence we have to resort to tensor networks to find efficient representations of mixed quantum states. The canonical choice for representing operators with tensor networks are matrix product operators (MPO) [47]. A drawback of this approach is that applying completely positive maps to the state can still lead to the MPO becoming non-positive due to truncation errors. The Locally Purified Density Operator tensor network solves this issue by representing the state as = χχ † , where the purification operator χ is given by a tensor network with 1 ≤ p l ≤ P , 1 ≤ k l ≤ K and 1 ≤ b l ≤ D [39].
Here, P is called the physical dimension, K is the Kraus dimension and D is the bond dimension.
Analogous to the bond dimension truncation in MPOs, truncating the Kraus dimension after applying a channel leads to errors in our state representation that can affect the accuracy of numerical simulations. However, we can control the accuracy of the simulation by increasing D and K and keeping track of a runtime lower bound estimate of the state fidelity. Let = χ † χ, σ = η † η, then the fidelity is given by From Lemma 1 in [39] we know that, Let χ be a locally purified description of a quantum state with local tensors {A [N ] } that is in mixed canonical form with respect to a local tensor A [lcp] . If a single tensor A [l] is compressed by discarding singular values in either the Kraus or bond dimensions, then by Lemma 6 of [39] we know that and subsequently where χ is the compressed tensor. By the triangle inequality, the two norm errors introduced by the discarded weights can at most sum up. Hence the true operator norm is lower bounded by the sum of all discarded weight errors With d the number of truncations and δ k the discarded weights. This brings the final runtime fidelity estimate to In all our experiments, we apply depolarizing channels to both qubits only after applying a two qubit gate, since single qubit gate noise tends to be small in experimental settings. The single-qubit depolarizing channel is given by where {K m } is a set of Kraus operators with Here, {X, Y, Z} are the Pauli matrices and 1 1 is the identity. The scalar λ ∈ [0, 1] controls the strength of the depolarization. With these channels, illustrate the bound of Eq. (36) by comparing the final state overlap of an exact full density matrix simulation and a LPDO simulation for a random 4 qubit circuit with a varying number of CNOT gates. In Fig. 7, we see that the runtime estimate of the fidelity is about two orders of magnitude above the true fidelity. The circuit consists of an initial state |+ ⊗4 to which we apply a varying number of CNOT gates with random control and target qubits. We set the noise to λ = 0.005 and take D = 4 and K = 16. The red line indicates the true accuracy of the LPDO simulation by comparing with the exact full density matrix simulation. The orange line gives the runtime fidelity estimate. We see that the accuracy of the simulation degrades as we add more two-qubit gates and depolarizing channels. The runtime fidelity gives an estimate two orders of magnitude above the exact error, indicating that for this example, the bound is a conservative estimate of the simulation error.

F. RANDOM WALK METROPOLIS-HASTINGS FOR NEGATIVITY MINIMIZATION
In this appendix, we present a method to minimize the sample-complexity overhead by the interface of a unitary gate U exploiting the freedom in the choice of dual frame, namely the choice of T subject to Eq. (4). For concreteness, we focus on the case where all POVM elements have the same trace, so t b = 1/D for all b, with D the number of POVM elements. Moreover, we optimize a modified version of the interface negativity n U where, instead of maximizing T (U ) a 1 over a [as in Eq. (9)], we average T (U ) a 2 1 over a. Such an average is the sample-complexity overhead directly given by the Hoeffding bound for when the sampled random variables can lie within segments of different lengths. The reason for this modification is that, while in Theo. 1 we are interested in the worst-case complexity, here we are interested in the more practical problem of the average case.
For optimizingT (U ) over T, we express it asT (U ) = T 1 T (U ) T 2 , with T (U ) given by T (U ) b,a := (M b | U |M a ). Note that by not enforcing that T 1 = T 2 , we are explicitly allowing for the more general case of possibly different input and output dual frames. Hence, we wish to solve the constrained non-convex optimization where T 1 T (U ) T 2 a is a short-hand notation for the a-th row of T 1 T (U ) T 2 and T 1 T (U ) T 2 a 1 its l 1 -norm. Eq. (41) is a necessary but not sufficient condition for T i to be the Penrose-Moore pseudo-inverse of T . Indeed, such condition implies that T i is a so-called generalized inverse of T [48,49]. So, the first question we need to consider is how to variationally explore the space of generalized inverses of T in a practical way. Fortunately, this question has been previously studied. In particular, in Ref. [29] it was shown that for an arbitrary matrix A ∈ R m×n and given any particular generalized inverse A − of it, every generalized inverse B − can be obtained from some C ∈ R m×n by the map That is, the entire space of generalized inverses is parametrized by C. This leads us to a practical way to obtain a random walk across the space of generalized inverses: In the first iteration, take the Penrose-Moore pseudoinverse A −1 as starting generalized inverse and a randomly sampled C. This produces the first B − . As inputs for the second iteration, use the firsts iteration's output B − as generalized inverse and a fresh, independently sampled C. This produces a new B − . Then continue to iterate. Using this recipe for A = T and A −1 = T −1 , we can ergodically explore the space of generalized inverses T i of T . In turn, the resulting random walk can be used as Markov Chain Monte Carlo dynamics for a simulated-annealing optimization [50,51] that approximates a solution to Eq. (40). More precisely, for each random walk iteration, we (probabilistically) accept or reject the newly produced T i via a standard Metropolis-Hastings algorithm with 2 . Hence the search-space dimension is 4 × 6 × 6 = 142. For the simulated-annealing schedule, we take random matrices C ∼ N (0, σ 2 ) 6×6 . We set the initial temperature to be T = 10 and decrease it with a factor 0.999 at each Monte Carlo step. In addition to the temperature, the Monte dynamics are controlled by the variance σ 2 of the normal distribution N (0, σ 2 ) 6×6 for C. We start with a large initial σ 2 = 0.1 to coarsely explore the search space. However, as the temperature decreases, we want to refine the search without freezing the Monte Carlo dynamics. Therefore, we use an adaptive scheme where σ 2 is decreased according to the acceptance ratio. Specifically, we halve the value of σ if the acceptance ratio per 100 MCS is smaller than 0.23, a well-known heuristic for continuous-variable MCMC [52]. The search is terminated if the negativity decreases less than 10 −2 after 100 accepted steps.
As a result, we consistently find dual frames whose averaged squared negativites are about half the value of the canonical dual frame from the pseudo-inverse (see Fig.  8). This is also observed to greatly improve the sample complexity in practice (see Fig. 9).