Cross-verification of independent quantum devices

Quantum computers are on the brink of surpassing the capabilities of even the most powerful classical computers. This naturally raises the question of how one can trust the results of a quantum computer when they cannot be compared to classical simulation. Here we present a verification technique that exploits the principles of measurement-based quantum computation to link quantum circuits of different input size, depth, and structure. Our approach enables consistency checks of quantum computations within a device, as well as between independent devices. We showcase our protocol by applying it to five state-of-the-art quantum processors, based on four distinct physical architectures: nuclear magnetic resonance, superconducting circuits, trapped ions, and photonics, with up to 6 qubits and 200 distinct circuits.

Quantum computers are on the brink of surpassing the capabilities of even the most powerful classical computers. This naturally raises the question of how one can trust the results of a quantum computer when they cannot be compared to classical simulation. Here we present a verification technique that exploits the principles of measurement-based quantum computation to link quantum circuits of different input size, depth, and structure. Our approach enables consistency checks of quantum computations within a device, as well as between independent devices. We showcase our protocol by applying it to five state-of-the-art quantum processors, based on four distinct physical architectures: nuclear magnetic resonance, superconducting circuits, trapped ions, and photonics, with up to 6 qubits and 200 distinct circuits.
Quantum computers represent a fundamental shift in the way we think about computation. By harnessing quantum interference effects between different possible branches of a computation, quantum processors have the potential to drastically outperform conventional computers for a range of tasks [1][2][3][4][5][6]. Potential applications of quantum computation range from cryptanalysis to the simulation of physical systems and even to machine learning. Extraordinary experimental efforts in recent years have enabled demonstrations of the technology's potential in a growing number of physical systems [7][8][9][10]. For certain simulation [11,12] and sampling [13] tasks, these devices are already pushing the limits of classical supercomputers, and it is foreseeable that the next generation of quantum processors will vastly outperform their classical counterparts.
Building such devices, however, remains challenging, with environmental interactions inducing noise that leads to potentially unreliable results for complex computations. This naturally leads to the question whether we can trust the output of a quantum computation, and, more concretely, whether we can certify the output of a computation as correct. The current standard approach is to benchmark the individual quantum gates that make up the computation [14,15] to obtain an indication for how well the full system can perform. In practice, however, such an extrapolation is typically unreliable due to effects such as non-Markovian behaviour, spatially and temporally correlated noise, or unmodelled stray interactions [16]. This highlights the need for a complementary technique to establish full system performance.
In order to fill this gap, significant work has been devoted to the development of cryptographically secure verification protocols [17][18][19][20][21][22][23][24], with one such technique having been experi- A cartoon representation of the quantum processing architectures used. a) An NMR device at the University of Oxford [7]; b) superconducting circuits at IBM [27] and Rigetti Computing [28]; c) a trapped-ion quantum processor at the University of Innsbruck [8]; and d) a photonic quantum processor at the University of Vienna [29].
mentally demonstrated [25]. However, existing provably secure verification techniques require either quantum communication [17][18][19][20] or shared entanglement between devices [21][22][23][24], leaving them out of reach for existing quantum processors, which are typically unable to exchange quantum information with one another. A promising technique has recently been proposed to allow verification of a single isolated processor based on computational hardness assumptions [26]. However due to large key sizes, implementing this approach would require an extremely sophisticated processor.
Here we address the question whether one noisy device can be used to efficiently check another noisy device, without relying on quantum communication or entanglement between devices. We introduce a cross-check procedure that is inherently agnostic to the underlying hardware, sensitive to systematic errors in the implementation, and applicable to any digital quantum computation. In contrast to related work on comparing output states from different quantum devices [30], our approach aims to verify the device rather than a certain output of it. The protocol is built on the framework of measurementbased quantum computation (MBQC) [31,32], which has proven a powerful tool for blind and verifiable computing protocols [33]. By exploiting the intrinsic symmetries of quantum circuits when mapped to an MBQC, our approach allows us to quantitatively compare the outputs of quantum circuits with different size and structure, performed on independent physical devices in any architecture, thus building a high level of trust in the outputs of the device.
We demonstrate our protocol by running 200 distinct circuits of different width and depth on five state-of-the-art quantum processors, using four primary technologies for digital quantum computation: 1) a nuclear magnetic resonance (NMR) device [7] at the University of Oxford, 2) cloud-accessible superconducting systems from IBM [27] and Rigetti [28], 3) a trapped-ion quantum processor [8] at the University of Innsbruck, and 4) a photonic cluster state quantum device [29] at the University of Vienna, see Fig. 1.

FROM MEASUREMENT-BASED QUANTUM COMPUTING TO CORRELATED SAMPLING PROBLEMS
In order to verify the correctness of a quantum computation we make use of independent runs of several different yet related sampling problems, obtained from a measurementbased implementation of the computation. In contrast to the standard circuit model of quantum computing, where a unitary operation is described by a sequence of gates applied to a reference input, in MBQC a computation is realized as a sequence of single-qubit measurements performed on highly entangled multi-qubit states. These states are also known as graph states for their one-to-one correspondence with simple graphs G = {V, E}. Graph states are represented by a set of vertices V, corresponding to single qubits initialized in the state |+ = (|0 + |1 )/ √ 2, and a set of edges E, corresponding to pairwise controlled-Z entangling gates applied to the respective vertices, see Fig. 2 and Supplementary Material for details. The MBQC model is computationally equivalent to the circuit model for appropriate families of graphs [31], even when measurements are restricted to the XY-plane of the Bloch sphere [34], as considered here.
One way to visualize a quantum computation as an MBQC pattern is to select an (arbitrary) set of input vertices, which represents the initial state of the computation, and an equalsized output set, which will contain the final state. Fixing these sets determines a unique set of paths connecting each input qubit to an output qubit, giving rise to the notions of flow [35] and generalized flow (g-flow) [36]. A deterministic computation then proceeds by sequentially subjecting each non-output qubit along the flow to a projective measurement P α = 1 √ 2 ( 0| ± e −iα 1|), at an angle α in the XY-plane of the Bloch-sphere, and applying outcome-dependent corrections to the neighbouring qubits. By convention, a measurement outcome of zero requires no correction, whereas a measurement outcome of one requires an update of the measurement angles for subsequent measurements. These flow structures thus specify the possible circuits over a graph by determining the appropriate corrections for non-zero measurement outcomes and their order. The choice of measurement angles α = {α 1 . . . α n } specifies an instance on this circuit.
The key insight that we make use of is that, although MBQC performs a deterministic computation between a specific choice of input and output set, there are always multiple such choices for a given graph. Consequently, there are multiple possible information flows, which is a concept known as flow ambiguity [37]. These alternate flows give rise to computations with different structure, different number of logical qubits and different flow-dependent corrections, which effectively insert random Z-gates into the circuit, see Supplementary Material for details. Thus an MBQC implementing a specific computation can also be seen as providing the outcomes of a random set of other computations, each related to a unique computation in the circuit model. This provides a natural means for testing different devices against each other in a reliable fashion.
We now use this insight to generate MBQC-related sampling problems by converting a given MBQC with angles α into the circuit model for different choices of output sets with n O qubits. To illustrate this, consider the 6-qubit H-shaped graph of Fig. 2, where one choice of flow gives rise to the n O = 2 qubit circuit C a (Fig. 2a), while a different choice gives rise to the n O = 3 qubit circuit C b (Fig. 2b). The angles for the elementary single-qubit gates are determined by the measurement angles of the underlying MBQC, and potential additional randomization, see Supplementary Material for details. Our goal is now to relate the outputs of these very different computations.
In the MBQC picture, the measurement outcomes over nonoutput qubits occur with equal probability of 2 −(n−n O ) [38], while the probabilities for the output qubits (Pr) depends on the choice of measurement angles α. Hence, there is no bias, and we can without loss of generality focus on the case of all-zero outcomes for the non-output qubits, where no flowdependent corrections are necessary. We emphasize, that this choice is arbitrary and merely affects the relation between the measurement angles of the two circuits, but has no effect on the success probability in the circuit picture, where these qubits do not exist. Hence, in the MBQC picture, the probability for obtaining all-zero outcomes using the flow corresponding to C a is 2 −4 Pr(0, 0) Ca , where Pr(0, 0) Ca is the probability of obtaining zero outcomes when measuring only the 2 output qubits. Similarly, the probability for all-zero outcomes using the flow of C b is 2 −3 Pr(0, 0, 0) C b . Since these two probabilities are obtained from the same graph, they must agree on shared outputs.
Using this fact, we find that the outcome probabilities obtained from the two circuits are related as Pr(0, 0) Ca = 2Pr(0, 0, 0) C b . Similarly, for the other output combinations we obtain Pr(0, 1) Ca = 2Pr(0, 0, 1) Cb , Pr(1, 0) Ca = 2Pr(0, 1, 0) Cb , and Pr(1, 1) Ca = 2Pr(0, 1, 1) Cb . Note that the case where the remaining qubit of circuit C b is in state 1 can also be used for verification, but is related to a computation with different angles in circuit C a , see Supplementary Material for more details.
The central observation here is that we can use this technique to establish a connection between the outcome probabilities from two quantum circuits with different width, depth, and structure, but with MBQC-related angles for the singlequbit gates. Implementing these circuits (C a and C b in the case of the presented experiments) on a single device provides a means for self-verification of the device, while implementing them on different devices provides a pathway to cross-validate the two devices. More generally, all output strings over shared output qubits can be related across circuits, as we describe in detail in the Supplementary Material. Moreover, one can ran- The qubits are measured according to the order of the labelling numbers. In (a) the input state of the circuit Ca is |++ 12, associated with qubits 1 and 2 of the cluster state, whereas in (b) the input state of the circuit C b is |+ + + 251, associated with qubits 1, 2, and 5 of the cluster state. The qubit ordering in the circuit was chosen for more intuitive comparison and the detailed procedure for going from the graph and flow to the circuit can be found in the Supplementary Material. Note that both quantum circuits on the right correspond to the same MBQC graph state on the left, albeit with a different flow. The basic gateĴ(αi) =ĤRz(αi) (for brevityĴi with i = (1, ..., 6)) can be decomposed into a Hadamard gateĤ and a rotationRz(α) around the Z-axis of the Bloch sphere, see Supplementary Material for details. domize the output strings by adding a random multiple of π to the measurement angles for the qubits in the output set. This would allow us to create two distinct, but related circuits such that the probability of obtaining particular (non-zero) strings as outputs is correlated.

CROSS-VERIFICATION
In order to formally turn this approach into a test of consistency between quantum devices, we now consider two quantum processors, implementing computations derived from the same MBQC, but with different output sets. For output sets of sizes n O1 (processor 1) and n O2 (processor 2), with n c qubits that are in both output sets, we are left with n v = n O1 + n O2 − n c variable bits and thus 2 nv different measurement strings m that can be obtained from the two circuits. The output of each quantum processor is a subset of these n v qubits, allowing for a direct comparison of the devices on this larger space. We denote the vector of probabilities for obtaining the strings m from the quantum circuit C j performed on the j th device (normalized as above) by p j . We can now compare the two devices by computing the squared 2 -distance between the vectors p j (see Supplementary Materials for details) The terms p j · p j in Eq. (1) are the probabilities of obtaining the same result twice when sampling from the j th device. These probabilities can be estimated from the minimum number of runs required to obtain a collision among output strings (i.e. obtain the same string twice). The MBQC picture then straightforwardly relates the probability for a collision within the 2 nv possible strings m to a collision within the 2 n O j strings obtained at the circuit output, see Supplementary Materials for details. Consequently estimating p j ·p j requires at most O(2 n O j /2 ) runs, independent of the probability distribution, due to a generalization of the birthday paradox [39]. Notably, the values p j · p j also provide a sanity check to detect systems that merely produce uniformly random samples, in which case p j · p j reaches a global minimum. In contrast, deep random quantum circuits are expected to lead to a Porter-Thomas distribution [40], resulting in values twice as large as in the uniform case. Finally, the term, p 1 · p 2 in Eq. (1) can be estimated in a similar as the minimum number of runs to obtain a collision between devices. This is achieved by randomly fixing the value of non-output qubits while sampling over the two devices, which by the same argument as above Hence, while the exact scaling depends on the problem at hand, in cases where there is a significant number of output qubits in common between the instances (n c ∼ n O1 , n O2 ), or where the output distribution for either computation is far from uniform (max p j 2 −n O j ), the quantity p 1 − p 2 2 provides a measure of similarity between the outputs of the devices, which can be estimated with exponentially fewer resources than conventional classical simulation techniques, which typically scale as 2 n O j [13,41].

EXPERIMENTAL RESULTS
We experimentally performed MBQC-related 2-and 3qubit circuits with different depth on five independent stateof-the-art quantum processors, covering four of the major quantum computing architectures, see Fig. 1. Using these systems, we experimentally implemented sampling instances for the six-qubit H-shaped graph shown in Fig. 2 by generating 200 random sets of angles {α i } 6 i=1 , with α i ∈ {0, π/4, π/2, 3π/4, π, 5π/4, 3π/2, 7π/4}. We ran 2-qubit circuits of type C a on the Oxford and Innsbruck systems, 3qubit circuits of type C b on the Innsbruck, IBM, and Rigetti systems and the 6-qubit H-shaped MBQC on the Vienna system.
After taking into account the relation between the outputs of the different implementations and additional randomization (see Supplementary Material for details), we compute the squared 2 -distances (1) between pairs of devices implementing different size computations for cross-verification. These values are averaged over 34 random instances, which is consistent with the results from the full data set, see Supplementary Information. Additionally, we compute the squared 2distance between the 2-and 3-qubit circuits implemented on the Innsbruck device for self-verification. Crucially, in either case we do not compare the output distributions to some "ideal theory" (e.g. from simulations), which would not be possible for future devices, but rather compare pairs of MBQC-related instances from different devices.
The key results of these comparisons are the estimated squared 2 -distance for each pair of devices performing computations of different size and depth, shown in Fig. 3a. A value close to 0, for example between Vienna and Oxford or Vienna and Innsbruck, indicates agreement between the devices, whereas any systematic error or statistical noise will lead to a larger value. For example, comparing an ideal 2-qubit circuit to a fully depolarized circuit for the instances considered in our experiments would return a value of ∼ 0.428. In the limit of large deep random circuits, the squared 2distance between an ideal and fully depolarized circuit converges to 2 −nv . Of course, the noise in real experiments is much more complicated and the exact dependence of the squared 2 -distance on such physical noise models remains an interesting question for future research.
When more than two devices are used, one can furthermore make a prediction about the performance of the individual devices by averaging the squared 2 -distance with all other devices, see Fig. 3b. For the small number of qubits involved here, we can still compute the ideal output distribution of each circuit classically. Computing the 2 -distance with this ideal theory prediction quantifies the true accuracy of each device.
Although these values will not be available for future devices, Also shown are the squared 2 -distances of each device against the (non-scalable) theory prediction (red). These two quantities are not expected to coincide. However, they show some qualitative agreement, in that arranging devices according to either metric yields the same order in our experiments. Averages are taken over squared 2distance between each device and all other devices, not just the ones in a), in order to avoid bias.
we find that they qualitatively agree with the average squared 2 -distance over all experimental comparisons. This indicates that the latter provide a good estimate for the true system performance, while the values from individual comparisons accurately capture the relative performance of the devices, thus enabling verification of the underlying quantum processors.
Besides cross-verification between dissimilar quantum devices, our method also provides an intriguing pathway towards self-verification of a single device. Using the Innsbruck trapped-ion system, we implemented MBQC-related instances of the 2-and 3-qubit circuits C a and C b of Fig. 2, and (1). This result indicates very good (relative to the results in Fig. 3a) agreement between the two circuits, which is confirmed by direct comparison to theory. Notably, by virtue of sampling from multiple instances of vastly different circuits, even systematic errors would be detected, as they manifest very differently in the two circuits. This demonstrates that our method can be used for independent verification of a single quantum processor.

VERIFICATION FOR NISQ DEVICES
In order to gain some insight into the measured 2distances, correlation plots between the MBQC-related out- comes on all pairs of devices computing different size and depth circuits are shown in Fig. 4. We emphasize that it is not scalable to produce such plots for larger computations, as they require sampling the full output distribution for multiple computations. However, for near-term intermediate scale (NISQ) devices, this remains feasible and provides useful additional information to aid the interpretation of the 2 -distances used for scalable cross-verification in Fig. 3.
While these correlation plots only provide a crude indication of the strength and direction of the correlations between the devices, there are some notable features. In the ideal case, one expects the two MBQC-related circuits to produce identical outcome probabilities, resulting in clustering around the 45 • line. On the other hand, depolarizing noise affecting the device on the vertical (horizontal) axis, would result in a mean correlation with slope smaller (larger) than 1. Large scattering around the mean further indicates higher levels of noise in one or both devices, since each point corresponds to a different output or instance of the computation. In addition, Fig. 5 shows correlation plots of the experimental outcomes per individual device against the respective ideal theory prediction obtained via direct simulation of the corresponding circuits. Fig. 6 shows correlation plots of experimental outcomes between the 2-and 3-qubit circuits, respectively C a and C b , performed on the Innsbruck device for self-verification.

DISCUSSION
As quantum processors start to surpass their classical counterparts, verification by direct comparison to theory will no longer be an option. The technique we present here provides a feasible alternative by validating quantum devices against each other. While not providing a complete toolkit for characterization of individual quantum processors, our method takes a crucial step away from the dependence on classical methods. By sampling from different physical devices implementing circuits that differ in the number of qubits, depth and structure, our method is robust against systematic, as well as statistical errors. By implementing these dissimilar circuits on a single device, our method also provides an avenue towards internal self-verification of single quantum devices.
A particularly intriguing feature of our approach is the way in which it allows us to compare devices using radically different implementations. Recently, a detailed comparison of a trapped ion system and a superconducting processor highlighted the advantages of each system for certain, identical problems [42], concluding that each processor was well suited to different tasks. In this work we overcome the heterogeneity of quantum physical systems, introducing a verification model which links computational circuits with different sizes and depths, and consequently can be run on the many types of quantum computer. The building block of our cross-check  scheme is represented by measurement-based quantum computation, which has been proven to be already essential for quantum computation security [43], quantum error correction [44], as well as quantum simulation [45]. This will prove useful in providing consistent benchmarks across the increasingly diverse range of quantum processors. Competing interests The authors declare no competing interests. None of the affiliated commercially-oriented companies have been partners or collaborators in the context of this scientific work. Data and materials availability: Correspondence and requests for materials should be addressed to JFF (email: joe@horizonquantum.com).

Supplementary Information: Cross-verification of independent quantum devices
Here we provide full details on the conversion between MBQC and circuit model computations for different choices of flow. We illustrate this using an explicit example from the main text and also provide complementary results on other graph states. Finally, we discuss some additional experimental details of the single-photon implementation.
EXPERIMENTAL METHODS NMR (Oxford) experiments [7] were performed on a Varian Unity Inova spectrometer with a nominal 1 H frequency of 600 MHz using a H{CN} probe with a single pulsed field gradient. The NMR sample comprised 13 C-labelled sodium formate dissolved in D 2 O at 25 • C, providing a heteronuclear two-spin system. With both spins on resonance, the Hamiltonian took the form of a spin-spin ZZ coupling of 194.7 Hz, and the B 1 field strengths were measured to give nutation rates of approximately 25 kHz for 1 H and 17 kHz for 13 C.
Pseudo-pure two-qubit states were prepared using the method of Ref. 46. Single-qubit rotations in the XY -plane were implemented using simple pulses, while two-qubit rotations were implemented as delays. Fixed Z-rotations were implemented as frame rotations [47] which were propagated through the pulse sequence [48] to points where they could be dropped. The variable small-angle Z-rotations were implemented using a pair of π pulses with phases separated by θ/2, with the phase of the first pulse chosen to partially cancel with the preceding Hadamard gate.
At the end of the algorithm a crush gradient was applied to project the density matrix onto the computational basis, and the 1 H NMR spectrum observed after a π/2 pulse. NMR spectra were processed using custom software and the intensity of the two components of the 1 H doublet were determined by integration and normalized to a reference spectrum. Corresponding measurements on the second qubit were performed by repeating the experiment with the reverse assignment of qubits to physical spins. From the collection of these measurements the populations of the computational basis states can be estimated. Due to imperfect calibration these populations do not quite sum to one, and some can be slightly negative. This was resolved by subtracting the most negative population found in any group of experiments from all the populations in that group, and then normalizing the populations for each experiment.
Photons (Vienna) experiments are based on the generation of the maximally-entangled six-qubit H-shaped cluster state. Three polarization-entangled pairs of photons are produced via three identical Sagnac-PPKTP pulsed down-conversion sources and later entangled by using partial fusion gates at polarizing beam splitters [49,50]. The qubits are encoded by the polarization of the six photons.
The laser repetition rate is set to 152 MHz, by doubling the original rate with a passive multiplexing scheme [51], to reduce multi-photon noise for an average power of 220 mW per source. The pump photons have a wavelength of 772.9 nm and a pulse-width of 2.1 ps. The crystals' temper-ature is stabilized at 24 • C. The single-qubit measurements are implemented with an optics tomographic unit of three motorized waveplates and a polarizing beam splitter per photon. Twelve multi-element superconducting-nanowire singlephoton detectors, composed of 4 channels each and kept at T = 0.9 K, enable a pseudo-number resolving detection, with an average quantum efficiency of 0.87. Due to technical problems, two of the multi-element detectors had to be later replaced with two single-element detectors. A customized time tagging and logic module for 48 input-channels counts the six-fold photons events. After postselection we obtain a total six-fold coincidence rate of 0.08 Hz. The purity of the single photons, measured with four-fold HOM interference, corresponds to 0.94 [29]. We characterize the six-photon cluster state by using a subset of stabilizer operators, so-called identity product [52], giving a lower bound on the state fidelity of F exp ≥ 0.64 ± 0.04 and by using a technique based on a probabilistic protocol for entanglement detection [50,53], estimating a fidelity of 0.75 ± 0.06 (see SM).
Superconducting (IBM and Rigetti) qubits are used independently via the two cloud-accessible quantum processors: the ibmqx2 (also known as IBM Q 5 Yorktown) from IBM [27,54] and the Rigetti 19Q from Rigetti [28,55]. Both apparatuses use transmon qubits, charge qubits which show insensitivity to charge noise thanks to an additional large capacitor in the circuit. Variations of the two devices can be found on the circuit wiring and reading, and the fabrication materials, e.g. the ibmqx2 has a star-shaped connected circuit, based on fixed-frequency transmons [56], with three qubits available as control qubits, whereas the Rigetti19Q has tunablefrequency transmon qubits [57,58], each coupled to three fixed-frequency transmon qubits.
Trapped ions (Innsbruck) experiments are performed with qubits encoded in the electronic states of a string of 40 Ca + ions confined in a linear Paul trap [8]. Each ion encodes a qubit in the ground state S 1/2 (m = −1/2) = |1 and the meta-stable state D 5/2 (m = −1/2) = |0 , which determines the qubit lifetime of ∼ 1 s. Single qubit Z-rotations are implemented via Stark-shifts induced by tightly focused laser beams, while collective rotations around any equatorial axis of the Bloch-sphere are achieved by resonant illumination of the whole ion string. Entangling operations are implemented via global Mølmer-Sørensen interactions using a bi-chromatic laser field [59]. Local gates as well as two-qubit entangling gates achieve fidelities greater than 99% and operate on a timescale of 20−30 µs (80 µs for entangling gates), much faster than the coherence time which is on the order of ∼ 100 ms and dominated by laser phase noise. Every run of the experiment consists of Doppler and sideband cooling of the ion string, followed by a gate sequence, and finally projection onto the computational subspace via fluorescence detection on the P 1/2 -S 1/2 transition with a CCD camera. One such run takes ∼15 ms and each experiment is repeated at least 100 times to gather statistics.

PHOTONIC H-SHAPED CLUSTER STATE CHARACTERIZATION
We first characterize the six-photon cluster state using a technique based on subsets of stabilizer operators, referred to as Identity Products (ID) [52]. The method exploits the entanglement of the operators to obtain a lower bound on the fidelity of the state and a proof of a Bell-type inequality with a minimal number of measurement settings. There exist a large, unquantified number of equivalent minimal subsets of stabilizers for the 6-qubit states. Here we repeat the characterization procedure with two equivalent ID sets, composed of 7 measurements: where we have omitted the tensor product symbols for compactness. From the ID measurements we extract an ID-Bell parameter α ID exp , where α ID = The non-ideal results are mainly due to the unbalanced losses present at the polarizing beam splitters stage, the imperfect polarization compensation along the single mode fibers connecting the three sources, and the non-unity purity of the single photons.
Furthermore we follow a probabilistic protocol for entanglement detection [50,53] in order to estimate the experimental fidelity of the state. This method entails a significant reduction of resources, that is, it needs in our case only a very low number of detection events (around 100) to verify the presence of entanglement in our cluster state with more than 99% confidence. We obtain a fidelity of 0.75 ± 0.06, which is comparable to fidelities obtained in state-of-the-art photonic experiments [60]. More details about the cluster state characterization can be found in Ref. [50].
The underlying graph for the MBQC pattern can be constructed by decomposing a generic unitary computation on a fixed initial state, |+ ⊗N , in terms ofĴ(α) gates andĈ Z entangling gates. For eachĴ(α) gate we add a vertex, and draw an edge to connect this vertex to the vertex that represents the precedingĴ(α) gate as dictated by the circuit. This is done recursively, hence creating N wires, which represent the unitary evolution of each initial qubit state. The last step is drawing an edge for eachĈ Z gate, by connecting the two vertices representing theĴ gates that immediately follow theĈ Z gate in the quantum circuit representation. These few steps give us the adjacency matrix of a graph G = {V, E}, with vertex set V, and edge set E. The cardinality of the vertex set is |V| = N + M , where M is the total number ofĴ(α) gates in the circuit.

CIRCUITS FOR THE 6-QUBIT H-SHAPED CLUSTER
We consider the two quantum circuits shown in Fig. 2 of the main text, associated with the six-qubit H-shaped cluster state. The unitary evolution of two g-flows correspond to: in a circuit with more qubits, this is implicitly understood as acting on the first set of qubits, unless a subscript indicates which qubits are acted on. The angles α can be randomly chosen within a specific set. The relationships between MBQCrelated outcomes of the circuits C a and C b are Pr(1, 0) Ca = 2 · Pr(0, 1, 0) Cb , where the labels of the outcomes are Pr(b 5 , b 6 ) Ca , and The MBQC protocols can be expressed in terms of the stabilizer formalism [62]. A graph state is invariant under stabilizer operations: Given a graph state on n qubits |G = ( GĈ Z )|+ ⊗ n we have: We can rewrite the computation by applying a stabilizer operator on each vertex of the graph state. We consider the stabilizers in their most general form, not restricting to the Pauli group, and a random bit-string k = {k i } 6 i=1 , k i ∈ Z 2 associated with the six stabilizers. Then the measurement angles α can be rewritten as and can be used to mask the real outcomes of the computation. Note that finding these relations, and thus identifying MBQC-related sampling problems, is computationally efficient because of the graph structure of the problem.
The strings we have to compare are the following By checking the outcomes above, we confirm the correctness of the relations.

THE SQUARED 2 DISTANCE
The central figure of merit for scalably comparing MBQCrelated circuits is the squared 2 -distance introduced in the main text Eq. (1). In general, the number of output qubits n O1 and n O2 for the two circuits will differ and a subset of n c of these qubits will be in the output set of both computations. For example the circuits in Fig. S2 have n Oa = 2 and n O b = 5 output qubits, respectively, and n c = 1 qubit that is in both output sets. There are therefore n v = n Oa + n O b − n c = 6 qubits (in the underlying MBQC) that must be considered for estimating the squared 2 -distance in Eq. (1) in the main text. Taking the expectation value we can estimate the three resulting terms independently.
A crucial observation is that, due to a generalization of the birthday paradox [39], the term p j · p j is related to the probability of obtaining the same output string from the j th device twice. Note that from the experiment we obtain strings of length n Oj < n v . Using the fact that in the MBQC picture the outcomes of non-output qubits are uniformly random, the probability of a collision between the experimental strings of length n Oj can trivially be related to the probability of a collision between the strings of length n v in the underlying MBQC. This requires a number of runs that scales as O(2 n O j /2 ).
Estimating the term p 1 · p 2 , on the other hand, requires us to consider collisions among the strings of length n v . This is achieved by randomly fixing the values of non-output qubits for either circuit (which corresponds to running different computations due to MBQC-corrections for non-zero outcomes) for each sampling run. In the example of Fig. S2, circuit a requires randomly fixing 4 of the 6 output qubits and measuring the others, while circuit b requires fixing 1 of the 6 output qubits and measuring the rest. Using this technique, one can estimate the probability for a collision among the 2 6 possible strings m in the underlying MBQC between the two devices with a number of runs that scales as O(2 (n O 1 +n O 2 −nc)/2 ).

SUB-SAMPLING
The data presented in the main text was obtained by averaging the squared 2 -distances obtained from 34 out of 200 randomly chosen instances over the related circuits. Averaging over more instances would increase the confidence in the final estimate, however, comes with an additional resource overhead. To investigate this trade-off, we estimate the squared 2distances from subsets of varying sizes up to the full dataset. The results, shown in Fig. S3, indicate quick convergence to the mean value over the full dataset. In particular, the choice of 34 instances is sufficient to clearly distinguish the different devices.

COMPLEMENTARY RESULTS
Here we report data related to the evaluation of quantum circuits equivalent to closed lattice cluster states, without performing the respective measurement based quantum computation. Specifically we consider the closed 2D cluster states involving 8 and 10 qubits shown in Fig. S2. We refer to those as Box Cluster 2x4 and Box Cluster 2x5, respectively, with 2xj (j = [4,5]) labels the height and width of the cluster.
Different types of circuits are performed on pairs of quantum devices. In the following table all the MBQC-related devices with the implemented circuits specifications (input qubits and circuit depth) are reported. Note that the 5x2 cluster was measured using a different IBM device, namely the ibmqx3 [63] (IBM Q 16 Rueschlikon), which we refer to as IBM 2 . This device has similar specifications as the first IBM quantum processor used here, but allows for computations with up to 16 qubits.
We ran 100 C 2x4 circuits on the Oxford and Innsbruck machines, with the 100 MBQC-related C 4x2 circuits run on the IBM processor. For the 2x5 case we ran 100 C 2x5 circuits on the Oxford machine and the 100 MBQC-related C 5x2 circuits on the IBM processor. In each case, pair-wise cross-check verification was performed between all devices, as well as individual comparisons to theory. Scatterplots of the outcome probabilities are shown in Fig. S4 and all relevant numerical values are given in the caption of that figure. As in the main text, we find that the squared 2 -distance provides a very good estimate of the true performance of the devices in agreement with the theory simulation.