Efficient Long-Range Entanglement using Dynamic Circuits

.


I. INTRODUCTION
Quantum systems present two distinct modes of evolution: deterministic unitary evolution, and stochastic evolution as the consequence of quantum measurements.To date, quantum computations predominantly utilize unitary evolution to generate complex quantum states for information processing and simulation.However, due to inevitable errors in current quantum devices [1], the computational reach of this approach is constrained by the depth of the quantum circuits that can realistically be implemented on noisy devices.The introduction of non-unitary dynamic circuits, also called adaptive circuits or LAQCC (local alternating quantum classical computation) circuits [2], can not only implement more general quantum channels, but may also be able to overcome some of these limitations by employing mid-circuit measurements and feedforward operations.As classical computation and communication are viewed as essentially free compared to quantum operations, such conditional operations are a necessary ingredient for quantum error correction (see, e.g., Ref. [3]).In the near term, dynamic circuits present a promising avenue for generating long-range entanglement, a task at the heart of quantum algorithms [4,5].This includes both implementation of long-range entangling gates that, due to local connectivity among the qubits in many quantum platforms, can require deep unitary quantum circuits, and preparation of many-qubit entangled [6,7] and topologically ordered quantum states [8][9][10][11][12][13][14][15][16].
From a physical standpoint, the entanglement needs to propagate across the entire range between the qubits.Given * eba@zurich.ibm.com that the entanglement cannot spread faster than its information light cone [17,18], entangling two qubits that are a distance n apart requires a minimum two-qubit gate-depth that scales as O(n), and even when assuming all-to-all connectivity, the generation of entanglement over n qubits necessitates a minimum two-qubit gate depth of O(log n).Thus, the task becomes challenging when applying only unitary gates.Using dynamic circuits, the spread of information can be mostly conducted by classical calculations, which can be faster and with a higher fidelity than the quantum gates, and long-range entanglement can be created in a shallow quantum circuit [19][20][21], i.e. the depth of quantum gates is constant for any n.
While dynamic circuits have been explored in small-scale experiments [22][23][24][25][26], only recently have there been experimental capabilities on large-scale quantum devices.However, most demonstrations (with the exception of e.g.Refs.[27][28][29][30]) have utilized post-selection [31] or post-processing [32,33] instead of feed-forward to prepare entangled states.Such approaches enable the study of properties of the state prepared in isolation, but have limited applicability when the state preparation is part of a larger quantum information processing task.
Here, we explore the utility of shallow dynamic circuits for creating long-range entanglement on large-scale superconducting quantum devices.In Section II, we demonstrate an advantage with dynamic circuits by teleporting a long-range entangling CNOT gate over up to 101 locally connected superconducting qubits.We also discuss how this approach can be generalized to more complex gates, such as the threequbit Toffoli gate.Then, in Section III, we prepare a longrange entangled state, the GHZ state [6], with a dynamic circuit.We show that-with a composite error mitigation stack customized for the hardware implementation of dynamic circuits-we can prepare genuinely entangled GHZ states but fall short of state-of-the-art system sizes achieved with unitary gates due to hardware limitations.We predict conditions under which dynamic circuits should be advantageous over unitary circuits based on our error budget calculation.

II. CNOT GATE TELEPORTATION
The limited connectivity between qubits in many quantum computational platforms can result in the compilation of nonlocal unitary circuits into deep and error-prone unitary circuits.A potential solution is the use of shallow dynamic circuits.The crucial ingredient for such protocols is long-range CNOT gates from the first to nth qubit, as shown on the left in Fig. 1(a).In the following, we demonstrate a regime under which dynamic circuits enable higher-fidelity long-range CNOT gates via gate teleportation.We first describe the dynamic circuit and compare to its equivalent unitary counterpart.We argue, using a simple error budget, that there exists a regime in which the dynamic circuit implementation has an advantage over the unitary one, see Fig. 1(b).Then, using up to 101 qubits on a superconducting processor, we demonstrate a crossover in the fidelity of CNOT gate teleportation, where dynamic circuits perform better for entangling qubits over longer ranges; see Fig. 1(c).This gate teleportation scheme enables an effective all-to-all connectivity in devices with a more limited connectivity, such as those on a heavy-hexagonal lattice.By using some of the qubits as ancillas for measurement and classical feed-forward operations, the ancilla qubits form a bus that connects all system qubits with each other.Therefore, by sacrificing some of the qubits in a large device with limited connectivity, we gain effective access to an all-to-all connected device with fewer qubits; see Fig 1(d).As this effective all-to-all connectivity limits the parallelization of gates, the orange system qubits could be sacrificed as ancilla qubits as well to further parallelize gate execution with increased connectivity.In addition, a clever compilation could increase parallelization, as e.g.shown in Appendix E in Fig. 10, where a long-range CCZ gate could be implemented with two feed-forward operations rather than teleporting all six CNOT gates separately.
We describe the dynamic circuit for CNOT gate teleportation, shown on the right in Fig. 1(a) and derived in Appendix A 1. Importantly, this dynamic circuit can be straightforwardly extended for any number of qubits n (where n is the number of ancillas) such that the depth remains constant for any initial states |φ 1 ⟩ (|φ 2 ⟩) of the control (target) qubit.We expect the error to be dominated by the n mid-circuit measurements, n + 1 CNOT gates parallelized over 2 gate layers, and idle time mostly over the classical feed-forward time.Note that in this particular realization, each of the n ancilla qubits between the two system qubits must be in the state |0⟩.Therefore, during the course of the gate teleportation, the ancillas cannot also be used as memory qubits, further motivating the division of qubits into system and sacrificial ancilla qubits in Fig. 1(d).
We also present an equivalent, low-error unitary counterpart in the middle of Fig. 1(a).(In Appendix B, we propose several different unitary implementations of the long-range CNOT gate.Based on experimental results, as well as the noise model described in Appendix G that gives rise to the error budget described in Appendix B 2, we select this one.)In this unitary realization, the system qubits are connected by a bus of ancilla qubits that are initialized in and returned to the |0⟩ state, just as in its dynamic counterpart.In our particular compilation, throughout the execution of the circuit, qubits that are not in the |ϕ 1 ⟩ or |ϕ 2 ⟩ state are in the |0⟩ state.Doing so minimizes both decoherence and cross-talk errors intrinsic to our superconducting qubit design, as heuristically we learned that the noise affecting our qubits is primarily limited to amplitude damping, dephasing and ZZ crosstalk errors on neighboring qubits, which implies essentially no idling errors on qubits in the |0⟩ state.Therefore, relative to the dynamic version, there is no error due to idle time or mid-circuit measurements, although there are ∼4 times more CNOT gates.
A summary of the error budgets for the dynamic and unitary circuits is in Fig. 1(b).Based on this table, we expect that dynamic circuits should be advantageous over unitary circuits if the additional n mid-circuit measurements in the dynamic circuit introduce less error than the 3n extra CNOT gates in the unitary circuit, assuming n is large enough such that the idling error µ incurred during measurement and classical feedforward in the dynamic circuit is relatively small.Importantly, we should note that these error analyses only consider the gate error on the two respective qubits, but not the error introduced on other qubits, which we expect to be much larger in the unitary case due to the linear depth.Thus, the constant-depth dynamic circuit might be even more advantageous than what we can see from the gate fidelity.
To determine the experimental gate fidelity, let our ideal unitary channel be U(ρ) := U ρU † and its noisy version be Ũ(ρ) := U(Λ(ρ)), where Λ is the effective gate noise channel and ρ is a quantum state.The average gate fidelity of the noisy gate is F avg U, Ũ = dψ Tr U (ρ ψ ) Ũ (ρ ψ ) , where the Haar average is taken over the pure states ρ ψ = |ψ⟩ ⟨ψ|.This fidelity can be faithfully estimated from Pauli measurements on the system, using Monte Carlo process certification [34,35], as detailed in Appendix C 2.
The results from a superconducting quantum processor are shown in Fig. 1(c).The implementation details can be found in Appendix D 1.As expected, for a small number of qubits n ≲ 10 the unitary implementation yields the best fidelities.However, for increasing n it converges much faster to the fidelity of a random gate (0.25) than the dynamic circuits implementation, which converges to a value slightly below 0.4.These align well with the error budget analysis in Appendix B 2 and the noise model predictions depicted in Appendix G.Note that, in the limit of large n, the fidelities of the measurement-based scheme are limited by the Z and X corrections on |ϕ 1 ⟩ and |ϕ 2 ⟩ (see Fig. 1(a)).A straightforward derivation using this noise model shows that the minimum possible process fidelity due to only incorrect Z and X corrections (without the fixed infidelity from the idle time and CNOT gates) is 0.25, which converts to a gate fidelity of 0.4.
The measurement-based protocol with post-processing per- (b) Error model inputs for unitary, measurement-based, and dynamic-circuit CNOT protocols comprise the total number of: non-zero idleblock times, CNOT gates, and additional measurements.(c) Experimental results, where dynamic circuits offer improved fidelity for CNOT gate teleportation across a qubit chain ≳ 10 qubits.(d) Map of a 127-qubit heavy-hexagonal processor, ibm sherbrooke, overlaid with system configurations for long-range gate teleportation across a locally connected bus.To establish an effective all-to-all connectivity, we show one possible strategy of dividing the qubits into system (purple and orange) and sacrificial ancilla (turquoise and blue for extra connections) qubits.To parallellize gate execution with increased connectivity, orange qubits can be used as ancillas.We show how a particular long-range CNOT can be implemented through an ancilla bus marked as turquoise spins.
forms slightly better than the dynamic circuits as the former does not incur errors from the classical feed-forward, allowing us to isolate the impact of classical feed-forward from other errors, such as the n + 1 intermediate CNOT gates and midcircuit measurements.Note, however, that the post-processing approach is generally not scalable if further circuit operations follow the teleported CNOT due to the need to simulate large system sizes, further emphasizing the advantage of dynamic circuits as errors rooted in classical feedforward are reduced.Overall, we find that CNOT gates over large distances are more efficiently executed with dynamic circuits than unitary ones.
In Appendix E we show that these ideas can be generalized to teleporting multi-qubit gates, such as the Toffoli or CCZ gate.Compiling them more efficiently than simply implementing multiple teleported CNOT gates, we expect their shallow implementation with dynamic circuits to be even more advantageous over their unitary counterpart, especially for large n.

III. STATE PREPARATION: GHZ
Dynamic circuits can also be used to prepare long-range entangled states.A prototypical example is the GHZ state [6], shown schematically in Fig. 2(a).While it can be created using only Clifford gates and thus can be simulated efficiently on a classical computer [36], it becomes non-simulatable when followed by a sufficient number of non-Clifford gates in a larger algorithm, or when inserted as a crucial ingredient in e.g.efficient compilation of multi-qubit gates [37,38].
Here, we show that GHZ states with long-range entanglement can be prepared with dynamic circuits.Although we do not see a clear advantage of dynamic circuits over unitary ones in this case, we provide a detailed description of the challenges that must be addressed to realize such an advantage.
For preparation of a GHZ state on a 1D n-qubit chain, in Fig. 2, we show the equivalence between the unitary circuit (left) and dynamic circuit (right).(For a detailed derivation, see Appendix A 2.) Notably, the unitary equivalent has a twoqubit gate depth that scales as O (n) with quadratically increasing idle time and n − 1 total CNOT gates, while the depth of the dynamic circuits remains constant with linearly increasing idle time, 3n/2 − 1 total CNOT gates, and n/2 − 1 mid-circuit measurements (see Fig. 2(c)).The dynamic circuit incurs less idle time and fewer two-qubit gate depth at the cost of increased CNOT gates and mid-circuit measurements.Therefore, we expect dynamic circuits to be advantageous for large system sizes n and low errors in mid-circuit measurement.For a more detailed analysis of the error budget, see Appendix F 1.
We explore whether current large-scale superconducting quantum devices enable an advantage with dynamic circuits for preparation of the entangled GHZ state.To efficiently verify the preparation of a quantum state σ, we use the Monte Carlo state certification that samples from Pauli operators with non-zero expectation values, as implemented in Ref. [31] and described in detail in Appendix C 1.As the n-qubit GHZ state is a stabilizer state, we can randomly sample m of the 2 n stabilizers {S i } i=1..2 n and approximate the fidelity by The experimental results of GHZ state preparation with unitary and dynamic circuits are shown in Fig. 2(d).They all include measurement error mitigation on the final measurements [39].The implementation details can be found in Appendix D 2. On the left, we show the results without dynamical decoupling.In the unitary case, we observe genuine multipartite entanglement, defined as state fidelity F > 0.5 [40], within a confidence interval of 95% up to 7 qubits with a rapid decay in fidelity with increasing system size mainly due to errors in two-qubit gates and ZZ crosstalk errors during idling time [41].As these errors are mostly coherent, they lead to an oscillation of the fidelity such that it increases again for higher qubit numbers.To suppress the coherent ZZ errors we apply dynamical decoupling (DD) pulses, as described below.
In the dynamic case, we observe genuine entanglement up to 6 qubits.Here, we do not find a crossover point after which dynamic circuits have an advantage over unitary circuits.We attribute the performance of dynamic circuits to several factors, including the fact that the current implementation results in an average classical feedforward time that scales with the number of potential mid-circuit measurement bitstring outcomes, which itself grows exponentially with system size.This limitation appears because the switch operator is currently testing each possible case of measurement outcomes sequentially, so on average checks half of the cases until it finds the correct one.With our future control software we expect to implement the correct feed-forward operations in constant time.By reducing the error induced by idle time during classical feedforward, we expect dynamic circuits to surpass unitary circuits at ≳10 qubits-we can see this by studying the post-processing case, which is equivalent to the dynamic circuit implementation except that the classical logic is executed in post-processing, not during execution of the quantum circuit itself.
On the right of Fig. 2(d), we show the results using dynamical decoupling (DD) [42,43].We observe improved fidelities for both the unitary and dynamic circuit cases, but not for the post-processing case as there is little error induced by idle times to quench with dynamical decoupling in the first place.For the unitary case, we observe genuine multipartite entanglement up to 17 qubits, more than twice as many compared to the unmitigated unitary case.This result is close to the state of the art on superconducting quantum processors and is limited by the fact that we do not leverage the 2D connectivity of the device, as in Ref. [44].While the fidelities are improved with DD for dynamic circuits, the improvement is less dramatic.We attribute this difference to two reasons: First, the unitary circuit has a quadratic idling error term in contrast to a leading linear term for dynamic circuits, resulting in comparatively smaller improvement for dynamic circuits with dynamical decoupling.Second, with the current controls, we are not able to apply DD pulses during the classical feedforward time, which is the main source of idling error in the dynamic circuit.As in the unmitigated case, we observe rapid decay of the fidelity with increasing system.This can again be partially attributed to exponential growth of the classical feedforward time.In the future, we expect to reduce this scaling to constant, in which case we expect drastically improved performance and genuine entanglement up to ∼15 qubits.Still, however, we do not expect to observe an advantage with dynamic circuits for preparation of GHZ states over unitary ones.To realize an advantage with dynamic circuits, we require a scenario where the quadratically scaling idle error of the unitary circuit dominates over sufficiently small CNOT and midcircuit measurement error; see Appendix F 2 for a more detailed analysis.We anticipate these conditions can be realized through a combination of hardware improvements and the extension of error mitigation techniques, such as probabilistic error cancellation [45,46], toward mid-circuit measurements.

IV. CONCLUSION AND OUTLOOK
Dynamic circuits are a promising feature toward overcoming connectivity limitations of large-scale noisy quantum hardware.Here, we demonstate their potential for efficiently generating long-range entanglement with two useful tasks: teleporting entangling gates over long ranges to enable effective all-to-all connectivity, and state preparation with the GHZ state as an example.For CNOT gate teleportation, we show a regime in which dynamic circuits result in higher fidelities on up to 101 qubits of a large-scale superconducting quantum processor.We leave incorporating this more efficient implementation of long-range entangling gates as a subroutine in another quantum algorithm to future work; potential studies can include simulating many-body systems with non-local interactions.As we demonstrate theoretically, gate teleportation schemes can be extended beyond CNOT gates to multi-qubit ones, such as the CCZ gate.Its experimental implementation is also a promising project for the future.For state preparation, based on both unmitigated and mitigated hardware experiments, we expect to see the value of dynamic circuits once the classical post-processing becomes more efficient and the mid-circuit measurement errors can be reduced.We plan on revising the experiments as soon these capabilities are available.We anticipate that further experiments with dynamic circuits and the development of noise models describing them will be vital contributions toward efficient circuit compilation, measurement-based quantum computation, and fault-tolerant quantum computation.

GHZ state preparation
In Fig. 5 we have illustrated a derivation of the GHZ state preparation, exemplary for 7 qubits, but it can be straightforwardly extended to an arbitrary number of qubits.In the following, we provide explanations for each step of the derivation, labeled by roman numerals in the figure : (i) Pushing every second CNOT gate to the very right introduces the extra pink CNOT gates.
(ii) We can omit CNOT gates that are conditioned on state |0⟩.
(iii) As every second qubit is only involved at the very end, we can use those before and reset them.
(iv) A Bell state followed by a CNOT gate results in two uncorrelated qubits in states |+⟩ and |0⟩.
(v) We move the pink CNOT gates along the Bell states to the respective qubits above (they commute with the other CNOTs they are "passing").
(vi) Pushing the pink CNOT gates to the left through the purple CNOT gates introduces the extra orange CNOT gates.
(vii) We make use of the principle of deferred measurement.
(viii) In a final step we merge the classically-conditioned gates.As the classical calculation can be done extremely fast compared to quantum gates, we draw it as a vertical line.The orange ⊕ correspond to XOR gates, i.e. addition mod 2. We also represented the initial Bell states again with their circuit representation.In order to compare the dynamic circuits implementation to a solely unitary one, let us first consider different unitary strategies that might be more or less powerful in different regimes:

Strategy I: Ancilla-based implementation
We can consider a similar setting as for dynamic circuits, where we place the system qubits in a way that they are connected by a bus of empty ancilla qubits.In this case, we need to swap the system qubits towards each other and back, so that the ancillas are empty in the end again.The swaps can be simplified since the ancillas are empty in the beginning.Here we can divide into different scenarios: • Circuit Ia: To minimize the number of CNOT gates, we could swap the controlled qubit all the way to the target qubit and back, which results in the circuit depicted in Fig. 6.Here, a lot of gates cancel, so given n ancilla qubits, the number of CNOT gates is 2n + 1.However, here the idle time of the qubits while they are not in state |0⟩ equals n 2 + 2n times the CNOT gate time.
• Circuit Ib: In order to decrease the idle time, we could essentially swap both, the controlled qubit and the target qubit half-way and back as illustrated in Fig. 6 (similar to some circuits presented in [47] and [48]).In that case, less gates "cancel", so for n ancilla qubits we get 3n + 1 CNOT gates, but the idle time reduces to n 2 4 + n times the CNOT gate time.• Circuit Ic: If we wanted to reduce the idle time even further, it might be beneficial to not cancel the CNOT gates in scenario 1b), but keep them to bring the swapped qubits back to state |0⟩ as shown in Fig. 6.In that case, we have essentially no idle time (as qubits in state |0⟩ are not prone to idling errors).Here, the number of CNOT gates increased to 4n + 1 though.
Strategy II: SWAP-based implementation without ancillas This is the case that happens if we just feed our circuit to the transpiler.Here we do not use any ancilla qubits, but only system qubits and apply swaps to move them around.The qubits can be at a different location in the end, so we do not need to swap back.The corresponding circuit is shown in Fig. 6.In this case we require 3ñ + 1 CNOT gates and the idle time is 3 2 ñ2 − 2ñ times the CNOT gate time.However, it is important to note here that the number of qubits lying between the two qubits of interest ñ is on average much shorter than the number of ancillas between two system qubits in the first scenario.Considering the connectivity illustrated in Fig. 1 (c), the relation is approximately n = 2ñ + 3.

Error budget
Let us now compare the regimes in which we expect the different implementations to be most useful to demonstrate the benefit of dynamic circuits.In Appendix G we derive a simple noise model that allows us to compute the combined effect of different sources of decoherence as a single Pauli noise rate: In Lemma 1 we show that the final process fidelity is loosely lower-bounded by e −λtot .The quantity λ tot combines the following noise sources: • The total amount of time t idle that qubits spend idle within the circuit, and a conversion factor λ idle that quantifies the strength of decoherence.t idle is expressed in multiples of the CNOT gate time (i.e.t idle = 3 for 3 CNOT gate times).The time for a mid-circuit measurement, including the additional time waiting for feedback, is defined as µ times the time for a CNOT gate.
• The total number of CNOT gates N CNOTs and an average Pauli noise rate λ CNOT per CNOT.
• The total number of mid-circuit measurements N meas and an average Pauli noise rate λ meas per measurement.
In Table I Comparing the different unitary cases it becomes clear that for large n the unitary implementation Ic) will be the best, as all other implementations have an error in the idling time that scales quadratically.This might be slightly counter-intuitive, as it tells us that the extra 2n CNOT gates required for implementation Ic) compared to Ia) are worthwhile not being cancelled, as the full swap leaves the other qubits unentangled resulting in a drastically decreased idling error, and as even without measurement and feed-forward, it can be still beneficial to use ancilla qubits and thereby increase the distances.For small n we need to keep in mind that for the swap-based implementation (unitary II) the number of involved qubits ñ is smaller than the number of qubits n needed for the same task in the ancilla-based implementation.Given the qubit division illustrated in Fig. 1(d), we achieve a ratio of 31 qubits connected to the bus, 30 qubits not connected to the bus and 66 bus qubits, which is a ratio of roughly 1:2 for all-to-all connected qubits to bus qubits.Respecting the rescaled errors, unitary II would be the most promising implementation for small n.In addition to the CNOT errors and idling errors, for dynamic circuits we also need to consider the error from the additional measurements, as well as a constant term µ that comes from the idling error during measurement and feed-forward.
Given this rough error analysis in Table I, we can infer that for large n dynamic circuits will be beneficial if the measurement of n qubits introduces less error than 3n CNOT gates, that is, when λ meas < 3λ CN OT .A sketch of how the fidelities for the different cases decrease with n is illutrated in Fig. 7. Note, that these error analyses only take into account the error on the involved qubits though.Considering also the fact that there are potentially a lot of other m qubits "waiting" for this operation to be performed would add another idling error of m • (2n + 1).So the fact that dynamic circuits can perform entangled gates between arbitrary qubits in constant depth instead of linear depth with only unitary operations speeds up the whole algorithm and therefore might be much more powerful than what we can see in the error on the respective qubits.Assuming one of them is a pure state σ = |ϕ⟩ ⟨ϕ|, we can simplify the general expression as shown in the following: If ρ is also a pure state ρ = |ψ⟩ ⟨ψ|, the expression reduces to a simple overlap F (ρ, σ) = |⟨ψ|ϕ⟩| 2 .We note that some authors define the square root of this as the fidelity.
Pauli decomposition.To connect to experimental measurements, let us decompose the quantum sates in the standard Pauli basis.The set of all Pauli operators on n qubits {I, X, Y, Z} ⊗n forms an orthogonal Hermitian operator basis.The inner product in operator space L (H) between two Pauli operators P i , P j ∈ L (H) is ⟨P i , P j ⟩ = Tr (P i P j ) = dδ ij , where the dimension of the pure state Hilbert space d := dim H = 2 n .In terms of this basis, any quantum state ρ ∈ D (H), can be decomposed into where the Pauli expectation value of the state with respect to the i-th Pauli is ρ i -an easily measurable quantity.We can similarly define the expectation values of the Pauli P i with respect to the prepared state σ and the desired state ρ as σ i := ⟨P i ⟩ σ = Tr (σP i ) and ρ i := ⟨P i ⟩ ρ = Tr (ρP i ), respectively.Fidelity in terms of Pauli expectation values.The state fidelity between the measured σ and ideally expected pure ρ state, see Eq. (C1), in terms of the Pauli decomposition of each is where σ i is an experimentally measured expectation value and ρ i is a theoretically calculated one.Given this, we can now define the relevance distribution r(P i ) := Random sampling of expectation values.When sampling m random operators {P k } k=1..m according to the relevance distribution r(P k ) and determining their expectation values σ k , the estimated fidelity F := m k=1 σ k ρ k approximates the actual fidelity F with an uncertainty that decreases as 1 √ m .Note that there is also an uncertainty in estimating each σ k , where for an additive precision ϵ roughly (ϵρ k ) −2 shots are required.
Random sampling of GHZ stabilizers.As the GHZ state is a stabilizer state, for each n there are exactly 2 n non-zero Pauli operators P i that each have eigenvalue ±1.Note that some stabilizers of the GHZ state have a minus sign, e.g.−Y Y X.For the n-qubit GHZ state, by defining the set of stabilizers {S i } i=1..2 n , we can express the fidelity in terms of only expectation values on the stabilizers This expression can be approximated by randomly sampling m of the 2 n stabilizers, defining the unbiased estimator F = m , which converges with the number of random samples chosen to the ideal fidelity.

Average gate fidelity
Similarly to the state fidelity, we use the Monte Carlo process certification following [35] to determine the average gate fidelity of our noisy CNOT gate.
Average gate fidelity.Consider the case in which we want to implement an ideal gate U(ρ) := U ρU † .However, instead we can implement only a noisy gate Ũ(ρ) := U(Λ(ρ)), where Λ is some effective noise channel and ρ is a quantum state.What is the gate fidelity of noisy Ũ relative to the ideal U? For a single given pure state ρ = |ϕ⟩ ⟨ϕ|, the state fidelity of the output of the ideal and noisy channels is which can be used to obtain the average gate fidelity devised by a uniform Haar average over the fidelity of the ideal and noisy output states, with ρ ψ = |ψ⟩ ⟨ψ|, To estimate F avg U, Ũ , we will use the process (or entanglement) as a more experimentally-accessible quantity.Process fidelity.Compared to the gate fidelity, the process fidelity is more readily estimated.It can in turn serve as a direct proxy to the gate fidelity.To make the connection, recall that the Choi-Jamiolkowski isomorphism [49] maps every quantum operation Λ on a d-dimensional space to a density operator ρ Λ = (I ⊗ Λ) |ϕ⟩ ⟨ϕ|, where |ϕ⟩ = 1 For a noisefree, ideal unitary channel U and its experimental, noisy implementation Ũ, the process fidelity F proc is the state fidelity of the respective Choi states ρ U and ρ Ũ : From this fidelity, the gate fidelity can be extracted using the following relation derived in [50]: Estimating the process fidelity.As described in Ref. [35], instead of a direct implementation of (I ⊗ Ũ) |ϕ⟩ ⟨ϕ| followed by measuring random Pauli operators on all qubits, we follow the more practical approach, where Ũ is applied to the complex conjugate of a random product of eigenstates of local Pauli operators P i ⊗ P j , followed by a measurement of random Pauli operators P k ⊗ P l .This leads to the same expectation values The operators are then sampled according to the relevance distribution Note, that d corresponds to the dimension of the Choi state, i.e. here d = 16.For Λ(ρ) = CNOTρCNOT † , there are only 16 combinations of Pauli operators with a non-zero expectation value ρ ijkl : ρ ijkl = −1 for P i P j P k P l ∈ {Y Y XZ, XZY Y } and ρ ijkl = 1 for the remaining 14.Thus, the relevance distribution is uniform amongst those with r = 1 16 and we can just take the average expectation value of those 16 operators.FIG. 9. Implementation details.In (a), we show the device layout of ibm peekskill, with the 21 qubits chosen for our dynamic circuits marked in black.In (b) and (c), we plot the cumulative distribution of the T1 and T2 coherence times, the single qubit gate (SX), readout (Meas.)and two qubit controlled-X (CX) error rates of the chosen qubits, as well as the corresponding median values.

Appendix E: Toffoli or CCZ
Dynamic circuits can also be applied to more efficiently compile multi-qubit gates.As an example, we describe how the CCZ, or Toffoli gate up to two single-qubit Hadamard gates, can be implemented by optimizing multiple teleported CNOT gates.Compilation of the unitary circuit on a 1D chain of n + 3 qubits using CNOT gates naïvely requires a two-qubit gate depth of O (n).Using dynamic circuits, we can implement this long-range entangling gate in shallow depth.Naïvely, one could successively implement each CNOT gate of the typical Toffoli decomposition (shown at the top of Fig. 10(a)) using the gate teleportation described previously.However, involving an ancillary qubit between the three system qubits to merge the teleported gates, as shown at the bottom of Fig. 10(a), allows for a more efficient implementation with the dynamic circuit; see Fig. 10(b).In total, this formulation requires n + 1 measurements, n + 6 CNOT gates, and 5 feed-forward operations divided across two sequential steps.Notably, as most qubits are projectively measured early in the circuit, the idling error should be low.Thus, we expect this shallow implementation with dynamic circuits to be advantageous over its unitary counterpart, especially for large n.
Appendix F: Error analysis for GHZ states

Error budget
As in Appendix B 2, we leverage (B1) to estimate the total noise λ tot of a quantum circuit as motivated by the model discussed in Appendix G. There, it is derived that e −λtot gives a lower bound on the process fidelity of the circuit.For GHZ states however, we are interested the state fidelity, so the bound from Lemma 1 no longer applies in a rigorous sense.However, we find that the same model can still provide useful intuition if we accept that the model parameters λ CNOT , λ meas no longer have a direct interpretation in terms of worst-case Pauli-Lindblad noise or a combination of amplitude-and phase-damping noise respectively.See Appendix G for details.
For the unitary approach we require n CNOT gates to entangle n qubits.For simplicity we assume (and implement) only a onedimensional connectivity chain in our protocols and the following numbers correspond to an even number n (only constant terms change when considering odd n).To minimize the idling time, we start in the middle and apply CNOT gates simultaneously towards both ends.This leads to an idle time of n 2 4 − 3 2 n + 2 times the CNOT gate time, as displayed in Table II.In the dynamic circuits approach we require 3 2 n − 2 CNOT gates in total, while the idling time is µ 2 n + 1 times the CNOT gate time, where µ corresponds to the measurement and feed-forward time (as a multiple of the CNOT gate time).However, here we also need to consider the errors of the additional n 2 − 1 measurements.As the error coming from the CNOT gates and the measurements is usually substantially larger than the error from the idling time, we expect that for small n the standard unitary preparation succeeds.However, as the idling time there scales as O n 2 in contrast to all errors in the measurement-based approach scaling only as O (n), we expect a crossover for large n, where the implementation with dynamic circuits will become more beneficial.The error budget is summarized in Table II In Fig. 11 we determine the expected crossover in performance from unitary to dynamic circuits for varying mid-circuit measurement and CNOT gate errors.We use the values of t idle , N CNOT , and N meas shown in Table II to predict how many qubits are required to see and the state fidelity at the cross-over, or where the performance of dynamic circuits becomes higher than that of its unitary counterpart, as a function of the mid-circuit measurement errors.Note that in this noise model we assume that we can eliminate all ZZ errors by applying dynamical decoupling.We keep the idling error constant at λ idle = 0.001 and consider different CNOT errors λ CNOT ∈ {0.001, 0.01, 0.02}.We can reach a fidelity > 0.5 for a CNOT error of λ CNOT = 0.01 with mid-circuit measurement errors λ meas ≲ 0.003 and for a CNOT error λ CNOT = 0.001 with mid-circuit measurement errors λ meas ≲ 0.012 In this section we present a simple framework for computing lower bounds on fidelities using the Pauli-Lindblad noise model discussed in [46].Pauli-Lindblad noise channels have several nice properties that we can use to simplify calculations, and also allow us to reduce estimates of the noise properties of our hardware to relatively few parameters.
Normally, Pauli-Lindblad noise is the workhorse of probabilistic error cancellation -an error mitigation scheme that leverages characterization of noise in order to trade systematic uncertainty for statistical uncertainty.But we are more interested in using Pauli-Lindblad noise as a tool for capturing the behavior of fidelity as a function of circuit size with an appropriate balance of rigor and simplicity.
As such, our central goal in this section is to develop mathematical tools that allow us develop a Pauli-Lindblad representation of various noise sources such as decoherence and gate noise and to find a method to combine all of this noise into a fidelity for the entire process.In particular, we aim to give a justification for modeling noise via the quantity λ tot as in (B1).This is achieved by Lemma 1, which states that e −λtot gives a lower bound on the process fidelity.
We leave the majority of our mathematical exposition without proof for sake of brevity, but present the proof of Lemma 1 at the end of this section.
Pauli-Lindblad noise.Pauli-Lindblad noise is a quantum channel defined as follows.Let P be the n-qubit Pauli group modulo phase, and consider some P ∈ P. Then for some noise rate λ ∈ R + the noise channel Γ λ P is given by: This is essentially applying P with probability ω.Pauli noise channels also have a representation as time evolution with respect to a simple Lindbladian: for P ∈ P, let L P (ρ) := P ρP − ρ.This way Γ λ P = e λL P .The main justification for why we can restrict to Pauli noise channels is twirling.Conjugating an arbitrary noise channel by a random Pauli matrix yields a channel that is always expressible as a product of Pauli noise.Although our experiments do not feature twirling, even for untwirled circuits we expect the Pauli-Lindblad noise to capture the first-order noise behavior.
Another reason why we expect our noise model to only capture the behavior to first-order is that we assume the noise rates are the same for all qubits.All CNOT gates and idle times are assumed to contribute the same amount of noise.But this is not a realistic representation of our hardware -in actuality different qubits have different coherence times and gate qualities also vary.When we consider circuits on many qubits we expect these differences to average out.
Let Λ be a quantum channel.Then let Λ be its Pauli-twirled version given by: For Q ∈ P, twirled channels Λ satisfy Λ(Q) = c Q Q for some coefficients c Q .For every Λ there exist noise rates λ P for P ∈ P/{I} such that Λ = P Γ λ P P .These noise rates satisfy: A central convenience of Pauli noise channels is that they do not interfere with each other when propagated: Pauli noise channels commute , and the noise rates can be added together when the Pauli is the same Γ λ1 P Γ λ2 P = Γ λ1+λ2 P .Combining noise channels into a single fidelity.Say we are trying to compute the overall amount of noise in a particular quantum circuit that has been appropriately twirled.Gates and idle time of the qubits all contribute some amount of Pauli noise.We propagate all of the Pauli noise to the end of the circuit, thereby removing any noise that does not affect certain mid-circuit measurements.Finally, we must tally up the noise Paulis on the resulting quantum state.
One metric for measuring the error on the final state is trace distance, or diamond norm if we are considering a channel.For a single Pauli noise source, we have the simple relation that for any P we have Γ λ P − I ⋄ = 1 − e −2λ .To generalize this to multiple Paulis, a simple approach could be to just apply the triangle inequality to all of the different Paulis.But it turns out we can do much better using the following bound on the process fidelity: Lemma 1.Consider a channel Λ = P Γ λ P P for some rates λ P .Then F proc (Λ, I) ≥ exp(− P λ P ).This bound is still pretty loose, but it is very simple and does better than adding up diamond norms.This can be seen by, for example, looking at the channel ).The latter is looser for N ≥ 2 and for any c.Lemma 1 also has the key advantage that it makes computation of the overall noise rate very simple: just add up all the noise rates.This allows us to simply tally the total idle time and count the number of CNOTs to obtain the total amount of noise, as in Appendix B 2.
An issue with using Lemma 1 is that it becomes increasingly loose in the limit of large P λ P .The quantity exp(− P λ P ) vanishes in this limit, but in general we have F proc (Λ, Λ ′ ) ≥ 1/d for all Λ, Λ ′ .When we only have one source of Pauli noise Γ λ P then not even the lower limit of 1/d can be reached as λ → ∞.Unfortunately, we see no way of overcoming this limitation while preserving the mathematical elegance of this tool: we would like to simply consider the quantity P λ P .The reason for this shortcoming is that we do not account for cancellations between Pauli errors -we discuss the details of the derivation at the end of this section.
Another limitation of this analysis is that it completely ignores crosstalk.Every gate is assumed to behave independently.Assuming independent errors corresponds to a worst-case analysis analogous to the union bound, so we would expect the bounds resulting from Lemma 1 to still roughly capture average error from crosstalk by accounting for it as T 2 dephasing noise, an error that we include when modeling experiments without dynamical decoupling.
Propagating noise to the end of the circuit.Next, we discuss how to move all the noise sources to the end of the circuit.This is particularly easy since we are considering Clifford circuits.Once all the noise is in one place, we can use Lemma 1 to combine it into a single fidelity.
With U• := U • U † as before, elementary calculation shows that UΓ λ P = Γ λ U (P ) U, so Pauli-Lindblad noise propagated through a unitary Clifford circuit is still Pauli-Lindblad noise.
Our circuits also feature several adaptive gates, propagation through which can be achieved as follows.Let Λ disc be the channel that traces out the first of two qubits.Then Λ disc Γ λ P ⊗Q = Γ λ Q Λ disc .Similarly, let Λ corr,P be the channel that measures the first qubit and applies a correction P onto the second qubit.If P and Q commute, then Λ corr,P Γ λ Q⊗R = Γ λ R Λ corr,P .Otherwise, Λ corr,P Γ λ Q⊗R = Γ λ P R Λ corr,P .Now that we have established how to move noise to the end of the circuit and to tally it into a bound on the fidelity, all that remains is to show how to bring various noise sources into Pauli-Lindblad form.
Decoherence noise.We begin with decoherence noise that affects idling qubits.We consider depolarizing, dephasing, and amplitude damping noise.
Conveniently, depolarizing and dephasing noise are already Pauli noise channels.A depolarizing channel Λ dep,q replaces the input ρ with the maximally mixed state with probability 1 − q: Λ dep,q (ρ) = qρ + (1 − q) I 2 n .(G4) Convergence to 0.4.In the main text, we remarked that the fidelities of the measurement-based CNOT experiments converge to a value slightly below 0.4, as is observed in Figure 1 (c).As discussed, this is due to the structure of the measurement-based circuit in Figure 1 (a).While the circuit also experiences infidelity on the top and bottom qubits due to idle time and some CNOTs, the only infidelity that actually scales with n is due to incorrect Z and X corrections on the top and bottom qubits respectively.
We can model this noise as Γ λ ZI ZI Γ λ IX IX in the limit of large λ ZI , λ IX , in which case ω ZI , ω IX approach 1/2.We proceed as in (G8).Since these Pauli errors cannot cancel, the calculation is exact.

FIG. 1 .
FIG. 1. Teleporting a CNOT gate for long-range entanglement.(a) Left: Circuit for a long-range CNOT gate spanning a 1D chain of n-qubits subject to nearest-neighbor connections only.Middle: Equivalent unitary decomposition into implementable CNOT gates; circuit depth O(n).Right: Equivalent circuit employing measurements with feed-forward operations; circuit depth O(1).If the post-measurement state is unused, feed-forward operations can be handled in post-processing, eliminating the need for their experimental implementation.Yellow regions indicate the idle time during CNOT gates on other qubits as well as during measurement and feed-forward (which is denoted by duration µ).(b) Error model inputs for unitary, measurement-based, and dynamic-circuit CNOT protocols comprise the total number of: non-zero idleblock times, CNOT gates, and additional measurements.(c) Experimental results, where dynamic circuits offer improved fidelity for CNOT gate teleportation across a qubit chain ≳ 10 qubits.(d) Map of a 127-qubit heavy-hexagonal processor, ibm sherbrooke, overlaid with system configurations for long-range gate teleportation across a locally connected bus.To establish an effective all-to-all connectivity, we show one possible strategy of dividing the qubits into system (purple and orange) and sacrificial ancilla (turquoise and blue for extra connections) qubits.To parallellize gate execution with increased connectivity, orange qubits can be used as ancillas.We show how a particular long-range CNOT can be implemented through an ancilla bus marked as turquoise spins.

FIG. 2 .
FIG.2.Preparing long-range entangled states.(a) Illustration of a GHZ state with chosen qubit spins (spheres) in a superposition of "all up" and "all down" polarizations (arrows), overlaid on a quantum processor.(b) Circuits to prepare an n-qubit GHZ state using either a unitary (left) or dynamic (right) circuit.For a 1D qubit chain, the depth of the unitary (resp., dynamic) circuit scales as O(n) (resp., O(1)).If the final state is not directly used, the feed-forward operations can be implemented in classical post-processing on the output bits (classically controlled X gates and resets can be omitted).Yellow regions indicate the idle time during CNOT gates on other qubits as well as during measurement and feed-forward (which is denoted by duration µ).(c) Error model inputs for the GHZ preparation circuits.The model incorporates the noisy components of the circuits: non-zero idle circuit periods (yellow), number of CNOT gates (pink), and the number of mid-circuit measurements (green).These parameters are used to derive an error model that yields a lower-bound on the protocol fidelity, shown in the following panel.(d) Fidelity of preparing the GHZ state on quantum hardware using unitary, measurement-based post-processing, or dynamic circuits in the absence or presence of dynamical decoupling (DD).Data shown with dots.Theory curves based on the error model parameters of panel (c) shown in dashed lines.

FIG. 4 .
FIG. 4. Graphical derivation for reducing a long-range CNOT gate into gate teleportation executed with measurements and feed-forward operations, i.e., a dynamic circuit.Roman numerals indicate sequential step numbers described in main text.

FIG. 5 .
FIG.5.Graphical derivation for the preparation of a GHZ state by converting its canonical but deep unitary circuit into a constant-depth circuit utilizing measurement and feed-forward operations-a dynamic circuit.Roman numerals indicate sequential step numbers described in main text.

FIG. 6 .
FIG. 6.Comparison of the different unitary implementations of a long-range CNOT gate.While the circuits in panels (Ia), (Ib), and (Ic) realize ancilla-based implementations, the circuit of panel (II) realizes a SWAP-based implementation without ancillas.The shaded regions indicate idle periods that accumulate errors.

FIG. 10 .
FIG. 10.CCZ with (a) unitary circuit and (b) a dynamic circuit over long ranges.

FIG. 11 .
FIG. 11.Noise-model predictions that indicate how many qubits are required to see a cross-over and what the corresponding fidelity would be as a function of the mid-circuit measurement errors.

TABLE I .
, we have summarized the error budget for each of the cases.Comparison of the error budget of the unitary and the dynamic circuits implementation in terms of idle time, number of CNOT gates and mid-circuit measurements and two-qubit gate depth.Note, that as the number of involved qubits ñ needed for the unitary implementation II) is in general much smaller, we rescale it for the error budget with the relation n ≈ 2ñ + 3

TABLE II .
. Comparison of the error budget of the unitary and the dynamic circuits implementation in terms of idle time, number of CNOT gates and mid-circuit measurements and two-qubit gate depth.2.Expected cross-over for lower mid-circuit measurement errors