Suppressing quantum circuit errors due to system variability

We present a quantum circuit optimization technique that takes into account the variability in error rates that is inherent across present day noisy quantum computing platforms. This method can be run post qubit routing or post-compilation, and consists of computing isomorphic subgraphs to input circuits and scoring each using heuristic cost functions derived from system calibration data. Using an independent standard algorithmic test suite we show that it is possible to recover on average nearly 40% of missing fidelity using better qubit selection via efficient to compute cost functions. We demonstrate additional performance gains by considering qubit placement over multiple quantum processors. The overhead from these tools is minimal with respect to other compilation steps, such as qubit routing, as the number of qubits increases. As such, our method can be used to find qubit mappings for problems at the scale of quantum advantage and beyond.


I. INTRODUCTION
Given the limitations of present day noisy quantum hardware, implementing quantum circuit error suppression techniques [1][2][3][4] is vital for obtaining high-fidelity results over non-trivial circuit space-time volumes [5][6][7][8]. Indeed, when compiling a quantum circuit to a given quantum system the choice of physical qubits, basis gates, swap mapping [9], gate optimization [10], and dynamical decoupling [11][12][13] routines all play a role in helping to reduce errors when executing the circuit.
Consideration of the noise characteristics of the target quantum system, e.g. gate and measurement errors and coherence times, is nominally taken into account at the beginning of the compilation process via the choice of virtual to physical qubit mapping. As shown in Fig. (1), present day quantum systems have substantial variation in their important metrics, making virtual to physical qubit mapping a crucial step in the compilation pipeline. The choice of qubits is even more critical when applying error mitigation techniques, where the mitigation process is exponentially sensitive on the fidelity of qubit operations [14][15][16].
For small width circuits targeting hardware of O(10) qubits, it is usually possible to hand-select a low-noise subset of qubits with reasonable accuracy. However, as hardware quality improves, and qubit counts begin to approach O(10 2 ), finding optimal layouts manually becomes exceedingly difficult. While automated noiseaware qubit placement routines do exist [3,17], they often suffer from the fact that the best initial qubit layout in terms of noise characteristics is often not ideal for later qubit routing via swap mapping to the target topology; the error from additional swap gates dominates the savings from noise-aware qubit selection. Moreover, the total number and type of gates in the final compiled circuit are not known ahead of time, further complicating qubit placement early in the compilation pipeline. Additionally, it is possible to incorporate quantum processor * E-mail: paul.nation@ibm.com noise into the qubit routing [2] and approximate local gate compilation [18]. However these methods may lead to additional swap gates and poorly approximated global unitary operators, respectively.
Here we take a different approach, remapping quantum circuits after qubit routing, or post-compilation, to matching low-noise subgraphs using device calibration data. Using output circuits generated with swap-efficient layout and routing routines that are not sensitive to device parameters, e.g. the Sabre method [9], we compute the possible equivalent qubit mappings ranked by a heuristic cost function. This can be done at the single device level, or across multiple machines of a similar topology. Circuits are then remapped to the lowest error subgraph before execution. A related technique was considered in Ref. [19] using Google Quantum AI processors [20], but it was found that the system calibra-tion data failed to capture real-world performance details. In marked contrast we will show via common independently defined algorithmic test circuits that meaningful improvements in quantum circuit fidelity can be obtained using even the most trivial of, and efficient to compute cost functions on IBM Quantum hardware. We extend this technique across multiple quantum processors and highlight that additional gains can be found by relaxing the requirement that all circuits be executed on the same quantum system. Although subgraph isomorphism is NP-complete [21], we highlight that the cost of computing these graphs is much smaller than the cost of circuit compilation as a whole, and thus our method adds little in terms of relative cost. Related machine learning algorithms have also been put forth [22], but come with substantial overhead in the requirement of gate set tomography and circuit learning runtimes.
This paper is organized as follows. In Sec. (II) we detail how quantum processor subgraphs are determined from the entangling gate topology of a given quantum circuit and discuss how each subgraph is heuristically scored. Section III discusses implementation considerations when integrating our method into complete compilation pipelines. In Sec. (IV) we look at the performance gains that are achievable with our technique using standard suites of algorithm test circuits, while Sec. (V) details additional performance gains that come when looking for optimal layouts across multiple quantum systems. Finally, Sec. (VI) summarizes our results and looks at possible future improvements. Appendix A looks at the effect of including T 1 and T 2 information in the cost function, while App. B details the importance of qubit routing when using our method.

II. METHOD
Our algorithm, called mapomatic [23], is a post-routing routine that assumes that one or more circuits have been compiled to match the native gate set (basis gates) and entangling gate topology (coupling map) of a target device. This guarantees that the graph structure of the circuits, as defined by their entangling gate connectivity, is a subgraph of the full device coupling map. Finding an optimal set of qubits then becomes a two-step process. First, a search over the system coupling map is performed to identify subgraphs that are isomorphic to each input circuit. Second, a heuristic cost function is used to score the resultant mappings to find subgraphs with the lowest error. Once identified, the input circuits are remapped to their corresponding minimal-cost subgraphs before execution.
In step one, we iterate over all the instructions [24] in the circuit after qubit routing, and possibly after additional gate optimization (post-compilation), and build an interaction graph. Edges in this graph represent unique two-qubit gates in the circuit, and single-qubit gates are treated as standalone nodes. Typically, a simple graph that is undirected and with no parallel edges is used. We then search for isomorphic subgraphs on the quantum processor's connectivity graph; a graph where each node represents a physical qubit and each edge indicates support for two-qubit gates between those qubits. Subgraphs isomorphic to the original mapping are bijective layouts between virtual circuit qubits and physical qubits on the quantum processor. Although here we assume all entangling gates are of the same type this is not a limitation of our routine. Figure 2 shows an example of the graph construction used for the subgraph isomorphism problem. By running after the circuit has been mapped to system topology, there is at least one possible mapping available as the compiler must rewrite the circuit to match the device. In our implementation we use the rustworkx [25,26] library's VF2 [27] implementation which includes support for using the search order heuristic from VF2++ [28]. This node ordering heuristic, that mapomatic takes advantage of, orders the search-tree based on the degree of each node, and in some situations can make the isomorphism search faster, minimizing the cost of the search. Figure 3 shows the run time of the rustworkx library's VF2 mapping function to find all the isomorphic subgraphs of the coupling map with and without the VF2++ ordering heuristic compared with the time it takes to execute the full Qiskit compilation pipeline without mapomatic. Being three orders of magnitude faster at the largest qubit counts, it is clear that the overhead from subgraph finding is minimal compared to the rest of the compilation workflow. This difference widens if, unlike the constant circuit depth used in Fig. (3), we consider situations where the circuit depth increases with width. In these cases, circuit routing and optimization time increases with depth, whereas subgraph finding is only minimally affected because the device topology remains unchanged. Additionally, limitations can be set on the number of internal state visits used in VF2 to bound the overall runtime for finding a set of isomorphic mappings [29].
For the second step we apply an efficient heuristic scoring function to each found mapping, Alg. (1). The scoring function bases its output on the reported calibration data for the processor from which we can define an error map E that maps physical instructions provided by a layout mapping M to error rates. For IBM Quantum systems this includes, amongst other information, singleand two-qubit gate errors, measurement infidelities, and T 1 and T 2 times across the device. This data is nominally updated daily, and thus the scoring can change on similar timescales. For each isomorphic mapping we estimate the overall fidelity of the circuit with that layout applied by taking the product of instruction fidelities corresponding to the physical qubits over which those instructions are applied. That the resultant fidelity approaches zero in the large error limit is not a concern here as we are looking for relative differences, not absolute values [see Ref. [30] for a discussion]. The returned scores for each layout are then used to rank all the possible mappings in order of their estimated error, and the layout with the least error is used. Note that the choice of cost function is not hardcoded into mapomatic, and users are free to define cost functions based on arbitrary input informa-tion.
Missing from Alg. (1) are T 1 and T 2 times, from which one can define approximate error rates associated with qubit idle times. We do not include this information in our default cost function as it was empirically found not to have a large impact on the layout order from scoring; it nominally permutes qubits within a given mapping, but does not modify the mapping ordering, see App. A. Additionally, adding timing information to quantum circuits is currently an unoptimized transformation in Qiskit [31], and greatly increases the runtime of layout selection. We do however include the cost function with idle errors as an example of a custom scoring function at the mapomatic website [23].
It is also important to note that scoring returns floating-point values for each layout. The difference between these values can be smaller than the uncertainty from device fluctuations, and ultimately finite-sampling statistics, effectively leading to a tie between one or more scored layouts. In cases such as these, more complex cost functions and/or information beyond device calibration data is needed to break the scoring degeneracy.

III. INTEGRATION INTO A COMPILER
While mapomatic was originally designed to run as a standalone post-compilation routine it is also possible to directly integrate it into a compilation pipeline. When doing this the only additional constraint is to ensure that any changes in layout do not prevent the circuit from running on the device. When running as a standalone postcompilation routine this is not a constraint because you can always re-run the compilation with a fixed optimal layout to update the circuit. There are two techniques for integrating the algorithm into a compiler, the first is to perform mapomatic immediately after routing, the second is to run mapomatic after all optimization routines but prior to any physical scheduling of the circuit. There are tradeoffs between the two approaches.
Running immediately after routing works with looser constraints typically ignoring directionality (by using an undirected interaction graph and coupling map), and with no guarantee that the circuit has been converted to the native gate set. This means that for error evaluation you can only apply an inexact scoring, typically Algorithm1 Algorithm for scoring layout mappings return 1 − fidelity 7: end procedure using average error rates for the different types of operations available (i.e. 1 qubit gates, 2 qubit gates, etc.) on the device instead of using the exact error rates for each instruction from the calibration data. However, running with looser constraints provides the flexibility to evaluate more potential layouts, and potentially yield better results as later compilation stages in the pipeline will be able to transform the circuit as needed.
Running the algorithm after physical optimization (but before scheduling) means that the circuit is guaranteed to be matching the directionality of the entangling gate topology and all operations are in the native gate set. This allows the scoring Alg. (1) to apply a more exact scoring as the exact instructions used in the output circuit have already been determined leading to a potentially more accurate selection of the best performing layout. However, running in this mode has more constraints on the available layouts as at this point in a typical compilation pipeline the algorithm must conform to all the constraints of the target device. This means the algorithm needs to use directed graphs for the interaction graph and the coupling map and also only evaluate isomorphic mappings where the instructions performed on the circuit qubits are available on the physical qubits selected. As selecting a layout that violates these constraints would result in an invalid output from the compilation.
We've integrated the mapomatic algorithm into the Qiskit compiler as the VF2PostLayout pass [32]. The VF2PostLayout pass supports running in both modes, however, it is integrated into the default compilation pipelines only in the first mode that runs immediately after routing. While in most cases when compiling for current hardware there is no difference when running the different techniques, there are some situations where being able to evaluate more potential mappings can yield better results.

IV. BENCHMARKING RESULTS
In order to validate the performance benefit of our method we will take advantage of the wide variety of test circuit libraries that are available [33][34][35][36]. In particular, here we use circuits from the Quantum Economic Development Consortium (QED-C) [34], comparing runs of this suite with and without mapomatic. We point out that this test suite is developed independent of this work, and unlike a few hand selected examples, represents a true test of our technique. Rather than focusing on all the tests results, some of which are well beyond what today's quantum processors are capable of, we only look at those algorithmic tests and number of qubits where at least one of the two fidelities used in the comparison is ≥ 1/e. This prevents inflated claims of success when dealing with differences in fidelity values whose overall magnitude is not indicative of meaningful experimental outcomes. In addition, rather than looking at fidelities directly, we ask what fraction of the fidelity missing in the baseline result can be recovered through better qubit selection. That is to say given a baseline fidelity f base we compute the value where f is the comparison fidelity. In Fig. (4a) we use the fidelities obtained on the QED-C test suite using Qiskit as the baseline, and look at the fraction of recoverable fidelity when using mapomatic after compilation to remap the same circuit on the IBM Quantum Peekskill device. Specifically, all baseline circuits were transpiled once in Qiskit using optimization_level=3 and approximation_degree=0 that enables Sabre layout and routing and disables approximate unitary synthesis. We see that qubit remapping provides a substantial performance benefit over the qubit assignment of Sabre layout. Across the benchmarking results, an average (median) of 37% (34%) of the fidelity is recoverable on the system via simple qubit remapping. This emphasizes the importance of not only choosing the correct quantum system to execute circuits, but also the correct qubits on that system. The small number of test results with negative values indicate a loss in fidelity from the remapping process; the calibration data does not faithfully represent a subset of qubits any more (device parameter drift) and/or that the simple cost function used here does not capture enough of the underlying contributions to circuit errors. However, it is clear that in the vast majority of cases there is marked performance gains from using our technique to remap quantum circuits.
While Fig. (4a) highlights fidelity improvements versus Qiskit without mapomatic, it is also beneficial to look at comparisons against other compilation pipelines. To this end, in Fig. (4b) we look the differences in fidelity when using Qiskit with mapomatic integrated as a transpiler pass versus the compiler in PyTket [37] used as the base fidelity. In this investigation we used PyTket version 1.3.0 with default_compilation_pass(2). This comparison is of interest not only because PyTket is a commonly used alternative to Qiskit, but also because its qubit layout and routing methods are deterministic rather than stochastic. As with Fig. (4a), we see a sizable portion of the fidelity missing from the PyTket results is recoverable with mapomatic utilized in the Qiskit compilation stack.
Although the results presented in Fig. (4) highlight the possible gains when using mapomatic, it is not a panacea, and other steps of the compilation process still play an important role in the final output fidelity. In particular, the stochastic nature of the qubit routing (swap mapping) methods used in Qiskit gives rise to a variable number of SWAP gates in the final compiled circuits. The variance on the number of added gates means that it is possible that a poor routing choice cannot be remapped with a higher fidelity than would otherwise be possible with a more careful routing selection utilizing repeated routing attempts, or using a non-stochastic routing routine such as that found in PyTket. An example highlighting poor routing on the overall fidelity is shown in App. (B). As a corollary, our results show the challenge when evaluating the performance of quantum systems; the fidelity of final results depends greatly on the circuit rewriting pipeline used before execution. This applies even more so to cross-platform comparisons where different compilation workflows, both client-side and serverside, further complicate the interpretation.

V. BEST DEVICE SELECTION
To date, standard workflows for quantum computation involve first selecting a good target system on which to compile and execute a set of quantum circuits. Until now, we have followed this procedure in this work as well, choosing the IBM Quantum Peekskill system based on prior knowledge of its performance characteristics. However, as the number of available quantum systems grows, and the complexity of algorithms that they faithfully execute increases, it becomes challenging to ascertain which of the myriad of system and qubit layout combinations are good candidates. Moreover, when running many different algorithms, or the same algorithm over a varying number of qubits, the optimal system and qubit layout can span several devices. Therefore, it can be valuable to look for optimal circuit layouts across multiple quantum processors.
It is possible to use the tools presented here for determining good initial device and layout candidates across multiple quantum systems in the following manner. To begin, circuits must be compiled against one of the systems within the set of possible target processors. To obtain the largest number of matching subgraphs, it is ideal for the systems to share a common entangling gate topology, e.g. all IBM Quantum systems are based on the heavy-hex architecture [38]. The mapomatic routine can then be run across the set of quantum processors returning the device name, best layout, and the associated error value for each processor. The device and layout corresponding to the lowest overall error value is then selected for remapping and execution. The matching subgraphs for systems with the same topology need only be computed once, where as the cost function for each layout must be evaluated on each individual system. If a quantum system does not have enough qubits to accommodate a circuit, then there are no matching subgraphs and the device is skipped.
To demonstrate the value in looking across multiple systems, we repeat the Hamiltonian simulation test circuits from the QED-C suite, and use mapomatic to find the system and layout with minimal cost across the entire fleet of IBM Quantum systems ranging from 5-to 127qubits. The experiments presented in Fig. (5) show the resultant fidelity from the quantum system selected from the entire IBM Quantum processor lineup compared to the best layout on the IBM Quantum Peekskill system used in Fig. (4). Although the Peekskill system is a newer high-coherence (avg. T 1 ∼ 300 µsec) 27-qubit Falcon R8 system, see Fig. (1), we see that previous generation Kolkata and Kawasaki Falcon R5 systems are selected and give better results for several circuit widths. The same is true for the two-qubit case where the 65-qubit Ithaca system does better, but with gains limited by the overall high-fidelities at such small circuit space-time volumes. We also see that the estimated fidelities returned by mapomatic do not match the results from hardware execution. Firstly, as mentioned previously, we do not include qubit idle time errors in the cost function. Although they contribute to the overall circuit error, we have empirically found that they do not greatly change the ordering of layouts; contributions from T 1 and T 2 in the instruction errors already capture much of much of these effects in the scoring process. Second, the calibration data returned does not include sources of error such as spectator qubits or cross-talk. Thus the fidelities corresponding to scored layouts should be loosely interpreted as upper-bounds. The performance gains shown in Fig. (5) are non-negligible given that the Peekskill system is one of the best performing IBM Quantum systems to date, and was selected because of this. This highlights that even with detailed device knowledge, looking across multiple quantum systems with mapomatic can provide performance gains that would otherwise be overlooked.

VI. CONCLUSION
We have shown that it is possible to account for system variability in near-term quantum processors using circuit remapping applied post-compilation, or after qubit routing. Using simple, quick to evaluate cost functions we have demonstrated that sizable fractions (∼ 40%) of result fidelity can be recovered on a wide variety of standard quantum application test circuits, and as compared to other circuit transformation pipelines, using our mapomatic method. Using performant subgraph isomor-

FIG. 5: Fidelity of QED-C Hamiltonian simulation
circuits for the Peekskill system used in Fig. (4) compared to execution of the same circuits on the optimal target system as selected by mapomatic from the entire set of IBM Quantum devices. Markers show the estimated fidelity for the Peekskill (circle) and mapomatic chosen (square) systems. Circuits are executed in the same manner as Fig. (4).
phism routines, our technique adds little in terms of additional cost to the overall compilation process. Given that much of the overall wall clock time of executing circuits is waiting in a queue, this remapping overhead can likely be amortized over this duration. This low-overhead also allows for layout scoring across multiple quantum processors, alleviating the need for users to identify a target system ahead of time, and uncovering additional performance improvements that would be missed if device selection is done ahead of time. mapomatic is easy to integrate into quantum workflows and, because of the performance advantages it offers, is incorporated into the transpilation process by default in Qiskit Terra 0.21+. Thus users are capable of leveraging these techniques immediately, and in many cases have been doing so without knowing it. There are several possible future improvements to mapomatic that should be investigated. Chief among them is looking for improved heuristics for scoring layouts that are more accurate while at the same time being efficient to evaluate. In particular, is there information outside of standard device calibration data that can yield more accurate cost analysis and break ties between layouts? Additionally, qubit selection over multiple quantum systems has received little attention to date, but is likely beneficial for algorithms that can be executed in parallel over multiple systems, for example variational algorithms [39,40] and circuit cutting [41,42]. More broadly, as the performance improvements seen with mapomatic apply across the board, it is possible to integrate it into cloud-based workflows that would allow for the abstraction of device selection away from users. This would not only lead to performance gains, but also removes the need for users not interested in device characterization to understand detailed system information.
As the field of quantum computation approaches the frontier of Quantum Advantage, and users increasingly make use of error mitigation techniques with runtimes exponentially sensitive on qubit quality, mapomatic and related qubit selection methods will undoubtedly play a pivotal role in early demonstrations of practical quantum applications.

Num. qubits
Default cost function 2 [5,8]   this is due for two reasons. First, it is currently inefficient to add repeatedly generate timing information in Qiskit, and doing so would greatly increase the runtime of the scoring process. Second, even when this information is included, the resulting qubit layouts are to large extent nominally simple permutations of the layouts computed without these error processes; effects from T 1 and T 2 are, to large extent, already accounted for in the gate and measurement errors included in the device calibration data. In Table I we give an explicit example of this, looking at the resulting optimal qubit layouts generated by mapomatic, both with and without idle-errors included in the cost function, from the Hamiltonian simulation test circuits from Sec. V.
Appendix B: SWAP mapping variability mapomatic will remap any circuit passed to it that matches hardware topology and written in terms of the supported basis gates. For compilation routines that contain stochastic components, such as the qubit routing methods in Qiskit, this means that the properties of the output circuits will follow a distribution of values. For routing, this is a distribution in the total number of SWAP gates added to the circuit in order to satisfy the topology constraints of the target system. Because each SWAP gate (equal to 3 CNOT gates) is costly to execute, there is a corresponding distribution in output fidelity when executing the same input circuit multiple times. It can be the case that the variance in result fidelities is greater than the benefit of remapping, and optimizing routing before remapping with mapomatic is still an important part of the compilation workflow. To overcome this, circuits should nominally be routed multiple times, which can be done in parallel, with the output circuit comprised of the fewest CNOT gates then passed on to mapomatic.
To highlight the effect that routing has on the overall fidelity we will do the opposite of what is proposed above and compile circuits multiple times using Qiskit and the Sabre routing method, taking the one with the greatest number of CNOT gates as the chosen circuit. Figure  (6) shows the affect this has on the QED-C test circuits using the deterministic routing method in PyTket as the baseline result. The detrimental consequence that added SWAP gates has on the resultant fidelity is clear when comparing against the results in Fig. (4b).