Qubit assignment using time reversal

As the number of qubits available on noisy quantum computers grows, it will become necessary to efficiently select a subset of physical qubits to use in a quantum computation. For any given quantum program and device there are many ways to assign physical qubits for execution of the program, and assignments will differ in performance due to the variability in quality across qubits and entangling operations on a single device. Evaluating the performance of each assignment using fidelity estimation introduces significant experimental overhead and will be infeasible for many applications, while relying on standard device benchmarks provides incomplete information about the performance of any specific program. Furthermore, the number of possible assignments grows combinatorially in the number of qubits on the device and in the program, motivating the use of heuristic optimization techniques. We approach this problem using simulated annealing with a cost function based on the Loschmidt Echo, a diagnostic that measures the reversibility of a quantum process. We provide theoretical justification for this choice of cost function by demonstrating that the optimal qubit assignment coincides with the optimal qubit assignment based on state fidelity in the weak error limit, and we provide experimental justification using diagnostics performed on Google's superconducting qubit devices. We then establish the performance of simulated annealing for qubit assignment using classical simulations of noisy devices as well as optimization experiments performed on a quantum processor. Our results demonstrate that the use of Loschmidt Echoes and simulated annealing provides a scalable and flexible approach to optimizing qubit assignment on near-term hardware.

As quantum computers with large numbers of qubits become increasingly available, experiments executed on a given device may not utilize all available qubits. In this case, the outcome of executing a quantum program will depend on the ability to efficiently select a subset of high-performing physical qubits. For any given quantum program and device there are many ways to assign physical qubits for execution of the program, and assignments will differ in performance due to the variability in quality across qubits and entangling operations on a single device. Evaluating the performance of each assignment using fidelity estimation introduces significant experimental overhead and will be infeasible for many applications, while relying on standard device benchmarks provides incomplete information about the performance of any specific program. Furthermore, the number of possible assignments grows combinatorially in the number of qubits on the device and in the program, motivating the use of heuristic optimization techniques. We demonstrate a practical solution to the problem of qubit assignment by using simulated annealing with a cost function based on the Loschmidt Echo, a diagnostic that measures the reversibility of a quantum process. We provide theoretical justification for this choice of cost function by demonstrating that the optimal qubit assignment coincides with the optimal qubit assignment based on state fidelity in the weak error limit, and we provide experimental justification using diagnostics performed on Google's superconducting qubit devices. We then establish the performance of simulated annealing for qubit assignment using classical simulations of noisy devices as well as optimization experiments performed on a quantum processor. Our results demonstrate that the use of Loschmidt Echoes and simulated annealing provides a scalable and flexible approach to optimizing qubit assignment on near-term hardware.

I. INTRODUCTION
With the increased availability of NISQ [1] devices there is a growing need for tools to efficiently deploy quantum circuits on hardware in a noise-aware manner. Part of this task is qubit assignment [2,3], where the goal is to assign a logical circuit [4] to the set of physical qubits on noisy hardware that maximizes the circuit performance. This task requires access to a performance metric for circuits implemented on hardware that is both efficient to evaluate and faithful to standard fidelity metrics, as well as a means of efficiently optimizing that performance metric with respect to the set of all possible qubit assignments.
Performing qubit assignment in a way that satisfies these requirements faces a number of challenges. For instance, an experimentalist might choose physical qubits according to which subset maximizes the fidelity of the output of the quantum program, but this introduces an experimental overhead that is exponential in the system size on near term devices. On the other hand, choosing physical qubits based on standard device benchmarks such as randomized benchmarking [5,6] or cross-entropy benchmarking [7,8] can result in poor-performing assignments since these benchmarks capture average error behavior that may differ from the noise occurring in the con-text of a specific quantum program. Once a performance metric is chosen, there still remains the challenge of efficiently exploring a space of qubit assignments that grows combinatorially both in the size of the quantum program and the hardware device. Implementing qubit assignment techniques can improve device performance whenever the total number of qubits on the device is larger than the number required for an application. This scenario is typical for near and intermediate-term devices, particularly in the context of quantum error correction experiments [9][10][11][12][13].
To overcome these challenges we study the Loschmidt Echo [14], a tool for probing reversibility in quantum systems. We then demonstrate that this metric can be used with simulated annealing (SA) [15] to effectively perform qubit assignment on hardware. The large state space of potential hardware circuits combined with high variance in qubit error rates naturally leads to an uneven cost landscape, for which SA is particularly well suited. Combining SA with Loschmidt Echoes then provides a scalable technique for qubit assignment that does not rely on potentially inaccurate hardware diagnostics.
Prior works have used variants of the Loschmidt Echo to assess the performance of qubit assignments. Refs. [16,17] introduced techniques for benchmarking quantum devices using "mirror circuits" composed entirely of Clifford operations, for which the fidelity of output state may be computed with the aid of efficient classical simulations of Clifford circuits [18,19]. While this technique provides a benchmark for the performance of a set of qubits with respect to a general set of operations, we are interested in the ability of the device to accurately prepare some specific state |ψ . Furthermore, while it is possible to use mirror circuits to evaluate the performance associated with specific circuits, its application to large circuits is limited due to the lack of efficient classical simulation of general non-Clifford circuits. Moreover, we will demonstrate theoretically how generic benchmarks (which assess qubit and gate performance independently of circuit structure) have limited ability to predict the performance of a given qubit assignment. We will provide experimental evidence to support this claim for a specific choice of generic benchmark.
Similarly, our work differs from prior qubit assignment experiments [2,20] since we do not rely on a set of gate fidelities characterizing hardware performance as a proxy for assessing the performance of a specific circuit on hardware. Therefore, while Ref. [2] used a hybrid algorithm involving SA to search over "sub-allocations" of progressively larger qubit subsets, in our approach we altogether avoid using a graph weighted by gate fidelities of (partial) circuits to anneal over.
Qubit assignment [21][22][23][24][25] (often referred to as qubit routing, qubit allocation, or quantum compilation) has been extensively studied as a tool for improving the performance of circuits executed on hardware. Qubit assignment typically includes modifying logical circuits to run on hardware when the gateset and connectivity constraints of the device do not match those of the logical program, typically with the goal of minimizing the number of additional operations introduced to the program. For instance, Ref. [26] introduced a technique for circuit compilation using SA with a cost function based on CNOT count. However, in this work we simplify qubit assignment on hardware by only considering a fixed circuit that already satisfies hardware connectivity, with the goal of maximizing the performance of a quantum program with respect to the choice of hardware qubit subsets satisfying this connectivity.
This work proceeds as follows. In Sec. II A we overview the problem of qubit assignment on a hardware device. In Sec. II B we provide theoretical evidence that the Loschmidt Echo is useful for ranking qubit assignments when the goal is to maximize the fidelity of a prepared state, and demonstrate shortcomings of other existing methods based on benchmark data or random circuit fidelities in this same context. Specifically, we show that scoring qubit assignments based on gate fidelities taken from device benchmark data fails to capture general performance trends. In Sec. II C we then develop a framework for performing qubit assignment using SA, and in Sec III we verify the performance of the Loschmidt echo and demonstrate optimization results that outperform competitive techniques on both simulated and experimental datasets.

A. Qubit assignment on hardware
We describe the task of qubit assignment in terms of a hardware topology and a set of gates [27]. We will provide a graph-based description of a quantum circuit as a sequence of gates applied to single qubits or pairs of qubits in a logical circuit, and a representation of qubit assignment as a transformation subject to connectivity constraints on a hardware device.
We describe the connectivity of a quantum device by an undirected graph G p = (V p , E p ) where each vertex i ∈ V p describes a physical qubit and each edge {i, j} ∈ E indicates that entangling operations can be executed between qubits i and j. We assume |V p | = N qubits are available on hardware and that the hardware supports a gateset G p consisting of single-and two-qubit gates which allows for universal computation [28] and is a common choice on superconducting qubit platforms [29,30]. We similarly define a logical graph G L = (V L , E L ) and a logical circuit C L over n qubits as a sequence of m gate operations [g 1 (v 1 , e 1 ), . . . , g m (v m , e m )], with e k ∈ E L , v k ∈ V L , and g k ∈ G L for k = 1 . . . m. Taking every gate g k in C L to act on either (i, ∅) (single-qubit gate) or (∅, {i, j}) (two-qubit gate), we can summarize the logical circuit C L ∈ (G m L , V m L , E m L ) according to three sequences, one for each gate description, one for each set of target qubits, and one for each set of target edges. For example a circuit might look like Let U (C L ) be the unitary representation of C L acting on n unique qubits and and let U (C p ) be the unitary representation of a hardware program C p ∈ (G m p , V m p , E m p ) acting on up to N ≥ n qubits. Then defining the state |ψ = U (C L )|0 n we are interested in searching over the range of a logical to physical circuit map of the form subject to the constraint for some subsystem A consisting of up to N − n qubits, in which case we will say that M satisfies M (C L ) = C p . In general, the effects of noise on the quantum device will interfere with perfect realization of the unitary operation U (C p ). We therefore assume that there is a quantum channel E(ρ; C p ) that describes the effect of C p applied to an input state ρ on hardware. We will consider implementing a specific unitary U on noisy hardware and therefore restrict our attention to circuits C p for which U (C p ) = U up to some permutation of subsystems and we will hereafter implicitly assume dependence on some hardware circuit C p (and therefore some qubit assignment) using the notation E U (ρ) = E(ρ; C p ).

B. Loschmidt Echoes for evaluating qubit assignments
We are interested in selecting the best qubits for preparing a state |ψ = U |0 over n qubits given a choice of N hardware qubits with constrained topology. Ideally we would be able to directly compare the state prepared by a noisy implementation E U to the perfect preparation of |ψ = U |0 . One way to characterize the hardware performance is to employ the fidelity function [31], which provides a measure of closeness between two quantum states ρ and σ given as F(ρ, σ) = Tr ρ 1/2 σρ 1/2 2 . Using F we can compare the prepared state ρ = E U (|0 0|) (which implicitly depends on a hardware circuit C p ) to an ideal state prepared by U in a noiseless environment: In general, evaluating F on a near term device can incur significant resource overhead. Computing F using Direct Fidelity Estimation (DFE) [32,33] can incur significant experimental overhead for each choice of qubits, and is therefore impractical to evaluate for qubit assignment on large quantum devices. Similarly, classical shadows [34] can be used evaluate F in constant time, but evaluating fidelities in this way requires significant overhead and the ability to execute operations drawn randomly from the n-qubit Clifford group -a requirement that is out of reach for near-term devices due to the depth requirements of such circuits. We are therefore motivated to find a proxy for circuit fidelity that can be executed quickly and efficiently, at the expense of accurately characterizing the true state fidelity or process fidelity. Given the ability to implement unitaries U and V , the fidelity between the states |ψ = U |0 and |φ = V |0 can be determined in constant time since the quantity can be estimated as the empirical probability of observing the all-zeros bitstring 0 n at the output of a circuit U † V . By extension, to test the effects of noise we choose to evaluate a performance metric for C p defined as where E U † represents the process resulting from a noisy implementation of a unitary U † on hardware. Eq. 5 computes the probability that a state prepared by implementing U can be successfully returned to |0 by implementing U † , though this will not provide an accurate estimate for the state fidelity F in general. This process is referred to as a Loschmidt Echo [14,35], and serves as a general measure for reversibility in a quantum process (see Ref. [36] for a thorough review). In the special case where E U † (ρ) = U † ρU can be implemented with perfect fidelity, comparison to Eq. 3 shows that this procedure exactly recovers F (C p ). From this perspective, the application of E U † can be understood as preparing an imperfect measurement of the projector |ψ ψ|, a comparison that suggests that F LE may be an effective tool for analyzing the implementation of U on hardware. Fig. 1 shows a circuit which estimates F LE as the probability of sampling the all-zeros bitstring at the circuit output.
While F LE will generally not provide an estimate for the state fidelity F , we now show that given some assumptions about device noise, the optimal qubit assignment with respect to maximizing F LE will result in a state with high fidelity as computed by Eq. 3. We first provide a concrete statement of this result by assuming a noise model consisting of one-qubit and two-qubit local depolarizing channels parameterized by the error rate i for each single-qubit gate acting on qubit i and the error rate η ij for each two-qubit gate acting on qubits {i, j}. This model is applied to the hardware implementation of each hardware circuit C p = (g, v , e ) with a fixed gate sequence g of length m. We then define the sets of optimal qubit assignments with respect to F and F LE respectively as: where the fixed gate sequence means that each maximum is computed with respect to all possible, valid assignments on the hardware device, i.e. sequences of qubits v ∈ (V ) m and sequences of edges e ∈ (E ) m subject to V ⊂ V p and E ⊂ E p for a device with connectivity described by (V p , E p ). Under the assumed noise model, we can then compare these optimal assignments in a lowerror limit by the following proposition: Proposition 1. For a graph G p = (V p , E p ) describing hardware qubits, consider an error model consisting of Markovian noise channels, each characterized by an error rate , inserted after each gate. Suppose each gate U and its inverse implementation U † are augmented by the same noise channel. Let the optimal qubit assignments S * LE and S * be defined as in Eqs. 6-7. Then in the weak-error limit 1 , We first illustrate this claim by considering the special case of local depolarizing noise (Appendix A 1) and global depolarizing noise (Appendix A 2), followed by a complete proof of this proposition then in Appendix A 3. A simple counterexample involving unitary errors for which F LE is independent of F is also provided in Appendix A 2. In Appendix B we show that F LE is strongly biased by readout error compared to the true fidelity of the prepared state, and so it is necessary to implement readout error mitigation [37][38][39][40][41] and in particular, readout correction to recover the all-zeros bitstring [42] when selecting qubits based on this metric.
Under the assumption of a 2-local depolarizing error model discussed in Appendix A 1 , it is natural to consider simpler metrics that might be used to estimate the performance a qubit assignment without the need for circuit-specific hardware experiments. For instance, the fidelity of random circuits exposed to discrete errors can be well approximated using device diagnostic data [43], and so in addition to computing F LE and F for each qubit assignment, we also consider a heuristic that can be computed by combining the structure of the executed circuit with device benchmark data: where n i counts the number of single qubit gates acting on qubit i, n ij counts the two-qubit gates acting on edge {i, j}, the length-|V p | vector contains weights for the graph vertices, and the length-|E p | vector contains weights for the graph edges. These weights describe error rates for single-qubit gates and two-qubit gates respectively. We omit a term characterizing measurement error from F 0 since we assume that readout error correction will be employed when estimating F and F LE . Assuming the 2-local depolarizing noise model , F 0 is the probability that an error does not occur during the execution of U on noisy hardware with intrinsic polarization error rates defined by and η. This suggests that an experimentalist could perform device diagnostics such as randomized benchmarking (RB) [5,6] or cross-entropy benchmarking (XEB) [7,8] to estimate the gate error rates asˆ andη and then compute F 0 (ˆ ,η) to evaluate the performance of any given qubit assignment using Eq. 9.
This strategy has been used in previous works for qubit assignment [2,44,45] (the approaches of Refs. [46,47] involved finding a shortest path on a graph with weights {η ij }, though the operational meaning of a sum of error rates is not as clear compared to the product formula for F 0 ). However we argue that this strategy must be approached with caution since it neglect information about how device noise affects a specific circuit. Even in the scenario of a circuit exposed to depolarizing noise where an experimentalist is given perfect knowledge of depolarizing error rates, we show in Appendix A that the optimal qubit assignment with respect to F 0 will not necessarily maximize F . Errors in experimental characterization of device performance (e.g. due to drift [48]) will then only compound this limitation. Our experimental results in Sec. III B will demonstrate that F 0 is almost useless as a proxy for determining the relative performance of qubit assignments on actual hardware.
In our experiments we also consider performing qubit assignment with respect to an average Loschmidt survival rate F rand LE computed over a series of random circuits. Random circuits have been used to successfully evaluate the average fidelity of operations in theory and on quantum hardware [6,43,[49][50][51], and therefore F rand LE will be sensitive to coherent errors that may not be captured by F LE . Our results will show that optimizing a qubit assignment with respect to F rand LE will serve as a weak proxy for determining the optimal qubit assignment for a circuit U , unless U is itself a random circuit.

C. Simulated annealing for qubit selection
Simulated Annealing (SA) [15] is an optimization technique motivated by the behavior of physical systems to approach their ground state when cooled sufficiently slowly. We now describe the task of SA on finite sets [52] in the context of qubit assignment. Given a fixed logical circuit C L to execute and the set S = {C p : M (C L ) = C p } of possible hardware circuits to evaluate, accompanied by a cost function C : S → R to be minimized, our goal is to find the optimal subset S * ⊆ S corresponding to the minimum value of C. For hardware experiments we make a number of assumptions about M that guarantee that S will be finite. Namely, we restrict the sequence of gates C p to be identical to the sequence of gates in C L and fix the logical connectivity to be the same as the physical device connectivity, so that M can be understood as a one-to-one map between logical and physical qubits (see Appendix C). For a physical circuit C p implementing a corresponding noisy process E U , we use the Loschmidt Echo performance metric of Sec. II B to define a cost function as where F LE is implicitly related to the set of qubits and entangling edges output by M . By analogy with annealing in physical systems, we define a temperature T (t) at each iteration t of the algorithm that dictates how likely the annealer is to evaluate a worse state in the interest of exploration. Then, given a choice between a current physical circuit C p and a new circuit C p at each step, the algorithm will probabilistically select C p depending on C(C p ) and T : If C(C p ) < C(C p ) then the algorithm always chooses C p . Otherwise if C(C p ) > C(C p ), SA will still allow for exploration of the state space with proba-bility and will therefore be more likely to explore disadvantageous configurations in the early stages of exploration to avoid becoming trapped in a local minumum. The implementation of SA for qubit assignment depends strongly on the set of new circuits that the algorithm can sample from at each step. Given a physical circuit C p , at any given iteration we define the neighborhood of circuits S(C p ) at S as the set of all circuits C p = C p that the algorithm may sample from. The definition of a neighborhood is important for understanding and tuning the convergence properties of finite SA. If the set of neighbors is too large, such as the extreme example S rand (C p ) = S \ {C p } (i.e., S rand contains all circuits in S except C p ) then SA performs a random walk over S, and convergence to S * requires the maximum number of iterations over all possible choices of S(C p ) [53]. Conversely if the neighborhood is too small, then SA may require a prohibitive runtime to reach S * . For each experiment we will implicitly or explicitly define S(C p ) as it relates to the logical circuit connectivity. In Appendix C 1 we provide Algorithm 1 with pseudocode to implement SA for qubit assignment.

D. Logical and physical connectivity
For hardware experiments on a device containing a grid of qubits with nearest neighbor connectivity, we consider logical circuits with nearest neighbor line connectivity, thereby restricting S and S(C p ). Each circuit C L = (g, v L , e L ) to is defined over n logical qubits with edges between nearest neighbors: where v k ∈ V p for k = 1 . . . n and {v , v +1 } ∈ E p for = 1 . . . n−1 subject to the restriction that {v 1 , . . . , v n } compose a length-n simple path on G p . This restriction on C L to satisfy the hardware connectivity graph (V p , E p ) simplifies the implementation of M without sacrificing access to qubit assignments with higher fidelity: for any fixed v L only two of the n! orderings of {v 1 , . . . v n } will respect nearest neighbor connectivity on hardware, while the remainder will require additional overhead to implement SWAP networks that will reduce the fidelity of the program and are therefore ignored here. Given the correspondence between C p and a simple path described by an array v ∈ V n p , we explicitly define the neighborhood around C p as where R is the reversal operation sending [x 1 , . . . , x n ] → [x n , . . . , x 1 ], |A−B| denotes a set difference over elements of A and B, and k ∈ Z is a tunable parameter that enforces that two neighboring assignments differ by no more than k qubits and edges. By construction every state in the state space of length-n simple paths over G p may be reached by performing updates to C p according to S(C p ).
In simulations we also considered the more general case of unconstrained logical circuit connectivity thereby extending the state space to include qubit routing or transpilation, the task of introducing SWAP networks into a circuit so that circuit entangling operations respect the hardware connectivity. An n qubit circuit without connectivity constraints mapped onto N hardware qubits will result in a state space of size at least N !/(N − n)!, greatly complicating the optimization procedure compared to the cases of constrained circuits. The outcome of transiplation varies significantly depending on the initial qubit assignment [54] and may involve choosing (deterministically or stochastically) between SWAP networks of equal gate count without any consideration for the device noise. This hinders the ability of an SA algorithm to iteratively explore the full state space of qubit assignments using an "out-of-the-box" transpiler (see Appendix C) and emphasizes the need for noise-aware transpilation. In Appendix C 1 we provide Algorithm 2 that implicitly defines the neighborhood S(C p ) around C p in the case of unconstrained logical connectivity.

A. Simulations
To test our techniques in a controlled setting we constructed a connectivity graph G p = (V, E) encoding information about the (simulated) fidelities of single and two-qubit gates executed at each vertex and edge in G p . We used the same noise model described in Proposition 1 consisting of a series of symmetric depolarizing channels D d acting on d-dimensional density matrices according to which can be simulated on any subsystem of a state ρ using appropriately defined local Kraus operators. After assigning the weight i to each qubit i ∈ V and η ij to each {i, j} ∈ E, we construct a noise model by inserting a depolarizing channel D 2 ( i ) after each single-qubit operation on i and D 4 (η ij ) after each two-qubit operation acting on {i, j}. Circuit simulations were performed using the qiskit [29] and cirq [30] software libraries.
We first tested the performance of Loschmidt Echoes in simulation by preparing noise graphs with a predetermined optimal qubit assignment. Each noise graph was uniquely determined by its connectivity and the arrays and η.
a. Simulated 5-qubit GHZ circuit exposed to a 2local depolarizing noise model shows a strong monotonic relationship between FLE and F compared to a weaker relationship between F0 and F . Also shown is the extrapolation ) which closely matches F in the low error limit F → 1 well outside of the single-error limit (compare to dashed line y = x, indicating perfect prediction of F (Cp)). b. Computing the difference in the cost function for every pair of qubit assignments in Gp as a function of the number of shared qubits and edges demonstrates that the cost function is local in S. Note that even when all edges and qubits are shared (k = 0) any linear path and its reversal may differ in the cost function. simulated grid. We find good agreement between the single-error extrapolation F ≈ 1 2 F LE F −1 0 + F 0 derived in Appendix A to approximate F in the low-error regime, validating our analysis of the Loschmidt Echo.
We also considered the locality of the SA cost function, which determines how much the cost function can change at each iteration of annealing. Given a current qubit assignment (V, E), a neighborhood size of k for the line connectivity state update rule enforces that any new . The ability of SA to navigate the space of qubit assignments non-randomly depends on the locality of the cost function defined in Eq. 10 which we choose to measure using the quantity Eq. 16 describes the typical change in cost moving between assignments for a given definition of neighborhood S(C p ). Fig. 2b shows that |F LE − F LE | Gp consistently decreases with k, indicating that the cost function C LE is local in the space of qubit assignments.
We then tested the performance of SA for qubit assignment using Loschmidt Echoes. Fig. 3 demonstrates example performance of SA qubit assignment using F LE as a cost function for a simulated noisy device where the logical and physical connectivity and gatesets were equivalent, C L = C p for all assignments. Our technique reliably finds high-performing and optimal assignments in noisy simulations. SA consistently outperforms a comparable baseline which keeps the highest performing assignment from a set of n s random assignments, where n s is the number of unique assignments attempted by the simulated annealer. We also considered SA for allto-all logical connectivity circuits which were then transpiled onto nearest neighbor connectivity as a step in the qubit mapping M . We found SA to consistently outperformed random qubit assignment, demonstrating that SA for qubit assignment is a viable tool for the assignment of unconstrained logical connecitivity circuits (Appendix C 1). However, the optimality guarantees for SA using a transpiler are somewhat weaker due to the mismatch between the size of the set of logical circuits and the set of all possible transpiled circuits, emphasizing the need for the development of noise-aware transpilers that can be integrated into a search over qubit assignments.

B. Hardware Experiments
We performed experiments using the 23-qubit Rainbow and 52-qubit Weber devices accessed via Google Quantum Cloud Services. Both devices are based on 2D grid nearest-neighbor connectivity (see Fig. 8 in Appendix D). Since readout error significantly biases the estimation of F LE we implemented readout error correction for all experiments (see Appendix B).
We first validated the use of the Loschmidt Echo metric F LE introduced in Sec. II B as a tool for determining the relative performance of qubit assignments. To test the performance of F LE as a proxy for maximizing the final state fidelity we estimated both F LE and the fidelity F for a subset of possible assignments on the Rainbow device. We executed three kinds of circuits. The first type Sample simulated noise graph engineered to contain an optimal 5-qubit assignment used to validate SA qubit assignment in simulation. Each vertex and edge is color coded by the weight corresponding to the depolarization probability for channels acting after each one-and two-qubit gate respectively b. Sample score history for a Simulated Annealer optimization run using simulated noise model on the engineered noisemap, with exponential temperature decay Ti = T0α i (α = 0.987, T0 = 0.10) and neighborhood size kmax = 2 for 150 steps. The score 1 − Ci and number of unique states #(i) queried initially change quickly at high temperatures, before stabilizing as T approaches zero (S * is often found and maintained early). The dotted curve for Ti/T0 is plotted on a linear, 0 (bottom of the y-axis) to 1 (top of the y-axis) scale. c. The distribution p(n) describing the number of unique assignments sampled by 500 trials of SA is generally smaller than the total number of iterations. d. We compare SA to assignment based on random sampling: Given the set of scores S, for 10,000 trials of n ∼ p(n) we sample a subset of scores K ⊂ S with |K| = n and compute Y = max(K). The resulting distribution p(Y ) is consistently outperformed by SA. Also shown is P (FLE), the distribution of scores over all qubit assignments.
of circuit prepares a Greenberger-Horne-Zeilinger (GHZ) state |ψ GHZ = 1 √ 2 (|0 ⊗n + |1 ⊗n ) using n = 8 qubits. F was computed using O(n) experiments per qubit assignment using DFE [55] (see Appendix D 2). The second circuit was a SWAP network (SWAPnet): We first prepared an arbitrary single qubit state |ψ SW AP net = α|0 + β|1 and then moved the state to a different position on the hardware using 8 sequential pairs of √ iSWAP gates. The fidelity of the SWAPnet can be evaluated in constant time by performing a single-qubit projective measurement onto |ψ SW AP net ψ SW AP net |. The final circuit we considered was a Clifford conjugation circuit (or mirror circuit in Refs. [16,17]), which has the form where C ∈ Cl(2 n ) is a Clifford circuit over n qubits, P = σ p1 ⊗ · · · ⊗ σ pn is an n-local Pauli string with each p k ∈ {I, X, Y, Z}, and we used n = 8 for this experiment. While computing F for a Clifford circuit using DFE can incur significant overhead in the general case, here the effect of U simplifies to a Pauli operator acting on a computational basis state. Therefore the output of this circuit is uniquely described by a computational basis bitstring that can be efficiently simulated according to the Gottesman-Knill theorem [18] such that F can be determined in constant time.
We estimated each of the metrics F LE , F rand LE , and F 0 on a subset of qubit assignments satisfying linear connectivity on the Rainbow device, for each choice of circuit described above. Fig. 4 demonstrates results for diagnostic runs for the GHZ state (corresponding figures for the other circuits can be found in Appendix D). Each fidelity F was computed using DFE using t = 1.5 × 10 4 repetitions for each choice of circuit. The corresponding values for F LE for each qubit assignment were computed with variance Var( were computed using an average over different random circuits, with F LE being estimated using t repetitions for each. F 0 (ˆ ,η) was computed for each circuit assignment using gate counts n 1 , n 2 for the corresponding circuit and takingˆ to be the average RB error per gate andη to be the average √ iSWAP XEB error per cycle [56,57] (all circuits were implemented using the hardware-native The diagnostic results can be interpreted by computing the conditional probability of the fidelity F in the k th percentile of fidelities given that an observed metric X ∈ {FLE, F rand LE , F0} falls in the k th percentile X k of its own distribution. Shading indicates standard deviation obtained via bootstrapping. For reference, the dashed line describes uniformly random sampling of F which gives P (Y k |X k ) = 1 − k/100. gling gate). All F 0 values were calculated using RB and XEB calibration data estimated in parallel for separate qubits, and we have verified that calculating F 0 using non-parallel calibration data has no significant effect on our analysis.
The goal of using F LE is to determine the qubit assignment corresponding to the highest value of F by proxy, which will only be possible if there is a strong monotonic relationship between F LE and F . We chose to compute Kendall's τ b coefficient [58] which captures the concordance between two random variables X ∼ p X and Y ∼ p Y according to evaluated over all pairs (i, j) with i, j = 1 . . . N . This coefficient has a simple interpretation in terms of joint behavior of changing X and Y . A value τ b = k indicates that when the variable X increases (decreases), then the event that Y also increases (decreases) is a factor of k more likely than the event that Y decreases (increases). τ b = 0 indicates no monotonic relationship between variables. Table I computes τ b with respect to the fidelity F for each diagnostic on each circuit implemented. In addition, we consider a quantity describing conditional performance of a metric as which provides the probability of a signal Y exceeding the k th percentile Y k of p Y when an input X is drawn from p X subject to X > X k , where X k is the k th percentile of p X . Practically, this quantity predicts the likelihood of a qubit assignment having high fidelity relative to other assignments when the assignment is chosen using only knowledge of the distribution of a diagnostic X ∈ {F LE , F rand LE , F 0 }. Both τ b and P (Y k |X k ) are consistently the highest between F LE and F compared to other metrics considered, indicating that the Loschmidt Echo is the most reliable proxy when direct measurement of F is unavailable.
We now present the results of SA for qubit selection on a hardware device. For demonstration purposes we consider the case of offline qubit assignment, in which the value of F LE was determined for every possible assignment and then cached in a lookup table to allow for many SA experiments to be repeated. For n 4, the number of qubit assignments grows super-exponentially and it becomes infeasible to construct a comprehensive lookup table, in which case SA would need to be performed online to optimize the qubit assignment.
To test the performance of Simulated Annealing for qubit assignment on hardware, we prepared a 4-qubit GHZ state restricted to logical line connectivity for each of the 1116 possible length-4 simple paths on the 53qubit Weber grid layout. We evaluated both orderings of qubit assignments over each simple path, with the expectation that each ordering may produce a different fidelity.  [59] are provided in parentheses. Each experiment was run for t = 1.5 × 10 4 repetitions on each of m distinct, randomly drawn assignments consisting of n qubits composing a simple path sampled from the Rainbow device using networkx [60]. For n = 8 and n = 9 there are 2984 and 4972 such paths respectively (including reversed paths), highlighting the need for optimization heuristics. In each case, FLE captures a stronger monotonic relationship to F than the other metrics considered, though F rand LE is an effective method for assessing the performance of other random circuits.
Roughly 27% of assignments were rejected on account of exceed our maximum threshold for 0.15 qubit visibility (see Appendix B), with SA being performed over the remaining dataset of 808 simple paths. Fig. 5 shows the results of SA for qubit selection using F LE . SA optimization runs are divided according to the number of unique states n s for which F LE would need to be computed on hardware during an online run. SA frequently found the qubit assignment corresponding to the optimal F * LE using n s as small as 30, fewer than 5% of the assignments available in the dataset. Since prior knowledge of F 0 computed from diagnostic data could not be relied on to find a high scoring qubit assignment, we chose best-of-n s random sampling as a competitive baseline optimization technique on the dataset. We found that SA consistently outperformed this scheme for all choices of n s .
We also performed SA for qubit assignment on the 23 qubit Rainbow device for an n = 3 Quantum Fourier Transform (QFT) circuit. Each circuit was executed with a specific input basis state |j , j ∈ [0, 2 n −1] so that DFE could be performed using O(1) experiments. Due to the much smaller state space (only 148 possible length-3 simple paths) we found that random assignment was competitive with SA (see Appendix D 1). This highlights the importance of considering circuit size and device layout when considering the qubit assignment problem. Columns are renormalized since the typical number of unique states queried ns concentrates around 23 for this choice of SA hyperparameters. SA optimization frequently finds the best possible value FLE from the state space (contained in the maximium y-bin shown) but the ns necessary to do so varies. Given the poor predictive capabilities of device diagnostic data for GHZ state fidelity, we compare this method to a random sampling scheme that consists of drawing ns circuit assignments without replacement (10000 trials per ns) and keeping the assignment corresponding to the maximum FLE observed. SA consistently outperforms random sampling for the n = 4 GHZ dataset. Averaged over all choices of ns the final FLE achieved by SA outperformed the random sampling scheme by 2.8%.

IV. CONCLUSIONS
Even with the availability of detailed error rates for a quantum device, the task of assigning a logical circuit to a subset of hardware qubits that maximize the fidelity of the prepared state can be extremely difficult. In this work we have demonstrated instances of circuits for which the Loschmidt Echo provides an effective stand-in for the fidelity function in qubit assignment, and have provided theoretical justification for this relationship in the smallerror limit. Conversely, we have shown that even excellent knowledge of device error rates (e.g. computed using Randomized Benchmarking) can be insufficient to predict the performance of a circuit mapped onto hardware. Furthermore, we provided analytical arguments that the advantage of F LE over F 0 for determining relative performance would persist even if F 0 could be estimated to higher precision using scalable techniques (e.g. Refs. [61,62]). We have also showed that Simulated Annealing can be used to find performant qubit assignments using significantly fewer resources than a complete exploration of possible qubit assignments would require.
Our theoretical arguments have relied on a low-error limit which limits this analysis to circuits with either few qubits or low gate depth. However our experimental results indicate that a strong connection between the Loschmidt Echo and the state fidelity function may still hold in a regime of higher errors. Future work may determine that such a relationship exists for a broader class of noise models or outside of the perturbative regime.
In this work we have evaluated the use of a Loschmidt Echo diagnostic on a specific selection of circuits employing only readout error mitigation, and so the optimal qubit assignments determined in this work may not remain optimal when employing platform-specific error mitigation. In particular, control errors which may go undetected by (see Appendix A 2 b) can be mitigated using Floquet Calibration [63], thereby strengthening the relationship between F LE and F . Therefore such devicespecific calibration procedures should be combined with qubit assignment based on F LE to maximize the final assignment performance. Other error mitigation strategies may be found, for example in [64,65].
However, as the number of qubit assignments grows exponentially with respect to the number of available qubits on the device and the number of qubits in the circuit, it becomes essential to minimize the experimental overhead associated with device calibration and computation of F LE for each assignment. Future work will be necessary to determine the appropriate balance of implementing error mitigation so that F LE remains faithful to F while minimizing overhead such that a significant number of qubit assignments can be probed during the optimization procedure.
While we have focused on maximizing the fidelity of a state prepared by a specific unitary U , our method can be readily adapted to analyze more general situations. For instance, a Hilbert-Schmidt test [66] which computes Tr U † V for unitaries U, V could be used to compute Tr(E U † • E U (|0 0|)) as a proxy for the process fidelity [67,68], F (U, E U ) = F(U ψU † , E U (ψ))dψ . However such a modification will introduce significant two-qubit gate overhead, for which our weak-error analysis of the Loschmidt echo technique is less applicable.

A. Data and Code availability
Code and data to reproduce the analyses in this manuscript are available online [69].

V. ACKNOWLEDGEMENTS
We thank the Google Quantum AI team for access to their quantum computing hardware and for all of the support they provided for experiments. We thank Alan Ho, Ryan LaRose, Xiao Mi, and Matthew Harrigan for thoughtful conversations about our experiments.
We thank Khalil Guy and Andy Sun for contributing to an earlier version of this project. This work was partially supported by the DOE/HEP QuantISED program grant "HEP Machine Learning and Optimization Go Quantum," identification number 0000240323 and the DOE/HEP QuantISED program grant "QCCFP-QMLQCF Consortium," identification number DE-SC0019219. Access to the Google QPU was supported under the Fermilab LDRD project FNAL-LDRD-2018-025 "Towards a Quantum Computing Science Center at Fermilab." EP is partially supported through Achim Kempf's Google Faculty Award. This manuscript has been authored by Fermi Research Alliance, LLC under Contract No. DE-AC02-07CH11359 with the U.S. Department of Energy, Office of Science, Office of High Energy Physics.
Appendix A: Optimal qubit assignments with respect to Loschmidt Echo and state fidelity 1. Local depolarizing noise A significant claim of this work is that under some assumptions on error rates and circuit sizes, performing qubit assignment with respect to the Loschmidt Echo metric F LE will provide reasonable outcomes compared to performing qubit assignment with respect to the fidelity F of a state prepared by applying a noisy implementation of U to an initial state |0 . As described in the main text, we use the definitions where E V is a noisy implementation of the unitary V which implicitly depends on the parameters of the circuit C p . We wish to compare the optimal qubit set S * LE that maximizes F LE with respect to a hardware connectivity graph (weighted by error rates) to the qubit set S * that maximizes F : where S describes the complete state space of hardware circuits that realize the operation U in the absence of noise, S = {C p : U (C p ) = U }. For notational convenience we will typically write F = F (C p ), F LE = F LE (C p ). Each metric will be evaluated with respect to a unitary U = m i=1 U i acting on an input state prepared to |0 . We assume without loss of generality that each U i is up to two-local. We will consider a discrete noise model that can be described by insertion of CPTP maps within an existing circuit, with each map being parameterized by error rates that depend only on the choice of qubit or qubit pair involved in the execution of a gate. We emphasize that an assumption about the device noise is necessary for any comparison between F LE and F since there exists a family of simple noise models for which these two quantities are completely independent, as demonstrated in Appendix A 2 b.
We therefore choose a noise model that inserts error channels for each single-and two-qubit gate occuring in a circuit. Fix a device connectivity graph as G p = (E p , V p ) and a circuit C p acting over n qubits chosen from N = |V p | qubits with entangling operations subject to the edges provided in E p . The unitary U implemented by C p can be described by a sequence of single qubit gates acting on individual nodes p and two-qubit gates acting on edges {p, q} ∈ E p . The circuit contains a total of n p gates applied to each node i (where n p = 0 if C p does not apply any single qubit gates to i) and n ij gates applied to each pair of qubits {p, q}.
In the noisy implementation E U each single-qubit gate acting on qubit p ∈ V p is followed by some fixed, CPTP single-qubit map N p also acting on p, and each two-qubit gate on {p, q} is followed by a two-qubit channel M pq acting on {p, q}. For convenience in the derivation, we will define a single-qubit error rate p such that the Kraus operator representation of N p is given as so that p is the probability that N p applies a nontrivial error operation. We define η pq to similarly be the probability that M pq acts with a nontrivial error operation on edge {p, q}. In this sense, p and η pq describe weights in the hardware connectivity graph G p . Fig. 6 illustrates an example construction of this noise model. From N we define another map which contains the (nontrivial) errors that occur when N is applied to some state ρ. Assuming that the operation U k is i-th operation applied to qubit p we can then define the (renormalized) mixture over the nontrivial Kraus operators for the i th channel N p acting on qubit p during execution of U ρ (p,i) := U m • · · · • U k+1 • N sub p • U k • · · · • U 1 (|0 0|) (A9) where we have used the shorthand U(ρ) = U ρU † . The state ρ (pq,i) can be defined analogously as the result of inserting M pq after the i-th operation applied to edge {p, q}, yielding the mixture over the nontrivial Kraus operators for the i th channel M pq acting on edge {p, q}, and |ψ = U |0 . This permits a compact analytical expression for the effect of E U and E U † for this form of noise model. The state prepared by E U (|0 0|) is given by where we have omitted terms corresponding to two or more errors and substituted the expression originally presented in Eq. 9 of the main text. The appearance of F 0 here motivates its interpretation as a zeroerror contribution to the fidelity F . The remainder of this derivation will consider contributions only up to the single error limit (which will generally still involve highorder terms in and η). The fidelity of U can be computed directly as Defining the quantity that describes drawing a nontrivial error for the i th instance of N p acting on the reverse circuit: and defining E (pq,i) U † analogously for M pq , we can similarly compute F LE by applying E U † to Eq. A10 to arrive at FIG. 6. The two-local noise model here appends channels to each gate depending on which vertex or edge the preceding unitary was applied to. Rounded boxes indicate completely-positive trace preserving operators that are not unitary in general (so that the diagram on the right represents maps acting n-qubit density matrices). If we consider N to be a channel that applies a non-identity operator with probability then the state |ψ (i,j) describes the output of the noisy operator in the case where a non-identity operator was drawn from the Kraus representation of N . While a more general model could consider Np and Mpq to be dependent on the gate prior to their insertion point, we do not consider this level of detail in this work.
While there is evidence that F LE contains some of the information about the errors involved in the computation of F , this relationship is still too general to draw any specific comparisons. So to prove Proposition 1 relating F LE and F , we instantiate a noise model for which both local channels N p and M j are symmetric depolarizing channels (Eq. 15) over one and two qubits respectively: This simple model involving local depolarization is sometimes used in the analysis of random circuits [7,49] and will be sufficient to demonstrate interesting behavior in the more general problem of qubit assignment. Each depolarizing operation may be commuted with the preceding unitary gate without affecting the behavior of the global state. Specifically, for the case of single-qubit depolarization where then for each single-qubit gate and channel we have that and a similar relation holds for each two-qubit gate acted on by D 4 . From the freedom in the ordering of each error and each gate occuring in the noisy adjoint circuit E U † it follows that Rescaling each → 3 4 and η → 15η 16 to account for a trivial pauli string in the Kraus representation of D d and substituting Eq. A20 into Eq. A15 yields We therefore arrive at a relationship between F LE and F in the single-error limit assuming local depolarizing noise. The final step to arrive at Proposition 1 is to take the small error rate limit. We recall that F LE , F , and F 0 are implicitly dependent on a hardware circuit C p = (g, v L , e L ) containing complete information about which qubits and pairs of qubits are acted on by any gate in the gate sequence g. We let n to be the length-|V p | vector of gate counts applied to each logical qubit p ∈ V L with (n) p = n p = |{k : 1 ≤ k ≤ m , (v L ) k = p}|, and m to be the length-|E p | vector of gate counts applied to each available edge {p, q} ∈ E p with (m) pq = n pq = |{k : 1 ≤ k ≤ m , (e L ) k = {p, q}}|. We similarly define ( ) p = p and (η) pq = η pq to contain the corresponding single-and two-qubit error rates. For some fixed ordering of and η, the circuit assignment problem (without introducing SWAP overhead) can then be understood as the procedure of ordering the elements of n and m under the constraint that the resulting circuit corresponds to the unitary U and respects the hardware device connectivity.
We now study this optimization by taking the limit that (n p p ) 1 and (n pq η pq ) 1 for all p ∈ V p , {p, q} ∈ E p , and arrive at the perturbative approximation To repeat this procedure for F and F LE , we define The arrays f and g are explicitly dependent on n and m respectively, and are therefore also dependent on the choice of qubit assignment. Noting that from Eq. A8 we then have that |(f ) p | ≤ n p , and similarly |(g) p )| ≤ n pq . This guarantees that terms of the form ( p (f ) p ) 2 vanish at least as quickly as ( p n p ) 2 and therefore permits the perturbative expansion Then under the conditions we imposed, it is clear that under any choice of qubit assignment constraints we have and therefore under any specific constraints that each choice of n, m respects hardware connectivity and realizes the unitary U , F and F LE similarly share a maximum. This demonstrates a specific case of Proposition 1 for the local depolarizing noise model.
A number of remarks are necessary to put this result into context. First, the result of Eq. A30 would not necessarily hold without the assumption of local depolarizing noise, reemphasizing that this result works only for a restricted family of noise models. Notably, this kind of noise model can be enforced on a given hardware circuit using randomized compiling techniques [70], by which the insertion of some redundant single-qubit operations has the effect of converting systematic errors into stochastic depolarizing errors. Furthermore, this result was attained under a restrictive choice of perturbative limit in which the typical number of errors ( n) is very small (though typically a similar assumption is necessary for fault tolerance required by most forms of quantum error correction). We also note that the terms of the form f · may contribute significantly less to the magnitude of F and F LE than n · , depending on how the intermediate states in the implementation of U are affected by the various Pauli operators imposed by D.
However, even under these restrictions we have that an optimization based on F 0 will not necessarily arrive at S * . Even in the case where an experimentalist has perfect knowledge of the depolarizing error rates in a device (computed using RB for example), more detailed knowledge about the effects of the errors (contained in the terms f and g) is necessary to probe for the optimal qubit assignment. This shortcoming can be demonstrated with a simple example where we suppose G N has exactly n−1 noiseless qubits to choose from (with the remainder being acted on by maximally depolarizing channels) and U is the circuit preparing the state In this case, F 0 is flat over the space of qubit assignments that contain the n − 1 noiseless qubits, but F is maximized only when n th logical qubit is assigned to a noisy device qubit since depolarizing any of the qubits in the entangled subsystem of the state will completely eliminate the coherence of the entangled state. The truncated expansions for F , F LE , and F 0 can be validated via numerical simulation. Comparing Eq. A22 to Eq. A15 gives a single-error approximation for F as which allows one to approximate F having knowledge of only F LE and F 0 given this specific noise model. This relation suggests that it may be possible to estimate F via extrapolation techniques if one can reliably compute F 0 , for instance using randomized circuits constructed using the same number of gates acting on each qubit and edge as U . As this extrapolation assumes an underlying noise model consisting of depolarizing noise, an accurate characterization of performance using the relationship of Eq. A32 might require techniques such as randomized compiling [70].

Alternative noise models
We now discuss the optimization behavior of F , F 0 , and F LE in the context of other noise models. Ap-pendix A 2 a will describe a simple noise model for which all three metrics share an optimal qubit assignment, demonstrating that under relaxed conditions knowledge of F 0 is sufficient to optimize a qubit assignment. Appendix A 2 b will describe a noise model for which F and F LE are entirely uncorrelated. Appendix A 3 will extend the results of Appendix A 1 to a family of general Markovian noise models in the weak error limit.

a. Global depolarizing noise
In contrast to Appendix A 1, here we study a noise model for which the optimal qubit assignment according to F 0 , F LE , and F is identical. We consider global depolarizing noise acting on the system with probability proportional to the error rate of the most recently executed gate. While such a model is highly unrealistic for most choices of U , it is well motivated in the case where U is a random circuit drawn uniformly with respect to the Haar measure on U (2 n ). Without loss of generality we may decompose U as U = m i=1 U i where each U i ∈ G p is a member of the hardware gate set for simplicity. Each gate U i is understood to act on the vertices and edges (i, e i ). Then for each U i we apply a depolarizing channel (Eq. 15 with d = 2 n ) acting over all n qubits so that the noisy implementation of U acting on |0 0| is given by where the map U i : ρ → U i ρU † i is adopted for convenience. Each channel D i has a depolarizing probability given by where w V and w E denote the weights for a vertex or edge respectively according to a fixed noise graph G N . We can now directly compute F and F LE . Starting from an initial state |0 0| the result of applying the noisy process U E can be computed recursively as where we have substituted F 0 defined in Eq. 9 with η pq = 0: where n k counts the multiplicity of each qubit or edge occurring in the instruction set (so that k n k = m) and |V | = n. This is a slight generalization of the result derived in [71]. Assuming that each operation U i and U † i are depolarized by the same amount, we apply E U † and then measure the probability of recovering the all zeros bitstring to be whereas the fidelity of the state prepared by E U using Eq. A36 as Then, defining S * = arg max S⊆Vp F 0 and using the monotonicity of x 2 for x > 0 we immediately find which is the desired result. Furthermore, the fact that S * was defined as the qubit assignment maximizing F 0 means that for this noise model, the optimal qubit assignment can be found by maximizing any one of F , F LE , or F 0 . This result is expected since the noise model does not incorporate any information about the structure of U (only the number of gates acting on each qubit at some point in time), and so the assumption of a digital error model holds and F 0 is an effective proxy for the circuit fidelity. This result further highlights that in the case of sufficiently random circuits F 0 may be an effective optimizer over qubit assignments.
In the above analyses we have defined F 0 in terms of depolarizing channel parameters k and η jk , even though though it is not immediately clear that this is the correct parameter to optimize over. In contrast, a depolarizing model can be viewed in terms of a Pauli error e p by defining the Kraus operators as [43] E = e p 2 n − 1 P , = 0 (A41) where σ is a Pauli matrix with ∈ {0, 1, 2, 3} n . Comparing this action of this channel to a global depolarizing channel with probability p (Eq. 15 for d = 2 n ) gives the relationship Therefore computation of F 0 from experimentally measured diagnostic data is subject to ambiguities arising from the interpretation of error rates. The quantity F 0 (ˆ ,η) is equally valid when its arguments represent individual gate infidelities (which are combined in a multiplicative fashion into a total circuit fidelity by F 0 ) or when its arguments represent probabilities for depolarization after each gate. The parameters in these two error models are related by a constant scalar, but it is straightforward to show that the ranking of qubit assignments with respect to F 0 (cˆ , cη) will not necessarily be preserved with respect to a rescaling factor c applied to the error rates. Define the function for any x ∈ R d + , n ∈ Z d + and 0 ≤ c ≤ 1 (the upper bound on c may be loosened as long as cx i ≤ 1 ∀ i). Then letting the vectors and be the per-qubit error rates for assignments S and S respectively, it is easy to show that there exist choices , , c, and c such that f ( , c) > f ( , c) but f ( , c ) < f ( , c ). Then letting c = 1, c = (1 − 4 −n ), one can generally find a distribution of errors such that the optimal assignment differs depending on whether F 0 is computed with respect to the Pauli error rate e p or the depolarization parameter p. This further complicates the problem of optimal assignments using F 0 computed from calibration data, as there is no a priori choice of scaling that best suits the assignment problem with respect to a given unitary U .

b. Gate-dependent unitary noise
In contrast to Appendix A 1, is straightforward to construct a noise model for which S * = S * LE . Consider if E U is strictly unitary, such that E U (ρ) =Ũ ρŨ † and If the (unitary) noise affecting the reverse circuit is its own adjoint, then F LE = 1 for any choice of {W i } while F(U ) can take on arbitrary values in [0, 1]. Such a model might be relevant to some forms of unitary control error and highlights the need for some contextual understanding of the noise processes involved in a specific hardware when using the technique we introduced in this work. For instance in [72] it was observed that similar unitary error occurred for both (iSWAP) 1/2 and (iSWAP) −1/2 which would result in accumulated errors reflected in F being possibly cancelled out during the computation of F LE .

Perturbative limit with Markovian noise
We consider a more general case such that the infidelity can be modeled as a result of weak Markovian noises and time-independent control errors. The time evolution of the qubit density matrix ρ is governed by a Lindblad master equation [73], where H is the qubit Hamiltonian representing the qubit gates and control errors, and the Markovian noise channels are represented by the jump operator A j and the decay rate γ j . To make the discussion more convenient, we introduce the superoperators which map an operator to another operator. Similar to operators, additions, multiplications, Hermitian transpose and inverse are welldefined over these superoperators. In particular, we can define Hρ = −i[H, ρ] and Dρ = j to express the master equation as A quantum program can be modeled as a series of m gates sequentially applied to the QPUs. Each of the gate is represented by a noiseless propagator U j = e −iHj;0∆tj generated by the gate Hamiltonian H j;0 where j = 1, 2, · · · , m and ∆t j is the gate duration time. In the presence of noise, the propagator has to be modified according to the master equation (A49) which has a formal solution ρ(t) = e Ht+Dt ρ(t = 0). The noisy propagator E U of a quantum program U can hence be written as where E Uj is the noisy propagator of the j-th qubit gate. The Hamiltonian H j = H j;0 + H e;Uj consists of the gate Hamiltonian H j;0 and the static control error Hamiltonian H e;Uj associated with the gate U j . In principle, the control error could be different for the gate U j and its conjugate U † j and we have E Uj =e Hj;0∆tj +H e;U j ∆tj +Dj ∆tj , We put a negative sign in front of H j;0 for E U † j since the conjugate gate U † j can be generated by −H j;0 . In the above equations, we also assume that the Markovian noise channels are the same for U j and U † j . To simplify the following discussion, we define The fidelity F and the Loschmidt Echo metric F LE as in the previous discussions are given by Here, ρ T = |ψ ψ| is the target pure state prepared by the noiseless U , ρ 0 = |0 0| is the initial pure state, ρ 1 is the state prepared by the noisy propagator E U , and ρ 2 is the state prepared by the noisy propagator E U followed by the noisy inverse propagator E U † . We can also express these states in terms of the superoperators, e Hj;0∆tj +Hj;e+∆tj +Hj;e−∆tj +Dj ∆tj ρ 0 , (A57) 1 j=m e −Hj;0∆tj +Hj;e+∆tj −Hj;e−∆tj +Dj ∆tj ρ 1 . (A58) We assume that the gate duration times ∆t j are small to separate the noiseless evolution and the error terms such that e Hj;e+∆tj +Hj;e−∆tj +Dj ∆tj e Hj;0∆tj ρ 0 1 j=m e −Hj;0∆tj e Hj;e+∆tj −Hj;e−∆tj +Dj ∆tj ρ 1 This decomposition is valid as long as the gate duration time is much smaller than the characteristic time scale of the errors. We then introduce as an order-counting parameter such that H j;e+ → H j;e+ , H j;ei → H j;e− and D j → D j . Expanding the exponentials in the above equations thus gives us In the perturbative limit, the error terms are small such that we can take the leading terms in the above equation to compute F and F LE . We then have Comparison these two equations, we arrive at the following relation where is the control error dependent on the pulse sequence realizations of U and U † .
If the control error of any gate is the same to that of its inverse implementation, E e− vanishes. We can rearrange equation (A65) for F LE results in the relationship This means that in the weak error limit, F LE is linearly proportional to F . Then in general, the ranking of F LE and the ranking of F are the same for noise models satisfying the following conditions: 1. The environmental noise is Markovian such that the time evolution of the system density matrix can be described by a Lindblad master equation [73]; 2. The control error is the same for any gate U and its inverse U † ; 3. The system is in the weak-error limit; 4. The gate duration time is much smaller than the characteristic time scale of the errors.
In other words, given these conditions, the optimal assignments with respect to F and F LE are the same, i.e.
which completes the proof of Proposition 1. One can show that the depolarizing noise models discussed in Appendix A 1 and Appendix A 2 are special cases of theses conditions with no control errors and the error rates given by γ j ∆t j for the specific depolarizing channels. One can also check that the gate-dependent unitary noise defined in Appendix A 2 b violates the second condition and thus F LE no longer serves as a good approximation of F .

Appendix B: Readout error mitigation
We discuss the effect of readout error on F LE (U ) and techniques for efficient readout error mitigation. In this case let E U → U and E U † → U † , but replace the perfect measurement operation with a noisy one. A noisy computational basis measurement can be represented by the POVM {P j } with j = 0 . . . 2 n − 1 with each element P j given by where the restriction that j p jk = 1 ensures that j P j = I. The choice P j = |j j| recovers a noiseless computational basis measurement. For a fixed j, each p jk in Eq. B1 can be interpreted as the probability of measuring computational basis state |k but then recording that result as |j . Substituting P 0 for |0 0| in Eq. 5 of the main text we find that which shows that this fidelity estimate will only depend on the probability that the bitstring 0 n is measured correctly by the noisy measurement, and the qubit setQ * resulting from optimization of F LE (U ) in the presence of readout error will be biased towards maximizing p 00 . Clearly this does not fully characterize the effect of readout error on a measurement applied to E U (|0 0|), and soQ * will generally differ from a qubit set Q * that minimizes the effect of readout error after preparing, say, E U (|0 0|). In an experimental setting this bias can be removed by applying readout error mitigation which can efficiently recover the probability of the all-zeros bitstring from the output of a quantum device [42].
In exploratory experiments we determined that readout error correction assuming uncorrelated bitflip errors for each qubit closely matched the performance of readout error correction assuming correlated bitflips. We performed readout error correction for all experiments by first determining the conditional error probability p 01 = p(0|1) of observing "0" given a pre-measurement state |1 and p 10 = p(1|0) of observing "1" given a premeasurement state |0 for every qubit. We then ran experiments to empirically estimate the probability p (j) of observing each bitstring j = 0, . . . , 2 n with readout error. Since p (j) represents a joint probability over single bit outcomes, we then estimated the true probability p(j) of measuring bitstring j using the tensor product of a set of linear maps used to correct each pair of single-bit marginal probabilities p (0) and p (1), each having the form The error propagated by a linear equation of the form p = Q −1 p increases with increasing condition number κ(Q) = Q Q −1 , and therefore greater readout error rates p(x|¬x) will lead to a breakdown in the performance of readout error correction. We therefore rejected qubit assignments for which max(p(0|1), p(1|0)) > 0.15, typically fewer than 5% of assignments on Rainbow but ∼ 27% assignments on Weber. We found that the relationship between F and F LE was consistently strengthened by performing readout error correction. This is partly due to the tendency of asymmetric readout error rates (i.e. p(0|1) > p(1|0)) to increase F LE (since the experimentalist infers greater survival likelihood from a larger population of observed 0 n bitstrings) while at the same time decreasing F (in which case the effect of mostly random readout error with high probability increases the distance of the observed bitstring distribution to the bitstring distribution associated with the output of DFE).
We present a more detailed description of the qubit assignment problem. We consider a map acting on a logical circuit that returns a hardware circuit performing the same operation as the logical circuit: and assume that C L ∈ (G m L , V m L , E m L ) is uses exactly n unique qubits taken from V L . We further subject M to the constraint that U (C p )|0 executed in a noiseless setting will prepare the same state as executing the logical circuit. We should allow for the use of additional operations or qubits in the execution of C p that do not affect the unitary U (C L ), and so we let C p ∈ (G m p , V m p , E m p ) use exactly N qubits taken from V p and constrain the map M such that (C2) for some register A consisting of at most (N − n) qubits. We will consider implementing a specific unitary U . To allow for a layer of abstraction between the logical circuit and physical implementations, we fix a map M and consider the sets of permissible logical and physical circuits to be To implement Simulated Annealing we fix the set of possible states of the annealer to be S = C p , and the algorithm proceeds by iteratively evaluating the cost function for elements drawn from S and updating to another element of S accordingly. We now discuss how the definition of C p with respect to a mapping M complicates the implementation of SA.
In our hardware experiments we chose M to be the identity mapping and set V p = V L , E p = E L , and G p = G L . As a result, C p = C L . This effectively separates compilation of logical operations into physical operations from the qubit assignment. If we define an update rule such that every element of C L may be reached in a finite number of update steps from any other element, then every element of S may similarly be reached after a finite number of steps and so the algorithm is guaranteed to converge by the conditions presented in Ref. [52].
However in general, compilation of logical circuits to hardware will modify the contents of a logical circuit C L such that C L = C p . Alternatively, error mitigation strategies such as dynamical decoupling [74,75] or zero noise extrapolation [76,77] may introduce additional operations to C p that do not affect the realized operation U (C p ). Similarly, "wait gates" which implement the identity operation with fidelity ≤ 1 on hardware can be used to probe T 1 and T * 2 . Under these considerations, a reasonable choice of M might allow any of the following mappings: Oftentimes logical circuits using elements of V L with higher degree than those available in V p , sometimes resulting in the need for SWAP networks to realize an operation U (C L ) on hardware, for example: Allowing for mappings of this form complicates the implementation of SA. Compiling operations from G L to G p or simplifying the implementation of C L (e.g. Eq. C6) to reduce the depth or number of operations in a hardware circuit involves a compilation problem that is generally NP-hard [78]. On the other hand, allowing for arbitrary maps of the form shown in Eqs. C5 and C7 will result in a state space with infinite configurations. In practice this means that the set S * of optimal qubit assignments may not be accessible by SA or any optimization over qubit assignments.

Assignment of circuits with unconstrained logical connectivity
Algorithm 1 describes the implementation of SA for qubit assignment given a neighborhood function S : C p → C p and a fixed temperature schedule T : Z → R.
In our simulated experiments involving the assignment of logical circuits with all-to-all logical connectivity to a hardware graph with restricted connectivity, we first assumed V L = V p and used a transpiler [29] to construct an intermediate logical circuit C L respecting (simulated) hardware connectivity. Then we defined the neighborhood S(C L ) around C L for unconstrained logical connectivity (Sec. II D) according to Algorithm 2. Despite not having access to the complete state space of qubit assignments, this update scheme was able to reliably find high scoring qubit assignments in simulation. Fig. 7 demonstrates results of running SA for all-toall connectivity logical circuits transpiled onto a nearestneighbor connectivity simulated hardware device. While Each FLE is computed for a qubit assignment initially placed onto four simulated qubits before being transpiled [29] into a circuit configuration satisfying nearest-neighbor connectivity on a 5 × 5 2D grid. A maximum score F * LE was not identified due to the number of assignment in the state space (the 25 4 4! possible all-to-all connectivity initial assignments on the grid result in an unknown number of possible transpiled circuits satisfying nearest neighbor connectivity). Experiments were performed with a tuned logarithmic cooling schedule (arbitrarily rescaled for presentation), T (i) = T0/(1 + log(1 + i)) with T0 = 0.08. b 100 trials of SA with transpiled all-to-all connectivity significantly outperforms best-of-ns random assignment where ns represents the number of unique states queried by each annealer. this approach was successful over random assignment of similar circuits, we emphasize that transpilation by default represents a one-to-many mapping and so further work is necessary to constrain the map M induced by transpilation such that S * remains accessible by SA. One potential approach is to add a counting argument to M that enumerates over all choices of SWAP networks such that U (C p ) = U , and to incorporate this counter into the SA update step. More generally, the task of noise-aware quantum compilation involving optimization over qubit assignments on noisy hardware with a state space C p in- We provide detail on the circuits implemented and the hardware devices used for hardware execution. We accessed the Weber and Rainbow superconducting qubit devices through Google's Quantum Computing Service. Fig. 8 shows the layout and connectivity of the qubits on each of these devices, colored according to gate fidelity metrics queried from the processor.
Each estimator accumulates error due to finite sample  The SWAP network circuit prepares a state |ψ = α|0 + β|1 that is then swapped into a new register using 4m iSWAP gates for some integer m. The fidelity of this circuit can be computed in constant time by performing the projective measurement with elements {I − |ψ ψ|, |ψ ψ|}. c. A sample Clifford conjugation circuit: An arbitrary, random Clifford circuit will typically require significant resources to certify. However the output for circuits of the form U P U † with U ∈ Cl(2 n ) and P ∈ {I, X, Y, Z} n can be verified in constant time. d. Decompositions of entangling gates using hardware native gateset for both Rainbow and Weber devices. CNOT gates are decomposed into √ iSW AP and the PhasedXZ(a, b, c) gate defined as RXZ = Rz(a)Rx(b)Rz(c) and implemented with depth-1 on hardware.
effects, gate infidelity in the DFE measurement subcircuit, and residual readout error after readout error correction has been applied. Without detailed knowledge of these effects we cannot completely characterize the error in each diagnostic, and so we assume every experimental diagnostic is affected equally by these systematic errors. Figs. 11-12 show the results of diagnostic runs for the (random) Clifford conjugation circuits and SWAPnet circuits respectively. For a specific family of random Clifford circuits we find that F LE and F rand LE capture the behavior of the fidelity F equally well when the target circuit is itself a random circuit. In contrast, the fidelity of a state transported through a SWAP network is only poorly modeled by F LE computed on random circuits. In both cases, attempting to find a qubit assignment that maximizes F using knowledge of F 0 alone is generally no better than random guessing, and in some cases this strategy is worse than random guessing (Fig. 12c).

Quantum Fourier Transform
In addition to the circuits described in the main text, we performed SA for qubit assignment for an n qubit Quantum Fourier Transform (QFT) circuit U QF T using n = 3 nearest-neighbor connectivity qubits. In general, the state resulting from applying U QF T an arbitrary initial superposition state will have support on an exponential number of Pauli strings when represented in the Pauli basis, so that DFE can not be performed with better than exponential cost. To work around this, we chose to implement U QF T applied to a specific computational basis state |j , j ∈ [0, 2 n − 1]. This requires only a constant number of experiments since the result of this computa-tion a separable state: Defining |a p = 2 −1/2 (|0 + exp (−i2πj2 −p ) |1 ) and setting the measurement parameters gives the Pauli decomposition for the state over each local system as |a p a p | = 1 2 (I + M p ) (D7) Then given a state ρ output by running QF T on hardware with input |j we can directly estimate the fidelity of the prepared state as where M = M 1 1 ⊗ M 2 2 ⊗ · · · ⊗ M n n . Therefore a single measurement configuration for the operator M 1 ⊗ M 2 ⊗ · · · ⊗ M n contains all of the information needed to estimate F up to postprocessing.For each qubit assignment we evaluated F LE and F were evaluated for j ∈ [0, 3, 4, 7] for the n = 3 U QF T |j states on all possible qubit assignments on the Rainbow device. As with other experiments, F LE with τ b (F LE , F ) = 0.72 outperforms other metrics considered (τ b ( F rand LE , F ) = 0.45, τ b (F 0 , F ) = 0.15) when used as a proxy for maximizing F over qubit assignments. Fig. 10 shows SA for qubit assignment on the state space of length three simple paths on the 23 qubit Rainbow device (148 possible assignments). The performance of SA over a comparable random sampling scheme was less significant compared to the n = 4 GHZ qubit assignment. This is likely due to the small size of the state space, for which random guessing is much more likely to produce a performant assignment.

GHZ State Direct Fidelity Estimation
Here we provide measurement setting specifications for performing DFE on the GHZ state according to [55]. The In the case of random Clifford conjugation circuits, computing FLE for the specific instance of random circuit does not provide significant advantage over computing F rand LE using a different kind of random circuit shown in b. Notably, both metrics still greatly outperform F0 which was computed using RB and XEB diagnostic error rates. F0 computed using RB error rates ostensibly describes the fidelity of a random circuit since the effect of Clifford twirling is understood to result in local depolarizing errors, though XEB fidelities do not have a similar straightforward interpretation. c P (F > k|FLE > k) and P (F > k| F rand LE > k) do not differ significantly, demonstrating that either metric is equally useful for picking a qubit assignment to optimize F . Comparison of F to FLE for a 9-qubit SWAP network experiment. Similarly to the GHZ experiment, the fidelity of the state transmitted by the SWAP network was strongly correlated with FLE computed on the same qubit assignment. b F rand LE computed using random circuits and F0 computed using calibration data were not strongly correlated with actual device performance. c The conditional success probability P (Y k |X k ) further emphasizes the performance of FLE compared to F0 and F rand LE . state |ψ = 1 √ 2 (|0 ⊗n +|1 ⊗n ) can be expressed as a linear combination of Pauli matrices using the equalities |0 0| = 1 2 (1 + σ z ) (D10) The goal is to measure F = ψ|ρ|ψ for some experimentally prepared ρ. This fidelity may be expressed as the sum of two terms, F ≡ f XY + f Z , where we define Tr ρ(|0 0| ⊗n + |1 1| ⊗n ) (D14) = 1 2 Tr ρ 1 2 n (1 + σ z ) ⊗n + 1 2 Tr ρ 1 2 n (1 − σ z ) ⊗n (D15) Here, Z = σ 1 z ⊗σ 2 z ⊗· · ·⊗σ n z with ∈ {0, 1} n is a Pauli operator, and the prefactor of 1/2 is because ψ contains Therefore to estimate f XY it is sufficient to estimate M k using pre-measurement rotations HU † k for each choice of k = 1, . . . , n. As a consequence we are able to estimate fidelity of the GHZ state |ψ using n + 1 experiments.