Large-scale quantum approximate optimization on non-planar graphs with machine learning noise mitigation

Quantum computers are increasing in size and quality, but are still very noisy. Error mitigation extends the size of the quantum circuits that noisy devices can meaningfully execute. However, state-of-the-art error mitigation methods are hard to implement and the limited qubit connectivity in superconducting qubit devices restricts most applications to the hardware's native topology. Here we show a quantum approximate optimization algorithm (QAOA) on non-planar random regular graphs with up to 40 nodes enabled by a machine learning-based error mitigation. We use a swap network with careful decision-variable-to-qubit mapping and a feed-forward neural network to demonstrate optimization of a depth-two QAOA on up to 40 qubits. We observe a meaningful parameter optimization for the largest graph which requires running quantum circuits with 958 two-qubit gates. Our work emphasizes the need to mitigate samples, and not only expectation values, in quantum approximate optimization. These results are a step towards executing quantum approximate optimization at a scale that is not classically simulable. Reaching such system sizes is key to properly understanding the true potential of heuristic algorithms like QAOA.

In ZNE multiple logically equivalent copies of a circuit are run under different noise amplification factors c.Based on the noisy results, extrapolation to the zeronoise limit produces a biased estimation of the noiseless expectation value.ZNE can be performed with pulses which results in small stretch factors c close to one [11].However, pulse-based ZNE is almost impossible for users of a cloud-based quantum computer to implement due to the onerous calibration.As alternative, digital ZNE folds gates such as CNOTs which produce large stretch factors c = 2m + 1 where m is the number of times a gate is folded [12].If the original circuit is deep compared to the noise then the first fold m = 1 results in noise rendering the extrapolation useless.Partial folding prevents this by folding a sub-set of the gates in a circuit [13].
PEC learns a sparse model of the noise [14].The nonphysical inverse of the noise channel is applied through a quasi-probability distribution to recover an unbiased expectation value.However, the large shot overhead of PEC can be prohibitive.This is why large experiments resort to Probabilistic Error Amplification (PEA), a form of ZNE in which the learnt error channels are amplified [15].PEA avoids pulse calibration but requires an onerous noise learning like PEC.
Pulse-based approaches can also error mitigate variational algorithms enabled by, e.g., open-pulse [16].Scaled cross-resonance gates [17,18] can implement ZNE [19] and reduce the schedule duration without calibrating pulses [20].Other approaches, inspired by optimal control [21], leave it up to the classical optimizer to shape the pulses resulting in shorter schedules [22,23].
Supervised learning benefits a wide range of scientific fields, including quantum physics [24].In particular, it can mitigate hardware noise in quantum computations.Kim et al. [25] adjust the probabilities estimated from measurements of quantum circuits with neural networks.They show an effective reduction in errors with a method that scales exponentially with system size.Czarnik et al. [26] propose a scalable method to error mitigate observables, rather than the full state vector, with linear regression.They efficiently generate training data by computing expectation values of Clifford circuits on noiseless simulators and noisy quantum hardware.Similarly, Strikis et al. [27] present a method that learns noise mitigation from Clifford data.They error mitigate a quantum circuit by simulating multiple versions of it in which non-Clifford gates are replaced with gates that are efficient to simulate classically.These methods successfully mitigate noise on both real quantum hardware and simulations of imperfect quantum computers.
Combinatorial problems are regularly encountered in practical settings such as finance and vehicle routing.The quantum approximate optimization algorithm (QAOA) [28] may help solve such problems by mapping the cost function to a spin Hamiltonian and finding its ground state with a suitable variational Ansatz [29].
Crucially, many problems of practical interest are nonplanar [30], but common superconducting qubit architectures have a grid [31] or heavy-hexagonal [32] coupling map.Recently, QAOA experiments with a connectivity matching the hardware coupling map have been reported for 27 [20] and 127 [33] qubits with up to QAOA depthtwo.By contrast to industry relevant problems, these instances are very sparse.Moreover, classical solvers perform well, especially on sparse problems [34].While brute-force classical simulation methods of quantum circuits can handle up to around 50 qubits [35,36] tensorproduct-based methods are capable of simulating much larger QAOA circuits.For example, Lykov et al. [37] report simulating a single depth-one QAOA amplitude with up to 210 qubits and 1785 gates on a supercomputer.There is therefore a dire need to implement denser and larger problems than those in current demonstrations on hardware.
In this work, we make two contributions.Inspired by Refs.[26,27], we present an error mitigation strategy based on a neural network that uses measurements of noisy observables and compares them to their ideal values.Second, we go one step beyond the hardware-native topology by implementing in hardware random three regular graphs with up to forty nodes.We achieve this by combining swap networks [20,31], and the SAT-based initial mapping of Matsuo et al. [38] which was so far only numerically studied.
This paper is structured as follows.In Sec.II we introduce the QAOA and discuss its implementation on hardware.Sec.III discusses machine-learning assisted quantum error mitigation.In Sec.IV we combine the QAOA implementation advances of Sec.II and the error mitigation approach of Sec.III to train depth-two QAOA circuits on hardware.We discuss our results and conclude in Sec.V.

II. QUANTUM APPROXIMATE OPTIMIZATION ALGORITHM
The QAOA was initially developed to solve the maximum cut (MaxCut) problem [28], but it also applies to any Quadratic Unconstrained Binary Optimization (QUBO) as exemplified by Refs.[39][40][41].MaxCut requires cutting the set of nodes V of a given undirected graph G = (V, E) into two groups to maximize the number of edges in E traversed by the cut.This problem, as many others, is equivalent to finding the ground state of an Ising Hamiltonian for an n-qubit system, where n = |V | is the number of decision variables [29].
A depth-p QAOA for an unweighted MaxCut minimizes the expectation value of the cost function Hamiltonian H C = (i,j)ϵE σ z i σ z j under the variational state The initial product state |+⟩ ⊗n is an equal superposition of all possible solutions.It is also the ground state of the mixer Hamiltonian H B = − n i σ x i [28].The circuit depth, controlled by p, determines the number of applications of the Hamiltonians.A classical optimizer varies the angles β = (β 1 , . . ., β p ) and γ = (γ 1 , . . ., γ p ) to minimize the energy expectation value E(β, γ) = ⟨H C ⟩ in a closed-loop with the quantum computer until the parameters β, γ converge.We denote the optimized parameters by θ ⋆ = (β ⋆ , γ ⋆ ) = arg min β,γ E(β, γ).

A. Implementation on superconducting hardware
In hardware, e −iβiH B is trivially implemented by single-qubit R X rotations applied to all qubits.The costoperator, however, creates a network of R ZZ gates that matches the graph connectivity.Noisy quantum hardware can run graphs with many nodes if their topology matches the connectivity of the qubits [33].However, SWAP gates must be inserted in the circuit when the structure of G does not match the native coupling map between the qubits.This severely limits the number of nodes that can be considered [20,42].
Transpiler passes are responsible for routing quantum circuits, i.e., inserting SWAP gates.Transpilers that do not account for gate commutativity in e −iγ k H C are suboptimal [20].Commutation-aware transpiler passes have thus been developed [43,44].Predetermined networks of SWAP gates quickly transpile blocks of commuting two-qubit circuits and produce low-depth circuits compared to other methods [20,31].However, for problems that are not fully connected, such as MaxCut on random-regular-three (RR3) graphs, predetermined swap networks produce even shallower quantum circuits if the initial mapping from the decision variables to the physical qubits is optimized to minimize the number of swap layers [38].
In this work, we map the quantum circuits to the best line of qubits on the hardware using alternating layers of SWAP gates [20,31].The line of qubits is chosen according to the fidelity of the CNOT gates as reported by the backend, see App. A. Furthermore, since RR3 graphs are sparse we reorder the decision variables of the problem to minimize the number of SWAP layers.This is done by a SAT description of the initial mapping problem [38].Details on the graph generation, transpilation, and SAT mapping are in App. A. We first consider two RR3 graphs with 30 and 40 nodes that can be mapped to the hardware with a total of six and seven swap layers, respectively, once the SAT initial mapping is solved.The resulting circuits are transpiled to the hardware native gate set {X, √ X, R Z (θ), ECR}.Here X and √ X are the Pauli X gate and its square root.R Z (θ) is a rotational Z gate with angle θ and ECR is the echoed cross-resonance gate [45,46].The ECR gate is equivalent to the standard two-qubit entangling CNOT gate up to single-qubit rotations.
RR3 graphs with 30 and 40 nodes result in large and dense quantum circuits.For example, a depth-one QAOA creates circuits with 305 and 479 ECR gates for |V | = 30 and |V | = 40, respectively, see Fig. 1(e) which also shows that these circuits leave little space for error suppression methods such as dynamical decoupling [47].We run the circuits on ibm brisbane and scan the values (γ 1 , β 1 ) from π/2 to π in 25 steps to investigate if there is a signal without error mitigation.We compare the hardware results to an efficient simulation of depth-one QAOA as described in App.F of Ref. [48].The structure of the measured landscape matches the simulations, compare Fig. 1(a) and (c) to (b) and (d), respectively.For the 30 and 40 node graphs the contrast, i.e., maximum less minimum, of the hardware-measured landscape is 43.0% and 33.8% of the contrast of the simulations, respectively, see the color scales in Fig. 1.The location of the hardware and simulation minima are identical in γ and shifted in β by one grid point, i.e., 65 mrad.Crucially, these results indicate that, despite the large gate count, the quantum computer produces a signal that we can further error mitigate to optimize the parameters of QAOA circuits with p > 1.

III. MACHINE LEARNING ASSISTED ERROR MITIGATION
Inspired by Ref. [25][26][27] we mitigate errors in the energy expectation value with supervised machine learning.We explore a machine-learning approach based on a neural network to error mitigate QAOA circuits with p > 1 during the optimization of γ and β.

A. Supervised machine learning
A supervised machine learning model requires input data X = {X i } M i=1 and target data Y = {Y i } M i=1 to learn the relation between X and Y and make predictions on unseen data.Here, M is the data size.We build X from noisy local expectation values and Y from the corresponding exact, noise-free, expectation values.The machine learning model learns the relation from noisy data to the noise-free data.Our proposed method has three steps.First, we generate noisy input data X on a quantum computer.Second, we simulate the quantum circuits classically to obtain noise-free target data Y.Finally, we train a machine learning model to learn the mapping from noisy to noise-free data.The trained model then error mitigates new, i.e., unseen, data.

B. Feed-forward neural network
There is a large number of sophisticated supervised machine learning models.Here, we use a standard fully connected feed-forward neural network (FFNN) [49] due           to its simplicity and ease of use.A FFNN is a series of layers.Each layer has multiple neurons that are fully connected to all the neurons in the subsequent layer.This architecture allows the FFNN to model complex non-linear relationships between the input and output data.We construct our FFNN with an input layer, a single hidden layer, and an output layer.Variational algorithms typically minimize the expectation value of a Hamiltonian built from a linear combination of Pauli expectation values i α i P i with α i a coefficient and P i a Pauli operator.To error mitigate a variational algorithm with a FFNN the output layer must yield quantities that can be optimized.We therefore chose as output layer the correlators that build up the cost function to minimize.The input is a set of noisy observables measured on the quantum computer.The FFNN thus maps noisy observables ⟨P ′ i ⟩ N , measured on hardware, to error mitigated observables ⟨P i ⟩ M .The sub-scripts N and M indicate < l a t e x i t s h a 1 _ b a s e 6 4 = " b c H < l a t e x i t s h a 1 _ b a s e 6 4 = " X + f 2 H 2 w B / y r y D S q 3 K 2 6 N V 6 t J 6 t N + t 9 3 J q z J j P b 6 E 9 Z n 9 8 S o q K k < / l a t e x i t > R / / 8 q H 5 P p 1 x 3 / T 6 X 7 q N n t X u z h q 5 A V 5 S V r E J 2 9 J j 3 w g A z I k j H w h P 8 g v 8 r v y r f K z 6 n 7 J t r R a 2 f U 8 J / d Q P f 0 D f T C z u g = = < / l a t e x i t > E( , noisy and error mitigated observables, respectively. In the following, we apply the general ideas outlined above to QAOA on a graph G = (V, E) with |V | = n nodes.We chose an input layer with n(n + 1)/2 neurons.n of these neurons correspond to n noisy local Pauli-Z observables ⟨σ z i ⟩ N .The other n(n−1)/2 neurons correspond to all possible ⟨σ z i σ z j ⟩ N correlators, where i, j = 1, 2, ..., n.The output layer is made of |E| neurons; one for each correlator ⟨σ z i σ z j ⟩ M corresponding to an edge (i, j) ∈ E. Therefore, a RR3 graph uses a FFNN with 3n/2 output neurons.The number of neurons in the hidden layer is the average of the input and output number of neurons.This construction is illustrated in Fig. 2. A trained FFNN helps us run the QAOA on a quantum computer.Noisy observables are fed into the FFNN for error mitigation.The value of the output neurons is summed to produce an error mitigated estimation of the energy expectation value E(β, γ) M = i,j∈E ⟨σ z i σ z j ⟩ M .This helps optimize γ and β.

C. Efficient training data generation
Training the FFNN requires input data X and target data Y.We generate X and Y by transforming the circuits to error mitigate into classically efficiently simulable circuits.This can be done in multiple ways.According to Ref. [26] it is advantageous to bias the training data towards the state of interest.The QAOA seeks the ground state of H C , typically a classical product state.It applies the unitaries e −iβ k H B and e −iγ k H C to drive the initial equal superposition |+⟩ ⊗n towards the ground state of H C .To generate the training data we could restrict the angles β k and γ k to reduce e −iβ k H B and e −iγ k H C to Clifford circuits.This would however result in a small training data set and may not be possible if the edges in E have non-integer weights.Alternatively, we could randomly replace each R Z rotation in the transpiled circuit of e −iγ k H C by a Clifford gate such as I, S, and Z.However, this alters the graph G by giving the edges in E random weights.This may be undesirable as the struc-ture of the QUBO of interest is changed.
These considerations motivate us to train the FFNN on data obtained by sampling over random product states that have undergone a noise process qualitatively similar to the QAOA without altering G. First, we change the initial state from an equal superposition to a random partition of V by randomly applying X gates to the qubits.This initial state is followed by circuit instructions that generate noise similar to the noise in the QAOA.The cost operator (up to SWAP gates which we omit in the following for simplicity) is (2) Where, CX i,j is a CNOT gate between qubits i and j and R i Z is a rotation around the Z axis of qubit i.By setting γ k = 0 the operator e −iγ k H C reduces to the identity (up to SWAP gates) and the QAOA circuit produces product states that we efficiently simulate classically.To retain the noise characteristics, we replace the R i Z gates with barriers to prevent the transpiler from removing the CNOT gates, see Fig. 3. Since R i Z is implemented by virtual phase changes [50], the duration and magnitude of all pulses played on the hardware are unchanged.This preserves the effect of T 1 , T 2 , cross-talk, and other forms of errors.In detail, we generate training data with a set of M random states that are used to measure the observables for the input data X .Here, each β i is a uniform random variable in [0, 2π] and p j is a Bernoulli random binary variable that applies an X gate on qubit j if successful.We chose a 1/2 probability of success for p j .To compute the target data we use CX 2  k,l = I, the resulting state is a trivial product state for which it is straightforward to efficiently compute the exact expectation values required for the target data Y.
6 N V 6 t J 6 t N + t 9 3 J q z J j P b 6 E 9 Z n 9 8 S o q K k < / l a t e x i t > 6 N V 6 t J 6 t N + t 9 3 J q z J j P b 6 E 9 Z n 9 8 S o q K k < / l a t e x i t >

IV. MACHINE LEARNING ERROR MITIGATED QAOA
We now apply the FFNN error mitigation discussed in Sec.III and the QAOA execution methods discussed in Sec.II to run depth-two QAOA.We first exemplify the error mitigation in a small ten qubit simulation and then turn to larger RR3 graphs with ten, twenty, thirty, and forty nodes executed on hardware.

A. Simulations
We build a noise model with short-lived qubits.Their T 1 and T 2 times are sampled from a Gaussian distribution with 10 µs mean and 10 ns standard deviation.Based on these durations, a thermal relaxation noise channel is applied to the CNOT gates lasting τ CNOT = 300 ns.This is a strong noise model for the 102 CNOT gates in the QAOA circuit as understood, e.g., by e −τCNOT/T1 which gives 97% as proxy for the gate fidelity.The other circuit instructions are noiseless.
We sample 300 random cuts to create the training data following Sec.III C. We train the FFNN with 90% of this data and the other 10% serves as validation data [51].In this example, the FFNN achieves a mean squared error (MSE) of 3.3% on the training data and an R 2 score of 71.8% on the validation data, see Fig. 4(a).The FFNN thus captures 71.8% of the variation in the validation data.Furthermore, we generate an additional test data points.The corresponding non-error mitigated correlators are damped towards an expectation value of zero, see Fig. 4(b).The MSE between these 20 × |E| = 300 non-error-mitigated and ideal ⟨σ z i σ z j ⟩ correlators is 11%.This number drops to 7% after the correlators are error mitigated with the FFNN.Furthermore, we observe that the error-mitigated correlators better follow the trend set by their ideal values, see Fig. 4(c).
In a separate simulation, we increase the strength of the noise by lengthening the CNOT gates.At a duration of 400 ns the FFNN cannot learn an error mitigation since the noise is too strong.We observe that the squared error does not reach low values and the predicted correlators are close to zero (data not shown).
We now optimize a p = 2 QAOA with 300 ns CNOT gates twice; once by optimizing the error mitigated cost function E M = (i,j)∈E ⟨σ z j σ z j ⟩ M , blue data in Fig. 5(a), and once by optimizing the non-error mitigated cost function E N = (i,j)∈E ⟨σ z j σ z j ⟩ N , red data in Fig. 5(a).We use COBYLA with θ = (γ 1 , γ 2 , β 1 , β 2 ) initialized from a Trotterized Quantum Annealing schedule [52].Each circuit is run with 4096 shots.
The error mitigated energy reaches lower values than the non-error mitigated energy, see Fig. 5(a).Both optimizations converge within 40 iterations, see Fig. 5(b).Furthermore, when the error mitigated cost function is optimized, the corresponding non-error mitigated cost function, dotted blue curve in Fig. 5(a), reaches lower values than a direct optimization of the non-error mitigated cost function, red dashed curve in Fig. 5(a).This shows that with the error mitigation on, the optimizer finds better values of θ.To further illustrate this, we compute the energy distribution of sampled bitstrings.We compare the distribution of the cost function of each sampled bitstring of the initial and last values of θ labeled θ 0 and θ ⋆ respectively.The sampling is done with the noisy simulator, Fig. 6(a) and (c), and a noiseless simulator, Fig. 6(b) and (d).Sampling from a noisy |θ ⋆ ⟩ produces a distribution that is near identical to the one obtained by sampling from a noisy |θ 0 ⟩, see Fig. 6(a) and (c).However, sampling from a noiseless |θ ⋆ ⟩ produces a better distribution than sampling from a noiseless |θ 0 ⟩, see Fig. 6(a) and (b).This suggests that the error mitigation helps find better values of θ despite the fact that we cannot see this by sampling bitstrings from noisy QAOA states.
Finally, we repeat these simulations 20 times.In each simulation we train a FFNN and optimize θ.This produces different optimization results due to the randomness of the noise.The optimization is carried out twice, once on E M and once on E N .After the optimization we sample 4096 bitstrings from a noiseless simulation of the QAOA circuit with the optimized parameters θ ⋆ .We compute the energy distribution of these bitstrings and report the expectation value.This expectation value is −3.78 ± 2.12 and −2.63 ± 2.01 when E M and E N is optimized, respectively.These results indicate that error mitigation tends to help the classical optimizer find better QAOA parameter values.However, we observe in seven out of twenty simulations that an optimization of the noisy QAOA cost function E N produces better parameters, as measured by a noiseless sampling of bitstrings, than an optimization of the error mitigated cost function E M .In all simulations, except one, the error mitigated cost function E M has a lower energy than the non-error correspond to the optimization with the error mitigated cost function (solid lines in Fig. 5).(c) and (d) correspond to the optimization with the non-error mitigated cost function (dashed lines in Fig. 5).
mitigated cost function E N .

B. Hardware
We now optimize the parameters γ 1 , γ 2 , β 1 , and β 2 with COBYLA of a depth-two QAOA for RR3 graphs on superconducting qubit hardware.As cost function we minimize the energy E M = (i,j)∈E ⟨σ z i σ z j ⟩ M computed with error mitigated correlators produced by FFNNs.Before each run, the FFNN is trained, as described in Sec.III, with 3000 training points evaluated with 1024 shots each.
For RR3 graphs with 30 nodes the quantum circuits have a total of 610 ECR gates, 1297 X and √ X gates, and 1577 R Z gates, see Fig. 7. Graphs with 40 nodes have a total of 958 ECR gates.While these circuits are extremely deep and wide they still contain a significant signal.When running the QAOA optimization on hardware we observe a minimization of E M for all graphs, see dark purple curves in Fig. 8(a), (d), (g), and (j).The non-error mitigate cost function E N (light purple curves) also decreases.In all cases the optimization of γ i and β i converges in about 20 to 40 iterations of COBYLA, see Fig. 8(b), (e), (h), and (k).We compare the distribution of the sampled bitstrings obtained from QAOA circuits with the initial points θ 0 to the distribution obtained with the optimized θ ⋆ .We see an improvement in the distribution, i.e., a bias towards lower values, for all RR3 graphs, compare the dark blue and light teal curves in Fig. 8(c), (f), (i) and (l).This is consistent with the interpretation that there is a meaningful signal in the corresponding circuits.We report the mean µ of each distribution (vertical lines in Fig. 8) as an approximation ratio α The optimized parameters produce an α(µ) of 71.6%, 64.0%, 59.6%, and 58.3% for the 10, 20, 30, and 40 node graphs.
We optimize the parameters θ with ECR-based circuits since all parameters are in virtual R Z gates.This preserves the amplitude and duration of all pulses in the schedule thus facilitating noise mitigation.Pulse-efficient transpilation moves the parameters from the R Z gates into the cross-resonance pulses [17,18].This shortens the pulse schedule but changes its noise properties.For example, a pulse-efficient transpilation of an R ZZ (θ) -SWAP pair, as shown in Fig. 3(a), reduces their duration by up to 20%, depending on θ.The shorter schedules produce better bitstrings than the fixed-duration schedules with parameters in R Z gates.We run the pulse-efficient circuits for the last points θ ⋆ .This results in an improved α(µ) of 76.0%, 65.5%, 60.8%, and 58.6% over the same circuit without pulse-efficient transpilation for the 10, 20, 30, and 40 node graphs, respectively, compare the dash-dotted red line to the solid teal line in Fig. 8(c), (f), (i) and (l).
To distinguish the impact of hardware noise on the bitstring distribution from limitations of the depth-two QAOA Ansatz, we compute the noiseless expectation value of the cost Hamiltonian H C = (i,j)∈E σ z i σ z j .We evaluate H C at the last point θ ⋆ obtained from the noisy hardware optimization.This computation is made fast, even for a 40 node graph, with quantum circuits based on the light-cone of each correlator σ z i σ z j .This method is detailed in App.B. The noiseless expectation value is indicated as a dotted line in Fig. 8(c), (f), (i) and (l).The corresponding approximation ratios α(µ) are 82.8%,74.2%, 72.6%, and 72.4% for the 10, 20, 30, and 40 node graphs, respectively.These values show the potential improvement in the bitstring value distribution if hardware noise could be reduced.

V. DISCUSSION AND CONCLUSION
Many machine learning tools can error mitigate an expectation value.The first contribution of this work is a user firendly FFNN-based error mitigation strategywith a problem-inspired methodology to generate the training data.Here, the FFNN is trained once before the variational optimization.It tries to match training data acquired at randomly sampled values of β for which all values of γ are set to zero with R Z gates replaced by barriers.This preserves the circuit structure and makes it easy to simulate classically.We observe that the FFNN performs better on validation data than a linear regression as the circuit size is increased, see App. C. The data from the optimization with error mitigation show that the FFNN reduces the effect of noise on the cost function at unseen values of β and γ.Other data generation approaches are possible and could be investigated in future work which may also explore other machine learning tools such as random forests as done in Ref. [54].
Our second contribution is to implement non-planar RR3 graphs on hardware by leveraging the SAT mapping of Matsuo et al. [38] and swap networks [20,31].We observe a meaningful signal for a depth-two QAOA with up to 40 nodes.The swap networks with 2, 4, 6, and 7 layers that we implement on 10, 20, 30, and 40 qubits can generate graph densities of up to 40%, 30%, 27%, and 22%, respectively.The corresponding circuits have impressive gate counts.We attribute hardware improvements of the Eagle quantum processors, shown in Fig. 9, to this success.For example, the T 1 times of ibm brisbane are more than twice as large as those of ibmq mumbai, see Fig. 9(a), which was used in the 27 qubit experiment in Ref. [20].The cumulative distribution function of the gate error of ibm brisbane and ibmq mumbai is approximatively the same, see Fig. 9(c).However, ibm brisbane has 127 qubits while ibmq mumbai only has 27 qubits.this allows us to run the 40 qubit RR3 graph on the best line of qubits.For example, the product of the fidelity of the 39 gates on the best line with 40 qubits on ibm brisbane is 76.5%, see App. A. By contrast, the product of all 28 two-qubit gate fidelities on ibmq mumbai only reaches 72.8% despite there being fewer gates.
Circuits with 40 nodes can be simulated classically.In particular, computing ⟨H C ⟩ of a depth-p QAOA for a RR3 graph produces an effective light cone with at most p k=0 2 k+1 qubits in the circuits, i.e., 14 in our p = 2 case.This allows us to confirm that optimizing the error mitigated correlators produces good variational parameters.As quantum computers increase in size and quality, such a classical verification will no longer be possible.Crucially, our hardware demonstration is much larger than many current experiments which typically employ up to 20 qubits [55].Furthermore, reproducing even depth-one QAOA samples is classically intractable [56].Our work is thus a step towards implementing QAOA on hardware that cannot be classically simulated.Future work must focus on implementing deeper circuits on hardware with more connectivity.Our work also serves as a benchmark to track quantum hardware progress, as done, e.g., with complete graphs [57].We anticipate that hardware improvements, e.g., increasing T 1 times, and novel architectures, based on, e.g., tunable couplers [58,59], will enable larger simulations.
The depth-one QAOA results, shown in Sec.II, exhibit the parameter concentration already observed in the literature [60][61][62][63][64].This may be important to quickly generate good yet sub-optimal solutions to combinatorial optimization problems without having to optimize the variational parameters for each problem instance [20].
The objective of QAOA is to sample good or even optimal solutions from a quantum state |θ ⋆ ⟩ that minimizes the energy H C , i.e., to find x opt = argmin x H C (x).The state is obtained by minimizing the expectation value of H C .The error mitigation method we present helps find good parameters γ and β.However, quantum approximate optimization needs tools that error mitigate samples.Noiseless simulations of ⟨H C ⟩ computed with the optimal parameters show that a significant gain is obtainable if samples could be error mitigated.Recently, Barron et al. show that a noise dependent sampling overhead produces good solution samples [65].This makes sense in the context of optimization as long as the total samples drawn is at least less than the 2 n cost of a bruteforce search and ideally less than the sampling complexity of the best classical benchmark.Hardware improvements in, e.g., the fidelity of a layer of gates [66], and noise-aware transpilers [67,68], will reduce this sampling overhead.Furthermore, proper QAOA benchmarks must compare to state-of-the-art solvers [34], such as Gurobi and CPLEX, and randomized rounding algorithms [69].For example, for RR3 graphs there is an approximation algorithm that achieves an approximation ratio of 0.9326 [70].Such benchmarking is an important task in itself which our methods enable on hardware.
For variational algorithms, like the variational eigensolver applied in a chemistry setting [71,72], error mitigating expectation values are often sufficient, e.g., to compute the energy spectrum of molecules [73,74].The FFNN-based error mitigation method we present is directly transferable to such settings which increases its applicability.Furthermore, the transpiler methodology we leverage works with any non-hardware-native block of commuting two-qubit gates.It thus applies to circuits other than QAOA such as graph states [75], and algorithms that implement e −iH C t including Ising simulations [19].ACKNOWLEDGMENTS IBM, the IBM logo, and ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide.Other product and service names might be trademarks of IBM or other companies.The current list of IBM trademarks is available at https://www.ibm.com/legal/copytrade. S.H.S. acknowledges support from the IBM Ph.D. fellowship 2022 in quantum computing.The authors also thank M. Serbyn, R. Kueng, R. A. Medina, and S. Woerner for fruitful discussions.The code for these results is available at https://github.com/eggerdj/large_scale_qaoa.

Appendix A: RR3 graph transpilation
Random regular graphs are sparse.When their corresponding QAOA circuit is transpiled to the hardware with predetermined swap layers certain edges may require a large number of swap layers.By wisely choosing the initial mapping between decision variables and physical qubits we reduce the number of swap layers needed.A SAT based approach to this "initial mapping" problem is proposed by Matsuo et al. [38].Here, the initial mapping problem is formulated as a SAT problem that is satisfiable if e −iγH C can be routed to hardware with ℓ swap layers.A binary search over ℓ finds the initial mapping that minimizes the number of layers.We label the minimum number of swap layers by ℓ * .

Graph generation
We generate 100 RR3 graphs with n nodes for each n ∈ {10, 20, 30, 40}.Each graph is mapped to a line of n qubits with the SAT approach.The distribution of the number of swap layers at the different sizes n is shown in Tab.I.With the SAT mapping, there are graph instances with 10, 20, 30, and 40 nodes that can be implemented with 2, 4, 6, and 7 swap layers, respectively.This is a large reduction compared to a trivial mapping which typically requires n−2 swap layers [38].The experiments in the main text are done on graphs that require the smallest number of swap layers.At each graph size with n nodes, 100 graph instances were generated.For instance, out of 100 random instances of RR3 graphs with 40 nodes three could be mapped to a line of qubits with 7 swap layers.

Qubit selection
ibm brisbane has 127 qubits, i.e., 87 more than the largest graph we study.We, therefore, select the best line of qubits to execute the quantum circuits on.For each pair of nodes i and j in the backend's coupling map, we enumerate all paths of length |V | connecting them.Next, we compute the path fidelity for each path p k as (i,j)∈p k (1 − E ECR,i,j ) and select the best one.Here, E ECR,i,j is the error of the ECR gate between qubits i and j.On ibm brisbane there are 1336, 15814, 125918, and 754462 lines of 10, 20, 30, and 40 qubits, respectively.The best measured respective path fidelities are 95.9%,89.5%, 82.8%, and 76.5%.For low-depth QAOA and sparse graphs, such as RR3, we can efficiently compute the expectation value ⟨H C ⟩ = (i,j)∈E ⟨σ z i σ z j ⟩ by considering the light-cone of each correlator σ z i σ z j .Indeed, for depth-one QAOA each correlator σ z i σ z j is only impacted by the gates applied to nodes in the direct neighborhood of i and j, i.e., distance one nodes, see Fig. 11.For depth-two QAOA, we must consider all nodes that are at most at a distance of two away from i and j in E. Therefore, to compute ⟨H C ⟩ for depth-two QAOA we create |E| circuits each with at most 14 nodes.In the circuit corresponding to ⟨σ z i σ z j ⟩ we only measure the qubits that map to nodes i and j, see Fig. 11(b).

Appendix C: Model comparison
Here, we compare the FFNN to a linear regression.The comparison is done on the hardware measured data presented in Fig. 8 of the main text.We resample the 3000 training data points ten times to generate ten training data sets made of 80% of the data, i.e., 2400 sets of |V |(|V | + 1)/2 expectation values, and 10 validation sets made of the remaining 20% of the data.We train both a FFNN and a linear model on the ten data sets with 80% of the data.Here, we employ the MLPRegressor and the LinearRegression from sklearn.Next, we compute the mean squared error between the 3|V |/2 predicted ZZ correlators and their ideal value.This results in a distribution of MSEs accross the 600 validation data points for each of the 10 validation data sets.We observe an increase in the MSE as the graph size increases.Further- more, the MSE of the linear model increases faster than the MSE of the FFNN as the graph size increases, as exemplified by comparing Figs.12(a) and (b) for one of the ten validation sets.Next, we compute the avarage over all MSEs at each graph size and the associated standard deviation of this mean, see Fig. 12(c) and their difference in (d).As the number of nodes in the graph increase the error of the FFNN becomes significantly lower than the error of the linear model.This test suggests that for the particular circuits that we employed the FFNN performs better than the linear model as the size of the quantum circuit increases.This may be due to effects such as cross-talk and unitary gate errors, e.g., over-and under-rotations, that are not captured by a linear model like the depolarizing channel [26].

Appendix D: Additional hardware runs
Here, we present depth-two QAOA data acquired on ibm nazca and ibm kyiv in addition to the data acquired on ibm brisbane.The data acquired on ibm nazca was gathered under the same settings as the data on ibm brisbane.The data acquired on ibm kyiv is produced with a smaller number of training circuits, i.e., 300 instead of 3000, and the hidden layer of the FFNN had 100 nodes for each graph size.By contrast, the FFNN trained for ibm brisbane and ibm nazca had a number of hidden neurons equal to the average of the input and output number of neurons.

Figure 1 .
Figure 1.(a) and (b) are the depth-one energy landscapes, measured on ibm brisbane and an ideal simulator, respectively, of a 30 node RR3 graph.The circuit (not shown) has 305 ECR gates, 639 X and √ X gates and 804 virtual RZ gates.(c) and (d) are the depth-one energy landscapes, measured on ibm brisbane and an ideal simulator, respectively, of a 40 node RR3 graph.The white stars indicate a minima of the noiseless simulations.They reveal a small shift in the corresponding minimum of the hardware-measured data.(e) The quantum circuit of the depth-one QAOA of a RR3 40 node graph transpiled to a line of qubits with seven layers of SWAP gates and a SAT-based initial mapping.The circuit has 479 ECR gates (dark blue), 1021 X and √ X gates (blue), and 1275 virtual RZ gates (light green).

p
< l a t e x i t s h a 1 _ b a s e 6 4 = " R I H n J 7 j B 9 O + d X 8 D x a f 6 5 e b I e F 0 c = " 5 8 e Z l 0 L 2 r 2 Z a 1 + W 6 8 2 b o o 4 y u A I n I A z Y I M r 0 A A t 0 A Y d g M E j e A a v 4 M 1 4 M l 6 M d + N j 3 l o y i p l D 8 A f G 5 w + 4 b 5 g T < / l a t e x i t > e i 1 HC < l a t e x i t s h a 1 _ b a s e 6 4 = " O z P 7 y c N b n o h 5 8 e Z l 0 L 2 r 2 Z a 1 + W 6 8 2 b o o 4 y u A I n I A z Y I M r 0 A A t 0 A Y d g M E j e A a v 4 M 1 4 M l 6 M d + N j 3 l o y i p l D 8 A f G 5 w 8 Z N Z h S < / l a t e x i t > e i p HC < l a t e x i t s h a 1 _ b a s e 6 4 = " 8 Q 9 U o V j C O e k e 3 V 7 Q m Y C j y I e Q + J Y = " > A A A C B H i c b V D L S s N A F J 3 U V 6 2 v q M t u B o v g x p J I U Z e l b r q s Y B / Q x D C Z 3 r Z D J w 9 m J k I J X b j x V 9 y 4 U M S t H + H O v 3 H S Z q G t B y 4 c z r m X e + / x Y 8 6 k s q x v o 7 C 2 v r G 5 V d w u 7 e z u 7 R + Y h 0 c d G S W C Q p t G P B I 9 n 0 j g L I S 2 Y o p D L x Z A A p 9 D 1 5 / c Z H 7 3 A 7 z C m 6 O c F + f d + Z i 3 5 p x s 5 h D + w P n 8 A b B g k F k = < / l a t e x i t > t e x i t s h a 1 _ b a s e 6 4 = " p L b O 5 l R H D J c E B s K 0 v e r 7 X e T s Y e 0 = " > A A A C H H i c b Z D L S s N A F I Y n 9 V b r L e r S z W A R X J V E i 7 o s u n E l F e w F m h p O p t N 2 d D I J t e x i t s h a 1 _ b a s e 6 4 = " a 7 s J d h w Q Y u w x n W q y g y P e p D H Z N a o = " > A A A C H H i c b Z D L S s N A F I Y n 9 V b r L e r S z W A R X J V E i 7 o s u n E j V L A X a G o 4 m U 7 b 0 c k k z E y E G v o g b n w V N y 4 U c e N C 8 G 2 c t B W 0 9 c D A x / + f w 5 z z B z F n S j v O l 5 W b m 1 9Y X M o v F 1 Z W 1 9 Y 3 7 M 2 t u o o S S W i N R D y S z Q A U 5 U z Q m m a a 0 2 Y s K Y Q B p 4 3 g 9 i z z G 3 d U K h a J K z 2 I a T u E n m B d R k A b y b c P P Q 6 i x y n 2 F O u F 4 L P r + x + 8 y V C O X D / 1 Q t B 9 A j y 9 G A 5 9 u + i U n F H h W X A n U E S T q v r 2 h 9 e J S B J S o Q k H p V q u E + t 2 C l I z w u m w 4 C W K x k B u o U d b B g W E V L X T 0 X F D v G e U D u 5 G 0 j y h 8 U j 9 P Z F C q N Q g D E x n t q O a 9 j L x P 6 + V 6 O 5 J O 2 U i T j Q V Z P x R N + F Y R z h L C n e Y p E T z g Q E g k p l d M e m D B K J N n g U T g j t 9 8 i z U D 0 r u U a l 8 W S 5 W T i d x 5 N E O 2 k X 7 y E X H q I L O U R X V E E E P 6 A m 9 o F f r 0 X q 2 3 q z 3 c W v O m s x s o z 9 l f X 4 D E R y i o w = = < / l a t e x i t > t e x i t s h a 1 _ b a s e 6 4 = " m Q E w + s 5 B X R 6 h q R J M m I D l 7 Q j + I l E = " > A A A C E X i c b V B N S 8 N A E N 3 U r 1 q / o h 6 9 L B a h p 5 K I q M e i F 0 9 S w X 5 A E 8 N k u 2 2 X b j Z h d y P U 0 L / g x b / i x Y M i X r 1 5 8 9 + 4 T X v Q 1 g c D j / d m m J k X J p w p 7 T j f V m F p e W V 1 r b h e 2 t j c 2 t 6 x d / e a K k 4 l o Q 0 S 8 1 i 2 Q 1 C U M 0 E b m m l O 2 4 m k E I W c t s L h 5 c R v 3 V O p W C x u 9 S i h f g R 9 w X q M g D Z S Y F c 8 D q L P K f Y U 6 0 c Q s Ls H 7 M l c C j I v A j 0 g w L P r 8 T i w y 0 7 V y Y E X i T s j Z T R D P b C / v G 5 M 0 o g K T T g o 1 X G d R P s Z S M 0 I p + O S l y q a A B l C n 3 Y M F R B R 5 W f 5 R 2 N 8 Z J Q u 7 s X S l N A 4 V 3 9 P Z B A p N Y p C 0 z m 5 U c 1 7 E / E / r 5 P q 3 r m f M Z G k m g o y X d R L O d Y x n s S D u 0 x S o v n I E C C S m V s x G Y A E o k 2 I J R O C O / / y I m k e V 9 3 T 6 s n N S b l 2 M Y u j i A 7 Q I a o g F 5 2 h G r p C d d R A B D 2 i Z / S K 3 q w n 6 8 V 6 t z 6 m r Q V r N r O P / s D 6 / A F l Y Z 4 I < / l a t e x i t > h z i i N and < l a t e x i t s h a 1 _ b a s e 6 4 = " U 9 J d H W 3 s X 7 j 4 1 p P A o V e 2 + F C G o 1 M = " > A A A C U n i c b V L R i h M x F E 2 r 7 t b u u t b 1 0 Z d g E V p Y y o w U 3 R e h r B R 8 E S r Y b a H T H e 6 k t 9 O 0 S W Z I M g v d Y b 5 R E F / 8 E F 9 8 U D N t B b f 1 Q O B w z r 3 J z U m i V H B j P e 9 7 p f r g 4 a O j 4 9 r j + s n p k 7 O n j W f n 1 y b J N M M h S 0 S i x x E Y F F z h 0 H I r c J x q B B k J H E W r 9 6 U / u k V t e K I + 2 3 W K U w m x 4 n P O w D o p b P B + K 4 j Q w g U N Y p A S 2 m E e S L A L B i L / W B T 0 H Q 1 M J s O 8 x S / o s k 0 D r m i f F j Q Q o G K B z u S x h J D f 3 P 2 l y 5 L q j X t / q 7 D R 9 D r e B v S Q + D v S J D s M w s b X Y J a w T K K y T I A x E 9 9 L 7 T Q H b T k T W N S D z G A K b A U x T h x V I N F M 8 0 0 k B X 3 l l B m d J 9 o t Z e l G / b c j B 2 n M W k a u s p z R 7 H u l + D 9 v k t n 5 5 T T n K s 0 s K r Y 9 a J 4 J a h N a 5 k t n X C O z Y u 0

Figure 2 .
Figure 2. Schematic of the error mitigation with a trained FFNN.The QAOA circuits of graph G = (V, E) are run and the noisy expectation values ⟨σ zi ⟩N ∀ i ∈ V and ⟨σ z i σ z j ⟩N ∀ i, j ∈ V are computed from the sampled counts.These expectation values are the input to a FFNN which outputs the noise mitigated correlators ⟨σ z i σ z j ⟩M ∀ (i, j) ∈ E. The noise mitigated cost function to optimize is (i,j)∈E ⟨σ z i σ z j ⟩M.
Measure all < l a t e x i t s h a 1 _ b a s e 6 4 = " p L b O 5 l R H D J c E B s K 0 v e r 7 X e T s Y e 0 = " > A A A C H H i c b Z D L S s N A F I Y n 9 V b r L e r S z W A R X J V E i 7 o s u n E l F e w F m h p O p t N 2 d D I J r O P / s D / A F l Y Z I < / l a t e x i t > h z i i N and < l a t e x i t s h a 1 _ b a s e 6 4 = " 6 N Y J I W i L G h U 3 A Q f u D e Y a Y 2 w C K + k = " > A A A B 8 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 1 G P R i 8 c K 9 k P a U D b b S b t 0 s w m 7 G 6 H E / g o v H h T x 6 s / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S A T X x n W / n c L K 6 t r 6 R n G z t L W 9 s 7 t X 3 j 9 o 6 j hV D B s s F r F q B 1 S j 4 B I b h h u B 7 U Q h j Q K B r W B 0 M / V b j 6 g 0 j + W 9 G S f o R 3 Q g e c g Z N V Z 6 e H K 7 i s q B w F 6 5 4 l b d G c g y 8 X J S g R z 1 X v m r 2 4 9 Z G q E 0 T F C t O 5 6 b G D + j y n A m c F L q p h o T y k Z 0 g B 1 L J Y 1 Q + 9 n s 4 A k 5 s U q f h L G y J Q 2 Z q b 8 n M h p p P Y 4 C 2 x l R M 9 S L 3 l T 8 z + u k J r z y M y 6 T 1 K B k 8 0 V h K o i J y f R 7 0 u c K m R F j S y h T 3 N 5 K 2 J A q y o z N q G R D 8 B Z f X i b N s 6 p 3 U T 2 / O 6 / U r v M 4 i n A E x 3 A K H l x C D W 6 h D g 1 g E M E z v M K b o5 w X 5 9 3 5 m L c W n H z m E P 7 A + f w B u B e Q X g = = < / l a t e x i t > |0i < l a t e x i t s h a 1 _ b a s e 6 4 = " 6 N Y J IW i L G h U 3 A Q f u D e Y a Y 2 w C K + k = " > A A A B 8 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 1 G P R i 8 c K 9 k P a U D b b S b t 0 s w m 7 G 6 H E / g o v H h T x 6 s / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S A T X x n W / n c L K 6 t r 6 R n G z t L W 9 s 7 t X 3 j 9 o 6 j h V D B s s F r F q B 1 S j 4 B I b h h u B 7 U Q h j Q K B r W B 0 M / V b j 6 g 0 j + W 9 G S f o R 3 Q g e c g Z N V Z 6 e H K 7 i s q B w F 6 5 4 l b d G c g y 8 X J S g R z 1 X v m r 2 4 9 Z G q E 0 T F C t O 5 6 b G D + j y n A m c F L q p h o T y k Z 0 g B 1 L J Y 1 Q + 9 n s 4 A k 5 s U q f h L G y J Q 2 Z q b 8 n M h p p P Y 4 C 2 x l R M 9 S L 3 l T 8 z + u k J r z y M y 6 T 1 K B k 8 0 V h K o i J y f R 7 0 u c K m R F j S y h T 3 N 5 K 2 J A q y o z N q G R D 8 B Z f X i b N s 6 p 3 U T 2 / O 6 / U r v M 4 i n A E x 3 A K H l x C D W 6 h D g 1 g E M E z v M K b o5 w X 5 9 3 5 m L c W n H z m E P 7 A + f w B u B e Q X g = = < / l a t e x i t > |0i < l a t e x i t s h a 1 _ b a s e 6 4 = " L 7 N 7 U Z S S / w K Q C 2 D Q o J D T D f g L m 4 A = " > A A A B 6 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k q M e i F 4 8 t 2 F p o Q 9 l s J + 3 a z S b s b o Q S + g u 8 e F D E q z / J m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I B F c G 9 f 9 d g p r 6 x u b W 8 X t 0 s 7 u 3 v 5 B + f C o r e N U M W y x W M S q E 1 C N g k t s G W 4 E d h K F N A o E P g T j 2 5 n / 8 I R K 8 1 j e m 0 m C f k S H k o e c U W O l Z q d f r r h V d w 6 y S r y c V C B H o 1 / + 6 g 1 i l k Y o D R N U 6 6 7 n J s b P q DK c C Z y W e q n G h L I x H W L X U k k j 1 H 4 2 P 3 R K z q w y I G G s b E l D 5 u r v i Y x G W k + i w H Z G 1 I z 0 s j c T / / O 6 q Q m v / Y z L J D U o 2 W J R m A p i Y j L 7 m g y 4 Q m b E x B L K F L e 3 E j a i i j J j s y n Z E L z l l 1 d J + 6 L q X V Z r z V q l f p P H U Y Q T O I V z 8 O A K 6 n A H D W g B A 4 Rn e I U 3 5 9 F 5 c d 6 d j 0 V r w c l n j u E P n M 8 f u K e M 5 Q = = < / l a t e x i t > X < l a t e x i t s h a 1 _ b a s e 6 4 = " 6 N Y J I W i L G h U 3 A Q f u D e Y a Y 2 w C K + k = " > A A A B 8 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 1 G P R i 8 c K 9 k P a U D b b S b t 0 s w m 7 G 6 H E / g o v H h T x 6 s / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S A T X x n W / n c L K 6 t r 6 R n G z t L W 9 s 7 t X 3 j 9 o 6 j h V D B s s F r F q B 1 S j 4 B I b h h u B 7 U Q h j Q K B r W B 0 M / V b j 6 g 0 j + W 9 G S f o R 3 Q g e c g Z N V Z 6 e H K 7 i s q B w F 6 5 4 l b d G c g y 8 X J S g R z 1 X v m r 2 4 9 Z G q E 0 T F C t O 5 6 b G D + j y n A m c F L q p h o T y k Z 0 g B 1 L J Y 1 Q + 9 n s 4 A k 5 s U q f h L G y J Q 2 Z q b 8 n M h p p P Y 4 C 2 x l R M 9 S L 3 l T 8 z + u k J r z y M y 6 T 1 K B k 8 0 V h K o i J y f R 7 0 u c K m R F j S y h T 3 N 5 K 2 J A q y o z N q G R D 8 B Z f X i b N s 6 p 3 U T 2 / O 6 / U r v M 4 i n A E x 3 A K H l x C D W 6 h D g 1 g E M E z v M K b o 5 w X 5 9 3 5 m L c W n H z m E P 7 A + f w B u B e Q X g = = < / l a t e x i t > |0i < l a t e x i t s h a 1 _ b a s e 6 4 = " L 7 N 7 U Z S S / w K Q C 2 D Q o J D T D f g L m 4 A = " > A A A B 6 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k q M e i F 4 8 t 2 F p o Q 9 l s J + 3 a z S b s b o Q S + g u 8 e F D E q z / J m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I B F c G 9 f 9 d g p r 6 x u b W 8 X t 0 s 7 u 3 v 5 B + f C o r e N U M W y x W M S q E 1 C N g k t s G W 4 E d h K F N A o E P g T j 2 5 n / 8 I R K 8 1 j e m 0 m C f k S H k o e c U W O l Z q d f r r h V d w 6 y S r y c V C B H o 1 / + 6 g 1 i l k Y o D R N U 6 6 7 n J s b P q D n e I U 3 5 9 F 5 c d 6 d j 0 V r w c l n j u E P n M 8 f u K e M 5 Q = = < / l a t e x i t > X < l a t e x i t s h a 1 _ b a s e 6 4 = " 6 N Y J I 5 w X 5 9 3 5 m L c W n H z m E P 7 A + f w B u B e Q X g = = < / l a t e x i t > |0i < l a t e x i t s h a 1 _ b a s e 6 4 = " L 7 N 7 U Z S S / w KQ C 2 D Q o J D T D f g L m 4 A = " > A A A B 6 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k q M e i F4 8 t 2 F p o Q 9 l s J + 3 a z S b s b o Q S + g u 8 e F D E q z / J m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I B F c G 9 f 9 d g p r 6 x u b W 8 X t 0 s 7 u 3 v 5 B + f C o r e N U M W y x W M S q E 1 C N g k t s G W 4 E d h K F N A o E P g T j 2 5 n / 8 I R K 8 1 j e m 0 m C f k S H k o e c U W O l Z q d f r r h V d w 6 y S r y c V C B H o 1 / + 6 g 1 i l k Y o D R N U 6 6 7 n J s b P q D n e I U 3 5 9 F 5 c d 6 d j 0 V r w c l n j u E P n M 8 f u K e M 5 Q = = < / l a t e x i t > X < l a t e x i t s h a 1 _ b a s e 6 4 = " P 9 t J i x c 1 4 W y r 3 5 q Z N 4 n Y w 7 Z t j k A = " > AA A B 6 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g K e y K q M e g F 4 / x k Q c k S 5 i d z C Z D Z m e X m V 4 h L P k E L x 4 U 8 e o X e f N v n C R 7 0 M S C h q K q m + 6 u I J H C o O t + O 4 W V 1 b X 1 j e J m a W t 7 Z 3 e v v H / Q N H G q G W + w W M a 6 H V D D p V C 8 g Q I l b y e a 0 y i Q v B W M b q Z + 6 4 l r I 2 L 1 i O O E + x E d K B E K R t F K D / e 9 d q 9 c c a v u D G S Z e D m p Q I 5 6 r / z V 7 c c s j b h C J q k x H c 9 N 0 M + o R s E k n 5 S 6 q e E J Z S M 6 4 B 1 L F Y 2 4 8 b P Z q R N y Y p U + C W N t S y G Z q b 8 n M h o Z M 4 4 C 2 x l R H J p F b y r + 5 3 V S D K / 8 T K g k R a 7 Y f F G Y S o I x m f 5 N + k J z h n J s C W V a 2 F s J G 1 J N G d p 0 S j Y E b / H l Z d I 8 q 3 o X 1 f O 7 8 0 r t O o + j C E d w D K f g w S X U 4 B b q 0 A A G A 3 i G V3 h z p P P i v D s f 8 9 a C k 8 8 c w h 8 4 n z 8 P 1 o 2 q < / l a t e x i t > RX < l a t e x i t s h a 1 _ b a s e 6 4 = " P 9 t J i x c 1 4 W y r 3 5 q Z N 4 n Y w 7 Z t j k A = " > A A A B 6 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g K e y K q M e g F 4 / x k Q c k S 5 i d z C Z D Z m e X m V 4 h L P k E L x 4 U 8 e o X e f N v n C R 7 0 M S C h q K q m + 6 u I J H C o O t + O 4 W V 1 b X 1 j e J m a W t 7 Z 3 e v v H / Q N H G q G W + w W M a 6 H V D D p V C 8 g Q I l b y e a 0 y i Q v B W M b q Z + 6 4 l r I 2 L 1 i O O E + x E d K B E K R t F K D / e 9 d q 9 c c a v u D G S Z e D m p Q I 5 6 r / z V 7 c c s j b h C J q k x H c 9 N 0 M + o R s E k n 5 S 6 q e E J Z S M 6 4 B 1 L F Y 2 4 8 b P Z q R N y Y p U + C W N t S y G Z q b 8 n M h o Z M 4 4 C 2 x l R H J p F b y r + 5 3 V S D K / 8 T K g k R a 7 Y f F G Y S o I x m f 5 N + k J z h n J s C W V a 2 F s J G 1 J N G d p 0 S j Y E b / H l Z d I 8 q 3 o X 1 f O 7 8 0 r t O o + j C E d w D K f g w S X U 4 B b q 0 A A G A 3 i G V 3 h z p P P i v D s f 8 9 a C k 8 8 c w h 8 4 n z 8 P 1 o 2 q < / l a t e x i t > RX < l a t e x i t s h a 1 _ b a s e 6 4 = " P 9 t J i x c 1 4 W y r 3 5 q Z N 4 n Y w 7 Z t j k A = " > A A A B 6 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g K e y K q M e g F 4 / x k Q c k S 5 i d z C Z D Z m e X m V 4 h L P k E L x 4 U 8 e o X e f N v n C R 7 0 M S C h q 3 h z p P P i v D s f 8 9 a C k 8 8 c w h 8 4 n z 8 P 1 o 2 q < / l a t e x i t > RX < l a t e x i t s h a 1 _ b a s e 6 4 = " P 9 t J i x c 1 4 W y r 3 5 q Z N 4 n Y w 7 Z t j k A = " > A A A B 6 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g K e y K q M e g F 4 / x k Q c k S 5 i d z C Z D Z m e X m V 4 h L P k E L x 4 U 8 e o X e f N v n C R 7 0 M S C h q 3 h z p P P i v D s f 8 9 a C k 8 8 c w h 8 4 n z 8 P 1 o 2 q < / l a t e x i t > RX CNOT network Random X gates M random circuits < l a t e x i t s h a 1 _ b a s e 6 4 = " b Z F 8 X e i s c P P B M S + 7 o U B F j w G H S 2 Y = " > A A A B 8 n i c b V D L S s N A F L 2 p r 1 p f V Z d u B o v g q i Q i 6 r L o x m U F + 4 A 2 l M l 0 0 g 6 d T M L M j V B C P 8 O N C 0 X c + j X u / B s n b R b a e m D g c M 6 9 z L k n S K Q w 6 L r f T m l t f W N z q 7 x d 2 d n d 2 z + o H h 6 1 T Z x q x l s s l r H u B t R w K R R v o U D J u 4 n m N A o k 7 w S T u 9 z v P H F t R K w e c Z p w P 6 I j J U L B K F q p 1 4 8 o j h m V W X c 2 q N b c u j s H W S V e Q W p Q o D m o f v W H M U s j r p B J a k z P c x P 0 M 6 p R M M l n l X 5 q e E L Z h I 5 4 z 1 J F I 2 7 8 b B 5 5 R s 6 s M i R h r O 1 T S O b q 7 4 2 M R s Z M o 8 B O 5 h H N s p e L / 3 m 9 F M M b D E 1 r A I I Z n e I U 3 B 5 0 X 5 9 3 5 W I y W n G L n G P 7 A + f w B l i + R d w = = < / l a t e x i t > X < l a t e x i t s h a 1 _ b a s e 6 4 = " G 3 q K w l k 9 e J Z 2 L u n d Z b 9 w 3 a s 2 b o o 4 y O k G n 6 B x 5 6 A o 1 0 R 1 q o T a i S K F n 9 I r e H H B e n H f n Y z F a c o q d Y / Q H z u c P l 7 S R e A = = < / l a t e x i t > t e x i t s h a 1 _ b a s e 6 4 = " x y d c K D s h a 1 _ b a s e 6 4 = " p L b O 5 l R H D J c E B s K 0 v e r 7 X e T s Y e 0 = " > A A A C H H i c b Z D L S s N A F I Y n 9 V b r L e r S z W A R X J V E i 7 o s u n E l F e w F m h p O p t N 2 d D I J

Figure 3 .
Figure 3. (a) Part of a QAOA cost operator.The dotted and dashed gates correspond to an RZZ gate with and without a SWAP gate transpiled to CNOT gates.(b) The RZ gates are replaced by barriers to prevent CNOT gate cancellation.This preserves the noise structure of the circuit.(c) Training data generation.M random input cuts are created by randomly applying X gates to the qubits.These states are propagated through p alternating networks of CNOT gates, corresponding to simplified e −iγH C operators, and mixer layers e −iβH B .These circuits are run on hardware to create the input data X and efficiently simulated classically to generate the ideal output data Y.

Figure 4 .
Figure 4. Training of a FFNN on simulated data.(a) Loss function (dark blue) and R 2 score (light green) as a function of the training iteration.The inset is the ten-node RR3 graph with node color indicating the MaxCut.(b) Correlators corresponding to edges in the inset graph in (a) before error mitigation versus their ideal value.If the device was noiseless, all the correlators would lie on the dashed line.(c) The correlators of (b) but after error mitigation by the FFNN.If the error mitigation were perfect, all the correlators would lie on the dashed line.

Figure 5 .
Figure 5. Optimization of a noisy QAOA.(a) The solid and dotted lines show the error mitigated cost function and the non-error mitigated cost function, respectively during the optimization of the former.The dashed line shows the non-error mitigated cost function as it is being optimized.(b) QAOA parameters for the error mitigated optimization (solid lines) and the non-error mitigated optimization (dashed lines).

Figure 6 .
Figure 6.Distribution of the cost function of each individual sampled bitstring for the optimization in Fig. 5. (a) and (b) correspond to the optimization with the error mitigated cost function (solid lines in Fig.5).(c) and (d) correspond to the optimization with the non-error mitigated cost function (dashed lines in Fig.5).

Figure 8 .
Figure 8. Depth-two QAOA data acquired on ibm brisbane.The graphs are shown as insets with node colors indicating the MaxCut found with CPLEX [53].(a), (d), (g), and (j) show the error mitigated cost function EM obtained from FFNNs at each iteration of COBYLA.The non-mitigated cost function EN , is also shown as EM is optimized.EN and EM thus represent the same quantities as the blue dotted and blue solid lines in Fig. 5(a), respectively.(b), (e), (h), and (k) show the QAOA parameters during the optimization.(c), (f), (i), and (l) show the distribution of the energy of the sampled bit strings.The dark blue and light teal lines correspond to bitstrings sampled from the QAOA circuits with the initial and last parameters θ0 and θ ⋆ , respectively.The red lines correspond to sampling from a pulse-efficient circuit.The solid black lines in (c) and (f) indicate the energy of the MaxCut.In (i) and (l) this energy lies outside of the x-axis range.We indicate it as a triangle with the energy as a number.The black dotted vertical lines indicate the noiseless expectation value obtained with θ ⋆ .

Figure 9 .
Figure 9. Properties of the Eagle quantum processors used in this paper, ibm brisbane and ibm kyiv, compared to the Falcon processor used in Ref.[20], i.e., ibmq mumbai.The data are presented as cumulative distribution functions (CDF). 0

Figure 10 .
Figure 10.Coupling map of ibm brisbane showing the qubit connectivity.The black qubits form the 40 qubits line with the best path fidelity as measured by the quality of the ECR gates.

Figure 11 .
Figure 11.(a) Sub-graph of a RR3 graph with all nodes up to a distance two from the nodes i and j for which we want to compute the correlator ⟨σ z i σ z j ⟩.(b) Quantum circuit for the graph in (a).The dumbbells represent parameterized RZZ rotations and the dark and light grey boxes are Hadamards and RX gates, respectively.The wire colors in (b) correspond to the color of the nodes in the graph in (a).

Figure 12 .
Figure 12.Comparison of a FFNN and a linear regression.(a) and (b) show the distribution of the MSE of a validation data set for the 10 and 40 node graphs, respectively.(c) MSE averaged over all ten validation data sets as a function of graph size.(d) Difference between the MSE of the FFNN and the linear regression.

Figure 13 .
Figure 13.Run on ibm kyiv.The training data was comprised of 300 circuits and the FFNN had one hidden layer with 100 nodes in all instances.The underlying graphs and other displayed quantities are identical to those in Fig.8of the main text.

Figure 14 .
Figure 14.Run on ibm nazca.The training data was comprised of 3000 circuits and the FFNN had one hidden layer with a number of nodes made of the average of the input and output.The underlying graphs and other displayed quantities are identical to those in Fig.8of the main text.

Table I .
Number of RR3 graphs that required ℓ * swap layers after an initial mapping found through solving SAT problems.