Multicore Quantum Computing

Any architecture for practical quantum computing must be scalable. An attractive approach is to create multiple cores, computing regions of fixed size that are well-spaced but interlinked with communication channels. This exploded architecture can relax the demands associated with a single monolithic device: the complexity of control, cooling and power infrastructure as well as the difficulties of cross-talk suppression and near-perfect component yield. Here we explore interlinked multicore architectures through analytic and numerical modelling. While elements of our analysis are relevant to diverse platforms, our focus is on semiconductor electron spin systems in which numerous cores may exist on a single chip. We model shuttling and microwave-based interlinks and estimate the achievable fidelities, finding values that are encouraging but markedly inferior to intra-core operations. We therefore introduce optimsed entanglement purification to enable high-fidelity communication, finding that $99.5\%$ is a very realistic goal. We then assess the prospects for quantum advantage using such devices in the NISQ-era and beyond: we simulate recently proposed exponentially-powerful error mitigation schemes in the multicore environment and conclude that these techniques impressively suppress imperfections in both the inter- and intra-core operations.


I. INTRODUCTION AND OVERVIEW
It is widely believed that the era of practical quantum computing is now dawning. With an increasing tempo, new records for qubit count are set and then surpassed [1][2][3][4][5][6][7]. However, as any given platform scales it is likely that there will be critical device sizes that will prove difficult to scale beyond. For example, it is generally expected that linear ion traps will be difficult to scale beyond the 50-100 ion scale due to decrystallisation, spectral crowding etc. [8][9][10]. Moreover, as monolithic (single-chip and single-core) solid-state arrays scale they face increasing challenges in providing power, control or cooling infrastructure to the qubit lattice. Furthermore, cross-talk and frequency crowding become increasingly problematic and there is a growing probability that a crucial component somewhere within the array will have a fabrication error (due to finite yield). It is to be expected that it will prove possible to create a plurality of distinct, moderate-scale devices -a multicore system -long before monolithic structures become truly scalable.
The multicore approach may be a key evolutionary stage even if the long-term goal is to realise a massive monolithic array of qubits. The progression is natural: a single core is realised (arguably, the 'supremacy' class devices already reported are of this kind [1,4,6]); multiple cores are realised without quantum interlinks (achieving simple parallelism that is already valuable for sampling-based tasks including variational quantum algorithms [13]); then multiple cores are realised with interlinks as discussed in this paper. Subsequently as cores become larger and interlinks more dense, the progression * hamza.jnane@materials.ox.ac.uk † balint.koczor@materials.ox.ac.uk reaches large-scale regular lattice patterns that constitute the fabric for topological error correcting codes [14][15][16].
We begin by considering the hardware-level realisation of the interlinks required for the multicore concept. Specifically, we consider single-electron qubits in semiconductor (e.g. silicon) devices and study two promising solutions: electron shuttling and microwave-based remote interactions, the latter operated in the fast (resonant) regime. While other solutions have been suggested, notably optical links [17,18], we find that these two linking modes have the potential to realise strong entanglement over relevant length scales when we assume that components perform at current or near-future levels. In particular, whereas shuttling is well suited for linking multiple cores on a single chip, cavities are potentially useful both for intra-chip and inter-chip linkage. It is worth noting that other approaches to the multicore concept on different platforms have been studied [19][20][21].
Although we consider our models' predictions of interlink performance to be encouraging, the fidelities that we obtain are well below those expected of short-range, intra-core operations. One might regard this as a fundamental constraint given the lessons of classical computing: a very mature technology in which links are inevitably inferior over each scale of distance. Intra-CPU operations can be an order of magnitude faster than calls to memory subsystems, which in turn are an order of magnitude faster than fibre-based links between adjacent 'boxes' in a cluster. It is therefore reasonable to assume that longer range links may always be inferior to local links, even as quantum computing matures. Our modelling suggests this is true of the fidelity, rather than the speed, of inter-versus intra-core links. This was anticipated as early as 2003 by Dür and Briegel [22] who proposed the use of efficient entanglement purification to upgrade the channel fidelity: Low quality Bell pairs are distributed via the channel, and then combined us- . Such an architecture is well-suited for implementing the ESD/VD error mitigation technique [11,12]: multiple quantum cores perform the same quantum computation on N qubits and these copies are used to verify each other via a derangement operation. This exponentially reduces the impact of imperfections in both the local operations and in the Bell pairs. The derangement operation is implemented by teleporting individual qubits into a buffer qubit (magenta dot) in one of the cores thus consuming overall N bell pairs. In this work we consider distributing Bell pairs via a cavity (b) and by shuttling (c) (not to scale). (b) Schematic of two spin registers connected via a superconducting cavity. By hybridising the spin and charge degrees of freedom of electrons in DQDs, we can couple distant qubits through microwave photons (red squiggles) in a cavity. Charge measurements are used to speed up the process. A successful measurement outcome will lead to a raw Bell pair, whose quality can be improved through distillation. (c) Schematic of two spin registers connected via a quantum dot shuttling channel. A Bell pair consisting of two electrons (small orange dots) is created locally via high-fidelity local gates; One of the electrons is then shuttled through the few micron-length chain on a 100 ns timescale -this chain does not use micromagnets given we need not control the spin degree of freedom during shuttling.
ing local operations to yield high-fidelity pairs; each such pair can be consumed to enact a desired two-qubit remote gate. Schemes of this kind have been much discussed [22][23][24][25][26] and indeed have been experimentally realised [27]. We adopt an approach of this kind, purifying Bell pairs to create the resource needed for high-fidelity remote operations using an architecture shown schematically in Fig. 1. By suitably tailoring purification protocols we find fidelities (with respect to the closest of the four canonical Bell states) of 99.5% can be achieved after a few successive uses of the interlink. Higher values are achievable in principle, but as anticipated in [22] the noise in intra-core operations becomes the limitation.
Having thus estimated the achievable speed-fidelity curve for the interlinks, we conclude by examining the potential of this multicore architecture for achieving quantum advantage. While we recognise the value for longterm, fault tolerant quantum computing by distributing an error correcting code over the cores, we focus on a NISQ-era application: A recently discovered error mitigation technique that allows one to suppress errors as an exponentially decreasing function of hardware size. The idea was introduced in Ref. [11] as 'Error Suppression by Derangements' (ESD) and in Ref. [12] as 'Virtual Distillation' (VD). In the present paper we model the performance of two interlinked cores and present it in the context of the in-principle performance of three-and four-core systems. Our conclusion is that the approach should be profoundly enabling for NISQ-era tasks and the prospects of real quantum advantage in the sense of meaningful tasks that are infeasible on classical systems.

II. DISTRIBUTING BELL PAIRS USING CAVITIES
In this section we first consider the possibility of generating Bell pairs using cavity mediated interactions. We explicitly model a specific setup assuming parameters that are achievable with near-term hardware and conclude that fidelities on the order of 95% are realistic. Our numerical models also confirm that optimised purifi-cation protocols can increase this fidelity to levels that are comparable to intra-core operations.

A. Overview
In the context of superconducting qubits, the coupling of distant qubits can be achieved by mediating quantum information through a superconducting cavity within the circuit quantum electrodynamics (cQED) framework [28][29][30][31]. By using this method, one can generate a coherent coupling between the resonator's modes and the qubits over few millimetres. For simplicity and for consistency with the shuttling alternative considered presently, we will restrict ourselves to single-spin qubits while recognising that singlet-triplet qubits may have advantages in cavity coupling [32].
Despite their long coherence time and the possibility of manufacturing them with current industry standards [33][34][35][36], applying the above procedure to single-electron silicon spin qubits is hard. The reason is that electronic spins have a small magnetic dipole, which means that the spin-photon magnetic dipole coupling is slower than the spin's decoherence rate. Ref. [37] gives a comprehensive review of cQED with single electron spin qubits.
In recent years numerous experimental works have explored the possibility of coherently coupling a semiconductor spin qubit with microwave photons [38][39][40]. By placing a cobalt micromagnet above a single electron confined in a double quantum dot (DQD), references [38,39] have leveraged the following two-step process to effectively couple electron spins with a cavity. First, the large magnetic field gradient generated by the micromagnet increases the spin-orbit coupling which leads to a hybridisation of the spin and charge degrees of freedom. Then, the dipole-dipole interaction between the cavity and the charge indirectly couples the spin to the cavity resulting in significant coupling rate (on the order of several MHz).
Despite the fact that hybrid spin qubits are more prone to charge noise, references [38,39] have achieved the strong coupling regime, i.e., the spin-photon coupling rate is greater than the spin decoherence rate and the cavity loss rate. Recently, Borjans et al. [41], demonstrated that one can coherently couple two spin qubits with a resonator by extending the architecture in [38] and Harvey-Collard et al. [42] reported the coupling of distant spins through virtual photons. The latter represents a significant step towards the experimental realisation of cavity-mediated two-qubit gates.
The success of these experiments has stimulated further theoretical work towards implementing long-range two-qubit gates. While Ref. [43] showed how an iSWAP gate can be implemented with fidelities above 90% by optimising the spin-charge hybridisation with current hardware, Warren et al. [44] have reported 99.5% fidelity √ iSWAP gates by exploiting the low-energy dynamics of the system. More recently, the authors proposed a photon-mediated cross-resonance gate that is more ro- Sketch of a single electron confined in a DQD. The DQD is generated by gates on top of a material heterostructure (a Si/SiGe heterostructure for instance). Cobalt micromagnets are placed above the DQD to generate the transverse magnetic field bx which will hybridise the charge and spin degrees of freedom. By lifting the spin degeneracy, the parallel magnetic field (0, 0, B) T allows us to define a qubit on the electron's spin.
bust to charge noise [45]. It is important to note that all these works assume a dispersive parameter regime: this regime may reduce cavity loss, but it results in relatively slow gates.
In the present work we limit ourselves to generating specific Bell states using cavity mediated interactions, in contrast to long-range general unitary quantum gates in the dispersive regime which were the focus of the aforementioned prior works [43,44]. This allows us to work in a resonant parameter regime that generates entanglement on a significantly faster timescale at the cost of introducing small (coherent) errors. We propose to apply powerful purification techniques to reduce such errors in our raw Bell pairs thereby non-deterministically generating high-fidelity Bell pairs. After purification, these Bell pairs can be used as a resource to perform arbitrary two-qubit gates via gate teleportation. As we explain below in Section V, more generally these Bell pairs can be used as valuable resources for a range of specific practical tasks beyond gate teleportation.

B. The model
By confining a single electron in a DQD, we obtain a spin qubit encoded in the spin of the electron with the basis states |↑ , |↓ and a charge qubit encoded in the position (orbit) of the electron with the basis states |L , |R . The spin and charge qubits are coupled by a magnetic field gradient across the two dots. Let us note that in our modelling we do not consider higher excited states of the charge degree of freedom nor the possible coupling with the valley degree of freedom [46]. To first order these would not change our results in the present section due to other mechanisms of more significant imperfections as explained below (while we will need to take into account the valley coupling for the shuttling-based alternative). As such, the full Hamiltonian of this system can be written as Above and throughout the paper we set = 1 and express energy in terms of angular frequency units unless stated otherwise. Here τ i and s i are Pauli matrices σ i with i ∈ {x, y, z} acting on the charge and spin qubits, respectively. The first term above represents the Hamiltonian for a charge qubit with and t c being the detuning and tunnel coupling between the two dots. The second term represents the Zeeman splitting B of the spin qubit due to a magnetic field that defines the spin z direction. The last term represents the coupling between the spin and charge qubits due to the difference b := (b x , b y , b z ) T in magnetic fields between the two dots. In this section, we assume the main gradient of the magnetic field is along the x-direction ( b = (b x , 0, 0) T ). This gradient can be generated using micromagnets as illustrated in Fig. 2. While using micromagnets in coupling spins to cavities may present technical challenges [41,42], we expect it is not a fundamental bottleneck for the cavitybased approach given only two micromagnets per pair of connected cores need to be implemented. In our modelling we consider a resonator described by the usual quantum-harmonic-oscillator Hamiltonian H r = ω r a † a where ω r is the resonator frequency andâ † ,â are the corresponding (bosonic) creation and annihilation operators. Furthermore, we model the coupling between the resonator and the charge qubit via the Hamiltonian where g c is the charge-photon dipole coupling factor. Hence, via interactions mediated through the charge qubit, we can couple two spin qubits in two distant DQDs via the resonator. We write the full Hamiltonian for such a system as where i ∈ {1, 2} are labels of the two distant DQDs, each containing a spin qubit and a charge qubit. The resonant condition corresponds to the Zeeman splittings of the two DQDs matching the resonator's frequency: We also note that in our numerical simulations we model up to six photons per resonator modethis guarantees a very good approximation of the exact dynamics.

C. Bell pair generation
In this section, we explicitly simulate the above introduced model and numerically search for a set of optimal evolution parameters. We explicitly take into account Evolution of the concurrence between the distant spins. Both plots were computed using the optimised evolution parameters. The optimal initial state is given by: ⊗ |0 , with the first two terms describing the state of the first and second electrons respectively, and the last one the state of the cavity. We account for potential difficulties to stop the interaction exactly at the maximal probability: we averaged the final density matrix over different stopping times (shaded area) weighted by a Gaussian distribution centred at 15 ns (see text). Here, T2,s = 120 µs and T2,c = 400 ns.
hardware noise that is comparable to that of current devices. We find that our cavity-based system is capable of generating raw Bell pairs of sufficiently high fidelity that we can then use as input for purification.
In particular, we numerically simulate the evolution of the system's state under the Hamiltonian defined in Eq. (3) while also accounting for different noise processes via the general Lindblad master equation as Here γ i are positive rates, andÂ i are Lindblad terms. Thereafter we denote each noise process using a tuple (γ i , A i ). In this work, we focus on three sources of noise. First, we consider cavity loss (κ, a) as defined by a cavity loss rate κ and the annihilation operator a. Then, we consider spin dephasing (1/(2T 2,s ), s z ) characterised by the spin decoherence time T 2,s and the Pauli operator s z . Finally, we consider charge noise (1/(2T 2,c ), τ z ) as defined by the charge decoherence time T 2,c and the Pauli operator τ z . Our model assumes a time-independent Hamiltonian and the high-frequency fluctuations of the detunings are accounted for in the charge noise model [47]. While this charge dephasing T 2,c ≤ 400 ns is significantly faster than spin dephasing, we note that in our protocol quantum information is never stored in the charge state -the charge degree of freedom merely acts as an interaction mediator for a brief period of time ≈ 15 ns.
As described above, our goal is to entangle the two remote spins by generating a specific Bell state -which may be noisy due to imperfections. We stress that our model in Eq. (3) implemented in the resonant regime leads to highly non-trivial evolutions and even with no decoherence it cannot perfectly generate a Bell pair as detailed in Appendix A 5. Intuitively, the reason is that we assume we only have the ability to abruptly alter the interaction Hamiltonian -and in the full Hamiltonian we only have control of at most 10 individual parameters, whereas at the end of the evolution a number of restrictive conditions need to be satisfied. First, the two spins have attained the highest degree of entanglement; second, the spins are separable from the charge degrees of freedom and from the cavity; third, the cavity is nearly in the vacuum state. As such, given the dimensionality of the Hilbert space is 2 4 × N cav (allowing for the fact that only the first N cav ≈ 7 energy levels are populated in the cavity), our dynamical system is clearly under-parametrised. Therefore we optimise the system's trajectory to come as close as possible to the desired condition, but do not expect to perfectly meet it.
In order to obtain the best possible fidelity, we will perform projective measurements at a given optimal time on the interaction-mediating charge degrees of freedom, thereby also increasing the entanglement in our raw Bell pairs at the cost of a slight probability of failure as explained in Appendix A 5. Our ultimate goal is that given a sufficiently high level of entanglement of our raw Bell pairs, we can improve its fidelity to arbitrarily high levels by using purification techniques.
To find an optimal set of parameters in Eq. (3), we performed a gradient-based optimisation that maximises the entanglement between the spins. The optimal parameter values can be found in Appendix A 1. As a figure of merit for the optimisation, we use the concurrence [48] of the joint state of the two spins given a fixed charge measurement outcome. Averaging this concurrence for the charge measurement outcomes that we accept then defines our cost function. We also introduce a set of constraints for the optimisation. Firstly, the resonant condition B 1 = B 2 = B = ω r , and, secondly 2t c,1 = 2t c,2 = 2t c = B (achievable by tuning the tunnelling barrier of the DQD [49]) which gives us the maximum coupling between the spin and the charge [50]. Apart from these fixed parameters, we optimise all other parameters that define our model and the initial state of each electron. It is interesting to reflect that, given a real device, one could perform a similar optimisation procedure (e.g. post fabrication) to find the device's ideal mode of operation. In Fig. 3, we report the evolution of our experimental system under an optimal set of parameters. In particular, Fig. 3(a) shows the probabilities of different charge measurement outcomes, while Fig. 3(b) shows the corresponding evolution of the concurrence between the spins. The concurrence peak nicely matches a probability peak given charge measurement outcomes in the odd parity subspace (as defined by the states |LR and |RL ). We discuss in Appendix A 5 that the discarded charge measurement outcomes eliminate the dominant coherent error in which the charge state (|LL − |RR )/ √ 2 is coupled to the unentangled spin state |↓↓ due to energy conservation on-resonance -which explains the nearly zero concurrence of the discarded measurements in Fig. 3(b)(black and red lines).
As we perform a charge measurement at the maximal peak, i.e., at around t ≈ 15 ns, we find a high success probability ≈ 0.9 while the resulting state achieves a relatively high concurrence ≈ 0.89. In our simulations we also took into account the possible uncertainty in stopping the interaction instantaneously by averaging over a Gaussian distribution of different stopping times with σ = 0.5 ns (grey shaded area in Fig. 3). Given stateof-the-art technology achieves smaller jitter by orders of magnitude, ours is a very conservative estimate to account for the possibility of integrating (likely lower quality) pulse-control electronics within the chip. Furthermore, the purified fidelities we observe are nearly oblivious to jitter times, in particular we observe fidelities 99.60%, 99.59%, 99.51% with a jitter time of 0 ps, 50 ps, 500 ps, respectively, which speaks for the robustness of our approach. It is also worth noting that as we accept measurement outcomes in the odd parity subspace, there is a high probability (≈ 0.96) to find the photon in the vacuum state |0 (see Appendix A 2). This guarantees with high probability that we can promptly use the cavity in a next instance for generating another Bell pair.
It is expected that in typical experiments the coupling between each spin and the cavity is electrically controllable [38,50]. This allows us to rapidly switch off their interactions at our optimal evolution time-which corresponds to the peaks in it might be difficult to stop the interaction exactly at the maximal probability in practice. We take this into account by averaging the final density matrix over different stopping times weighted by a Gaussian distribution centred at 15 ns (with σ = 0.5 ns). As an artefact, the present dynamics might introduce a deterministic phase to our quantum state, however, if this phase is stable over consecutive generations of Bell pairs it is automatically removed by our distillation process.
We also note that the quality of the entanglement in the raw Bell pairs (without purification) might not be sufficient to perform high-fidelity long-range operations as we report in Table I. In particular, we expect that charge dephasing noise (which deteriorates our hybridised states) is the dominant source of error and we report fidelities in Table I for different values of T 2,c decoherence times on the order of tens up to hundreds of nanoseconds. For each simulation we assumed a spin decoherence time of T 2,s = 120 µs. An ensemble-averaged decoherence time T * 2,s = 120 µs has been achieved experimentally [51], hence we expect our T 2,s to be at least equal. It is worth noting that T 2,s is not a bottleneck, and while it could be reduced without significantly affecting our results, it is expected that T 2,s will have to be reasonably large given for any practical application one needs to perform at least hundreds of quantum gates.

D. Entanglement purification
As noted in the introduction, as early as 2003 entanglement purification was explored theoretically as a means to distribute a quantum task over subsystems with nonoptimal links [22][23][24][25][26]. We briefly recapitulate the basic two-node version entanglement purification (or, 'distillation' -the terms are now typically used interchangeably). The 'Alice' and 'Bob' nodes require a means to create Bell pairs between them, as well as a modest set of local operations. Purification involves stages, or rounds, where two Bell pairs are locally entangled and then one pair is measured out. Alice and Bob compare their measurement outcomes over a classical channel. If the Bell pairs were both noise-free then only certain measurement outcomes are possible; if the allowed outcome is not seen then Alice and Bob discard the remaining Bell pair, otherwise they store it. Thus, a surviving Bell pair has passed a form of validation and its fidelity is higher: in the circuits used here, one noise channel (say, phase) will have been suppressed from order p to p 2 while the other (flip) increases from p to 2p. Thus, one must create a second validated Bell pair, and then combine the two pairs in a new, higher round of purification that targets the remaining noise channel. In this way, four 'raw' Bell pairs with infidelity of order p can be combined to give rise to a single pair with infidelity of order p 2 (with a modest prefactor). Of course, further rounds can be employed to suppress noise to order p 3 or higher, but the cost in Bell pairs increases exponentially.
Noise in the local operations is relevant but the purification process itself combats it, so that only noise in the final round is significantly impactful. Importantly, if there is significant structure in the noise in the raw Bell pairs, then the purification process can be tailored to exploit this structure -structureless full rank noise (white noise) is the least desirable. Distillation process flow chart for two rounds of distillation. The blue-shaded box represents the steps for one round of purification. It requires the generation of two raw Bell pairs (steps 1 and 2), a process that fails with probability praw ≈ 0.10. The distillation process (step 3) fails with probability p1 ≈ 0.08. After having successfully generated two level-1 Bell pairs, we perform a second round of distillation to get the final purified Bell pair, which fails with probability p2 ≈ 0.06. The probabilities are given for the T2,s = 120 µs and T2,c = 400 ns case.
We use the controlled interaction between the spin qubits and the cavity described in the previous section as a process to generate noisy bell pairs. In Fig. 5 we plot the concurrence (entanglement) in the Bell pairs after various rounds of purification as a function of the dominant noise source, the charge decoherence time T 2,cwhile we also model spin decoherence and imperfections in the local single and two-qubit gates via depolarizing noise as well as measurement errors. Here, we use the variant of the standard purification circuit in Fig. 4 that gives us the best concurrence after the purification process for the particular structure of noise in our raw Bell pairs. Furthermore, in Appendix A 6 we report that by adding further single-qubit gates we can obtain better concurrence in the ideal purification case. For simplicity, in the following we will mostly focus on the T 2,s = 120 µs and T 2,c = 400 ns case. Such a charge decoherence rate can be reached with current hardware [52].
Since in entanglement purification we need to discard certain measurement patterns, it is a non-deterministic process. In our case, the purification protocol (in both rounds) succeeds if we measure both qubits in the same state. Fig. 6 shows the main steps of the distillation process. On average, one needs to generate N avg ≈ 5.2 noisy Bell pairs to obtain a 99.5% fidelity Bell pair. Assuming typical gate/measurement times this results in an average generation rate g Bell ≈ 0.14 MHz (see Appendix A 7), even though in the limit of very fast gates and measurements entanglement generation is dominated by the cavity interaction time with a generation rate of g Bell ≈ 12.9 MHz.
Fortunately this rate of 0.14 MHZ can still suffice for powerful applications including the derangement-based error mitigation described in Section V. Prior experiments demonstrated that coherence can be maintained with respect to phase noise for as long as 28 ms using a Carr-Purcell-Meiboom-Gil sequence [51], and indeed even devices fabricated in 300 mm commercial wafers have achieved a T 2 time of 3.7 ms [53]. We therefore expect our approach allows the generation of tens or even hundreds of purified Bell pairs to be used for 'comparing qubits' between cores using the derangement method in Section V before dephasing would negate the advantage of the process.
For applications where this rate is insufficient, note that the cavity is in use for only a very small fraction of the procedure (≈ 15 ns), and we can thus significantly improve generation rates by a 'parallelisation' of the purification step: The extended architecture would include n 'purification stations' on both sides of the cavity, and while the cavity is exclusively used to generate a single Bell pair at a time, these raw Bell pairs are fed in an alternating way into the different purification stations. By choosing a modest parallelisation n ≤ 5 we expect the generation rate is effectively increased by the same factor n. We also remark that our particular purification protocol transforms the initial |ψ − into |φ + .
For charge noise worse than T 2,c = 400 ns, four rounds of purification are necessary to obtain an adequate fidelity. In Table II we collect the fidelities for different charge noise parameters for two and four rounds of purification. The fidelities reported in parentheses are obtained without errors in the local gates and during measurements in the purification process. Let us define the obtained Bell pair after the i th successful purification round as a Bell pair of level i. Four rounds of distillation is rather demanding, since in order to go from the i th round to the (i+1) th one needs two level-i Bell pairs. As purification is a stochastic process, performing more rounds leads to a decrease in the success probability and hence to a smaller generation rate. See Appendix A 7 for its computation for T 2,c = 100 ns and T 2,c = 50 ns.
With better hardware, the number of purification rounds will decrease while the generation rate and the fidelity will increase. Indeed, with negligible decoherence, one obtains a fidelity ≈ 99.9% with g Bell ≈ 0.15 MHz after two purification rounds (g Bell ≈ 13.9 MHz via fast gates and measurements). However, we will see in Section V that for many applications one does not need to generate perfect Bell pairs and already a fidelity of 99.5% is enabling for powerful NISQ applications.

III. SHUTTLING
Perhaps the most straightforward means of transporting quantum information on-chip is through the direct movement of the electrons themselves through a series of quantum dots. So-called shuttling has featured in many spin-based quantum computing architecture proposals as an efficient means of transporting spins over micron-scale distances without creating a time-bottleneck [54][55][56][57]. In the following we focus on a "bucket-brigade" shuttling protocol which is perhaps the most theoretically wellunderstood alternative in silicon, in contrast to "conveyor mode" or "surface-acoustic-waves" protocols.
The unit operation of a shuttling protocol is the coherent tunnelling of an electron from one quantum dot to the next. This may be actuated via a time-dependent electrostatic detuning (t) present between left (L) and right (R) dots, such that the charge state is adiabatically tipped into the target dot. The velocity of the detuning sweep must be sufficiently slow to minimise diabatic transitions to excited valley-orbit states, while slow enough to ensure charge noise does not induce transitions near avoided crossings. Such restrictions are dependent on device design and the degree of pulse optimisation, though speeds on the order of 1 nanosecond per tunnelling event are considered feasible [58][59][60]. With a dot-to-dot spacing of 50-100 nm, the distance of a few microns can be traversed in tens to hundreds of nanoseconds, comparable to single-and two-qubit gate times in silicon quantum dot processors [61].
Coherent spin shuttling has been realised over effective several-micrometer distances in a GaAs quantum dot circuit [62]. Reliable charge shuttling has also been shown in multi-dot Si/SiGe arrays [63,64], while repeated coherent spin tunnelling between two Si-MOS dots has also been demonstrated [65]. This places shuttling as a top candidate for micron-scale on-chip quantum information transport in near-term devices.
Previous theoretical treatments have given substantial attention to the ability of a single shuttling event to preserve an arbitrary input state's fidelity [58,[66][67][68][69]. However, a shuttling channel does not need to be able to shuttle arbitrary states to constitute a useful resource. In accord with the theme of the present paper, shuttling along arrays of quantum dots can provide a means of onchip entanglement distribution and may be a component of gate teleportation or error correction schemes between many computational cores. In many ways, this is a less demanding task, as unitary transformations induced by the shuttling channel can be mitigated through calibration or distillation protocols. Rather than risk losing information in a shuttled data qubit, entanglement may be continuously generated on a timescale similar to native physical gates and then be used by spin registers separated by micron-scale distances on-demand.
We present a schematic of two spin registers connected via a quantum dot shuttling channel in Fig. 1(c): A Bell pair consisting of two electrons (small orange dots) may be created using a high-fidelity two-qubit interaction near one spin register, made possible with a local micromagnet. One electron of the pair may then be shuttled down the few micron-length chain on a 100 ns timescale. A measurement dot (M) at the very end of the chain may be used to probabilistically project the electron's state into the ground valley. The resulting spin states of the Bell pair may be stored in the ancilla dots (A) via a SWAP interaction or state-preserving tunnelling. A second Bell pair may be initiated and transported in the same way. A single-round purification scheme may be run on the two Bell pairs, using local two-qubit interactions. Charge measurement can again be carried out using the ancilla dots, and a successful outcome yields a high-fidelity Bell pair on the memory qubit. This entangled pair may be used as part of an algorithm requiring interaction between the two spin registers.
To investigate the dynamics of shuttling, it suffices to focus on the process of shuttling an electron from one QD to another in a DQD structure [68,69]. The Hamiltonian we have is similar to the static DQD Hamiltonian in Eq. (1), but now we are equipped with the ability to change the detuning between the two dots ( → (t)) for carrying out the shuttling. Without the strong magnetic field gradient induced by the micromagnet in the cavitymediated interaction in Section II, it becomes essential to take into account small magnetic fields. We consider the intrinsic spin-orbit interaction as the Rashba and Dresselhaus effects [70] which we model via the Hamiltonian Here τ y and s i are Pauli matrices as explained below Eq. (1) in an effective magnetic field Ω [70].
As opposed to our modelling of the cavity-based distribution in Section II, here we need to take into account the coupling with the valley degree of freedom which we identify as potentially the leading source of error in shuttling. In particular, the time-dependent variation (sweeping) of the detuning leads to the possibility of coupling the charge qubit to higher excited states of the valley degree of freedom which we model as an effective valley qubit described by the two basis states as the bulk valley states |z and |z . By using v i to denote the Pauli matrices σ i acting on the valley qubits, we define v + = 1 2 (v x + iv y ). Furthermore, we define the projectors that act on the charge degree of freedom as τ d = |d d| with d ∈ {L, R} and thus we can write the valley-charge coupling term as Here we denote the site-dependent valley-couplings ∆ d = |∆ d |e −iφ d corresponding to the valley splitting energies E V,d = 2|∆ d |, refer to Ref. [46] for more details. Hence, the full Hamiltonian that we will consider for the shuttling process is: Due to the local magnetic field variance, the local natural spin basis is different from the global basis defined by B in Eq. (1), and we can transform Eq. (1) into the local spin basis via a unitary transformation defined in Ref. [69]. Similarly, the local valley eigenstates are also different from the bulk valley states {|z , |z }. Suppose the valley eigenstate at site d ∈ {L, R} are |− d (ground state) and |+ d (excited state), we can then express them as As the detuning is swept, an electron state traverses from left to right. Both adiabatic (black arrows) and diabatic (red arrows) transitions may occur, and the initial spin state will coherently evolve into a superposition of output states. Leakage to the excited orbital sector will result in a mixed spin-valley state.
When the valley-orbit sector of Eq. (5) is rewritten in the {|± d } basis, the valley-conserving and valley-flipping tunnel couplings can be identified specifically as [68] Here δφ = φ L −φ R is the difference in phases between the two valley coupling parameters ∆ L and ∆ R . This phase difference depends precisely on the overlap of the electron wavefunction with the quantum well interface [46]. When δφ = 0, states of opposite valleys cross entirely, and when δφ = π, valley flips occur deterministically regardless of the magnitude of the bare tunnel coupling, making this microscopic parameter crucial in the description of electron shuttling in silicon. Fig. 7 illustrates the spin-valley-orbit energy level diagram of H for a realistic set of parameters that may be encountered during shuttling as detailed in Appendix B. Initially, an electron will populate the lowest two states |L− ↓ and |L− ↑ . Population in the former ground spin state will adiabatically transfer reliably to |R− ↓ , as the unique avoided crossing near zero detuning is large. Population in the latter excited state encounters a much smaller effective tunnel coupling t vc near zero detuning, possibly allowing diabatic Landau-Zener transitions into the higher energy level. Subsequent avoided crossings provide further opportunities for population leakage into excited spin-valley-orbit states.
If the quality of the shuttled state is measured on the basis of fidelity, population loss into higher-energy states will degrade the quality of our Bell pairs unless recovered through appropriate calibrations [58,68], i.e., such a unitary evolution does not inherently destroy information but does rather introduce a coherent error. Although the coherent error introduced by the unitary evolution under our Hamiltonian in Eq. (6) is local to the shuttled spin, it may increase the spin-valley entanglement and thus lead to a decreased entanglement between the spin Bell pairs due to the monogamy of entanglement.
Let us now identify two additional mechanisms of central importance for information loss while a further discussion of our assumptions and the impact of other noise sources is included in Appendix B.
First, at the end of the detuning sweep, the state may populate the excited orbital state |L , corresponding to the charge "boucing back" to the initial dot. In any practical shuttling implementation, the detuning will be plunged much further, such that this excited orbital state will inevitably cross with higher energy orbital states in the target dot. For a sufficiently large detuning, charge transfer will occur deterministically but can populate these excited orbits which are not included in Eq. (6). Nevertheless, excited orbitals in silicon quantum dots relax on a sub-nanosecond timescale [71], and therefore the ground orbital state |R of the target dot will be entirely occupied prior to a subsequent shuttling operation.
Second, a finite population of the excited valley state |+ R may also lead to decoherence. Although the bare valley degree of freedom is believed to be long-lived, hybridisation with the orbital state may substantially decrease the coherence time as a result of quantum well interface roughness [71,72]. Therefore, in a manner similar to the population of the excited orbital states, excited valley population and decoherence can cause overall information loss. However, the spin-valley state may plausibly be well-defined during the entire duration of a 100 ns shuttling operation. While spin-valley computation is universal [73], and coherent valley control has been demonstrated [74,75], fault-tolerant fidelities have only been achieved with spins [76,77]. Therefore, among many creative possibilities, we find it prudent to projectively measure the valley state of the shuttled electron and post-select on ground state measurements. This leaves the spin state intact while introducing a slight probability of failure as we may need to discard certain measurement outcomes. We outline how this may be accomplished in Appendix B.
While it is the decoherence properties of the excited valley-orbit states that ultimately lead to information loss, it is the parameters of Eq. (6) that determine the extent to which these states become populated during shuttling. The electrostatic and magnetic environment of the shuttling chain can be accurately engineered, or at least known, to good accuracy. However, the variation in the valley parameters ∆ d is believed to be large on account of their sensitivity to microscopic interface details [78], and experiment is just beginning to probe the variation in inter-and intra-valley tunnel couplings in silicon quantum dot arrays [79].
In Fig. 8, we emphasise the paramount importance of valley phase uniformity for Bell pair distribution via shuttling. We consider chains of 25 quantum dots, with each chain having a uniform bare tunnel coupling t c between adjacent sites. A protocol as described in Fig. 7 is run, while the spin-spin concurrence between a stationary and shuttled spin is evaluated down the chain as a measure of entanglement [80], as if the shuttling protocol was halted at each location on separate experiments. For valley phase differences randomly selected from a Gaussian distribution with a modest standard deviation of SD φ = π/4, preserving high concurrence depends on adjacent dots having a sufficiently large bare tunnel coupling t c . The threshold is roughly given by the condition for a vanishing anticrossing between |L− ↑ and |R− ↓ near zero detuning when B ≈ 2|t vc |, corresponding to maximal mixing of the spin and valley states. For larger variation in the valley phase differences, the higher probability for spin-valley mixing with each tunnel coupling results in a faster decrease of spin-spin concurrence with chain length for all bare tunnel couplings.
In the Appendix, in Fig. 15 we also show a representative Pauli Transfer Matrix (PTM) for shuttling chains with parameters that result in high spin-spin concurrences. From this PTM, the shuttling superoperator S can be interpreted as a coherent z-rotation as well as amplitude damping towards the |↓ state as indicated by the nonzero I in → Z out element [81]. Such an effect can be understood as the result of the |↑ state being principally involved in diabatic transitions during the tunnelling operations. As such, let us consider the action of the PTM on an initial Bell state ρ = |ψ + ψ + | as 2 |↓↑ is an asymmetric Bell state which has accumulated some phase φ.
When an ideal single-round purification circuit is applied to two copies of ρ , a "11" ↔ |↓↓ outcome occurs with probability (1 − ) 2 ( 1 2 − ) with the resulting entangled state being the ideal Bell pair |ψ + . Entangled states distributed via shuttling are therefore highly amenable to purification provided concurrence is mostly preserved. Of course, other decoherence processes do manifest beyond those captured in Eq. (9), such that even an ideal purification circuit does not completely restore the initial state. For example, over many instances of the 25-dot chain with t c = 30 µeV and SD φ = π/4, concurrences after a single ideal round of purification average 99.5%. Here we do not report rigorous success probability estimates as in case of the cavity-based alternative given these highly depend on the valley phase differences of the particular device.

IV. COMPARISON OF ENTANGLEMENT DISTRIBUTION MODES
Let us now compare the imperfect Bell pairs that we obtain via the cavity and suttling-based approaches. We show their corresponding density matrices before and after 2(1) rounds of entanglement purification in Fig. 9.
As such, we conclude that both mechanisms are able to provide sufficiently high grade shared Bell pairs, that subsequent entanglement purification yields final fidelities at about 99.5% or better. Interestingly, the low-rank nature of the noise in the case of shuttling (only 2 brown bars in the diagonal of the density-matrix in Fig. 9 and not 3) has the consequence that the purification process can be more simple -a single stage does suffice. The full-rank noise predicted for the cavity-mediated link (3 brown bars in the diagonal in Fig. 9) does however require a multi-round purification to reach high grade final before purification after purification cavity shuttling FIG. 9. Comparing output density matrices in the case of cavity-based and shuttling-based approaches. Absolute values of matrix elements in the Bell basis assuming imperfection parameters T2,c = 400 ns (top) and SD φ = π/4 (bottom). The matrix associated to shuttling before purification corresponds to an average over multiple instances of 25-dot chains and is given up to local rotations. We need 2 rounds for the cavitybased approach, and only 1 for the shuttling-based one. The largest fidelity with respect to the closest of the four canonical Bell states is increased from 94.5% to 99.5% for the cavitybased method, and from 95.7% to 99.6% for shuttling.
Bell states. A second distinction is that a shuttling channel could simultaneously transport a number of spins in a 'pipeline' mode, whereas multiple simultaneous use of a cavity is possible in principle but may be more difficult to achieve in practice. While these comparison points provide an interesting perspective, in reality it may be unlikely that a chip architect would select between these mechanisms; rather the choice would be determined by the desired length of the link. Shuttling is likely to be relevant in the 1−10 µm range, with cavity-based links appropriate for longer distances up to several mm (or even between chips).

V. APPLICATIONS SUITED TO MULTICORE
Let us now consider a number of potential applications building on the above interlinked multicore model. The key characteristic here is that inter-core operations can be assumed to be of high fidelity but are a limited or 'expensive' resource in comparison to 'cheaper' intracore operations which will be faster and (we suppose) capable of parallel operation over the core. We identify several important applications that are compatible with these features, and they may be of particular relevance to near-term quantum quantum computers. We begin by arguing that recently proposed, exponentially powerful error mitigation techniques are eminently compatible A derangement operation Dn is then applyied immediately prior to measurement that uses the copies to validate each other. Errors in estimating the observable σ are suppressed exponentially when increasing the number n of cores. As each core consits of N computational qubits, we show that the derangement circuit can be implemented efficiently with distributing N (n−1) Bellpairs between the quantum cores and performing N (n − 1) controlled-SWAP operations locally. This figure has been adapted from Ref. [11].
with the multicore paradigm, and we simulate a VQE task enabled by such a mitigation.

A. Exponential Error Suppression
In the following we focus on the recently introduced Error Suppression by Derangements (ESD) and Virtual Distillation (VD) approaches [11,12]. These error mitigation techniques can achieve exponential suppression of hardware noise, which is reminiscent of true quantum error correction. However, the technique requires significantly fewer resources than quantum error correction and is compatible with near term quantum devices. As such, we achieve exponential suppression by preparing n identical copies of a computational quantum state, which fits well with our proposal of utilising multi-core quantum processors.
In particular, we use n quantum cores to perform the same quantum computation in parallel. We use these (near) identical copies of the computational state to effectively verify each other via the derangement circuit as illustrated in Fig. 10. Following Ref. [11], if we entangle these copies with the derangement circuit and estimate the probability Prob 0 that the ancilla qubit collapses into state |0 , we can formally obtain the expectation value Tr[ρ n σ]/Tr[ρ n ]. This allows us to suppress errors in estimating the expectation value of an observable σ exponentially in n. The main limitation of the approach is that a small coherent mismatch in the dominant eigenvector of the state ρ may ultimately bias our estimates, however, this mismatch is exponentially less severe than the incoherent decay of the fidelity [82]. Furthermore, the derangement circuits used to entangle the two registers are considerably shallower than typical computational quantum circuits used even in the context of near-term quantum algorithms [11]. As such, the following three properties of the approach make it particularly relevant for on-chip multicore architecture designs. First, the two (or more) input quantum states ρ 1 and ρ 2 can be prepared completely independently in two physically separate cores. Second, they are entangled with weak quantum links between the cores immediately prior to the ancilla measurement. Third, the entangling derangement operation is shallow: it decomposes into N elementary controlled-SWAP operations between pairs of qubits, where N is the number of computational qubits in the individual cores (registers).
We note that Ref. [11] provides decompositions of derangement circuits for arbitrary n into local, elementary entangling gates whereas Ref. [12] focused on the scenario of n = 2 copies without requiring an ancilla qubit. In the following we outline an alternative implementation that is compatible with both the above techniques and utilises macroscopically separate quantum cores (which could be on a single chip for both methods, but could also be on multiple chips for the cavity-based method) with weak entangling links between them.
Let us first note that distributing Bell pairs in principle enables universal quantum computation as it allows us to implement arbitrary long-range quantum-gates (via quantum gate teleportation). However, we choose an other way and we propose an approach that merely uses N distributed Bell pairs to teleport single-qubit states between the two quantum cores. Let us explain now the technique on the specific example of n = 2 copies. Refer also to Fig. 1 for a schematic illustration of the process.
(1) Teleport single qubit state-we aim to implement the quantum teleportation protocol in Fig. 11(a) to formally swap the state of a single computational qubit from core 2 into a buffer qubit in core 1. This requires the preparation and distribution of a single Bell pair between the two cores as well as local operations and classical communication (LOCC transformations).
(2) Apply controlled-SWAP -Once the single qubit state has been teleported into core 1, we can implement the controlled SWAP operation (the elementary building block of the derangement circuit) in core 1 locally as illustrated in Fig. 11(b).
The process of quantum-state teleportation (1) and consequent application of the controlled-SWAP operation (2) is then repeated N times for all computational qubits. Note that during this process the buffer qubit in core 1 is always reset, while the computational qubits in core 2 are all measured out. Hence one of the copies of the computational state is destroyed, but this does not affect the measurement outcome of the ancilla qubit.
Our technique can be naturally generalised to the case of n copies via a number of possible ways. For example, we may distribute Bell pairs between cores (1−2), (1−3) · · · (1−n) and use them to teleport n single

(c)
unmitigated noisy teleportaion + noisy derangement noisy teleportaion + perfect derangement perfect teleportation + noisy derangement n = 2 copies (noiseless) n = 3 copies (noiseless) n = 4 copies (noiseless) n → ∞ copies (noiseless) FIG. 11. (a) Circuit for teleporting a single qubit state from core B to core A by consuming a Bell pair (wavy line) that had been distributed between the two cores (b) after teleporting the single-qubit we can locally apply the derangement circuit in core A, i.e., the controlled-swap operation between the two single qubits. The lower qubit is reset and repeating this procedure for all computational qubits k implements the entire derangement circuit. Imperfections in the Bell pairs have a formally equivalent effect to a depolarising channel applied during the computation and therefore the derangement circuit mitigates these imperfections. (c) Numerical simulation of a spin-ring Hamiltonian. Error in estimating the ground state energy as a function of the expected number of errors in the state-preparation circuit (ξ). The unmitigated errors (red squares) are impressively removed by the error suppression technique (assuming noiseless derangement circuits) using n copies of the computational state (dashed lines). In the practically relevant regime when 0.1 ≤ ξ ≤ 5, the performance of the ESD/VD technique is similar to the ideal one even when taking into account imperfections in the derangement circuit (brown stars) or the imperfections in the Bell pair distribution (black crosses) or both (magenta circles).
computational qubit states to buffer qubits in core 1. We can then implement the controlled-derangement operator locally on n copies of the single-qubit state and repeat the procedure N times for all computational qubits.
The above approach has one significant advantage in the context of error mitigation: We prove in Appendix C 1 that imperfections in the Bell-state preparation or measurement errors are guaranteed to result in a formal application of a single-qubit depolarising error channel to the individual computational qubits, as illustrated in Fig. 11(b). This is highly advantageous since imperfection in the long-range quantum teleportations are formally part of the state-preparation process and the derangement circuit is guaranteed to exponentially suppress these incoherent error contributions.
Practical questions that arise in our setting are the following. (a) is the Bell-pair generation rate sufficient such that distributing N Bell-pairs immediately prior to measurement is sufficiently fast when compared to the main computation? We can answer this affirmatively given the distribution of a Bell pair is comparable to the time of local operations -which we expect is the case for both the cavity-based and shuttling alternatives. (b) can 'mem-ory qubits' be manufactured that buffer all N Bell-pairs thereby enabling them to be generated in parallel with the main computation?
We numerically simulate the present protocol in the following using realistic error rates of Bell pairs as we have established above.

B. Numerical simulations
We simulate a variational quantum eigensolver (VQE) application and consider a spin-ring Hamiltonian with a constant coupling J = 0.1 and uniformly randomly generated on-site interaction strengths ω k ∈ [−1, 1] as as relevant in the context of manybody localisation [83][84][85]. We explicitly simulate N = 6 qubits and n = 2 copies (equivalent to a 26-qubit pure-state simulation) and we assume the ground state is prepared via a variational Hamiltonian ansatz of l = 20 layers to a precision of 10 −4 [86][87][88][89][90]. Refer to Appendix C 2 for more details.
We simulate quantum cores that can locally implement parametrised controlled-Z entangling gates as well as single qubit rotation gates and we assume the error rate of single-qubit gates are 5 times smaller than that of the entangling gates. While such a ratio is very common in experimental systems, we do not intend to capture exact noise characteristics of state-of-the-art entangling gates and simply note that the literature is evolving rapidly [76]. Independently of the gate error rates, the main computation requires overall 3N l = 360 applications of local entangling gates. Furthermore, we adapt techniques of [11] for implementing the derangement circuit using local entangling gates: implementing the derangement circuit requires 5N = 30 applications of local entangling gates as well as the distribution of N = 6 Bell pairs. We can therefore expect that due to its modest resource requirements the derangement circuit is much less affected by gate noise than the main computation will be.
We simulate the approach assuming noisy Bell pairs of fidelity f that have been prepared via the long range entangling links outlined above. In the following we assume that f = 99.5% which we have shown above is reachable with current state-of-the-art technologies but we can also expect this figure will improve with future hardware developments. As discussed above, we efficiently model the process of teleporting single qubit states by formally applying single-qubit depolarising noise of probability 0.5% after every qubit in one of the copies of the computational quantum states. Fig. 11 shows energy estimation errors as a function of the number of expected errors (circuit error rate ξ) in the state-preparation circuit. Since the measurement cost of error mitigation techniques generally increase exponentially with the circuit error rate, we focus on the practically most important regime as 0.1 ≤ ξ ≤ 5. In this regime, the unmitigated errors (red squares) are significantly reduced even when we take into account the imperfections in the derangement circuit (brown stars) and imperfections in the Bell-state preparation (black crosses) or both (magenta circles). The error due to imperfect Bell pairs (black crosses) approaches a very small constant error for ξ → 0. This error is due to a small coherent shift in the dominant eigenvector of the quantum state ρ introduced by the formal application of N singlequbit depolarising noise channels immediately after the main computation.
While these simulations confirm that the ESD/VD technique is impressively robust to imperfections in the long-range entangling links, they do come at an increased measurement cost. In particular, since imperfections in the N distribted Bell pairs are equivalent to local depolarising channels, we estimate that the probability that no error happens during the long range teleportation is f N , where f is the fidelity of the Bell pairs. While the ESD/VD technique filters out erroneous contributions, the probability of an error-free outcome is attenuated and therefore the expectation value requires an increased number of samples to be resolved to a sufficient accuracy. For example, in the present case of N = 6 qubits we find the probability of no errors occuring during teleportation is 0.995 6 ≈ 0.97 not significantly attenuated. However, when scaling up computations this probability decreases exponentially with N . Nevertheless, even with, e.g., 100 qubits, we can still estimate the encouraging probability 0.995 100 ≈ 0.60.
In summary, our numerical simulations confirm that the ESD/VD technique is indeed compatible with the weakly connected multicore concept and imperfections in the long-range links are impressively well mitigated by the derangement circuits.

C. Other Applications
Besides exponential error suppression, there are a number of other problems that can be efficiently implemented using our modular architecture.
The simplest example is the SWAP-test whereby we prepare two different states |ψ 1 and |ψ 2 in two input registers (quantum cores) and measure their overlap | ψ 1 |ψ 2 | 2 using the controlled-SWAP operation. This operation is directly analogous to what we have done for the 2-copy error mitigation scheme in Fig. 10. The SWAP-test is an elementary subroutine crucial for the implementation of a number of important algorithms, which include the following. (a) finding excited states of quantum systems, such as in quantum chemistry [91,92]; (b) Simulating quantum dynamics of mixed quantum states and general processes [93,94]; (c) implementing the quantum natural gradient optimisation approach in variational quantum eigensolvers and in other variational quantum algorithms [95][96][97].
Our modular architecture is also well-suited for performing simulations for problems with a Hamiltonian that is modular in nature (has clusters of subsystems). In these cases, the ansatz for the variational algorithm of such problems can be implemented natively on our architecture for efficient simulations. There are many interesting physical problems of this kind [98], including problems in chemistry [99,100], many-body physics [101,102], quantum field theory [103,104], and quantum gravity [105,106]. Supporting the prospects for successful quantum advantage in such tasks, there are studies anticipating the challenges of compiling onto target hardware with specific topologies [107][108][109] and examinations of whether modules of modest size can 'punch above their weight' in simulating more complex quantum systems [3,110,111]. However, the requirements towards the quality of the communication channels may depend entirely on the structure of the problem.
It is also possible to implement quantum error correction codes on such a modular architecture. The simplest example would be code concatenation, in which each module will implement a base code, and the long range links among the modules will implement another code at the logical level of the base code. A more concrete example would be the modular way to implement large scale surface code studied in Ref. [112]. They found a high error threshold for the noise in the long-range connections among the modules even without purification. In the limit of small modules of a few qubits or a few tens of qubits, there are detailed studies of the most efficient means of supporting fault tolerant codes [24,26]. More recently, Ref. [113] established that it is possible to perform constant overhead quantum error correction in a 2D architecture with non-crossing long range connections. Though this is not a modular architecture, the long range gates that we have studied in this article will still be highly relevant to their practical implementation in silicon.

VI. DISCUSSION AND CONCLUSION
In this work we have investigated the practicality and utility of an interlinked multicore architecture for quantum computing. We have focused on silicon-based devices where the paradigm is particularly relevant: it is natural to consider a large number of independent, noisy quantum cores due to their inexpensive fabrication process and the small surface area required by a given modest-sized processor. Recognising that inter-core operations may be fundamentally more noisy than intracore processes, we eschewed the goal of performing direct unitary two-qubit quantum gates between remote qubits. Instead we focused on distributing Bell pairs between the processors. This allowed us to use 'weak', noisy entangling links for the purpose of generating raw Bell pairs, which can then be upgraded to far higher fidelities using powerful entanglement distillation techniques.
We briefly reviewed the experimental literature on physical realisations of potential 'quantum links' that have already been demonstrated in state-of-the-art experiments: We identified cavity-and shuttling-based techniques as the most promising ones for our proposal. We have comprehensively explored advantages and limitations of both techniques in terms of numerical simulations. In our modelling we identified and accounted for the leading error and noise contributions in the longrange entangling links. As such, even when assuming parameter regimes that are realistic in current experiments, our numerical simulations confirm that both techniques (in combination with entanglement distillation) would support Bell pair generations between distant quantum cores with fidelities about 99.5%. Our simulations also confirmed that slightly increasing the number of rounds in entanglement distillation allows us to obtain nearperfect Bell pairs approaching error levels of local, intracore operations. To inform our models with realistic parameters, such as relaxation rates or gate and measurement times, we have used the best reported experimental values; even though there is no single experiment that would achieve all those values at once, we note that they are very much moving targets and the numbers are constantly improving.
We reviewed a number of potential applications which build on a modular computational architecture. Most importantly, we identify that the recently introduced Error Suppression by Derangements [11] and Virtual Distillation [12] techniques fit well with our proposal, since these error mitigation schemes assume that n copies of a quantum computation are performed independently in separate cores. Through our long-range links, the copies are entangled via a derangement operation which bridges the n computational cores and this allows us to mitigate errors exponentially in n. The great advantage of our proposal is that the long-range links are formally part of the main computation and thereby all experimental imperfections are mitigated by the derangement operation.
We numerically simulated such a practically relevant application: a VQE solving the ground state of a spin system. These simulations confirm that even with Bell pairs of fidelity 99.5%, as achievable with current technology, we can impressively suppress errors of local quantum gates in the main computation and thereby obtain accurate expectation values of observables in practical settings. We have also identified a number of other potential applications, such as the SWAP test, which are highly relevant in the context of both near-term and error-corrected quantum computations.
We finally conclude that long-range entangling operations that have already been realised in experiments can be used for linking a multitude of silicon-based computational quantum cores. Such a modular architecture would enable a variety of near-term applications and, in particular, would enable powerful error mitigation techniques to be implemented in silicon devices. While in the present work we considered silicon architectures, we remark that these concepts naturally generalise to other platforms as well: distributing and purifying Bell pairs between quantum hardware will enable powerful error mitigation and other applications in other platforms via the techniques outlined in this work.
Note added -We note that recently a paper appeared [arXiv:2202.11793] that investigates in great detail shuttling arrays as in the present work and reaches even more encouraging fidelity estimates. During the process of finalising this publication a paper appeared [arXiv:2208.11151] that investigates cavity-based links between silicon spin qubits in the resonant regime similarly as in the present work and finds similar conclusions. Parameter regime In our modelling we have explicitly-and numerically exactly-simulated the unitary dynamics generated by the Hamiltonain H in Eq. (3) in a resonant parameter regime, i.e., we did not use a rotating frame approximation. To ensure numerical stability we numerically exactly exponentiated the matrix −itH after projecting the bosonic creation and annihilitation operators into the subspace spanned by the first seven Fock states (eigenstates with a fixed number of photons). Furthermore, we computed the matrix −itH explicitly assuming a set of dimensionless parameters and the resulting optimised values are reported in Table III. Recall that given a Hamiltonian H = aH 1 + bH 2 in terms of the dimensionless parameters a and b we can map the dynamics under the dimensionless time t to any physical set of parameters via the symmetry e −it(aH1+bH2) = e −it/λ(λaH1+λbH2) = e −it (a H1+b H2) .
Here we have introduced a new set of parameters a = λa, b = λb and t = t/λ, and the arbitrary scaling factor λ. We choose a possible scaling factor λ := 100 MHz such that the resulting optimal phyisical parameters fall into a regime that can be realised with near-term technology (see [114] for the possibility of reaching large chargecavity couplings). Note, however, that the optimal unitary dynamics can be mapped to any other regime, e.g., choosing λ := 200 MHz allows a faster generation of Bell pairs (t = 7.5 ns) but requires the fabrication of twice as large couplings (and all other dymanical parameters twice as large).
While the arbitrary choice of λ is an exact symmetry of the unitary dynamics, we can introduce a pseudosymmetry. As discussed above, we explicitly take into account and simulate the effect of the magnetic field B which is by orders of magnitude larger than all other dynamical parameters. Given that one could make the rotating frame approximation and remove this interaction without significantly affecting the dynamics, in numerical simulations we have confirmed that indeed increasing or decreasing the value of B by a small factor, e.g., 2 or 3, does only affect the resulting dynamics to a very small extent. This ensures us that an experimental system is robust to B is the parallel magnetic field used to lift the spin degeneracy, bx,1 (bx,2) is the transverse magnetic field above the first (second) DQD which hybridises the spin and charge degrees of freedom, gc,1 (gc,2) is the dipole coupling constant between the cavity and the first (second) DQD and 1 ( 2) the detuning.
the choice of the B as long as the resonant condition is satisfied.

State of the cavity after charge measurements
To be able to reuse the cavity straight after having generated a Bell pair, we need to end up the protocol with the photon in the vacuum state. One possibility would be to measure the charges and the cavity, and deem the result as successful if we are both measuring the charges in either the state |LR or |RL and the cavity in the vacuum state. However, this would have led to a decrease of the success probability compared to the case when we only perform charge measurements. Fortunately, after a successful charge measurement, the probability to find the resonator in the vacuum state is high ≈ 0.96 (see   12). Even though the probability to be in the zero state for the cavity is high, one may need to consider the cavity heating up during consecutively repeated Bell pair generations. Nevertheless, given cavities in current experiments have typical decay rates ≈ 1MHz [41], dissipation should occur on timescales that is smaller than a single round of entanglement purification. Furthermore, the cavity is only used for a very small fraction of the procedure -allowing for cooling protocols to be periodically performed if necessary.

Lindblad master equation
The evolution of an d-dimensional quantum system in the state ρ with time is given by the Lindblad master equation, where γ i are positive rates, and A i Lindblad terms. Given the Hamiltonian is time independent we can exponentiate the superoperator L, and the above equation can be rewritten as We can use standard numerical techniques for the exponentiation [115] if we represent ρ and L as a vector and a matrix respectively. To do so, we need to flatten ρ, meaning that we will stack the different rows one after the other. ρ becomes then a 1 × d 2 vector. The superoperator L becomes a d 2 × d 2 matrix, by applying the following rules, where on the left-hand side we have the operation in the operator formalism and on the right-hand side the equivalent operation in the matrix-vector formalism. Eq. (A1) becomes then,

Time evolution
There are two ways to numerically simulate Eq. (A2). One way consists in recomputing the exponential e Lt for each time t and apply it to ρ(0). We discarded this method as exponentiating L, which is represented by a (12544 × 12544) matrix, is time-consuming. The other way relies on time discretization. In this case, as we only compute once the exponential for a small time step dt, and apply it sequentially to propagate the state, the simulation is much faster. For large time t, the first method is more accurate. However, in our case both techniques give similar results, reinforcing our choice to use the second method.

State prior to measurement
We start the dynamics with the initial quantum state as |ψ init = |−− ch |↑↑ sp |0 cav , such that the charge de- Energy (μeV) grees of freedom are in the separable state |−− ch , the spin degrees of freedom are in the separable state |↑↑ sp and the cavity is separable in the vacuum state |0 cav . Given we assume a resonant condition with the parameters B 1 = B 2 = ω r ≡ B as well as 2t c,1 = 2t c,2 = 2t c ≡ B with no detuning = 0, the expected energy of this initial state is exactly E = 0. Furthermore, for any other state we obtain from this initial state by pairs of "flips", e.g., |−+ ch | ↑↓ |0 cav , we obtain an identical energy E = 0 given any flip represents an energy quantum of ±B. Assuming no decoherence, at our optimal evolution time in Fig. 3 we obtain the quantum state where with a high fidelity F ≈ 0.89 we obtain our desired entangled state with a Bell pair |Ψ − = (|↑↓ −|↓↑ )/ √ 2 between the two spins up to some coherent error |ψ err .
As we discussed in Section II C this coherent error FIG. 14.
Energy-level diagram of H2e from Eq. (B2) in the positive-detuning region where electrons only occupy the ground orbital wavefunction. Here, U = 1 meV, t = 5 µeV, EZ,L = 51 µeV, EZ,R = 49 µeV, EV,L = 90 µeV, and EZ,R = 110 µeV. In this experimentally realistic regime, ∆EZ > 0 > ∆EV and |∆EV | > |∆EZ |, although the Zeeman difference here has been exaggerated to enhance the visibility of distinct traces. The state labels are ordered as |vs L |vs R . The highlighted labels indicate the 4 states where the right dot is initially populated with an ancilla groundvalley/ground-spin state, and the left dot is populated with a shuttled spin-valley state. Of these 4 states, only those with the left dot electron in the excited valley state (red) will adiabatically evolve to (0,2), while the states in the ground valley (blue) will remain in (1,1). Therefore, such a double-dot arrangement along with a proximal charge sensor can be used to perform a valley-to-charge projective measurement while preserving the spin state.
is present because we have fewer parameters of control in the Hamiltonian than the number of constraints the states need to satisfy in order be usefully entangled. Given the desired state above has energy E = 0, as it can formally be obtained by an even number of energy flips from the initial state, the coherent error state |ψ err must also have energy E = 0 due to energy conservation under a unitary evolution. While |ψ err is a complex quantum state that is a superposition of nearly all charge and spin configurations, we find that the dominant contribution in |ψ err is the state which occurs with a probability F ≈ 0.05 as the fidelity with respect to the full state |ψ . Indeed this state has energy E = 0, however, given the charge is entangled as |Φ − ch = (|LL −|RR )/ √ 2, energy conservation forces the spins into the separable state |↓↓ sp while the cavity also obtains a photon.
This motivates our measurement-based protocol: we measure the cavity state and only accept the outcomes |LR or |RL which does not affect the ideal contribution of the state in Eq. (A5), however, it eliminates the dominant coherent error contribution from Eq. (A6). This necessarily increases entanglement between the spins given the spins are separable in the error contribution in Eq. (A6) that we eliminate. Furthermore, the probability of an empty cavity is also increased given the state we project out has a non-empty cavity.

6.
Alternative purification protocol By adding single S gates to the three last rounds of purification, one can obtain a better purification scheme than the one presented in the main text for the case of the cavity-based approach. In Fig. 13, we note that the concurrence is slightly improved in the ideal case. However, in the noisy case, we increase the noise (and thus lower the concurrence) by adding these new gates. With better hardware, one would transition from the scheme presented in the main text to this one.

Average generation rate
Assuming a raw Bell pair is generated with time T raw = 15 ns and we neglect gate and measurement times in the purification process, the time to generate a purified Bell pair is T raw N avg where the average number of required raw Bell pairs is N avg . More realistically, the average time for generating a single purified Bell pair is which depends on the number of single-qubit gates N 1qb,avg = 13.6, on the number of two-qubit gates and on measurements N 2qb,avg = N meas,avg = 6.8. In these figures we assumed T 2,c = 400 ns and we have divided N 1qubit,avg by 4 and N 2qubit,avg , N meas,avg by 2 as the four single-qubit gates, the two two-qubit gates and the two measurements used in a purification round can be performed in parallel. The generation rate is then given by g Bell = [T avg ] −1 . Upon substituting typical state-ofthe-art values for one-, two-qubit gate and measurement times as T 1qubit = 1 µs, T 2qubit = 100 ns, T meas = 1 µs [77,[116][117][118], we obtain an average generation rate of g Bell ≈ 0.14 MHz. As we discussed in the main text, for smaller values of T 2,c , one needs to perform four rounds of purification. On average, this corresponds to N avg ≈ 25.4 and 33.0 raw Bell pairs to successfully go through the four rounds, leading to average generation rates of g Bell ≈ 2.6 MHz and g Bell ≈ 2.0 MHz (g Bell ≈ 24.9 KHz and g Bell ≈ 20.8 KHz by incorporating gate time and measurement time) for T 2,c = 100 ns and T 2,c = 50 ns respectively.
Appendix B: Shuttling

Model Parameters and Details
For all shuttling simulations, a Zeeman splitting of 40 µeV was used, corresponding to an external magnetic field strength of approximately 350 mT which is comparable to many silicon spin experiments. The inhomogeneous magnetic field b may be decomposed as a tranvserse gradient b x and a longitudinal gradient b z with no loss of generality. In order to minimise coherent spin flips, shuttling should take place along an axis with minimal b x . As field gradients on the micron scale should be nearly constant, we assume the total inhomogeneity is evenly distributed down the shuttling channel with electrons only moving in a linear path and we consider b x = 1 µeV, b z = 3 µeV.
While large valley splitting variation may be expected in devices at present, substantial effort is being placed on improving heterostructure quality such that future devices may have consistently large ground state gaps. In order to emphasize the importance of the valley phase parameter, we assume consistently high valley splittings by randomly generating them from a normal distribution with mean |∆ D | = 75 µeV and standard deviation SD |∆| = 10 µeV. Such a case should apply to both Si/SiGe and Si-MOS devices. We focus on tunnel couplings t c larger than 20 µeV, as these have been experimentally reported for silicon charge shuttling [63]. A tunnel coupling of order 100 µeV was reported in [65], suggesting that even larger values are realistic.
In order to simulate the evolving entanglement between two spins labelled using 1 and 2, a second spin is added to the Hamiltonian of Eq. (6): The Zeeman splitting of the stationary spin, B 2 , is inconsequential, as we may work in the rotating frame of the second spin such that its dynamics are trivial. We assume that there is negligible residual exchange interaction between the two spins, and Coulomb interaction between electron charges may be accounted for separately. The shuttled electron always begins in the ground valley-orbit state |L− . After the detuning sweep has completed, a following fast, deep detuning pulse is implicitly assumed, such that deterministic charge transfer is assured. As described in the main text, any population in the excited |L states will evolve into the target dot. We assume that this secondary charge transfer is adiabatic, and that subsequent relaxation from excited orbital states preserves both spin and valley, such that the operation is described by the partial trace of the orbital degree of freedom. We point out that different assumptions could be taken at the expense of introducing additional microscopic parameters into the model. The spin-valley state |sv is then re-initialised as |Lsv prior to the next detuning sweep.

Valley Projective Measurement
For our analysis, we assume the spin-valley state is well defined throughout the shuttling protocol. In order to fit the protocol within a larger spin-based quantum computation, we propose projecting the valley state into the ground valley state. We make use of an ancilla dot which is populated with a single electron initialized in the lowest-energy state |− ↓ , and describe the total twoelectron two-dot system with the Hamiltonian: Once again, describes the detuning between dots, U is the Coulomb repulsion energy attributed to adding a second electron to a single dot, t is a spin-and valleypreserving tunnel coupling between dots, E Z,D and E V,D are Zeeman and Valley splittings.n i =ĉ † iĉ i is the number operator for state i, whileĉ † i andĉ i are creation and annihilation operators, respectively. Excluded indices are assumed to be summed over. Fig. 14 illustrates how the ancilla dot may be used to implement a valley-to-charge conversion, such that the spin state of the ground valley is unaffected after postselection. We neglect the valley phase here as analyzing its effect in readout is not the present focus, although it certainly will affect readout quality [120]. We make use of Eq. (B2) to illustrate the essential physics and plot the corresponding energies in Fig. 14. We note that this is only one possible choice. Information may be better preserved by coherently manipulating the valley state directly, or it may be further damaged through the relaxation of the valley state during the shuttling operation.

Other Sources of Decoherence
For this analysis into electron shuttling, we have focused on the intrinsic ability of the unitary tunnelling operation to populate excited valley-orbit states and thereby give rise to information loss. However, other external sources may also damage information. This includes the hyperfine coupling of the electron spins to residual 29 Si nuclear spins, residual coupling to reservoirs, and ubiquitous 1/f charge noise. We expect the effect of hyperfine dephasing to be minimal, since the timescale of shuttling is orders of magnitude smaller than 10-100 µs T * 2 times reported in purified silicon [121]. Similarly, coupling to reservoirs may require hundreds of tunnelling events to have a measurable effect [69].
To quantify the effect of charge noise, we adapt the technique used in [59] by adding 1/f charge noise to the detuning sweep (t) generated by combining 1000 Ornstein-Uhlenbeck processes and averaging over 100 cases. The power spectrum is normalized with respect to the S(1 MHz)= 10 −6 µeV 2 /Hz reported in the charge noise spectroscopy of a present-day Si/SiGe heterostruc- Quantum teleportation with injecting an other Bell pair than the required |Φ + is equivalent to applying a Pauli operator σα to the teleported qubit state as ±σα|ψ where the Pauli operator maps between the two Bell states as |Φα := σα|Φ + . Due to linearity, using an imperfect, incoherent mixture of input Bell states for teleportation is equivalent to applying a Pauli noise channel to the teleported single-qubit state. Additionally applying twirling guarantees that this error channel is a single-qubit depolarising channel.
ture [122]. From Fig. 15, we can see that this magnitude of charge noise adds a negligible correction to the anticipated unpurified and purified concurrence estimates. We note that charge noise may play a much larger role when using detuning sweep speeds substantially slower than 300 µeV/ns or tunnel couplings smaller than 10 µeV.
Appendix C: Error Suppression

Quantum teleportation with imperfect Bell pairs
Let us first show the following useful identity. Quantum state teleportation can be realised by consuming a Bell pair |Φ + := (|00 + |11 )/ √ 2. If we instead inject any one of the three other Bell pairs it will result in a Pauli operation on the teleported qubit. In particular, we can define the four Bell states in terms of Pauli transformations as |Φ α := σ α |Φ + where σ α are Pauli matrices with α ∈ {0, 1, 2, 3}. It is now straightforward to show the identity in Fig. 16 It immediately follows from linearity that injecting an incoherent mixture of Bell pairs with inhomogeneous weights p α (probabilities) as and performing the teleportation results in a qubit state that has effectively undergone an inhomogeneous Pauli error channel as Pauli(ρ) := (1 − p 0 )ρ + p 1 XρX + p 2 Y ρY + p 3 ZρZ.
It also follows from linearity that if we inject a coherent superposition of Bell states |Φ coherent := 3 α=0 c α |Φ α then it shows up as a coherent transformation of the final state as U |ψ .
One might additionally apply twirling techniques to the Bell pairs such that the output state is a Werner state [123], i.e., all three erroneous Bell pairs appear with identical probabilities resulting in an effective single-qubit depolarising channel acting on the teleported qubit state. These twirling techniques randomly apply Pauli operators σ α to the input Bell pair such that the ideal state |Φ + is left invariant while the erroneous Bell states are mapped onto each other.
Since the ESD/VD error suppression techniques are oblivious to coherent errors, it is important that the prepared/distilled Bell states are of the form of Eq. (C1). For this reason we require that twirling techniques [123] are implemented: This requires a minor overhead of applying local single-qubit operations randomly to the input Bell pairs but in return they guarantee that imperfections are incoherent as in Eq. (C1). Consequently we can model the teleportation process implicitly as single qubit Pauli error channels applied to the teleported single-qubit states.

Numerical simulations
In Fig. 11(c) we simulate the spin-ring Hamiltonian in Eq. (10) and aim to determine its ground state using the Variational Hamiltonian Ansatz: this ansatz consist of alternating layers of time evolutions under the Hamiltonians which we define as σ k · σ k+1 , via H = H 0 + H 1 . We start the optimization from the ground state |ψ init of the diagonal Hamiltonian H 0 which we have analytically determined [11]. We then apply alternating layers of the parametrised evolutions A(γ k ) := e −iγ k H1 and B(β k ) := e −iβ k H0 to this initial state as |ψ(β, γ) = B(β l )A(γ l ) . . . A(γ 2 )B(β 1 )A(γ 1 ) |ψ init , using overall l layers. In typical applications the parameters β and γ are optimised by a classical co-processor such that the experimentally estimated energy E := tr[ρH] is minimised.
We simulate a quantum device that can natively implement controlled-Z gates between any pairs of qubits as well as single qubits. As such, we remark that we do not take into account the additional complexity imposed by connectivity constraints: one needs to apply SWAP operations to overcome limited connectivity. However, these operations are part of the main computation and their imperfections are fully mitigated by the derangement circuits. Furthermore, we expect their complexity can be negligible when compared to the complexity of the main computation (almost all terms in our Hamiltonian are nearest-neighbour interactions).
We have determined the ground state energy to a precision ∆E = 10 −4 using l = 20 layers of the ansatz by optimising parameters in a noise-free scenrio. We use these parameters as input for our noisy simulations. In particular, we assume a noise model where single-qubit gates are followed by single-qubit depolarising events, while 2qubit gates are followed by 2-qubit depolarisation. The gate error of 2-qubit gates is taken 5-times larger as of single-qubit gates as explained in the main text.