Spiderweb array: A sparse spin-qubit array

One of the main bottlenecks in the pursuit of a large-scale--chip-based quantum computer is the large number of control signals needed to operate qubit systems. As system sizes scale up, the number of terminals required to connect to off-chip control electronics quickly becomes unmanageable. Here, we discuss a quantum-dot spin-qubit architecture that integrates on-chip control electronics, allowing for a significant reduction in the number of signal connections at the chip boundary. By arranging the qubits in a two-dimensional (2D) array with $\sim$12 $\mu$m pitch, we create space to implement locally integrated sample-and-hold circuits. This allows to offset the inhomogeneities in the potential landscape across the array and to globally share the majority of the control signals for qubit operations. We make use of advanced circuit modeling software to go beyond conceptual drawings of the component layout, to assess the feasibility of the scheme through a concrete floor plan, including estimates of footprints for quantum and classical electronics, as well as routing of signal lines across the chip using different interconnect layers. We make use of local demultiplexing circuits to achieve an efficient signal-connection scaling leading to a Rent's exponent as low as $p = 0.43$. Furthermore, we use available data from state-of-the-art spin qubit and microelectronics technology development, as well as circuit models and simulations, to estimate the operation frequencies and power consumption of a million-qubit processor. This work presents a novel and complementary approach to previously proposed architectures, focusing on a feasible scheme to integrating quantum and classical hardware, and significantly closing the gap towards a fully CMOS-compatible quantum computer implementation.


I. INTRODUCTION
Semiconductor quantum dots [1], particularly in silicon [2], are attractive hosts for spin qubits in large-scale quantum computation applications, because of their assumed compatibility with conventional CMOS integration processes. In addition, the ∼ 100 nm spatial dimensions intrinsic to semiconductor spin qubits provide the potential to pack many millions of qubits inside a single quantum processor chip. The last several years have seen significant progress in spin-qubit research that resulted in the demonstration of long coherence times [3], high-fidelity single- [3][4][5] and two-qubit gates [6,7], quantum algorithms [8], quantum non-demolition measurements [9,10], and electron spin [11] and charge [12,13] transfer.
Scaling to millions of qubits has many technological hurdles that will need to be cleared. Among them, the wiring bottleneck is a very challenging one for any solidstate qubit controlled by electrical signals [14][15][16]: cur- * l.m.k.vandersypen@tudelft.nl rently at least one wire must run from the control electronics to each and every qubit, but chips have strict interconnect wiring density limitations, making a onewire-per-qubit scheme unfeasible in the large scale. This was already an issue when classical processors started to scale up, leading to the development of techniques such as cross-bar addressing, that enabled the number of output pins T to increase at a rate much slower than the number of transistor unit cells U on the chip. This was formally captured by Rent's rule: T = c U p , where c is the number of connections per unit cell, and p is Rent's exponent, a measure of optimization in the wiring scheme [15].
One approach to reducing Rent's exponent in quantum-dot-based spin qubits [17,18], involves addressing 2D quantum dot arrays of N 2 qubits using ∼ N word and bit lines. These cross-bar addressing schemes impose strong requirements on the homogeneity of the quantum dot potentials across the array, which in itself is a great technological hurdle. Furthermore, they are inefficient in the sense that many operations can only be executed sequentially.
A second approach is to decrease qubit density by using long-range quantum links that connect modules of qubits [14,19]. Semiconductor spin qubits provide mech- anisms to control the length-scales required for qubitqubit interactions over a wide range, allowing to configure the desired density while still maintaining very large qubit arrays on chip. With this scheme, the distribution of the space on the chip can be customized for the integration of local, classical control electronics aimed at solving the homogeneity and operation efficiency issues. Expanding on this potential route to solving the wiring bottleneck, we present and analyze the spiderweb array, a sparse 2D spin qubit array with single-qubit nodes separated by gate-based shuttling channels [20]. In this analogy, the qubit array resembles the open structure of a spiderweb, with vacant space between spiders (spin qubits) that move along the threads of their web (shuttling channels). We will focus on how to use the open area of the sparse array to integrate classical control electronics with quantum hardware, in order to minimize the need for off-chip interconnects, resulting in a scalable Rent's exponent. Different from many proposal papers, we make a rather extensive initial effort to assess the feasibility of this classical-quantum integration, by considering several relevant practical aspects, without aiming or claiming to be exhaustive in this assessment. Another aim of this work is to identify immediate technological development opportunities that can be prioritized in order to operate a large-scale spiderweb array as a quantum processor.
We first describe in Sec. II the basic physical architecture of the qubit array and the implementations of the operations required to sustain the surface code. In Sec. III we describe the integration of on-chip classical electronics at different layers within the architecture. This leads to a logarithmic reduction in the number of control and measurement lines from the qubit region to the chip boundaries, which is discussed in Sec. IV. We then consider in Sec. V the footprint of the local control circuits and the array sparsity required to accommodate them. Sec. VI discusses the power dissipation in different sections of the array, assuming ∼1K operation [21][22][23], and the effects this has on the operation. Finally, we put all these ingredients together to describe a feasible implementation of a million-qubit array in Sec. VII.

II. ARRAY DESIGN AND OPERATION
The overarching concept of the spiderweb array is presented in Fig. 1. A large 2D square lattice of spin-qubits constitutes the quantum plane. The spin-qubits consist of single electrons confined in electrostatically defined quantum dots in silicon. The large 2D array of qubits can be used to implement the surface code [24], by assigning qubits as data qubits or surface code (SC-) ancilla qubits in a checkerboard fashion, and allowing single-qubit operations as well as nearest-neighbor two-qubit operations. In contrast to most other spin-qubit architectures, this array is sparse, with a qubit pitch d on the order of 10 µm. This has two major implications: 1. It facilitates the local integration of classical control electronics, consisting here of sample-and-hold circuits that provide independent DC biasing of each quantum dot gate electrode, which effectively offsets inhomogeneities in the potential landscape across the array. Consequently, the array can be considered fully homogeneous allowing the majority of qubit control signals to be shared across the entire array, which significantly reduces the number of control lines at the quantum plane boundary.
2. It requires a means to transfer quantum information over ∼10-µm distances, which we implement by shuttling the qubits to operation regionswhere single-and two-qubit operations are performed along with readout.
We define a unit cell as the the smallest operational set of elements, which can be concatenated with identical unit cells to form the large 2D array. This unit cell ( Fig. 1(a)) contains 4 qubits, and has an area of (2d) 2 and a perimeter of 8d. We define modules ( Fig. 1(b)) consisting of N × N unit cells, with an area and a perimeter of Completing the array, the quantum plane ( Fig. 1(c)) consists of M × M modules (M 2 N 2 unit cells and 4M 2 N 2 qubits), covering an area and a perimeter of (2dN M ) 2 and 8dN M , respectively. As seen from Fig. 1(d), the quantum plane is designed to occupy a section of the die, with the remaining space to be used to reduce the off-chip wire count even further by adding additional onchip electronics. Module sizes for DC biasing and readout (N b and N r , respectively) may be different, as long as where M b and M r relate to the number of DC biasing modules and readout modules in the full array, respectively.
A detailed schematic of the components of a unit cell is presented in Fig. 2. The unit cell mainly consists of large linear arrays of electrostatic gates, connecting the vertices of the qubit array to the operation regions (see Fig. 2(a)). The bulk of the gates in these linear arrays make up the shuttling channels (blue gates in Fig. 2). Four phase-shifted sinusoidal signals are applied to four consecutive gates (shades of blue in Fig. 2), repeating the set of signals along the linear array to create a travelingwave potential to trap and shuttle an electron [13,25]. The sign of the phase difference between adjacent gates defines the shuttling direction.
Four gates at the vertices of the spiderweb array (red gates in Fig. 2(b)), are used to control both the confinement potential that keeps the electrons at the vertex while idle, as well as the tunneling in and out of the shuttling channels. The qubits are shuttled to and from the operation regions between the vertices (Fig. 2(c,d)) in order to perform single-and two-qubit operations, as well as readout and initialization. Fig. 2(c) shows a schematic of the gate structure used in the operation regions.
Single-qubit operations are performed via electric dipole spin resonance (EDSR) in a transverse magnetic field gradient provided by a pair of micromagnets [26]. Alternatively, the micromagnets can be omitted in qubit systems with large spin-orbit interaction [27]. After an electron is shuttled to the operation region and confined under the bottom red gates in Fig. 2(c), a microwave pulse is applied to the control gate labeled MW, to drive spin rotations. Qubit phase rotations can be achieved via geometric phase rotations [28] or by pulsing on the gate labeled J to temporarily detune the electron-spin energy via the Stark effect [3].
For two-qubit operations, two electrons from the vertices adjacent to the operation region are shuttled and confined under the red gates in Fig. 2(c,d), and a pulsed signal on the gate labeled J activates an exchange interaction between the two electron spins [1,29,30].
Most state-of-the-art spin-qubit systems use one of two spin-to-charge conversion mechanisms to read out a qubit state: 1 -spin-selective tunneling to a nearby electron reservoir (Elzerman readout) [31], or 2 -based on Pauli spin blockade (PSB readout) [2], using a second quantum dot with an ancilla electron [17,32]. The charge state of the single or double quantum dot is then identified using a charge sensing quantum dot connected to source/drain Ohmic contacts. Although both readout techniques are compatible with this architecture, we focus here on PSB readout, due to its shorter demonstrated readout times [33,34] and its compatibility with higher temperature operation [21][22][23].
PSB readout can be used to measure the parity of a spin pair [32]. If prior to the measurement, one of the qubits is prepared in a known eigenstate, the parity measurement will reveal the state of the unknown spin. Par- Note that the two-qubit gates consist of three substeps (as described in the main text), and only the implementation of the √ S w is shown. Also note that steps 3 and 13 only require shuttling of data qubit 1. allel spins are identified by their (1,1) charge state while anti-parallel states are identified by their quick relaxation to the (0,2) singlet. The singlet can be adiabatically detuned and separated into a known product state (|↑↓ or |↓↑ ). After this measurement protocol, the target qubit is left in its projected state and the RO-ancilla qubit remains in its original state. In the surface code protocol, the measured state of the SC-ancilla qubits can be tracked to perform the required decoding of errors. Alternatively, the target spin can be initialized after measurement by real-time feedback [14] or by letting the state thermalize to the (0,2) singlet after every measurement, then adiabatically detuning as described above. In the spiderweb array, the target electron-spin qubit to be read out is shuttled to the operation region and confined under the bottom red gates in Fig. 2(c), alongside an ancilla electron (RO-ancilla) prepared in a known state on the neighboring dot. A PSB readout is performed to obtain a measurement of the target qubit, then the sensing dot can be tuned to store the RO-ancilla, and the two spins can be adiabatically separated with deterministic knowledge of their spin state, provided by the field gradient between the readout and sensing dot regions.
Qubits are able to share an operation region to perform single-qubit gates and only SC-ancilla qubits have to be read out. This allows to reduce the number of gates in the unit cell by reducing some operation regions to only contain the required gates to perform two-qubit gates (see Fig. 2(d)). The arrangement of qubits and operation regions shown in Fig. 2(a) constitutes a unit cell that can be tiled to obtain a fully operational surface code lattice.
Before the start of a computation, the spiderweb array needs to be initialized by loading electrons with a known spin state onto each of the qubit idling regions. This is achieved by initializing electrons in the operation regions using the two-spin thermalization and adiabatic detuning method described above, and shuttling the electrons to the nearest idling regions. Each operation region will initialize a data qubit first (discarding the second electron), followed by the SC-and RO-ancilla electrons. After these two steps, the entire array is prepared to begin the surface code cycles, which we describe below.
We have designed the unit cell in the spiderweb array such that a cyclic sequence of pulsed signals can be used to perform the required operations to sustain an efficient surface code implementation [35], with the same sequence performed in parallel across all unit cells to sustain it in an arbitrarily large array. Fig. 3(a) shows the circuit diagram of a single surface code cycle in a unit cell (see Appendix D for details of the circuit). The circuit contains single-qubit gates denoted as R {y,z} (θ), where θ is the angle of rotation and the subscript identifies the rotation axis on the Bloch-sphere. The two-qubit operation in the schematic is a type of phase gate S p , decomposed as: where √ S w is the square-rootswap operation that is native to the exchange interaction, I is the identity operator and the superscript in the single-qubit operator indicate which qubit the operation is applied to (see Appendix D for details). Each step in the circuit is implemented by shuttling the participating qubits from their idle position to the operation region to undergo the required single-or two-qubit gate, before being shuttled back to their idle position. Note that each two-qubit gate, as shown in the schematic, consists of three rounds of shuttling to achieve the required steps in the S p gate. Fig. 3(b) shows the required qubit shuttling scheme appropriate to each step of the surface code cycle. R y rotations are achieved by tuning the amplitude, phase and duration of the EDSR microwave pulse, while R z rotations are phase rotations achieved by using one of the methods described in Sec. II. The √ S w gate is achieved by calibrating the amplitude and duration of the pulse applied to the J gate that controls the exchange interaction between the qubits (see Fig. 2(c,d)). At the end of the cycle, the SC-ancilla qubits are measured in the operation regions. The total operation time of a surface code cycle will be where t sh is the time required to shuttle an electron from the vertex to the operation region and back, t 1q is the single-qubit gate duration, t sw is the duration of a √ S w operation and t r is the time required to readout the SCancilla qubits.
State-of-the-art surface code protocols follow one of two approaches to implementing logic gates. The first, known as defect qubits [36], requires disabling a subset of SC-ancilla qubits moving these defects around the lattice while the rounds of surface code error correction continue all around them. The process of moving the defects not only involves disabling and enabling qubits in succession, but also requires measurements on individual data qubits. The other implementation, known as lattice surgery [37], divides the array into patches, separated by an idle row of interstitial data qubits. Patches are created by splitting a section of the lattice, which requires performing a measurement of the interstitial qubits and subsequently disabling them from the surface code cycles. Patches can also be merged, by initializing the interstitial qubits and reincorporating them into the surface code cycles. The spiderweb array architecture can be designed to accommodate either type of logical qubit implementation. Patches of qubits can be disabled by preventing shuttling of a subset of qubits while the rest of the surface code operations on all remaining qubits continue. Selective qubit measurement and initialization is more complex to implement, since it will require interruptions of the regular surface code cycle.

III. LOCAL CONTROL ELECTRONICS
The spiderweb array has been designed with the main intention of providing space within the qubit plane to integrate local control electronics, with the ultimate purpose of obtaining a feasible scaling factor between number of qubits and external control and measurement signals. In this section we describe in detail the implementation and function of these circuits, along with the corresponding routing of the signal lines between signalgenerating source and gates, as summarized in Table I.

A. Biasing and control signals
In principle, the array only requires the generation of control signals for a single unit cell, and that set of signals can be replicated in parallel across the entire array to sustain the surface code. In practice, setting a quantum dot array to a state where several qubits can be controlled simultaneously with a single set of control signals, requires careful calibration of the DC voltages of each gate in the array [14]. Inhomogeneities in the materials and fabrication properties will cause these calibration voltages to vary over a wide range across the lattice, requiring each qubit to have independent biasing.
To mitigate this effect, we design a sample-and-hold scheme to apply a local DC bias to the gate electrodes [38]. We define two bias voltage resolutions ∆V to accommodate different gate functionalities. For gates acting as barriers to shuttling channels (dark red gates in Fig. 2), only a resolution sufficient to maintain an electron in a quantum dot is required and therefore we can afford a coarse resolution ∆V c = 1 mV. A fine resolution ∆V f = 1 µV is required for all other plunger and barrier gates [14]. The need for DC biasing of the shuttling gates is eliminated by making the traveling wave potential large enough to overcome potential landscape inhomogeneities. As shown in Table II, there are a total of 64 gates requiring DC biasing, with an equal split between gates requiring 1 mV and 1 µV resolutions.
The minimum hold capacitance required to achieve coarse (C c ) and fine (C f ) resolutions is C c ≈ 0.16 fF (limited by the electron charge e/∆V c ) and C f ≈ 14 pF   (limited by thermal noise k B T /∆V 2 f , assuming power dissipation from the local electronics requires the operating temperature to be raised to 1 K).
The integrated electronics required to implement DC biasing consists of demultiplexers and capacitors that together form sample-and-hold circuits, as schematically depicted in Fig. 4. Local demultiplexers distribute DC voltages generated remotely (i.e., outside the quantum plane) by voltage sources which we have labeled dcDAC, to local capacitors connected to the gate electrodes. The local control electronics for a unit cell needs to be distributed within the d 2 area regions between the qubits, with a total of 4d 2 open footprint available per unit cell (see Fig. 2(a)). Therefore, in order to bias 64 gates per unit cell (as per Table II), we fit 1-to-16 demultiplexers in each open region between the qubits, which implies 4 demultiplexers per unit cell. The demultiplexers are implemented using 4-bit digital decoders, with each output activated using the 4-bit address line. Fig. 5 shows the DC biasing scheme on the unit cell, module and quantum plane levels. All demultiplexers within a module share the same input DC biasing signal ( Fig. 5(b)), and all demultiplexers in the quantum plane share the same address bus (Fig. 5(c)). The demultiplexers in a module are enabled sequentially by crossbar addressing and in turn sequentially (one by one) update each gate. This way, all modules are updated in parallel and therefore one module refresh cycle is required to refresh the entire qubit array.
AC signals (MW and pulsed) are generated remotely by sources which we have labeled acDAC. Each signal is distributed throughout the array to their respective gates and the complementary switching circuit (see ϕ AC and ϕ AC ) shown in Fig. 6 is used to combine the AC and DC components of the gate voltages.
To implement logical qubits within the surface code lattice, we have designed an additional scheme to control local patches of data and SC-ancilla qubits. Switches connected to the barrier gates surrounding the idle regions, control the tunneling of the qubits into the shuttling channel. When one of these switches is activated, the corresponding qubit will remain in the idle region while the rest of the operations in the array carry on. The signals used to control these switches are arranged in a crossbar fashion across the entire quantum plane, as schematically depicted in Fig. 7. This scheme enables the implementation of either defect qubits or lattice surgery. Defects or interstitial boundaries can be created by disabling patches or rows of qubits, respectively. Selective data qubit measurement or initialization can be performed by temporarily disabling the rest of the array and interrupting the surface code signal to perform the required operations. The crossbar addressing scheme limits the shape of the defects and patches to rectangles of arbitrary size, but meets basic surface code requirements and within those limits allows for universal control.

B. Readout signals
It is difficult to realise a favourable Rent's exponent on the measurement circuit, because each SC-ancilla qubit needs to be read out independently. In order to use a single line outside the quantum plane to measure more than one qubit, the readout protocol will require some form of multiplexing. With this in mind, we define a readout module consisting of N 2 r unit cells that share a single readout signal line for multiplexed measurements.
The simplest form of multiplexing is to read out qubits sequentially. This is done by connecting the drain contacts of all sensor dots in a readout module to a single line at the quantum plane boundary and consecutively pulsing their plungers to bring them to the low-impedance, electrostatically sensitive regime, while all other sensor dots in the readout module are in Coulomb blockade (i.e., in the high-impedance regime). A global readout demultiplexer is used for the sequential control of the sensor plungers in a readout module. This demultiplexer can be shared between all readout modules across the entire array and can be located outside the quantum plane. This method is technically simple to implement, but will be limited by the data qubit coherence time, since it will increase the total surface code cycle time (i.e., the last term in Eq. 1 becomes N r t r ).
Simultaneous readout of a number of qubits can also be achieved using other multiplexing techniques such as amplitude, frequency and/or phase modulation. The plungers of the qubits to be read out in parallel can be connected to a single pulsed signal line that activates all sensors simultaneously. The measured signal from the   common Ohmic line needs to then be demodulated to extract each individual qubit measurement.
Amplitude modulation is achieved by tuning the bias voltages of the plungers from the sensors that are read out simultaneously, such that each sensor response will result in distinct current amplitudes. The currents from all the sensors can then be added into a single output line and the total current amplitude can be used to decode the responses of all the sensors. The number of parallel readouts using this technique will be limited by the signal-to-noise ratio of the sensor response.
Frequency and phase modulation are achieved by applying RF reflectometry [39]. This technique requires the design of resonant circuits that connect to the readout SET Ohmics. The challenge with this multiplexing strategy is that in order to use multiple frequencies combined into a single output line at the qubit plane boundary, the resonant circuits will need to be implemented locally, significantly increasing the footprint requirements of the control electronics. The evolution in state-of-the-art spin-qubit measurement techniques will determine the most feasible readout multiplexing strategy that can be applied in this architecture.

IV. LINE SCALING
The local electronic circuits described above allow for significant sharing of control lines, which results in a very efficient scaling of the ratio of the number of interconnects at the quantum plane boundary to the number of lines at the unit cell level. The line scaling factors obtained for each type of control and measurement signal are summarized in Table III. We note that some additional interconnects will be required at the unit cell and module level in order to route connections to adjacent  cells.
The sample-and-hold circuits for independent DC biasing of the gates require O(M 2 b +N b ) lines at the quantum plane boundary. Concretely, at the unit cell level, four digital address lines (green lines in Fig. 5(a)) and four enabler lines (orange lines) select a specific output of one demultiplexer, which is set to the correct voltage via a single connection to a voltage source (i.e., dcDAC) outside the quantum plane. This makes a total of 9 lines required for DC biasing. At the module level only the enabler lines scale with the number of unit cells as 4N b . Meanwhile, the 4 digital address lines and the connection to the voltage source is shared between all unit cells in a module. Every module is served by one dcDAC, so at the quantum plane level M 2 b connections to dcDACs are needed, while both the digital address lines and the enabler lines are shared between all modules.
Shuttling of electrons across the array requires four signals that can be fully shared over the entire array.
Since all pulsed and microwave control signals can be shared across all unit cells in the array, a constant number of 58 of these lines (see Table II) are required at the quantum plane boundary to sustain the surface code, irrespective of the total number of qubits.
To control the switches that disable shuttling of qubits from the idle region, we propose to use x crossbars, in order to allow for x qubit patches to be simultaneously created and manipulated, therefore accommodating approximately x logical qubits. To obtain an estimate of x, we first consider that the number of physical qubits required to implement a logical qubit will depend on the maximum error correction distance d c , measured as the number of neighboring physical qubits that can have errors before the logical qubit produces an error. The maximum number of logical qubits that will fit in the array will then depend on the chosen logical qubit implemen-tation [40]: where U = M 2 b N 2 b is the number of unit cells in the array. In this line-counting exercise, we consider that each of the crossbars span the entire array, which require two horizontal and two vertical lines per unit cell. The number of crossbar lines then scales with N b and N b M b at the module and quantum plane level, respectively. This is a worst-case estimate, since we envision that it will be good enough to define crossbars over partial sections of the array, therefore reducing the total line count. In addition, the number of lines required per crossbar for lattice surgery will likely be significantly less than for defect qubits, since only the interstitial qubits need to be locally addressed.
For the readout signals, a unit cell contains two readout SETs that both require a connection to their plunger and share a single Ohmic line. A module requires a single Ohmic line, and the plunger line scaling will depend on the multiplexing technique used. Performing q sequential readouts will require log 2 q lines to address the readout decoders. Simultaneous readout of r unit cells will just require a single line that connects all the plungers together. Therefore, a readout module consisting of N 2 r = qr unit cells, will require 2 log 2 N r −log 2 r+1 lines in total for the readout scheme. To read out the full quantum plane, all M 2 r modules require their own Ohmic line, while the address lines for the readout demultiplexers are shared.

A. Module size considerations
From the total achievable signal connections at the quantum plane boundary (see Table III), it is clear that Rent's exponent will be minimized if the module size is made as large as possible (i.e., maximize N , thus minimize M ).
The main consideration for the DC bias module size N b is the required refresh rate of the sample-and-hold circuits. Leakage current from the hold capacitors will cause a drift in the DC bias voltage on the gates dV /dt [38,41]. The ratio between the voltage drift and the required voltage stability (which we assume equal to the voltage resolution ∆V ) will determine the minimum refresh rate The leakage current is very technology dependent and values are not readily available for state-of-the-art integrated capacitor technology. To date, there have only been two proof-of-principle demonstrations of sampleand-hold circuits integrated with qubit device gates, with reported dV /dt values ranging from 2 µV/s [41] to 0.1 V/s [38]. Using these values, we obtain minimum refresh rates ranging from f c = 2 Hz to 100 kHz, limited by the sample-and-hold circuits with fine resolution ∆V f = 1 µV. The module size will then set the minimum clock frequency required to run the DC biasing demultiplexers f b = 64N 2 b f c . Therefore, the maximum module size will be limited by the feasibility of distributing the dcDAC signals across the array, which becomes more difficult as f b increases. This issue will be discussed in more detail below.
The readout module size has two limiting factors. The number of parallel readouts r through a single line will depend on the feasibility of applying readout multiplexing techniques. As readout is generally the operation that takes the most amount of time, q will need to be kept to a minimum in order to restrain t sc to within the appropriate bounds of coherence that will achieve a good quantum memory.

V. FOOTPRINT
We now consider the footprint requirements of the control electronics that need to be locally integrated in the quantum plane. This is the minimum area that needs to be available adjacent to the qubits, and therefore sets the minimum qubit pitch d. The most significant contribution to the footprint comes from the capacitors that are required for the sample-and-hold scheme. As summarized in Table II, 32 gate electrodes per unit cell require a fine voltage resolution and another 32 gates require coarse resolution, which comprise a total capacitance per unit cell of ∼450 pF. Assuming ∼1 pF/µm 2 (using stateof-the-art deep-trench capacitor technology [42]), we estimate a total capacitor footprint A c ≈ 450 µm 2 . In addition, we modeled a decoder circuit using 40-nm technology (see Appendix A for details) and obtained an estimate of the total footprint of the required demultiplexers A d ≈ 180 µm 2 per unit cell. This adds to a total footprint per unit cell of A u ≈ 630 µm 2 , which allows to set the qubit pitch to d 13 µm. Assuming a 50-nm pitch between gate electrodes, this would require linear arrays of 260 gate electrodes per lattice arm and would set the unit cell area to 4 d 2 ≈ 676 µm 2 .
The required unit cell footprint sets the qubit pitch and consequently the perimeter through which the required interconnects have to enter and leave the unit cell. To evaluate the feasibility of implementing the spiderweb array gate structure integrated with the local control electronics, we have constructed a circuit model in cadence (see Appendix B for details) with realistic parameters that includes all the essential components and connection routing scheme described here, using a total of four metal layers. To obtain an estimate of the maximum number x max,fab of crossbars that can be added for the implementation of logical qubits, we need to first estimate the maximum number of interconnect lines that can be routed across the perimeter of a unit cell N lines = 8 d N layers /∆ i , where N layers is the number of metallic layers that can be used for routing and ∆ i is the pitch of the interconnect lines. Then x max,fab = N lines /8, since each of the 4 crossbar lines will cross the unit cell perimeter twice. Assuming current numbers from the latest device roadmap report [43] N layers = 12 and ∆ i = 80 nm, we estimate x max,fab ≈ 2000.

VI. HEAT DISSIPATION
As with all quantum processors that will operate at cryogenic temperatures, it is necessary to ensure that the heat dissipated during the operation is kept to within the stringent requirements set by the cooling power available [14]. In this section we will discuss some of the main sources of heat dissipation in the spiderweb array.
Routing of the signals lines across the chip requires a high-density of metallic lines at the different interconnect levels, which results in parasitic capacitances between the lines. Any oscillating or pulsed signal on these lines will dissipate energy due to the charging and discharging of these parasitic capacitances. Using the interconnect circuit model drawn in cadence as a guide, we consider the two layers with the highest line density as a regular grid of metal lines and use this to obtain an estimate of C p = 700 fF for the total parasitic capacitance of the signal lines in a unit cell (see Appendix C.1 for details). When a capacitor C p is charged by a voltage source v p , the energy stored in the capacitor (0.5C p v 2 p ) is half the energy supplied by the source (Qv p = C p v 2 p ). The other half is dissipated by the circuit as heat by the parasitic resistance between the voltage source and the capacitor, independent of the resistance value. This is known as the dynamic power dissipation from the parasitic capacitance and can be expressed as: where v p and f p are the amplitude and frequency of the pulses applied to the lines. We note that this estimate includes power dissipated both by the signal lines and the driving circuit-which will be outside the quantum plane-so it should be taken as a worst case estimate. Next, we estimate the dissipation for the sample-andhold circuits, which comes mainly from the dynamic power consumption of the network of transistor switches driving the hold capacitors. We use the decoder model introduced in Sec. V and described in Appendix A to estimate a maximum energy dissipation of 0.35 pJ for a 4-bit decoder cycling through all 16 outputs. Using the worstcase 100 kHz gate refresh rate previously estimated, we obtain the transient power dissipation P d < 140 nW for 4 demultiplexers in a unit cell, noting that capacitor leakage rates in state-of-the-art technologies will likely make this dissipation orders of magnitude lower. Additional capacitive-load power from charging and discharging the array of hold capacitors is negligible (∼fW per unit cell).
Another important consideration is the power dissipated due to the finite resistance of lengthy lines carrying AC signals. We model the signal lines as transmission lines with finite resistance and parasitic capacitance to ground (see Appendix C.2 for details), to obtain the following expression for the power dissipation of a line running above a unit cell of length 2d = 26 µm: where v t and f t are the amplitude and frequency of the signal, respectively. The total power that will be dissipated by the operation of the entire array will then be P T = U (P p + P d + P t ). (6)

VII. EXAMPLE: A MILLION-QUBIT ARRAY
To illustrate the advantages of implementing this architecture on a large scale, let us consider a spiderweb array of 2 20 (≈ 10 6 ) qubits, or U = 2 18 unit cells.
To make a concrete assessment of the line scaling, we assume a DC biasing module size N 2 b = 1024, with M 2 b = 256 modules to complete the array. This sets the maximum clock frequency of the biasing demultiplexers to f b ≈ 6 GHz. We assume a measurement multiplexing strategy with q = 4 sequential readouts of sets of 8 SC-ancilla qubits read out in parallel using amplitude, frequency and/or phase multiplexing (r = 4). This implies a readout module size N 2 r = 16 and M 2 r = 16384. Omitting for the moment the crossbars required to implement logical qubits, and using the result in Table III, the array contains a total of c = 74 connections per unit cell and T = 16, 836 connections at the quantum plane boundary. Using the formula for Rent's rule T = c U p , we extract a Rent's exponent p = 0.43. Adding crossbar circuits will increase Rent's exponent to a maximum p = 0.5 for x 200 (see Appendix E for further discussion).
Considering that state-of-the-art qubit fidelities, require a code distance d c ≈ 16 to perform complex quantum algorithms [40], we can estimate upper bounds of x from Eq. 2, x max,dq ≈ 700 and x max,ls ≈ 1000.
The total area covered by the quantum plane of this million qubit array with local control electronics, will be (2dN M ) 2 ≈ 177 mm 2 . The remaining area on, for example, a 22 mm × 33 mm (726 mm 2 ) die is ∼550 mm 2 , and can be used to implement classical control circuits, i.e., among others the pulsed voltage sources we have described. In addition, additional levels of multiplexing can be employed to bring the off-chip wire count, typically being the real bottleneck for Rent's rule, to well below the wire count at the quantum plane boundary.
We can evaluate the total duration of a surface code cycle by estimating the operation times in Eq. 1. From recent modeling of a shuttling protocol similar to the one used here [44], we estimate that with a 50 nm gate pitch we can achieve t sh ≈ 50 ns, maintaining >99.9% fidelity. We note that a recent experimental demonstration of this shuttling protocol [13] showed shuttling fidelities >90 % for distances ∼420 nm, an encouraging initial result towards the feasibility of this protocol at larger scales. State-of-the-art quantum dot qubit systems, with operation mechanisms similar to the ones proposed here, have demonstrated Rabi frequencies and exchange couplings ∼10 MHz [5,8,45], which correspond in our system to t 1q , t sw ≈ 25 ns. In addition, those same systems exhibit dephasing times as long as T * 2 ≈ 20 µs. High-fidelity readout using Pauli spin blockade has been achieved on timescales t r ≈ 1 µs [33,34]. Assuming these operation times, we estimate an entire surface code cycle will take t sc ≈ 6 µs, more than 3× shorter than the coherence time.
Finally, we calculate the power dissipation from the sources described in Sec. VI. To estimate the power dissipated from the parasitic capacitance of the interconnects, we first note from Table III that the bulk of lines that need to be routed above the unit cell correspond to pulsed signals for qubit operations. These signal lines will need to be activated on average ∼ 6 times per surface code cycle, from which we estimate f p ≈ 1 MHz. Assuming v p ≈ 1 V of pulse amplitude on these lines, we can use Eq. 4 to estimate U P p 90 mW for these lines. The biasing multiplexers will dissipate U P d 40 mW for the entire array. We use Eq. 5 to calculate the power dissipation of the lossy transmission lines carrying the higherfrequency signals. The MW signals used for single-qubit control are relatively low amplitude and pulsed with very short duty cycle, so their power dissipation will be negligible. A larger contribution will come from the lines that carry the signals for the shuttling channels, since they will be nearly continuously activated. Assuming a voltage amplitude v t = 1 V and frequency f t = 1 GHz, we estimate U P t 0.3 mW. These estimates suggest that the parasitic capacitance and the local demultiplexers will be the main contributors to the total power dissipation in the array, with a million-qubit spiderweb array expected to dissipate on the order of 100 mW of power. This is well within the cooling power capabilities for 1 K operation using state-of-the-art cryogenic technologies. We can also estimate how well this power can be transferred from the chip to the cryogenic environment with currently used heat sinking techniques. Recent work [46] studying a CMOS integrated chip in a cryogenic environment (3 K) found that a circuit dissipating 200 mW increased the chip temperature by 7 K over an area of ∼ 4 mm 2 . Considering that the qubit plane area is 40× larger than that, we can roughly estimate ∼ 0.2 K of self heating in the qubit plane of the spiderweb array.

VIII. DISCUSSION
The goal of this work is to provide insight into the feasibility of implementing a quantum hardware archi-tecture with integrated local control electronics. We have shown that, with reasonable assumptions regarding quantum and classical fabrication and measurement techniques, a million qubit array made from spins in quantum dots is achievable. We have used the state-of-the-art in quantum-dot qubit development to consider a range of technical implementation aspects, and will now discuss some of the open questions that will need to be addressed based on how the technology evolves. As with most other electrically controlled solid-state qubit architectures, the distribution of high-frequency signals is not trivial, with potential issues such as standing waves and signal synchronization across the array. Some additional local circuitry will likely be needed to solve these issues.
Furthermore, qubit operation performance could be significantly improved with significant restructuring of the array topology, to place micromagnets at the qubit idle regions and add a fast gate to perform single-qubit gates. This would reduce the number of shuttles per surface code cycle, allow for dynamical decoupling during idle times to extend coherence times and correct phase errors that may arise, e.g., from the shuttling process.
If the number of cross-bars becomes the limiting factor to Rent's exponent, it would be beneficial to move the DC-biasing and readout demultiplexers outside the quantum plane. This would reduce the footprint and in turn relax the shuttling performance requirements.
This work builds on previous large-scale qubit architectures based on quantum dots, focussing on the implementation of integrated classical and quantum electronics. We believe it will help identify the key areas of technological development required to shortcut the path towards building the future generation of quantum computers.

ACKNOWLEDGMENTS
We would like acknowledge Simon Devitt and Ramon W.J. Overwater for useful discussions.
This work was supported by Intel Corporation and the Early Research Programme of the Netherlands Organisation for Applied Scientific Research (TNO) with additional support from the Top Sector High Tech Systems and Materials.

Appendix A: Demultiplexer design
In order to estimate the footprint and the power consumption of the demultiplexer used to route the DC biasing to the gates in a unit cell, we model a 1-to-16 demultiplexer using a 4-bit digital decoder driving an array of 16 switches. As explained in the main text, each unit cell accommodates 4 4-bit demultiplexers to bias 64 gates, so that the demultiplexers can be easily placed in the 4 free areas within the quantum-dot grid.
The switches can be implemented as a simple transistor switch (NMOS or PMOS) or by a pass gate (NMOS and PMOS in parallel) depending on the voltage levels expected by the gates, the demultiplexer supply and the threshold voltage of the transistors in the employed technology. Due to the increased threshold voltage at cryogenic temperatures with respect to room temperature, a dead zone for voltage biasing around mid supply may appear, in which even the pass-gate impedance could be too high to allow proper biasing of the gates in the unit cell [47]. Possible alternatives could then be the well-known bootstrapped switches or the use for thickoxide transistors driven by level shifters (or eventually driven by a thick-oxide decoder). Noting that the abovementioned design choices strongly depend on the required DC biasing levels, we assume ∼ 1 V outputs and design a standard thin-oxide decoder in a nanometer CMOS technology driving single-transistor switches. Independent of the design choice, we expect the effective footprint and energy to be well within an order of magnitude of the reported estimates.
A 4-bit decoder enabled by the combination of a row and column address (see Fig. 4) has been designed in verilog and synthesized using a commercial TSMC 40nm CMOS process. After place-and-route, the decoder occupies an area of 36 µm 2 . By budgeting 25% extra area for the switch array, each demultiplexer would occupy 45 µm 2 in 40-nm CMOS, and the total footprint required to fit 4 demultiplexers in a unit cell will be A d = 180 µm 2 .
To estimate the power dissipation, the decoder binary input has been swept from 0000 to 1111, leading to an energy dissipation between 0.2 pJ and 0.35 pJ from a 1-V supply for the whole 16-phase cycle for a decoder load ranging from 1 fF to 10 fF to emulate the switch input capacitance. A unit cell containing 4 demultiplexers will then dissipate a maximum energy of 1.4 pJ. This estimate uses the standard room-temperature device models and includes the parasitics extracted from the layout but excludes the power required to drive the demultiplexer inputs.

Appendix B: Cadence drawings
In order assess more concretely the feasibility of implementing the local classical control electronics described in the spiderweb array, we have drawn a unit cell with all its elements using cadence circuit design software. Although we have used realistic gate pitch and width dimensions (80 nm) for the densest part of the array, the drawings do not strictly follow design rules and are not optimized, as they are intended as an illustrative feasibility exercise. Additionally, we have not implemented the connection that will be required for readout (these will depend on the exact implementation) or the lines connecting the dcDACs outside of the quantum plane to the local demultiplexers. Footprint is reserved for the local demultiplexers themselves based on the design 1 μm described above, but they are not implemented explicitly. We have not drawn the two-dimensional electron gas (2DEG) channels that run underneath the end of the gate electrodes, which are electrostatically depleted to create quantum dots. For all the drawings presented we use the following color convention: metal layers from lowest to highest are blue, red, green and pink, while vias connecting different layers are represented by squares colored cyan, orange and purple, connecting to the second, third and fourth layers, respectively. White areas represent doped regions. Fig. 8 shows the drawing of a ∼6.5 µm shuttling arm. Identical copies of these shuttling arms link every qubit idling and operation regions to form the spiderweb array. The 2DEG channel runs horizontally in the top of the image below the tips of the blue gate electrodes. Fig. 9 show the two types of operation regions, as shown schematically in Fig. 2 of the main text. In these images, the 2DEG channels run below the tips of the blue gate electrodes, along the horizontal white line in the center. To combine DC biasing signals with AC pulses, the quantum dot gates are connected to DC biasing capacitors in a sample-and-hold scheme, as explained in detail in the main text. Coarse-resolution capacitors are visible in Fig. 9 (yellowish structures). For simplicity, we have represented these capacitors here by parallel plate capacitors, although we envision to use more advanced capacitor types that have a higher capacitance per unit area. The capacitor footprint used in these drawings (and not the parallel plate capacitance that it represents) is in agreement with our estimates in the main text. The gate electrodes for which their capacitor is not seen in the image, connect to larger, fine-resolution capacitors that are not shown here, but are visible (as larger greenish hatched squares) in Fig. 12. The green wires from the third metal layer carry the AC signals and are routed to the corresponding gates in neighboring unit cells. In the top part of Fig. 9(a) show two doped square regions in white that serve as source and drain reservoirs for the sensing dot, as well as electron reservoirs for initialization of the spiderweb array. The other doped regions in Fig. 9 are part of the capacitor structures. Connections to the boundary of the quantum plane for any of these doped regions are not implemented. In the bottom part of Fig. 9(a) a short connection via the second metal layer (red) is required to by-pass another metal line before connecting to the third metal layer (green).
The qubit idle region is surrounded by four shuttling arms, as shown in Fig. 10. The four gates directly adjacent to the qubit idle region act as gate keepers that   control the transfer of electrons into the shuttling arms. They are activated through a transistor structure arranged in an AND configuration. Two of these transistors are visible in the top right and bottom left of the images, while the remaining two transistors are outside the field of view. The fourth metal layer (pink) connects the transistors to the gate electrodes, as is visible in Fig. 10(c). The second ring of gate electrodes around the qubit idling region (i.e., the first set of shuttling gates) connect to a line in the second metal layer (red), visible in Fig. 10(a). The subsequent three rings of gate electrodes that make up the rest of the shuttling set connect to three lines from the third metal layer (green), visible in Fig. 10(b). The horizontal green wires that are visible in Fig. 10(b) connect the shuttling signals to the next qubit idle regions, and the vertical pink wires in Fig. 10(c) connect the AC signals in adjacent unit cells together. The idle regions for ancilla and data qubits are very similar, with minor differences to allow for the crossbar addressing to prevent data qubits from being shuttled in the surface code operation. The images shown here correspond to the ancilla qubits. Fig. 11 progressively shows larger areas of the cadence drawing, leading to the drawing of the full unit cell in Fig. 12. All layers are included in both figures. Both panels of Fig. 11 are centered on the qubit idling region. The horizontal (red/pink) and vertical (green/pink) wires connect the AC signals in adjacent unit cells together. The open areas visible in the corners of Fig. 11(b) and as nine green open squares in Fig. 12, are reserved footprint for the local demultiplexers used for DC biasing. The greenish hatched squares on the perimeter of the full unit cell image are fine-resolution capacitors for the sample-and-hold scheme.
We have drawn the unit cell, with a qubit pitch of ∼13 µm, in such a way that it can be tiled horizontally and vertically to form modules, with lines ending at each edge of the unit cell aligned to connect to their respective counterparts at the opposite edge of the adjacent unit cells, and the ancillary structures (such as the fineresolution capacitors) neatly complement each other.
Appendix C: Heat dissipation estimates

Parasitic capacitance
To assess the heat dissipation of the signal lines, we consider the dominant contribution from the parasitic capacitance caused by having metallic lines next to each other, as well as overlapping across layers. Here we will estimate this capacitance based on simplified models, noting that a more accurate estimate can be obtained using capacitance simulation software. We consider a grid of metallic lines such as that shown schematically in Fig. 13, consisting of a set of parallel lines in one layer, overlapped by another set of orthogonally placed lines placed on a layer above. The two main contributors to the total parasitic capacitance will be the capacitance between parallel lines in the same layer C 1 and the capacitance caused by overlapping metal regions across two layers C 2 . The total parasitic capacitance of a metallic grid with N l lines per layer is then C p = 2N l C 1 + N 2 l C 2 . We consider both the parallel plate capacitances and the fringe capacitance between metal edges, to obtain a capacitance model [48,49]: where L/W /H is the length/width/height of a metal line, d 1 is the distance between neighboring metal lines in the same layer, d 2 is the thickness of the dielectric separating two layers, is the permittivity of the dielectric between metals and v 0 is the speed of light in vacuum.
To estimate the C p of a unit cell, we consider the bottom two layers as the main contributors, since they contain the highest density of signal lines. Metal layers higher up would have signal lines separated widely, thus bringing in less parasitic capacitance. Vias that make connections between layers introduce a parasitic capacitance as well, while it would be reduced through proper layout design that avoids placing vias close to-gether. Using the drawings in Appendix B as a guide, we assume N l = 150, L = 24 µm, W = 80 nm, H = 50 nm, d1 = 80 nm and d 2 = 500 nm. The dielectric is assumed to be SiO 2 making = 3.9 0 , where 0 is the vacuum permittivity. From this we obtain the estimate C p ≈ 700 fF, used to calculate this portion of the power dissipation in the main text.

AC transmission lines
To estimate the power dissipated by lossy transmission lines, we model a segment of transmission line running above a unit cell, as a resistor R t in series with a capacitor C t to ground. The amplitude of the current drawn by the capacitor while sustaining a signal of amplitude v t and frequency f t is i c = 2πv t f t C t . This current flowing through the finite resistance will cause a power dissipation P t = i 2 c R t /2 = 2R t (πv t f t C t ) 2 . Assuming transmission lines with capacitance to ground ∼ 0.2 fF/µm and sheet resistance ∼ 0.1 Ω/ [50] with width ∼ 1 µm, we obtain the expression in Eq. 5 used in the calculation in the main text. This is a conservative estimate for our application, since the resistance of the lines is expected to reduce at cryogenic temperatures [51].
Appendix D: Square-root-SWAP as two-qubit gate in the surface code for spin qubits A scalable scheme for implementing error correction cycles in a surface code for superconducting qubits has recently been presented [35], where an effective cnot gate is applied by combining a conditional phase gate with appropriate single-qubit Hadamard (H) operations. Here we implement a similar scheme, adapted to efficiently accommodate the square-root-swap ( √ S w ) operation, i.e., the native two-qubit gate of the spiderweb array, defined as We propose √ S w modifications of the X-and Zplaquettes which are the building blocks of larger errorcorrection schemes.

Entangling phase gate
An explicit calculation of an operator product proposed in [1,52] yields S w R [1] z (π) ⊗ I [2] S w = −iS p , where we have defined the phase gate as This relation is pictorially shown in Fig. 14a and in what follows the S p gate will be depicted as in Fig. 14b. Note that S p is an entangling gate, in the sense that it can map a product state into a an entangled state. This is most easily verified by using the concurrence of a twoqubit state. Its Hermitian conjugate S † p can be written as −iS † p = √ S w I [1] ⊗ R [2] z (π) √ S w . The controlled-phase gate C Z may be used as primitive data-ancilla interaction in a surface code [35]. With only two single-qubit rotations, this C Z gate is constructed from S p as C Z = R [1] z ( π 2 ) ⊗ I [2] Since the three matrices involved are diagonal, they commute and the order of applying the operations is immaterial. This can be advantageous in actually implementing error-correction cycles. To this end, the following relation with the Hermitian conjugate is convenient as well C Z = R [1] z (− π 2 ) ⊗ I [2] I [1] ⊗ R [2] z ( π 2 ) S † p = R [1] z (− π 2 ) ⊗ R [2] z ( π 2 ) S † p . (D3) If one alternatively desires the cnot gate as primitive data-ancilla interaction [35], two additional Hadamard gates are necessary [30]. The result can be written as C N =i R [1] z ( π 2 ) ⊗ I [2] I [1] ⊗ R [2] z ( π 2 ) × I [1] ⊗ R [2] x ( π 2 ) S p I [1] ⊗ H [2] .
Here, non-commuting operators have to be applied in the right order. A similar relation involving S † p can be readily be derived.

Surface code cycles
We adapt the pipeline-based error-correction scheme [35] to use it with the S p operator described above. To achieve this, we are only required to reanalyze the X-type and Z-type plaquettes using the C Z ancilla-data interaction.
For the X-type plaquette, we insert H = R y ( π 2 )Z = ZR y (− π 2 ) for all Hadamard operations. The C Z gates are to be implemented as Eq. D2; here the actual realization of S p as Eq. D1 is implicitly assumed but irrelevant for the remaining analysis. Because of the earlier mentioned commutativity of the diagonal operators, the circuit can readily be simplified. In particular, the Zoperators stemming form the initial and final Hadamard gates combine to the identity. The Z-rotations can be combined as well. All final Hadamard gates are replaced by R y ( π 2 ). The C Z ancilla-data operations are replaced by S p gates without changing the time-ordering. The initial Hadamard operation on the ancilla is replaced by [R z ( π 2 )] 4 R y (− π 2 ) = −R y (− π 2 ), where the minus sign can be omitted. The initial Hadamard data-qubit gates are replaced by the product R z (− π 2 )R y (− π 2 ). The resulting circuit diagram for the X-plaquette cycle is depicted in Fig. 15a.
The Z-plaquette is modified analogously. The final Hadamard gate on the ancilla is again replaced by R y ( π 2 ), whereas the initial Hadamard operation is replaced by −R y (− π 2 ). Once more, C Z ancilla-data operations are replaced by S p gates at the same instant of time. For the data-qubits, rotations R z (− π 2 ) have to be added at some instant of time. In the circuit shown in Fig. 15b, it is done at the onset.
These building blocks can be used to construct depthnine quantum circuits for parallel as well as pipelinebased, quantum-error-correction cycle of a surface code; the proposed scheme is scalable.