Resource costs for fault-tolerant linear optical quantum computing

Linear optical quantum computing (LOQC) seems attractively simple: information is borne entirely by light and processed by components such as beam splitters, phase shifters and detectors. However this very simplicity leads to limitations, such as the lack of deterministic entangling operations, which are compensated for by using substantial hardware overheads. Here we quantify the resource costs for full scale LOQC by proposing a specific protocol based on the surface code. With the caveat that our protocol can be further optimised, we report that the required number of physical components is at least five orders of magnitude greater than in comparable matter-based systems. Moreover the resource requirements grow higher if the per-component photon loss rate is worse than one in a thousand, or the per-component noise rate is worse than $10^{-5}$. We identify the performance of switches in the network as the single most influential factor influencing resource scaling.


I. INTRODUCTION
Numerous different physical systems have been explored as platforms for quantum information processing. Most approaches involve embodying information in matter systems such as ions or superconducting qubits, but a striking alternative is linear optical quantum computing (LOQC) where all information is encoded in electromagnetic field modes, and processing is carried out using only linear optical elements [1]. Using light as the information medium takes advantage of the low decoherence suffered by optical fields, and the relative ease with which quantum information can be encoded photonically. However there are drawbacks, in particular the impossibility of deterministic entanglement and the impact of photon loss (whether due to absorption, leakage or detector failure). Such difficulties can be solved by increasing the physical complexity of the circuitry. Thus while LOQC may benefit from simple building blocks, conversely it may require more complex circuits than other approaches, and the balance of these factors will determine whether the approach is a practical competitor to matter-based processors.
The most developed method for LOQC to date is based on a discrete dual-rail encoding, in which each qubit is encoded in the field modes occupied by a single photon [2] (these modes can be spatial, polarisation, time-frequency or any other degree of freedom supported by electromagnetic fields). Crucially, even though entangling operations between dual-rail encoded photonic qubits cannot succeed deterministically, it has been shown that it is nonetheless possible to build an essentially deterministic universal quantum computer using only linear optics. This can be achieved by attempting probabilistic entangling operations (PEO) between many resource states in order to ensure that, with high probability, a sufficient number of operations will succeed to allow for quantum computing [3,4].
Techniques for mitigating photon loss have also been developed. It has been shown that if quantum information is suitably encoded in a multi-photon state, then losses of up to 50% of the photons can be tolerated before the encoded information is lost [5]. However, in any realistic implementation of a quantum computer, one must account for how complex multi-photon states can be created given that every component, at every level, will be associated with finite rates of photon loss and other forms of noise. Furthermore, the circuitry associated with overcoming non-deterministic entanglement will require many linear optical elements, including delay lines and switching networks, in order to dynamically reroute the outputs of successful operations to the next stage of processing. These elements will induce further errors and losses, and in this sense the twin issues of non-deterministic entanglement and photon loss aggravate one another in LOQC. Fortunately the threshold theorem assures us that, if all physical error rates are sufficiently low then errors at the logical level can be made arbitrarily rare, and scalable fault-tolerant quantum computing can be achieved. The central challenge of quantum computing is therefore to demonstrate the operations necessary for quantum computing with error rates below these thresholds. Theoretical studies have established the required thresholds for architectures relevant to superconducting qubits [6], and to matter optical networks [7], and experimental systems have been demonstrated at, or beyond, the required performance levels [8][9][10]. However to our knowledge no prior paper has established requirements of LOQC at the per-component level while simultaneously tracking the overall resource costs.
In this paper, we propose a protocol for LOQC that includes every step from the initial generation of entan- glement primitives to the deployment of a fully faulttolerant scalable unit for quantum computing. We consider computational errors and losses at each stage and endeavour to employ the most efficient known protocols for optical quantum information processing. In contrast to previous studies of noise thresholds in optical quantum computing [11,12], we explicitly account for the substantial resource costs of LOQC protocols. We focus on a purely linear optical network, without employing matter qubits as memories or for entanglement generation. However we do assume the availability of ondemand sources of single photons, without concerning ourselves with the particular method with which these would be generated [13]. It is important to recognise that our results only represent an upper bound on the physical characteristics that are required of the components in an LOQC system -our protocol can admit various further optimisations, and these will make the physical requirements less stringent. Nevertheless we believe the results we offer are highly relevant to the field, having been derived from protocols that are presently "state of the art", and moreover it is therefore fair to compare the results here with those that have been derived for matter and hybrid matter-optical systems.
Our analysis allows us to make an estimate of the overall scale of the resources necessary to construct a fully fault-tolerant optical quantum computer. We choose the number of detectors as a metric for the device size, recognising that the total numbers for the other kinds of component will scale roughly proportionately. We find that one would require upwards of 10 5 -10 6 detectors per physically encoded qubit in the cluster state, therefore requiring a total of at least 10 11 -10 12 detectors to build a 1000 logical qubit quantum computer 1 . Further, such a quantum computer would require loss rates per component below ∼ 10 −3 and error rates below ∼ 10 −5 per component.

II. PROTOCOL
Our protocol is based on a three-dimensional (3D) cluster state [15][16][17] [ Fig. 1 (a)]. With the cluster state approach, all entanglement required by the quantum computation is generated ahead of the computation itself, which then proceededs purely through measurements. The 3D cluster states enables measurement-based implementations of topological quantum computing using the surface code [18][19][20], providing high thresholds for both qubit loss and computational errors [21]. Without qubit loss, 3D cluster states tolerate phase errors with a rate up to 3% on each qubit; conversely, without computational errors, they tolerate up to 24.9% qubit losses [21]; and with both computational errors and qubit loss, the threshold of errors decreases approximately linearly with the loss rate. Thus cluster states are particularly well suited to LOQC as they can be efficiently prepared with linear optics: there is no fundamental difficultly caused by a high rate of entanglement failure during the creation of the cluster state, provided that once it is created it surpasses these thresholds [3,4,22,23]. The 3D cluster state is a graph state on a 3D lattice, which can be understood by supposing that each vertex on the graph denotes a qubit initialised in the state |+ and each edge denotes an controlled-phase gate entangling the two linked qubits. In the particular lattice we require, each qubit is connected to four neighbouring qubits, see Fig. 1(a). In order to create such a cluster state, our protocol requires one complete building-block state to be prepared for each eventual qubit in the cluster. Importantly, these building-block states contain sufficient redundant encoding that entanglement links between building blocks can be generated with a probability above that necessary for fault-tolerant computing. If suitable building blocks can be constructed, a faulttolerant cluster state of arbitrary size can then be generated deterministically. We can therefore focus on the optimal approach to constructing these building-block states (Fig. 2), without concerning ourselves with the precise details of the 3D cluster state that will ultimately be generated. However, we note that it is only necessary to generate one 2D layer of the cluster state "at a time", entangling it with the layer generated previously (each layer represents one 'clock cycle' of the computation, and therefore requires one vertex for every physical qubit). Therefore the building-block factories can be reused to generate each layer of the 3D cluster.
As in other cluster state generation protocols [3,[24][25][26][27][28][29], our building-block state is also a graph state. We employ the star graph as the basic structure of our building blocks, Fig. 1 (a). This state is composed of one core qubit and several bridge units. While the core qubit is a single photonic qubit, each bridge unit is physically encoded in tree-structure graph state of several photonic qubits. In order to implement a PEO between two different building block states, a Bell measurement is carried out between the root qubits at the base of each bridge unit. The tree-like structure within the bridge units enables two key properties: In the case of a successful PEO between bridge units on two different building-block states, the core qubits on each building block become entangled, and the remaining qubits within each unit can be trimmed away, see the left part of Fig. 1 (b). Moreover, on failure of the entangling operation then the measurements on the remaining qubits within the bridge units allow us to identify any necessary phase correction to the core qubit, preventing its corruption (with high probability). The right part of Fig. 1 (b) summarises this protocol, see Appendix A for further details. This method for recovering from PEO failure via measurements on ancillary qubits follows the approach introduced in Ref. [5].
With these two properties, it is possible to make multiple attempts to form links between core qubits while still ensuring that errors remain below the fault-tolerant threshold. Two building blocks can therefore be successfully connected with a high rate provided that there are enough bridge units.
Since each core qubit must be linked to four other core qubits, the number of bridge units on the building blocks is chosen to be a multiple of four, with a quarter of the bridge units allocated for each connection. To establish an entanglement link, PEOs are performed on corresponding bridge unit in parallel. If there is one successful PEO and the removal measurements are also successful, then the connection is successfully established. If there is more than one successful PEO, we keep only one link. The connection fails if there are no successful PEOs or one of the removal operations fails. Connection failures are dealt with by treating core qubits with a failed connection as missing qubits [29], which can be tolerated by MBQC on the 3D cluster state.
The finally prepared state of core qubits is equivalent to the 3D cluster state up to some feedback single-qubit gates depending on outcomes of single-qubit measurements and Bell measurements for preparing the state. In the MBQC algorithm on the 3D cluster state, (core) qubits are measured in four bases, which are σ x , σ z , and (σ x ±σ y )/ √ 2 (only for magic state injection). In our protocol, the feedback gate on a core qubit is always either the identity 1 1 or the phase gate σ z . Therefore, remarkably, core qubits can be measured before the cluster state is prepared! It is beneficial to do so, in order to reduce the effect of photon loss: core qubits are measured as early as possible, and obviously if a measurement fails then that particular building block is abandoned at its initial stage. Once the full feedback is known, we may update (flip) the recorded outcomes of any cores measured in σ x or (σ x ± σ y )/ √ 2.   3. (a) Circuit for realising linear optical quantum computing using a three-dimensional cluster state. The circuit includes stages for (i) generating initial three-qubit GHZ states, (ii) synthesising these states into building-block states, and finally (iii) constructing the 3D cluster state. Qubits on the cluster state are physically measured right after they are generated by a GHZ state factory. (b) GHZ states factories probabilistically generate three-qubit GHZ states from six single photons. Successful generation of the GHZ state is heralded by specific three photon detection events at the detectors. To select at most N successful copies of the GHZ state from M attempts, we need six M -to-N switchyards. Delay lines are necessary before photons enter switchyards to allow for feed-forward. (c) Bell measurement circuits synthesise building-block states. The two different circuits depicted succeed with probabilities 50% and 75%, respectively. For the 75%-success circuit, four ancillary single photons are used. At each synthesise stage, many copies of input states are prepared, and measurements are performed on these states in parallel. Successful output states are selected with switch networks. (d) A M -to-N switchyard can be realised with 1-to-N switchyards and M -to-1 switchyards. For each 1-to-N switchyard, every output mode is connected to an input mode of a different M -to-1 switchyard. A M -to-1 switchyard composed of ∼ 2M 2-to-2 switches. A 1-to-N switchyard is similar.

III. GENERATION OF BUILDING BLOCKS
Each building-block state must be generated from an initial resource of unentangled single photons. In our scheme, these single photons are first entangled into three-qubit GHZ states. These entanglement primitives can then be sequentially combined into larger units using Bell measurements (Fig. 2). This process is known to be efficient for loss rates of less than 1/3 [30], since at each stage it is then possible increase (up to double) the size of the resulting entangled states. Further details on the building-block construction process are given in Appendix B.
Regardless of the specific architecture of the building block to be generated, this process requires two primary circuit elements. The first element, a GHZ-state factory, produces GHZ states probabilistically from single-photon inputs. The second element probabilistically joins two independent graph states into a larger graph state. Along with these two processing elements, it is also necessary to construct switching networks and delay lines in order to route photons between processing states. All of these operations must be realised using only linear optical elements, e.g. single-photon sources, beam splitters, switches, delay lines and photon detectors [ Fig. 3 (a)].
In our scheme, we use the same GHZ-state factory as proposed in Ref. [30]. This circuit requires six single photon inputs, and, in the lossless case, successfully generates GHZ states with probability 1/32 [ Fig. 3 (b)]. Our fusion elements use Bell measurements as PEOs for joining intermediate states. These Bell measurements consume one photon from each input state [4]. A tempting alternative is to employ the Type-I fusion gate, which consumes only one photon and can also connect two graph states [4]. However, a Type-I fusion gate may convert photon loss into computational errors (see Appendix B), which should be avoided as overcoming errors is usually harder than overcoming photon loss. Therefore, we only use Bell measurements in our protocol [30]. The circuits we use for the Bell measurement are also shown in Fig. 3 (c). Without any ancillary resources, a linear optical Bell measurement (often termed Type-II fusion) can succeed with 50% probability in the lossless case. However, with the help of four ancillary single photons, the success probability of a Bell measurement can be boosted to 75% [31]. The same success probability can also be achieved with a Bell-state as the ancillary resource [32]. With a resource state of more entangled photons, the success probability can be further boosted [31,32].
As neither GHZ-state generation nor Bell measurements can succeed deterministically, we select successful outcomes from these operations to feed into the next stage of construction. This requires a rapidly reconfigurable switchyard consisting of a network of switches. For example, to select N successful copies of the threequbit GHZ state from M attempts in parallel, we need six M -input to N -output switchyards, one for each out-put mode of the GHZ-state generation circuit. Before photons enter switchyards, delay lines are necessary to allow time for the switchyard to be reconfigured.
We consider two different approaches to this switching requirement. In the ideal case, this switchyard would consist of a single reconfigurable switch with multiple inputs and outputs [33], in which there is no extra cost in terms of losses or errors as N or M increases. This may prove impossible to achieve, and so we also consider the opposite limit, in which a switchyard is built out of a network of 2-to-2 switches. Such an M -to-N switchyard can be realised with M 1-to-N switchyards and N Mto-1 switchyards [ Fig. 3 (d)]. Each 1-to-N and M -to-1 switchyard is respectively composed of ∼ N and ∼ M 2-to-2 switches, as also shown in Fig. 3 (d). With such a network of simple switches, each photon must go through approximately log 2 (M N ) switches. To minimise photon loss, switchyards with multiple inputs M but a single output N are favourable. However resources are not used efficiently in this case, and many successful PEO outputs will be discarded. To increase the efficiency, it is preferable to use switchyards with more output modes. In our numerical simulations, we have considered different configurations of the switch network to obtain the optimal threshold of a computer built with 2-to-2 switches.

IV. PHOTON LOSS AND COMPUTATIONAL ERRORS
The main source of noise in LOQC is photon loss, which may be induced by any component on the optical path of the qubit. We assume that a loss occurs at single-photon sources, beam splitters, delay lines (for the time period required for one PEO stage), switches and detectors with the rates p e , p b , p d , p s and p m , respectively.
In addition to photon losses, we also have to consider computational errors. Because measurements are eventually attempted on all photonic qubits in the protocol, computational errors are induced by any source of noise that can affect these measurement outcomes. For example, any asymmetry, e.g. phase difference or biased transmission, between two modes of a qubit may result in computational errors. All computational errors are equivalent to Pauli errors (see Appendix C). In this paper, we assume that depolarising errors may happen at beam splitters, delay lines and switches with the rates b , d and s , respectively. Imperfect modal overlap between different photon sources will lead to imperfect quantum interference at beam splitters, and therefore also to Pauli errors. For simplicity, in our model we incorporate this form of error into b .
Other types of noise are also tolerable in our protocol. For example, a photon source may emit two photons rather than a single photon into the circuit. Similar errors can be induced by switching errors, in which a photon enters the wrong mode, and from dark counts of detectors. To first order, all of these errors will be caught during measurement, since if these extra photons survive in the optical path we will measure more than the expected number of photons. In this case, we can simply treat the qubit as missing, an error which can be overcome in the same way as true photon loss. However, if a two-photon error is followed by a photon loss event, only one photon will be detected, and the measurement on such a qubit may give a wrong outcome. These computational errors are also equivalent to Pauli errors and can be corrected with our protocol. Although these errors can be corrected, we consider regimes in which they will occur at a rate much lower than the first-order error terms, and so we do not explicitly include them in our threshold study.

V. THRESHOLDS
In this approach to LOQC, the fault-tolerance threshold depends on the complexity of each building-block state. With more resources, one can prepare bigger build-ing blocks, and thus a higher level of photon loss is tolerable. In Fig. 4, fault-tolerant thresholds are obtained numerically (see Appendix D for details). In order to provide some physical intuition for the size of such linear optical quantum computers, we choose to specify the total number of detectors needed as our metric of the resources required. It can be seen that this approximately corresponds to twice the number of single photons needed, and therefore twice the number of single photon sources. It is likely that the resource burden of the other elements, e.g. beam splitters, delay lines and switches, will be of similar magnitudes.
Note that each curve in Fig. 4 (a) is actually an envelope representing the best of a very large number of protocols that were tested. Each small dot within the red curve in the upper left figure represents the outcome of one such simulation; these dots are omitted from other curves for clarity.
It is vital to appreciate that the number of detectors shown in the figure is for a single building-block state rather than the entire computer. A building-block state corresponds to only one qubit on the cluster state, i.e. one data qubit of the surface code, which could correspond to just one ion in an ion trap quantum computer or one superconducting qubit in a superconducting quantum computer. It is anticipated that a fault-tolerant quantum computer will need at least ∼ 10 6 data qubits in order to be able to compete with state-of-the-art classical computers [14,34]. We therefore do not consider building-block states with a resource requirement of greater than two billion detectors, since at that point one finds the entire computer requires thousands of trillions of components! In Fig. 4 (a), we consider the case in which all components of the computer have the same photon loss rate. Depending on the choice of Bell measurement protocol, the threshold loss rate per component varies from ∼ 0.1% to ∼ 0.2%. In this subfigure, we consider the worst case approach, in which each switchyard is built out of a cascade of 2-to-2 switches. For comparison, in Fig. 4 (b) we consider a more sophisticated computer, in which each switchyard is a multiple-input multiple-output switch with the same loss rate as the other components. In this case the threshold is ∼ 3 -5 times higher than that of a simple-switch computer. We note that these thresholds approach the 1% limiting loss rate that has been discussed in the context of a computing paradigm where gates are essentially deterministic but suffer a small probability of qubit loss [35] (we have achieved this at the cost of the additional resource overhead of course).
In order to further explore which components have the most significant impact on the fault-tolerance threshold, in Figs. 4 (c), (d), (e) and (f) we modify the model in Fig. 4 (a), assuming in each that one of the circuit components is lossless. These simulations confirm that it is the switching networks which most strongly impact the loss tolerance of the quantum computer. This suggests that alternative approaches in which intermediate cluster states are extensively recycled (similar to the recycling discussed in Ref [28]) will suffer from the associated increase in complexity of the switching networks.
The threshold changes dramatically with increased success probability Bell measurements. With a higher success probability, the size of building-block states (number of bridge units) can be smaller, hence both the level of noise and the resource cost can be lower. Bell measurements with 75% success probabilities perform significantly better than those with 50% success probability. Further, the single-photon ancilla assisted Bell measurement is slightly better than the Bell-state assisted Bell measurement. These boosted Bell measurements do however require photon detectors with additional photon-number resolution. In the case of the 50%-success Bell measurement, we need detectors that can distinguish photon numbers 0, 1, 2, while for 75%success Bell measurements, we need detectors that can distinguish photon numbers 0, 1, 2, 3. With more complex ancillary states, the success probability can be further boosted. However in this case more resources are required for preparing these ancillary states, which can counteract the benefits of the higher success probabilities (see Appendix E).
The presence of computational errors reduces the threshold loss rate. A general study of the threshold for both loss rates and error rates is shown in Fig. 5, in which we only consider quantum computers built with 2to-2 switches and single-photon ancilla assisted Bell measurements with 75% success probability. More data for other Bell-measurement circuits and fancy switches can be found in Fig. 10. The threshold error rate per component is on the order of 10 −5 .

VI. DISCUSSION
We have proposed a comprehensive protocol for LOQC with 3D cluster states, in which we consider the full network of linear optical devices necessary to realise a quantum computer. We find thresholds for loss and error rates of ∼ 10 −3 and ∼ 10 −5 per component, respectively. This per-component performance is beyond the current state of the art in photonics [36][37][38][39][40][41][42]. Furthermore we find that such a quantum computer would require on the order of 10 11 detectors, and similar numbers of other components including deterministic and indistinguishable single photon sources. These component counts are several orders of magnitude greater than those required for systems with deterministic gates [7,14].
We wish to emphasise that these stringent thresholds should be taken as a challenge to the community, aiming to stimulate further discussion and innovation in LOQC. From an experimental perspective, we have tried to determine which components will prove most critical in the development of an optical quantum computer. We found that, for our scheme, it is the performance of the optical switches that have by far the most impact the threshold loss and error rates, while other components contribute more equally. We hope that this will help guide the priorities of future experimental work aimed towards realising LOQC. From a theoretical perspective, we hope that our work will stimulate others to improve on our thresholds by exploring alternative schemes.

Appendix A: Loss and error tolerant building blocks
As discussed in the main text, our protocol is based on a three-dimensional (3D) cluster state [ Fig. 1 (a)]. To create the cluster state, one intermediate building-block state must be prepared for each qubit in the cluster.
We employ the star graph as the basic structure of our building blocks [ Fig. 1 (a)]. This state is composed of one core photonic qubit and several bridge units. To tolerate photon loss and failures of PEOs, each bridge unit is encoded as a tree-structure graph state of several photonic qubits with a root qubit connected to the core qubit [ Fig. 6]. The PEO for connecting two core qubits includes a Hadamard gate on one root qubit and a Bell measurement on two root qubits [ Fig. 1 (b)]. Because the Bell measurement can only succeed probabilistically in LOQC, the overall operation is probabilistic. If the PEO is successful (fails), qubits on first-generation branches, which are directly connected to the root, are measured in the σ z (σ x ) basis, qubits on second-generation branches are measured in the σ x (σ z ) basis, and so on. This measurement pattern removes redundant branches from two connected building blocks if the PEO is successful and removes entire bridge units from two independent building blocks if the PEO is failed. This removal operation is not always successful due to photon loss. When the tree graph state is large enough, the removal operation can succeed with an arbitrarily high probability if the photon loss rate is lower than 50% [5].

Appendix B: Constructing building blocks
Building-block states are generated by fusing threequbit GHZ states (see Fig. 6) with Bell measurements. There are four types of graph states occurring in the generation process, which are three-qubit GHZ states, rakestructure graph states, tree-structure graph states, and rake-tree states. In the first step, rake states are prepared from GHZ states. These states, along with further GHZ states are the basic ingredients of rake-tree states. Using these ingredients, rake-tree states are generated and enlarged with Bell measurements. When the tree of a rake-tree state is large enough, it can be converted into a building-block state by removing the rake. As an example, the construction process for a building-block state with branching numbers (8,2,2), is shown in Fig. 6.
A rake with r branches can be prepared with 2(r − 1) GHZ states (assuming all Bell measurements are successful) in ceil[log 2 (r−1)]+1 steps. In the first step, each pair of GHZ states are fused by a CP operation [see Fig. 7 (a)] to obtain a 4-qubit linear cluster state, which is also a rake with 2 branches. Two rakes can be combined into a bigger rake by a PP operation [see Fig. 7 (b)]: If two input rakes respectively have r 1 and r 2 branches, the output rake has r 1 + r 2 − 2 branches. Since, in each step the number of branches can be nearly doubled, r − 1 2branch rakes can be combined into the r-branch rake in ceil[log 2 (r − 1)] steps.
A rake itself is a rake-tree graph state in which the tree is 1-level but the branching number is 0. A GHZ state itself is also a rake-tree graph state in which the rake has only 1 branch and the tree is 1-level with the branching number 1. From these two kinds of graph states, raketree states can be generated and enlarged with two basic processes: increasing the branching number of the tree and increasing the level of the tree (see Fig. 6). The branching number is increased by fusing two rake-tree graph states with a PP operation, where one of the rake always has only one rake branch. The level of the tree is increased by removing, i.e. measuring in the σ z basis, one branch of the rake. We would like to remark that, when the level number is increased from 1 to 2, the branch of the rake (which is supposed to be measured) can be kept as a branch of the tree.
In our protocol of generating photonic tree-structure graph states, the rake structure allows us to increase the level of the tree with a single-qubit measurement (which is physically performed at the GHZ-state-generation stage due to the same reason of measuring core qubits as early as possible). This process is efficient; the largest rake state can be prepared in ceil[log 2 (R − 1)] + 1 steps, where R is the level of the final tree. As a comparison, the previous protocol reported in Ref. [30] requires probabilistic Bell measurements for increasing the level of the tree. Therefore, the number of construction stages is reduced in our protocol for high-level trees. Minimising the number of construction stages can reduce noise induced by delay lines and switchyards and also to reduce the resource cost. If the success probability of a Bell measurement is p S , for each successful output state, roughly speaking, 1/p S input states need to be prepared. For a GHZ state going through n Bell-measurement stages, 1/p n S copies will be required to ensure that each stage is successful. Reducing n is therefore critical to reducing the the resource costs.
As we have discussed in the main text, we choose Bell measurement rather than Type-I fusion gate as the operation of entangling photons, because a Type-I fusion gate may convert photon loss into computational errors. For a Type-I fusion gate, if there is no photon loss, two input photonic qubits are projected into the subspace of HH and VV (for the polarisation encoding) when only one photon is detected, or the state HV (VH) if zero photons (two photons) are detected. With photon loss, the input qubits may be in the state VH rather than the subspace of HH and VV if only one photon is detected and the other is missing. Therefore, photon loss may result in computational errors in a Type-I fusion gate. U ρU † .

Appendix D: Numerical simulations
The threshold of fault-tolerant quantum computing is determined by evaluating p L and p P , which respectively are the loss rate and phase-error rate of a cluster-state qubit, for the given loss rate and error rate per component. In Ref. [21], the inferred critical threshold is almost a straight line in the (loss rate, computational error rate) parameter space. For the 3D cluster state, the phaseerror rate threshold without loss is 2.93% [15], and the loss rate threshold without error is 24.9%. Therefore, the threshold of (p L , p P ) is estimated as To obtain thresholds of the loss rate per component without computational error, we have considered 2-level trees and 3-level trees [see Fig. 8] with branching numbers not larger than 20 as building-block states, which includes 8400 different tree structures in total. In the case that switchyards are composed by 2-input-2-output switches, we have considered configurations of the switch network for which the number of outputs for all of the switchyards is the same. We simulate output numbers N = 2 n with n = 0, 1, . . . , 10. For switchyards at the outputs of GHZ state factories, the input number is M = m × N with m = 32, 36, . . . , 256 (the success rate of generating GHZ states is 1/32 [30]). For switchyards for selecting successful Bell measurements, the input number is determined by the input number of GHZ-state switchyards, which is ceil(M/(32p S )), where p S = 50%, 75%, 87.5% is the success rate of Bell measurements without photon loss. Similarly, for switch-yards for selecting successfully generated Bell states, the input number is ceil(M/4), where we have used the circuit for generating Bell states with the success rate 1/8, which can be boosted to 3/16 if a switch is introduced [4]. Therefore, we have in total considered 627 different configurations of the switch network composed by 2-input-2-output switches. In the case that each switchyard is a fancy switch with arbitrarily large input and output numbers, we have assumed that the ratio 'output number/ input number' equals the actual success rate (including the effect of photon loss) of corresponding operations.
For each curve in Fig. 4 (a) and (c)-(f), thresholds of the loss rate per component are evaluated for 8400 × 627 different protocols. Each protocol includes the buildingblock structure and the configuration of the switch network. Each curve is obtained as the envelope of these thresholds. For each curve in Fig. 4 (b), thresholds of the loss rate per component are evaluated for 8400 different protocols, which are only determined by building-block structures.
To obtain thresholds of the loss rate per component with computational errors, we have selected about 500 protocols from protocols that require not more than 2 × 10 9 detectors for each case. These selected protocols are all close to the envelope, i.e. have the best performance of tolerating photon loss. Specifically, we have drawn a straight line connecting the highest point (corresponding to the protocol tolerates the highest loss rate per component) and the lowest point (corresponding to the protocol with the smallest number of detectors) on the envelope. This line is then shifted downwards until there are about 500 protocols whose thresholds of the loss rate per component are above it. Thresholds of the loss rate per component with computational errors in Fig. 5 are obtained from these selected protocols. Computational errors are evaluated using Monte Carlo methods. In each protocol, for each value of the loss rate and the error rate, the phase-error rate on a cluster-state qubit is obtained with 100000 samples.

Appendix E: Bell measurements with entangled ancillary states
In addition to the Bell measurement assisted by a Bell state (see Figs. 4), which has the success probability 75%, we also have considered the Bell measurement assisted by a 4-qubit GHZ state (see Fig. 9), which has the success probability 87.5% [31]. The 4-qubit GHZ state is prepared with two 3-qubit GHZ states generated with the circuit in Fig. 3 (b). By using a PP operation, in which the Bell measurement is assisted by a Bell state, i.e. the success probability is 75%, two 3-qubit GHZ states can be fused into a 4-qubit GHZ state. We find that further boosting the success probability of Bell measurements with more entangled ancillary photons is not helpful.  9. Thresholds of the loss rate per component without computational errors using Bell measurements with the success probability 75% (assisted by a Bell state) and the success probability 87.5% (assisted by a 4-qubit GHZ state). All components have the same loss rate.