What limits the simulation of quantum computers?

It is well established that simulating a perfect quantum computer with a classical computer requires computing resources that scale exponentially with the number of qubits $N$ or the depth $D$ of the circuit. Conversely, a perfect quantum computer could potentially provide an exponential speed up with respect to classical hardware. Real quantum computers however are not perfect: they are characterized by a small error rate $\epsilon$ per operation, so that the fidelity of the many-qubit quantum state decays exponentially as $ {\cal{F}} \sim (1-\epsilon)^{ND}$. Here, we discuss a set of classical algorithms based on matrix product states (MPS) which closely mimic the behavior of actual quantum computers. These algorithms require resources that scale linearly in $N$ and $D$ at the cost of making a small error $\epsilon$ per two-qubit gate. We illustrate our algorithms with simulations of random circuits for qubits connected in both one and two dimensional lattices. We find that $\epsilon$ can be decreased at a polynomial cost in computing power down to a minimum error $\epsilon_\infty$. Getting below $\epsilon_\infty$ requires computing resources that increase exponentially with $\epsilon_\infty/\epsilon$. For a two dimensional array of $N=54$ qubits and a circuit with Control-Z gates of depth $D=20$, a fidelity ${\cal F}\ge 0.002$ can be reached on a single core computer in a few hours. It is remarkable that such a high fidelity can be obtained using a variational ansatz that only spans a tiny fraction $(\sim 10^{-8})$ of the full Hilbert space. Our results show how the actual computing power harvested by noisy quantum computers is controlled by the error rate $\epsilon$.

It is well established that simulating a perfect quantum computer with a classical computer requires computing resources that scale exponentially with the number of qubits N or the depth D of the circuit. Conversely, a perfect quantum computer could potentially provide an exponential speed up with respect to classical hardware. Real quantum computers however are not perfect: they are characterized by a small error rate per operation, so that the fidelity of the many-qubit quantum state decays exponentially as F ∼ (1 − ) N D . Here, we discuss a set of classical algorithms based on matrix product states (MPS) which closely mimic the behavior of actual quantum computers. These algorithms require resources that scale linearly in N and D at the cost of making a small error per two-qubit gate. We illustrate our algorithms with simulations of random circuits for qubits connected in both one and two dimensional lattices. We find that can be decreased at a polynomial cost in computing power down to a minimum error ∞. Getting below ∞ requires computing resources that increase exponentially with ∞/ . For a two dimensional array of N = 54 qubits and a circuit with Control-Z gates of depth D = 20, a fidelity F ≥ 0.002 can be reached on a single core computer in a few hours. It is remarkable that such a high fidelity can be obtained using a variational ansatz that only spans a tiny fraction (∼ 10 −8 ) of the full Hilbert space. Our results show how the actual computing power harvested by noisy quantum computers is controlled by the error rate .

I. INTRODUCTION
Operating a quantum computer is a race against the clock. The same phenomena enabling the potential computing power of quantum computers-entanglementis also responsible for decoherence when it occurs with unmonitored degrees of freedom. The main challenge of quantum computing is to quickly build entanglement between the qubits before imperfections or decoherence overly corrupt the quantum state.
As different experimental platforms for quantum manipulation make rapid and impressive advances, there has been a justifiable interest in the computational capability of near-term quantum computers [1]. One of the key questions is when and how to achieve the goal of "quantum supremacy" [2], which is the crossover point where a quantum system ceases to be within reach of simulation on a classical computer. Precise circuits and fidelity metrics have been designed to meet this goal [3]. Recently, an experiment using 53 qubits and a circuit of depth D = 20 has reached a multi-qubit fidelity F = 0.002 [4]. According to the authors, such an experiment would take thousands of years to be simulated on the largest existing supercomputer. This statement was then challenged by another estimate which claims that only two days would be needed [5]. Such a disparity between estimates raises the question of the difficulty of simulating a quantum computer and consequently of the true computing power realized in a quantum computer.
The implicit assumption behind quantum supremacy as well as the most appealing applications of quantum computing is that a quantum computer is exponentially hard to simulate. Indeed, in recent years many tech-niques have been developed to simulate quantum computers, and they all have an exponential cost in some parameter. A brute force approach where one holds the full quantum state in memory as a large vector of size 2 N (N : number of qubits) requires a computing time and memory that scales exponentially with N but linearly with the depth D of the circuit. Other approaches require a computing time that scale exponentially with the number of two-qubit gates [6][7][8][9], with the number of non-Clifford gates [10] and/or with the number of gates that are non-diagonal in a chosen basis [11,12]. In all cases, the required computing resources are exponential so that getting beyond N = 50 and a depth D = 20 for an arbitrary circuit is extremely difficult.
In this article, we discuss a class of algorithms where the limiting factor is the fidelity with which the calculation is performed while the computing time is linear in both the number of qubits N and the depth D. These algorithms "mimic" actual quantum computers in the sense that the difficulty lies in increasing the fidelity of the calculation: a small finite error is made each time a twoqubit gate is applied to the state. Therefore, they offer a better reference point to assess the computing power harvested by actual quantum chips.
Our algorithms are based on tensor networks and more precisely on matrix product states (MPS) [13]. MPS have been recognized very early as an interesting parameterization of many-qubit quantum states for quantum simulations [6] and its generalizations are used in some of the most advanced simulation tools [14]. So far, the focus of classical simulations of quantum hardware has been to build essentially exact simulations techniques and little attention has been devoted to approximate techniques. Interestingly these exact techniques can require one to go well beyond double precision calculations [15]. However, the historical success of MPS has not been for exact calculations but for the development of controlled approximate techniques to address quantum many-body physics problems. This includes the celebrated density matrix renormalization group (DMRG) algorithm [16] which has provided precise solutions to a number of one dimensional and quasi-one-dimensional problems, as well as time-dependent extensions [17] and generalizations to higher dimensions through the projected entangled pair states (PEPS) [18] or the multi-scale entanglement renormalization ansatz (MERA) [19] tensor network formats. At the root of these successes is the fact that MPS naturally organizes states according to the amount of entanglement entropy between different parts of the system. Hence, slightly entangled systems can be easily represented with MPS. As entanglement entropy grows, one eventually truncates the basis. The associated error can be made arbitrarily small by keeping a larger basis. In this article, we construct such an approximate technique in the context of quantum computing. Our chief result is that fidelities comparable to those reached experimentally require parameterizations spanning only a tiny fraction of the total Hilbert space.

II. POSSIBLE STRATEGIES FOR APPROXIMATE SIMULATIONS OF QUANTUM CIRCUITS
Let us start by discussing possible strategies for simulating quantum circuits in an approximate manner. Suppose that we have split the qubits into two different sets A and B with respectively N A and N B qubits (N A + N B = N ). Let us consider the two-qubit gates that connect A and B and ignore gates internal to A or B. Performing a singular value decomposition (SVD) of such a gate, it can be written as a sum of terms that act separately on A and B. This sum contains two terms for the case of usual gates (Control-NOT and Control-Z) and at most four terms for an arbitrary two-qubit gate. It follows that computing the state after n of these gates amount to keeping track of 2 n (up to 4 n ) different amplitudes. These amplitudes are the discrete analogue of the Feynman paths and are referred as such in the literature. For the random circuits that will be considered in this article, these 2 n amplitudes have essentially random phases. It follows that if one keep track of just a single path, one reaches an overall multi-qubit fidelity F = (1/2) n (or F = (1/4) n in the worst situation). This very simple strategy could be used to simulate an arbitrary large number of qubits with low fidelity per gate in a computing time ∼ n. However, if one wants to keep a fixed fidelity per gate f defined as F = f n , in analogy with real quantum computers, the number of paths N path that must be tracked during the simulation is N path = (2f ) n , hence increases exponentially with n. Such a strategy has been used in [4] to validate the experimental results.
We now seek algorithms where a constant fidelity f can be obtained at a constant computing cost per gate, independently of the total number of gates n. One starts by writing a general state for the bipartite system as, where the states |a (|b ) form an orthonormal basis of A (B). Performing a singular value decomposition (SVD) we can define an orthonormal basis (with similar notation for the B subsystem) and arrive at the usual Schmidt decomposition of |Ψ : in terms of a finite number of singular values S µ . States with only one non-zero singular value S 0 = 1 are simple product states. A measure of the number of singular values needed to describe the state is given by the entanglement entropy is the reduced density matrix for the subsystem A (B). The general strategy of DMRG-like algorithms is to keep only a finite number χ of singular values. After a two-qubit gate that connects A and B, one performs a SVD decomposition of Ψ ab and truncate the state by keeping only the χ largest singular values. When χ e S this procedure is essentially exact. As the entanglement increases, this procedure lead to a certain fidelity per gate f < 1 that can be controlled by increasing the parameter χ. Of interest to the present article is the typical value of f that can be reached in a reasonable computing time.

A. MPS representation of the state
We first consider a one dimensional network of N qubits where two-qubit gates can be only applied directly between nearest neighbors. Within this connectivity, gates acting on other non-neighboring qubits are still possible at the cost of using ∼ N SWAP operations to bring the qubits onto neighboring sites. We define our MPS state in terms of N tensors M (n) as where the "physical" indices i n ∈ {0, 1} span the 2 N dimensional Hilbert space while the bond (or virtual) indices µ n ∈ {1, ..., χ n } control the maximum degree of entanglement allowed by the MPS. |x is a shorthand for |i 1 i 2 ...i N . A sketch of the MPS structure is shown in Fig. 1b. We enforce χ n ≤ χ so that the parameter χ controls the error rate made by our algorithm as well as the computational cost for running it. Its memory footprint is N χ 2 while applying a two-qubit gate takes ∼ χ 3 operations. To be acceptable, our algorithm must provide the same features that a real quantum computer would provide. Applying a one-qubit gate U on qubit n can be done exactly and without increasing any of the χ n : it simply amounts to updating the corresponding tensor M (n) → M (n): as shown in Fig. 2(a). Calculating the overlap between different MPS states or calculating individual wavefunction amplitudes i 1 i 2 ...i N −1 i N |Ψ can be done with contraction algorithms which, for MPS, can be done exactly in ∼ N χ 3 operations (see e.g. [13] for a detailed description of standard MPS algorithms). It follows that one can also sample from the distribution | i 1 i 2 ...i N −1 i N |Ψ | 2 within the same complexity. Quantum measurements (sampling of a given qubit followed by its projection) can also be done efficiently in a straightforward manner [20].
To perform a two-qubit gate U between qubit n and qubit n + 1, one first transforms the MPS into the socalled "canonical form" centered around the qubits of (a) Applying a single qubit gate to an MPS can be done without approximation by multiplying the gate by a single MPS tensor. (b) To apply a two-qubit gate to qubits n and n + 1, one contracts the corresponding tensors together, then applies the gate. To restore the MPS form, the resulting tensor is decomposed with an SVD truncated to keep the largest χ singular values, and the matrix of singular values is multiplied into one of the unitary factors X or Y .
interest, through a series of QR factorizations [13]. This step is crucial for the accuracy of truncations of the MPS. The steps to apply the gate are then shown in Fig. 2(b). One first forms the two-qubit tensor Then one applies the two-qubit gate U and obtains In a last stage, considering the tensor T as a matrix with indices spanned by (i n , µ n−1 ) and (i n+1 , µ n+1 ), one performs a singular value decomposition and writes where the tensors X and Y are formed of orthogonal vectors while the vector S µ contains the singular values of T . Here S µ has up to 2χ components (irrespectively of the nature of the two-qubit gate) so that exact algorithms imply a doubling of χ after each application of a twoqubit gate. In the spirit of DMRG like algorithms, we truncate S µ and keep only its χ largest components to obtain S µ . The new MPS tensors are then simply given by which completes the algorithm. Overall, the cost of applying a two-qubit gate is dominated by the SVD step which scales as χ 3 . We emphasize that such an algorithm can do anything that a quantum computer does but the reverse statement is not true: in the MPS approach, one holds the full wavefunction in memory which provides much more information than can be obtained from samples of the wavefunction. For instance, one can compute bipartite entanglement entropy of an MPS, and it is straightforward to calculate quantities such as observables or correlation functions without any statistical errors. The MPS format also satisfies the sample and query access criteria needed for quantum inspired de-quantizing algorithms [21].

B. Random Quantum Circuit
Fig . 1a shows the quantum circuit used in our numerical experiments. It consists of alternating layers of onequbit and two-qubit gates. This circuit has been designed following the proposal of [3] in order to create strongly entangled states in as few operations as possible. It it believed to be one of the most difficult circuit to simulate on a classical computer since its many-qubit quantum state is extremely sensitive to modification of any of the gates. The one-qubit gates U n represented as colored squares in Fig. 1a are chosen randomly such as to remove any structure or symmetry from the many qubit state. A gate U n is a rotation U n = exp(−iθ n σ. m n ) of angle θ n around a unit vector m n = (sin α n cos φ n , sin α n sin φ n , cos α n ) ( σ is the vector of Pauli matrices). We take the angles θ n , α n , and φ n to be uniformly distributed (note that the resulting matrix U n is not distributed according to the Haar distribution of U (2)). While the U n are random, the actual sequence used is carefully recorded for comparison with e.g. exact calculations. We call the number of two-qubit gate layers applied the depth D of the circuit, focusing on the number of two-qubit gate layers because those are the only source of imperfection in our calculations. In real quantum computers, twoqubit gates also dominate the errors over one-qubit gates in terms of fidelity. However real quantum computers also have other sources of error (decoherence, unknown couplings between qubits, leakage to non-computational states...) not present in the algorithm. After a depth D ∼ N , the state obtained with the circuit of Fig. 1a is totally scrambled and well described by a Porter-Thomas distribution. This is illustrated in Fig. 3 where the cumulative distribution of p x = | x|Ψ | 2 is compared to the Porter-Thomas form for various maximum MPS bond dimensions (main panel) and for various depths using exact calculations (inset). One indeed observes that the dis- tribution quickly approaches the chaotic Porter-Thomas distribution as one increases the bond dimension χ.

C. Effective two-qubit gate fidelity
Let us introduce the main quantity of interest for this study, the effective two-qubit fidelity f n . The effective two-qubit fidelity f n is the computational analogue to the fidelity reported experimentally for two-qubit gates. f n = 1 for a perfect calculation, but the truncation of the MPS will induce 0 < f n < 1.
Let us call |Ψ T (n) the MPS state after a sequence of n individual two-qubit gates (n ≈ (N − 1)D/2 for the circuit of Fig. 1a). Up to irrelevant one-qubit gates, |Ψ T (n) is obtained by applying one Control-Z gate C Z onto |Ψ T (n − 1) followed by the truncation operation which introduces a finite error. We define the effective fidelity f n as, and the corresponding error rate n as, f n can be calculated using the contraction algorithm in N χ 3 operations. However, when the MPS is in canonical form, f n is simply obtained without any additional calculations as, We have explicitly checked the equivalence between the two algorithms. A typical simulation is shown in Fig. 4 for the circuit with the Control-Z gate. At small depth D < 2 log 2 χ, the simulation is exact and f n = 1. Above this threshold, one starts to truncate the MPS after each two-qubit gate. We observe a transient regime where f n decreases after which f n quickly saturates at a constant value, here around 0.988. The first thing to notice in Fig. 4 is that these simulations are many orders of magnitude easier than an equivalent perfect calculation: simulating the exact state for N = 60 and D = 200 would be out of reach even with thousand of years of computing time on the largest existing supercomputer. Yet here, these simulations of a noisy quantum computer have been performed on a laptop. The averaged fidelity for a modest χ = 64 is better than 99% which already corresponds to qubits of very good quality. This is rather remarkable since the percentage of the Hilbert space spanned by the MPS ansatz is only a very tiny fraction ∼ 10 −13 percent of the whole Hilbert space. After the transient regime, f n is, up to some fluctuations, independent of both D and N . The second statement is true up to small 1/N corrections. These corrections arise from the fact that the fidelity associated with gates applied on the edge of the system (i.e. associated to matrices M (i) with i < 2 log 2 χ or N − i < 2 log 2 χ) is always equal to unity since the en-tanglement entropy associated to the subsystem of qubits i < a is bounded by S ≤ a log 2.
Our main goal is to understand how the residual error n = 1 − f n decreases as one increases the bond dimension χ. As χ approaches χ ∼ 2 N/2 , one must have n → 0. However, here we are interested in the regime χ 2 N/2 which remains accessible to simulations. Fig. 5 shows how the residual error n = 1 − f n decreases with increasing the bond dimension. The main finding of Fig. 5 is that the residual error per gate at large depth D and number of particle N eventually saturates at a finite value, in this case around ∞ ≈ 6 × 10 −3 . In other words, this algorithm can simulate any 1D quantum computer that has a two-qubit gate fidelity smaller than f ∞ = 99.4% at a linear cost in both N and D . As the depth or number of qubits is reduced, the average fidelity increases. The black cross in Fig. 5 corresponds to a calculation where only the last part of the circuit has been taken into account in the calculation of the average fidelity, i.e. the average is performed for D > 50 where the system has already entered its stationary regime. As we shall see, decreasing the error rate beyond ∞ requires an exponential effort.

IV. LINKS BETWEEN TWO-QUBIT AND MULTI-QUBIT FIDELITY
Before investigating the origin of ∞ , we make a short detour to discuss how the effective two-qubit fidelity f n is related to the actual N-qubit fidelity F of the state and is related to practical estimates of the fidelity that can be measured experimentally.

A. Multi-qubit fidelity
Let us call |Ψ P (n) the exact perfect state after n two-qubit gates-meaning it is never truncated or otherwise approximated at any stage of its evolution by the circuit-while |Ψ T (n) is the truncated MPS state (P stands for Perfect and T for Truncated). The N-qubit fidelity F is defined as, The fidelity F is a direct measure of how reliable is our truncated state. As the errors accumulate, it is natural to expect that the fidelities f n are multiplicative, Eq. (17) is indeed a very accurate approximation. An analytical argument will be given below. The validity of Eq. (17) can also been shown by numerical simulations. Fig. 6 shows the fidelity versus D for N = 20 particles obtained in two independent ways. The symbols corresponds to a direct calculation of F while the lines correspond to the the right hand side of Eq. (17). We find an almost perfect match in all the regimes that we have studied. Eq. (17) is a very useful result: it relates a property of the perfect state (left hand side) to a property solely defined in terms of the MPS (right hand side). It allows us to easily estimate the fidelity in regimes where we do not have access to the exact state anymore. When f n has reached its stationary value f ∞ , Eq. (17) simplifies into In an actual experiment, one cannot measure the f n but rather one has access to an estimate of F(n) (see the subsection below). To compare the accuracy of the simulations with the capabilities of actual quantum chips, we therefore define the average two-qubit fidelity f av after n two-qubit gates, where the second equality is specific to the quantum circuit studied here. Derivation of Eq. (17). Let us define a full basis of orthogonal states |α such that state |1 ≡ |Ψ T (n − 1) is our truncated state and we complement state |1 with an arbitrary basis. Writing |Ψ P (n − 1) in that basis as |Ψ P (n − 1) = 2 N α=1 p α |α , we have p 1 = F(n − 1). Similarly, we write |Ψ T (n) = √ f n . From these definitions, the fact that C Z is unitary and that |Ψ P (n) = C Z |Ψ P (n − 1) , we have, As the fidelity goes down, the p α and t α become increasingly decorrelated, in particular in sign. Assuming random signs between the p α and the t α and using that p α ∼ 1/ √ 2 N , we find that the second term in the above equation is at most of order 1/ √ 2 N and is therefore negligible. Eq. (17) follows directly.
We end this subsection by proving a weaker but exact statement valid for any circuit. Schwartz inequality implies that, from which we obtain, The Eq. (22) bound is exact, but saturating this bound in practice implies that all the terms p α t α interfere constructively which is not realized in actual circuits. Eq. (22) implies that from which one can prove that, The exact statement Eq. (24) can be useful for small depth circuits where the actual decrease of the fidelity F(n) is indeed linear with n, before one enters into the true exponential regime.

B. Other fidelity metrics
So far we have used the overlap F between the exact state |Ψ P and our approximate state |Ψ T as our metric for the fidelity of the calculation. It is a natural metric as it measures the probability for the approximate state to be in the exact state one. It is bounded 0 ≤ F ≤ 1 and is nicely related to the probabilities per gate f n through the formulas of the preceding subsection.
However F cannot be directly measured experimentally, so that other fidelity metrics must be designed. Indeed, in an actual quantum computer, the only existing output are samples of bitstrings x = i 1 i 2 ...i N distributed according to | x|Ψ T | 2 . A natural metric is the cross entropy defined as Cross entropy is a standard tool of machine learning and has several interesting properties. First it is measurable through sampling as where the x m are the output of the quantum computer when the experiment is repeated M times. Second, the cross entropy between two distributions | x|Ψ T | 2 and | x|Ψ P | 2 is maximum when the two distribution are identical. Hence it is a genuine measure of the likelihood of the two distributions. Cross entropy was proposed in [3] as a fidelity metric. Note however that the cross entropy is not a symmetric function of the two distributions. In particular it is strongly affected by particular configurations x where | x|Ψ P | 2 is very low but | x|Ψ T | 2 is not. Cross entropy was eventually abandoned by the Google team and replaced [4] by the cross entropy benchmarking (XEB) defined as XEB is also sampleable and is symmetric with respect to the two distributions. When the approximate state is the uniform distribution, the XEB metric vanishes, B = 0 indicating a total lack of fidelity. However, when the approximate state is actually exact, the value of the XEB metric can be arbitrary. When the approximate state is exact and distributed according to the Porter Thomas distribution (which happens in our circuits after a few cycles), then the XEB metric gets a well defined B = 1 0 10 20 30 40 50 D value. The XEB metric is not in general a good measure of the likelihood between two distributions: for a given perfect state, it is maximum when the approximate state is sharply peaked around the values of x where the perfect state is maximum. In our circuit the initial value of XEB is exponentially high B = 2 N − 1 and quickly decreases as the distribution approaches the Porter-Thomas one. Calling D * the depth after which XEB has reached unity (ideally D * would the depth after which | x|Ψ P | 2 corresponds to Porter-Thomas), we find empirically that Equation (28) could be used to estimate the actual fidelity F from XEB measurements. Figure 7 show an example of calculations contrasting the fidelity F with the XEB metric. Here we have used no truncation but added some noise on the two qubit gate so as to induce a finite fidelity per gate f . We find that both F and XEB decay exponentially with consistent decay rates. However, the large difference of the initial values at D = 0 leads to a shift of the fidelity which is significantly lower than the XEB curve. This shift increases as the fidelity is lowered and corresponds typically to one order of magnitude for a typical experimental value f = 99%.

V. RANDOM TENSOR THEORY OF ∞
We now turn back to the discussion of the asymptotic value f ∞ reached by the two-qubit gate fidelity in our calculations. The first remark of importance is that f ∞ is a property associated with a single tensor of the full MPS state: if we apply a gate between qubit i and qubit i + 1, only the associated T tensor defined in Eq. (10) comes into play. Since the whole goal of our quantum circuit is to scramble the wavefunction as efficiently as possible, a natural hypothesis is that the tensors M (i) and M (i + 1) become eventually well described by totally random tensors. In this section we explore this possibility and calculate the properties of the associated tensor T as well as the corresponding two-qubit gate fidelity f GTE . We find that the distribution of singular values of T obtained from the random ensemble closely matches what we observe in the MPS state.
In the spirit of random matrix theory [22,23], we introduce the Gaussian tensor ensemble (GTE) where a tensor M i µν is supposed to be totally random. The GTE can be sought as a "worse case scenario" where the quantum circuit is so chaotic that the tensors are left with no structure. In the GTE, the tensor M are distributed according to where the sum over ν spans 1 . . . χ, the sum over i spans 0, 1 and the sum over µ span 1 . . . βχ. In the remaining of this section, we restrict ourselves to β = 1 which corresponds to the tensors of Eq.(6). We shall have an example of β = 2 for the grouped-qubit algorithm we will discuss in section VI. From two such tensors, we apply a two-qubit gate following Eq.(8)-(12) constructing the associated tensor T and T and the SVD of T . From the 2βχ singular values S µ of T , we can obtain the associated fidelity f GTE through Eq. (15). Fig. 8 studies the distribution of the singular values S µ for tensor T obtained from the GTE. The singular values are sorted in order of decreasing magnitude and plotted as a function of the index µ = 1, . . . , 2χ. Plotting χS 2 µ as a function of µ/χ, we observe that all the different values of χ collapse onto a single curve. In other words, we find that there is some function g(x) such that This scaling is already valid for rather small values of χ. This observation can probably be put on firm mathematical grounds -it is consistent with the usual scaling of the semi-circular law for the GUE ensemble -but for the moment it is merely an empirical statement made from numerical evidence. It follows from this scaling that f GTE very quickly converges to .
In other word, one finds a finite value of the fidelity that is independent of χ. FIG. 8. Singular value S 2 µ of the matrix T obtained from the GTE ensemble. We find a perfect scaling of the form S 2 µ = g(µ/χ)/χ where µ is the index of the µ th singular value. The two bundles of curves correspond respectively to the CX ,CZ gates (two non-zero eigenvalues) and the iS π/6 /iS gates (four non-zero eigenvalues). Within one bundle, the different curves are indistinguishable.
other hand on the two-qubit gate used. Control-Z (C Z ) and control-NOT (C X ) are equivalent (they are related to each other through a change of basis of the second qubit) and corresponds to f GTE = 96.2%. Gates like the iSWAP gate (iS) or iSWAP followed by a π/6 rotation over the z-axis (iS π/6 , close to what is used in [4]) have 4 different singular values which roughly doubles the error with respect to C Z (f GTE = 93.2%). Fig. 9 shows how the distribution of the singular values in the GTE compares to the one obtained in the MPS simulation. We find a close agreement between GTE and the MPS simulations when looking at the T tensor for a gate in the center of the system and at large depth. The agreement is not perfect however, and we observe that the asymptotic fidelity of MPS simulations is always better than the one found in GTE, To try and understand why the inequality in Eq. (32) is not saturated, we plot in Fig. 9  therefore we never reach the "worse case scenario" of the GTE.
To summarize, f GTE can be thought as a lower bound for the fidelity found in the simulations for large enough χ (typically χ ≥ 300 in practice) and large enough depth. Getting beyond the asymptotic value requires algorithms that have an exponential cost. In the following section we describe possible strategies.

VI. ALGORITHMS FOR GETTING BEYOND ∞
The algorithm discussed above can also be used for 2D arrays, since any two qubit gates between distant qubits can always be written as a combination of gates on neighboring qubits using SWAP gates. However, this is inefficient and leads to a decrease of the effective f as the transverse dimension of the 2D array increases. Another limitation of the above algorithm is that one cannot efficiently simulate systems that have a fidelity above f ∞ .
There are multiple strategies that could be used to go beyond the above algorithm. In particular, recent progress in the algorithms for contracting tensor networks, such as [9] could be interesting candidates in 2D. Below, we follow a very simple strategy where we keep using MPS states, but group the qubits so that each tensor now represents several qubits.

A. Grouped MPS state
We now consider the MPS structure sketched in Fig. 1c where each tensor addresses several qubits. We now have P ≤ N tensors M (n) each addressing N n qubits with P n=1 N n = N . The tensors M (1) and M (P ) possess N n + 1 indices while the others possess N n + 2 indices, The number of elements of these tensors is χ 2 2 Nn so that the computing time now increases exponentially with the number of qubits per tensor. On the other hand, the two qubit gates that are performed inside a given tensor M (n) are now handled exactly, so that the average fidelity of a circuit increases.
To perform a two-qubit gate between neighboring tensors M (n) and M (n + 1), one proceeds in three steps. The first two are shown diagrammatically in Fig. 10. In the first step, one performs a QR decomposition of the two tensors to "extract" smaller tensors corresponding to the involved qubits. Assuming (without loss of generality) that the two qubit gate involves qubit N n of tensor M (n) and qubit 1 of tensor M (n + 1), one decomposes M (n) as where the "vectors" of Q(n) indexed by σ are orthonormal. The important point here is that the index σ takes only 2χ values. Similarly, we write: The second step follows Eqs. (8)- (12) of the algorithm of Section III with the replacement M (n) → R(n) and M (n + 1) → R(n + 1), and is shown for the present case in Fig. 10(b). In the last step the new tensors M (n) and M (n + 1) are obtained by contracting Q(n) with R (n) and R (n + 1) with Q(n + 1).
The main difference between the algorithm of Section III and the grouped MPS algorithm is that the resulting tensor T of Eq. (10) now has 4χ singular values instead of 2χ. As a result, upon truncation to keep only χ singular values, we anticipate that the fidelity per gate will be smaller than in the 1D case. However, as we shall see, this decrease will be more than compensated by the gain of having perfect gates within one tensor. In the terminology of random tensors, the grouped MPS algorithm corresponds to β = 2. For the C Z gate, the GTE fidelity drops from f GTE (β = 1) = 96.2% down to f GTE (β = 2) = 87.4%. . In (a) the grouped MPS tensors M (n) and M (n + 1) are exactly factorized using QR decompositions, such that the R(n) and R(n + 1) tensors carry the qubit indices acted on by the gate and the newly introduced indices σ and σ range over 2χ values. In (b) the gate acts on the product of R(n) and R(n+1), and the resulting tensor is factorized using an SVD truncated to χ singular values. Finally, to update the MPS (not shown), one computes the new tensors M (n) = Q(n)R (n) and M (n + 1) = R (n + 1)Q(n + 1) which diagrammatically looks like step (a) but in reverse.

B. Application to a two dimensional circuit
We now show the results of simulations performed on a 2D circuit. To put the results into the perspective of what can be achieved experimentally, we choose a circuit very close to the one used by the Google team in their "supremacy" experiment [4]. We consider a 2D grid of 54 qubits as shown in Fig. 11a. The circuit is shown in Fig. 11b and alternates one-qubit gates applied to each qubit (same distribution as in the 1D case) with two-qubits gates (Control-Z) applied on different pairs of qubits according to the color shown. Except for the choices of one-and two-qubit gates, and the number of qubits (53 versus 54), the setup is identical to the "supremacy sequence" of the Google experiment [4]. In Ref. [4] a XEB fidelity B = 0.002 was reached after a depth D = 20 corresponding to a total of 430 two-qubit gates. Ignoring the difference between XEB and the fidelity F, this translates into av = 1.4% which we shall use as our reference value to evaluate the performance of the grouped MPS algorithm. Fig. 11c shows various strategies for grouping the qubits. The [1] 12 grouping corresponds to 12 tensors that contains one column of qubit each (i.e. alternatively 5 and 4 qubits). The [6,6] grouping is the most expensive computationally with two tensors of 27 qubit each. Note that the tensors on the edges are less computationally costly than the middle ones, since they only have one bond index. The results of the simulations are shown in Fig. 12 for a depth of D = 20. While the error rate is significantly larger than in the 1D case, we find that it can be brought down to less than 1.4% (which corresponds to a global fidelity of F = 0.002) on a single core computer. The computing times of the data points of Fig. 12 range from a few seconds to less than 48 hours for the most expensive points on a non-parallel code (single core calculation). We find that the grouping strategy is effective, but not as efficient as the maximum gain that one could expect: even though some of the gates become perfect upon grouping, we observe a decrease of the fidelity for the noisy gates which reduces the overall gain. For χ = 320 and the [4,2,2,4] partition where the final fidelity is slightly better than F = 0.002 (see Fig. 12), the memory footprint of the calculation is 4.5 GB of memory which represents only 1.5 × 10 −6 percent of the size of the total Hilbert space spanned by the 2 54 qubits.
To conclude this section, we have shown that for the Control-Z gate a simple grouping strategy allows one to reach the same fidelity as the Google experiment [4] in a matter of hours on a single core computer (i.e. f av ≥ 98.6%). Note that the Google experiment was not performed with Control-Z but with more complex gates that produce more entanglement. (In the experiment a different choice of gates was used for each pair of qubits such as to optimize the fidelity.) Reaching a similar fidelity for such gates is beyond the possibility of a single core computer with the present algorithm; it would require a parallel implementation that we leave for further investigations.

VII. DISCUSSION
In this work, we have discussed a practical algorithm that allows to simulate a quantum computer in a time which grows linearly with the number of qubits N and the depth D at the cost of having a finite fidelity f . The fidelity f can be increased at a polynomial cost up to a finite value f ∞ ; increasing it further has an exponential cost in the fidelity. Our main observation is that fidelities of the order of 99%, which are typical fidelities found in state of the art experiments, can be reproduced at a moderate computational cost.
Is a fidelity of 99% large or small? From an experimental physics perspective, it is certainly quite an achievement to keep several dozen qubits at this level of fidelity. From a quantum information and classical algorithms point of view, a question is what is the level of entanglement-hence the actual fraction of the Hilbert space that can truly been accessed-associated with this level of fidelity. Our MPS ansatz can provide an estimate (or at least an upper bound for one may come up with better algorithms) for this fraction. Since the FIG. 11. a) Sketch of the quantum circuit with 54 qubits in a 2D grid. The qubits are represented bu the black dots while the two-qubit gates by the color links. b) The circuit alternates one-qubit gates (black dots) with two-qubit gates (here the Control-Z gate). The depth D counts the number of two-qubit gats per qubit. c) Different grouping strategies for the group MPS algorithm. [1] 12 corresponds to a grouping in 12 blocks counting 1 column each; [4,2,2,4] corresponds to a grouping in 4 blocks counting respectively 4, 2, 2, and 4 columns.
MPS ansatz only spans a very tiny fraction of the overall Hilbert space, it follows that the computational power associated with fidelities in the 99% range is much more limited than the full size 2 N of the Hilbert space would suggest. We conclude that increasing the computational power of a quantum computer will primarily require increasing the fidelity/precision with which the different operations are performed [24]. Secondarily, one could try to improve its connectivity with e.g. quantum buses [25] as we have seen that 1D simulations are far easier than 2D ones. However, increasing the number of qubits will remain ineffective until better fidelities have been reached.
As a side comment, our approach could also be used to get lower bounds for quantum error correction (QEC) schemes [26]. Suppose that for a certain connectivity, one has an algorithm that can reach a fidelity f in polynomial time in N and D. Then, it is reasonable to expect that any QEC code has a threshold p > f . If it were not the case, one could build a logical quantum computer with a classical one at a polynomial cost by simply simulating the QEC protocols on the classical computer. In this respect, extending our approach to a truly 2D algorithm (beyond the quasi-1D one discussed in this article) would be particularly interesting. Indeed, 2D surface codes have a particularly low threshold p ≈ 99%. How close to f = 99% can one get at a polynomial cost in 2D is currently an open question.
Finally, it would be interesting to perform a similar study, but of the practical approximability by MPS of circuits designed for useful tasks. Goals could include estimating minimum fidelities needed to perform these tasks with a high probability of success and finding crossovers where useful quantum algorithms could offer advantages over classical approaches.

ACKNOWLEDGMENTS
XW and YZ thank the Flatiron CCQ where this work was initiated during summer 2019. XW acknowledges funding from the French ANR QCONTROL and the E.U. FET open UltraFastNano. Numerical results involving MPS were obtained using the ITensor library [27]. The Flatiron Institute is a division of the Simons Foundation.