Quadratic speedup for simulating Gaussian boson sampling

We introduce an algorithm for the classical simulation of Gaussian boson sampling that is quadratically faster than previously known methods. The complexity of the algorithm is exponential in the number of photon pairs detected, not the number of photons, and is directly proportional to the time required to calculate a probability amplitude for a pure Gaussian state. The main innovation is to use auxiliary conditioning variables to reduce the problem of sampling to computing pure-state probability amplitudes, for which the most computationally-expensive step is calculating a loop hafnian. We implement and benchmark an improved loop hafnian algorithm and show that it can be used to compute pure-state probabilities, the dominant step in the sampling algorithm, of up to 50-photon events in a single workstation, i.e., without the need of a supercomputer.

In addition to experimental implementations significant progress has also been made in developing classical simulation algorithms. For boson sampling, a first method was an approximate Markov chain Monte Carlo algorithm [27]. This was improved in Ref. [28], where an exact sampling algorithm was introduced such that the complexity for generating one sample with N photons is equivalent to that of calculating an output probability amplitude. This in turn is equivalent to computing the permanent of an N ×N matrix, which requires O(N 2 N ) time using the best known methods [29].
An effort to obtain simulation techniques for GBS has also been pursued. An exact algorithm has been reported and implemented for GBS with threshold detectors [30,31], but it suffers from exponential memory requirements. Two algorithms were also proposed in Ref. [32] for a restricted version of GBS. The first one has polynomial space complexity and O(poly(N )2 8N/3 ) time complexity; the second has exponential space com- * Equal contributors plexity and O(poly(N )2 5N/2 ) time complexity. Recently, an exact sampling algorithm for GBS was presented that requires only polynomial memory [33]. This algorithm shows an improved complexity proportional to O(N 3 2 N ) for generating one sample with N photons. These GBS algorithms mark a crucial difference with respect to boson sampling: in GBS with pure states, the probability of observing an outcome with N photons is proportional to the loop hafnian of an N ×N matrix, for which the best algorithms require O(N 3 2 N/2 ) time. In other words, there is a quadratic gap between the complexity to generate a sample and the complexity to compute an output probability. This suggests the possible existence of a better sampling algorithm with complexity matching that of calculating an output probability amplitude, similar to what has been achieved for boson sampling.
We present such an algorithm. The main insight is to introduce conditioning auxiliary variables obtained by performing a virtual heterodyne measurement in all modes and iteratively replace the heterodyne outcomes with photon number measurements. The photon number outcomes conditional on the heterodyne measurements are given by pure-state probabilities. We develop a chain of conditional probabilities, where at each step only probabilities of a pure Gaussian state need to be calculated. For outputs with N photons, these are proportional to loop hafnians of N × N matrices, which can be calculated in O(N 3 2 N/2 ) time [34]. In general, the number of output probabilities that must be calculated is proportional to the number of modes m, which leads to a time complexity upper bounded by O(mN 3 2 N/2 ). This corresponds to a quadratic speedup over the previous state-of-the-art. It also suggests that, compared to boson sampling, roughly twice as many photons are needed in GBS to reach the regime where classical simulations become intractable. We implement an improved version of a loop hafnian algorithm and use it to compute pure-state probabilities of events with up to 50 photons using a workstation with 96 CPUs.
In what follows, we begin by giving a short overview of GBS in Sec. II. We then describe our simulation algorithm in Sec. III, which constitutes our main result. In Sec. IV, we benchmark an algorithm for computing loop hafnians, which determine the most expensive step for simulation, and finally present a discussion of our findings in Sec. V.

II. GAUSSIAN BOSON SAMPLING
The quantum state of a collection of m optical modes can be uniquely described in terms of its Wigner function [35,36]. Gaussian states are states whose Wigner function is a Gaussian distribution. They can be described by a covariance matrix V and a vector of means R = q p , whereq,p are the mean canonical position and momentum vectors. It is also possible to express the covariance matrix in terms of the complex amplitudes α = 1 √ 2 (q + ip) ∈ C m , which are described by a complex-normal distribution with mean α = 1 √ 2 (q + ip) ∈ C m and covariance matrix [37], where Á is the identity matrix. Gaussian Boson Sampling (GBS) is a form of photonic quantum computing where a Gaussian state is measured in the photon-number basis. Ifᾱ = 0 , the probability of observing the output sample S = (s m , . . . , s 1 ), where s i is the number of photons in mode i, is given by [11] where and A S is the matrix obtained as follows: if s i = 0, rows and columns i and i + m are deleted from A; if s i > 0, the rows and columns are repeated s i times. The hafnian of a square symmetric matrix of even dimension n is defined as where PMP(n) is the set of perfect matching permutations [34].
Defining N := i s i as the total number of photons, the submatrix A S has dimension 2N , meaning that the best algorithm for calculating its hafnian requires O(N 3 2 2N/2 ) = O(N 3 2 N ) time. However, when the Gaussian state is pure, it is possible to write A = B ⊕ B * and the probability distribution reduces to [11] where B S is constructed analogously: if s i = 0, rows and columns i are deleted; if s i > 0, the rows and columns are repeated s i times. Since the matrix B S has dimension N , computing its hafnian requires only O(N 3 2 N/2 ) time.
An analogous formula can also be derived for GBS with displacements, i.e., whenᾱ = 0. We define the quantities: In this case, the output probabilities are given by [34,38]: where lhaf(·) is the loop hafnian [34], γ S is obtained from γ by repeating the i, i + m entries of γ a total of s i times, and filldiag (A S , γ S ) replaces the diagonal of A S with the vector γ S . The loop hafnian of an arbitrary matrix is defined analogously to the hafnian in Eq. (6), but replacing the set PMP(n) by the set of single-pair matchings SPM(n) [34]. When the Gaussian state is pure, it is possible to express the output probability as whereγ S is obtained fromγ by repeating its i-th entry s i times. Up to constant prefactors, the best known algorithms for calculating loop hafnians of generic matrices have the same complexity as for hafnians, namely O(N 3 2 N/2 ) for matrices of dimension N . This implies that the complexity of computing output probabilities with N photons for pure Gaussian states with displacements also scales as O(N 3 2 N/2 ). We make use of this result in the next section.
So far we have focused only on describing the photonnumber statistics of a multimode Gaussian state. For the sampling algorithm we describe in the next section it will also be necessary to recall some basic properties of continuous-output measurements on a Gaussian state. Of particular relevance here are so-called heterodyne measurements. The outcomes of an m-mode heterodyne measurement are specified by a vector of complex numbers α = 1 √ 2 (q + ip) ∈ C m . Generating heterodyne samples from a Gaussian state is straightforward. For a Gaussian state with vector of meansR and covariance matrix V , we sample from the multivariate normal distribution µ ∼ N (R, V Q ), where µ = [ q p ] and V Q is the Husimi Q-function covariance matrix [39] in the quadrature basis, given by Explicitly, we obtain outcome µ with probability A partial heterodyne measurement can be performed when measuring only a subset of the modes. This can be done by either sampling from the reduced matrix of V Q , formed by selecting only the rows and columns of the included modes, or by sampling all modes and discarding the outcomes for modes we don't wish to sample. It is also useful to write the conditional state of a subset of the modes, A, when the other modes, B, are measured using heterodyne detectors. For this it is convenient to write the covariance matrix V in block form with modes in A and B in separate blocks, and similarly to group the modes in the vector of means. This can be done by permuting the rows and columns in the covariance matrix and the elements of the vector of means, keeping the ordering in both consistent. So we write both the covariance matrix and vector of means in the ordering (q A , p A , q B , p B ): If the outcome µ B is obtained by measuring the modes in the set B, we can write the resulting covariance matrix and vector of means for A as [36] V This update rule together with the sampling in Eq. (14) will be useful in the next section where we introduce our algorithm.

III. ALGORITHM
We now describe an algorithm which samples the modes sequentially, but that, unlike the one from Ref. [33], never requires the calculation of mixed-state probabilities. In this algorithm we introduce partial heterodyne measurements so that only pure-state probabilities need to be evaluated at each step. Given a partial heterodyne measurement on modes in B, the photon number probability of a pattern S in modes A is given by Eq. (12), where B and γ are now found for the state with covariance matrix V (B) A and vector of means R (B) A as given in Eq. (17). Note that if the global covariance matrix V corresponds to a pure state, the conditional covariance matrix V (B) A also corresponds to a pure state.
If we wish to sample in the photon-number basis in modes A without measuring the other modes, we can make use of the observation in the paragraph above: the marginal photon-number probabilities can be obtained by integrating the joint probabilities over the set of possible heterodyne outcomes α B : Hence we can sample from the marginal probabilities by sampling from p(α B ), followed by the conditional probabilities p(s A |α B ) and then simply ignoring the heterodyne outcome. This is justified because one can always sample from a marginal distribution by considering additional virtual variables and then sampling from the correspondingly enlarged probability distribution, as long as one then forgets the values of the added variables. The outcomes of the heterodyne measurements are precisely these added variables; they do not correspond to real measurements in an experimental setup but are rather virtual measurements to introduce convenient conditioning auxiliary variables.
We now want to apply this directly to sampling the modes sequentially. The objective is to sample mode k given the previous modes 1, . . . , k − 1 have already been sampled, so we set A = 1, . . . , k and B = k + 1, . . . , m. We assume that we have already sampled from p(s 1 , . . . , s k−1 , α k+1 , . . . , α m ). We wish to sample s k conditional on this outcome. To do this we can calculate the relative probabilities for all s k and sample from that distribution. This will result in a sample drawn from p(s 1 , . . . , s k , α k+1 , . . . , α m ). By discarding α k+1 , we are left with a sample from p(s 1 , . . . , s k , α k+2 , . . . , α m ) and are ready to sample the (k + 1) th mode. We work progressively from k = 1 to m essentially replacing the virtual heterodyne measurements with photon-number measurements until we are left with a sample from p(s 1 , . . . , s m ).
This algorithm can sample from a pure multimode state, yet in realistic experiments, the full state may not be pure, for example if loss is included. To address this, we express the mixed Gaussian state as a convex combination of pure states. The Williamson decomposition [35,40] of a quantum covariance matrix V states that it can be split as where T = 2 SS T is the covariance matrix of a pure state, S is a symplectic matrix, and W is positive semidefinite. In Hilbert space, this implies that a mixed Gaussian state with covariance matrix V = T + W and a vector of meansR = q p can be expressed as [36,41] where is the probability density function of a multivariate normal distribution, and |ψ R,T is a pure Gaussian state with vector of means R and covariance matrix T . Using this, the probability of observing an outcome (s 1 , s 2 , . . . , s m ) when performing a measurement on a mixed state ̺ on m modes is Hence to sample from a mixed state we can sample the displacement vector from Eq. (21) and then sample from the resulting pure state.
We can now outline the full algorithm for simulating Gaussian boson sampling: Algorithm: To sample from a state given by covariance matrix V and vector of meansR in m modes: 1. If the Gaussian state to be sampled is mixed, calculate the matrices T , W from the Williamson decomposition such that V = T + W . Sample a vector R from the multivariate normal distribution p(R) as in Eq. (21). This can be done in cubic time in the size of the matrix [42]. Continue the algorithm with the pure state given by covariance matrix T and vector of means R.
2. Generate a sample from the probability distribution p(α 2 , . . . , α m ) resulting from heterodyne measurements on modes 2 to m using Eq. (14). This can be done in cubic time in the number of modes.
After repeating steps 3-6 for each k = 1, 2, . . . , m, we are left with an outcome sampled from p(s 1 , . . . , s m ). The correctness of the algorithm follows directly from the definition of conditional probabilities and integration over the auxiliary variables α 2 , . . . , α m as shown in Appendix A.
Overall, the algorithm reduces sampling from the GBS distribution to computing pure-state probabilities p(s k , s ⋆ k−1 , . . . , s ⋆ 1 |α k+1 , . . . , α m ). When a total of N photons are detected, calculating the largest such probability amplitude requires O(N 3 2 N/2 ) time, which results from computing loop hafnians in Step 4. This is scaled up by at most the total number of modes, giving a total sampling complexity of O(mN 3 2 N/2 ). This is a quadratic improvement over the algorithm in Ref. [33] which has complexity O(mN 3 2 N ).
It is worthwhile to compare to the algorithm of Ref. [28] for boson sampling, whose complexity is O (N 2 N ). Up to polynomial factors, our algorithm suggests that the time required to simulate boson sampling with N photons is roughly the same as simulating GBS with 2N photons. We note that our algorithm can also be used to simulate GBS with threshold detectors: once a photon number sample is generated, simply set s i = 1 if s i > 0, where s i = 1 denotes that the detector measured one or more photons.
Finally, note that each random variable s k has support over the non-negative integers, thus in principle one needs to calculate all the (infinitely many) conditional probabilities. However, we can choose some cutoff number of photons d in each mode such that the probability of getting a photon number above this value is negligible. We confirm the accuracy of the algorithm in Fig. (1) where we show that the total variation distance lies within the expected range for the sample size as compared to a brute force sampler with the same number of samples. This shows that the chosen cutoff was sufficiently high to not introduce any notable error.

IV. BENCHMARKING
In this section, we test the performance of a new implementation of the loop hafnian algorithm of Ref. [34], which is available in The Walrus [43]. The evaluation of loop hafnians is delegated to multi-threaded C++ code which uses the La Budde algorithm [44] for calculating the characteristic polynomial of a matrix. This gives a speedup of about three times with respect to previous algorithms based on diagonalization, but more importantly, improves significantly the accuracy of the calculation. In the original implementation of the loop hafnian algorithm, which uses double precision and eigenvalue methods for calculating power traces, it was found that a relative error of ∼ 10 −1 is present for computing the loop hafnian of the 54 × 54 all-ones matrix [34]. To get around this issue, we use the aforementioned La Budde algorithm to significantly improve the accuracy of the calculation of the characteristic polynomial of a matrix [44]. Moreover, this method allows us to use long double complex data types. With these changes we lower the relative error in the calculation by three orders of magnitude compared to the previous implementation. We can then achieve a precision of one part in ten thousand for the computation of loop hafnians of matrices with dimension 56, as shown in Fig. 2.
As shown in the previous section, the runtime of the algorithm scales exponentially with the number of photons and linearly with the number of modes. Since the number of photons is the dominant parameter, we benchmark the time taken to calculate the largest event. If N photons are detected at the end of the algorithm, probabilities having at most N + d photons need to be calculated, where d is the cutoff.
In Fig. 3 we benchmark the time it takes to calculate the loop hafnian of a random symmetric complex For a physical system with m modes, an upper bound on the runtime can be obtained by multiplying the runtime of the largest loop hafnian calculation by m. However, typically the N photons occur spread out across different modes, so we only calculate the largest loop hafnians for a fraction of the modes. The worst-case occurs when photons are detected in the first N modes, resulting in expensive calculations in the remaining m−N modes. For example, for m = 100 and N = 50, we can estimate a runtime of roughly two weeks for generating a single sample on the workstation.
Finally, the loop hafnian implementation benchmarked here is written for portability and reproducibility. This implies that our implementation of the algorithm is not highly optimized. In the future, we expect to achieve a speed increase of one or two orders of magnitude by low-level optimization of the C++ backend coupled with the use of a modern task-based parallelism library [45] for more efficient load-balancing.

V. DISCUSSION
We have introduced an algorithm for the simulation of Gaussian boson sampling for which the complexity of generating a sample scales, up to polynomial prefactors, like the complexity of calculating a probability amplitude. This results in a quadratic speedup compared to the previous state-of-the-art. The algorithm is exact up to a small error induced by a cutoff dimension, runs in polynomial space, and simulates the most general forms of Gaussian boson sampling.
A remarkable consequence of our result is that Gaussian boson sampling requires roughly twice as many photons than standard boson sampling to reach the same regime of classical simulation. This has potential implications for experimental efforts at demonstrating an advantage over classical simulators. Indeed, for a number of photons N and a number of modes m, the complexity of the best algorithm for boson sampling scales as O(N 2 N ), whereas our algorithm has runtime O(mN 3 2 N/2 ). A possible interpretation is that in Gaussian boson sampling, where photons are generated through squeezing, it is the number of photon pairs that determine complexity; not the number of photons.