Decomposing large unitaries into multimode devices of arbitrary size

Decomposing complex unitary evolution into a series of constituent components is a cornerstone of practical quantum information processing. While the decompostion of an $n\times n$ unitary into a series of $2\times2$ subunitaries is well established (i.e. beamsplitters and phase shifters in linear optics), we show how this decomposition can be generalised into a series of $m\times m$ multimode devices, where $m>2$. If the cost associated with building each $m\times m$ multimode device is less than constructing with $\frac{m(m-1)}{2}$ individual $2\times 2$ devices, we show that the decomposition of large unitaries into $m\times m$ submatrices is is more resource efficient and exhibits a higher tolerance to errors, than its $2\times 2$ counterpart. This allows larger-scale unitaries to be constructed with lower errors, which is necessary for various tasks, not least Boson sampling, the quantum Fourier transform and quantum simulations.

Unitary transformations are the basis of quantum information processing and quantum simulation.While transformations on a small number of modes are relatively straight forward, many algorithms and applications require the implementation of joint unitary transformations on a large number of modes; pertinent examples include Boson sampling [1,2], the quantum Fourier transform [3,4], quantum photonic simulation [5] and, outside of quantum photonics, neuromorphic computing [6].Typically these large n × n unitaries are constructed from a decomposition into a collection of smaller 2 × 2 unitaries.While this approach is used independent of the physical platform, (see e.g.[7]) it is very common in linear optics where 2 × 2 transformations can be easily implemented by beam splitters and phase shifters, the established building blocks of linear optics.In their seminal paper, Reck et al. demonstrated how to mathematically decompose any n × n unitary in a product of n (n − 1) /2 2 × 2 subunitaries.Their decomposition results in a triangular array of beam splitters and phase shifters, which can be programmed to implement an arbitrary linear transform of optical modes [8].This scheme has been refined by Clements et al. enhancing the loss tolerance using a symmetric arrangement of 2×2 splitters [9].Furthermore, the length of the circuit could be shortened with symmetric 2 × 2 splitters [10].Indeed, with the advent of integrated optics, in which many beam splitters and phase shifters can be implemented on a small footprint, large and complex unitary operations have been demonstrated using this approach, not least linear optics quantum computing [11][12][13], Boson sampling [14][15][16], quantum simulation [17][18][19], and neuromophic computing [20,21].Nevertheless, while sources and detectors scale linearly with the network dimension n, the required number of beam splitters and phase shifters scales with O(n 2 ), in order to implement an arbitrary n × n unitary.It is therefore pertinent to investigate building large unitaries starting from larger building blocks.
Beyond decompositions of large unitaries into arrays of beam splitters, other approaches such as multiport integrated devices and 3×3 fiber tritters have also been investigated [22] in the context of quantum interference [23][24][25][26].Higher-order modal manipulation of quantum light has also been investigated beyond the spatial degree of freedom, which is highly promising for experimentally implementing unitaries of larger size.For example, methods for manipulating orbital angular momentum modes have shown up to seven modes [27], however generalised manipulation remains challenging [28].In the frequency degree of freedom, operations on 10 modes have been shown [29], while operations on hybrid time-frequency modes have also been demonstrated up to 64 modes [30][31][32][33].This begs the question: how can larger n × n unitaries be constructed from m × m constituent unitaries, where m > 2?
The answer to this question becomes practically relevant only when the m×m constituent device outperforms (by some reasonable metric) its own decomposition into 2×2 components.In other words, if the performance cost (e.g.loss, fidelity, production cost, etc) associated with producing a m×m unitary is greater than the m(m−1) 2 different 2 × 2-unitaries, the 2 × 2 decomposition is more efficient.Nevertheless, once it becomes cheaper to directly fabricate a device realizing an m × m unitary compared to building it out of 2×2 phase shifter/beam splitter cascades, the question how one efficiently can build an n × n arXiv:2309.12440v1[quant-ph] 21 Sep 2023 unitary from its m × m subunitaries is of the utmost relevance.Indeed, in all the aforementioned physical implementations, the number of possible modes is physically restricted far below the size the desired matrix size for quantum computing applications.
In this paper, we generalise unitary decomposition of an n × n unitary into m × m submatrices, where n > m ≥ 2. This provides a significant scaling advantage whenever it is cheaper to produce a m × m unitary directly, compared to building it out of 2 × 2 unitaries.We provide an algorithmic approach to find this decomposition, which uses the best-known minimum number of submatrices.We also show that quality thresholds exist when comparing larger devices to the established beam splitter decomposition.This provides a route to implementing large scale devices, which, by reducing the total number of components, are more tolerant to the errors caused by the components individually.
The algorithm to decompose a unitary n × n−matrix U into a product of smaller unitary matrices of dimension m × m runs as follows: The key task is to find m × m−matrices Q1 , . . ., QN (properly embedded as n × n−matrices) such that U Q1 . . .QN is an upper triangular matrix.Note that any unitary upper triangular matrix is automatically a diagonal unitary matrice D. Such diagonal matrices are experimentally easy to realize because it consists only of a phase shift in each individual mode.Summarizing, our algorithm will allow to write the large unitary U as 1 thus we have factorized U into m×m-unitaries and final phase shifts.
In order to achieve upper triangular matrices we use the RQ-decomposition which, for any m × m−matrix A, ensures the existence of a unitary matrix Q and an upper triangular matrix R (i.e.R ij = 0 for i > j) such that A = RQ −1 (see e.g.[34, Section 5.2]).In particular, AQ is upper triangular so that we can transform any matrix into an upper triangular one by right multiplication with a unitary.
We now describe how to use the RQ-decomposition to create zeros at predefined positions in a large unitary matrix U .We will use this multiple times to create zeros at all places below the diagonal.Let us fix m columns 1 ≤ j 1 < . . .< j m ≤ n and a base row i ∈ {m, . . ., n}.This choice gives rise to a m × m-matrix A consisting of the entries of U which are contained in the columns j 1 , . . ., j m and in the rows i − m + 1, . . ., i (see Fig. 1 for an illustration of this embedding for m = 3).Our goal is to transform this matrix into an upper triangular form.First, by the RQ-decomposition, there is a unitary matrix Q such that AQ = R is upper triangular.We now show how to properly embed the matrix Q into an (n × n)−matrix such that we can create zeros in our original matrix U .For this let q kℓ denote the entries ofthe m × m matrix Q and build an n × n-matrix Q with entries qkℓ as follows: Start with the identity matrix and set qj k ,j ℓ := q kℓ , i.e. we embed Q into the identity matrix at the m × m-submatrix given by the entries having their row and column coordinates both in {j 1 , . . ., j m }.
Thus, U Q has an upper triangular m × m-submatrix.More precisely, the entries in row l and column j k are zero for l = i − m + 1 + k, . . ., i and k = 1, . . ., m.Note that the multiplication with Q from the right only affects the columns j 1 , . . ., j m of U .Moreover, if there are only zeros in these columns below the ith row, all these zeros are maintained by this multiplication.To state this differently, if U has zeros at (l, j k ), k = 1, . . ., m, this is equivalent to saying that an input state that occupies only the modes j 1 , . . ., j m is transformed to an output state where the lth mode is not occupied.Since Q only affects the input modes j 1 , . . ., j m this property is preserved if we apply the matrix Q before applying U which means that we consider U Q.
The algorithm to find the matrices Q i we are presenting works similarly to a tetris game: In each step we create zeros in a specific row by inserting blocks of zeros using the technique previously described.If m = 2 then this algorithm is exactly the algorithm of [8].We start by creating zeros in the bottom row.By selecting the first m columns and using the RQ-decomposition as above, we can build an upper triangular m × m-block in the lower left corner of U (see Fig. 2).This gives us the matrix Q1 .We now proceed in the same way with the next m columns without zero entries to create more and more triangle-shaped blocks of zeros in that row until the number m ′ of remaining non-zero entries is less that m and use (potentially) one more matrix of size m ′ to fill in the remaining zeros -a triangle-shaped block of size m ′ .As we already created some zeros in the penultimate row, we only have to create new zeros at the places which are not already covered.Here we have to split up the triangle-shaped block and use selected columns j 1 , . . ., j m as described in the previous paragraph.In general, the algorithm is structured as follows: 1. Let i ∈ {2, . . ., n} be the smallest value such that in the rows i + 1, . . ., n all entries below the diagonal are zero.In the first step described above we generically have no zeros in the last row, i.e. i = n.
2. Consider the ith row and pick the first m non-zero entries in that row which are on the left hand side of the diagonal or on the diagonal.Denote the corresponding columns by j 1 , . . ., j m .If there are just 2 ≤ m ′ < m non-zero entries left in that row, proceed with m ′ instead of m.
3. Choose a unitary matrix corresponding to the row i and the columns j 1 , . . ., j m to create an upper triangular block of size m, as described above.
4. Repeat until U is transformed into an upper triangular matrix.Each m × m-submatrix Qi creates 1 2 (m − 1)m zeros in the matrix U .In total we have to create 1 2 (n − 1)n zeros to end up with a diagonal matrix.Hence, we need at most n(n−1) m(m−1) matrices Qi which come from a m × mmatrix.In addition, our algorithm requires at most one m ′ × m ′ -submatrix with m ′ < m for each row.Hence, we end up with at most n(n−1) m(m−1) + n − 1 matrices Qi , see Fig. 4.
If c m is the relevant cost of a m × m unitary then the total cost for constructing the n × n matrix out of m(m−1) + n − 1) (under the assumption that the total cost is linear in the number of utilized devices).Recall that the Reck or Clements scheme requires 1  2 n(n − 1) different 2 × 2-unitaries which leads to a relevant cost of C n,2 = c2 2 n(n − 1) for the realization of a 2 × 2 unitary.Comparing these two cost function one sees directly that whenever c m < 1 2 m(m−1) (i.e. the m×m device is cheaper then building the matrix out of 2 × 2 matrices) one sees that for large n, C n,m < C n,2 , i.e. it is advantageous to build the n × n matrix out of m × m instead of the traditional 2 × 2 beam splitter phase shifter cascades.
We numerically implemented the above described algorithm in order to test its performance as well as its robustness against perturbations: For this, we randomly chose a n × n-unitary matrix U and decomposed it into m × m-unitaries Q i as described above.We then perturbed the calculated m × m-submatrices Q i by adding random numbers from a normal distribution with specified width ("noise strength") to real and imaginary part of the entries, respectively.As a figure of merit for the strength of the perturbation we consider the fidelity of the perturbed submatrices Q pert and the original subma- This will of course vary for any realized perturbation so we take the expected reconstruction fidelity F Q := E(F (Q, Q pert )) as a measure for the precision of our individual components Q. Next, we reconstruct a (n×n) matrix U pert from the Qpert .The final metric to analyse the robustness of U is defined by F (U, U pert ) respectively by the expected fidelity F U := E(F (U, U pert )) and plotted in Figs. 5 and 6.In the first figure, we fixed the matrix size at n = 50 and plot the dependence of the reconstruction fidelity F U as a function of the component quality F Q for different submatrix size m = 2, 3, 5, 10.It can be clearly seen from the figure that the reconstruction fidelity of F U drops quickly with component quality F Q .The smaller the submatrices are (i.e. the smaller m) the steeper is the fidelity drop.Thus, already the increase in component size from m = 2 to m = 3 improves the fidelity of the final unitary significantly and proves a much higher robustness of the reconstructed matrix U pert .Second, we analysed the question, how big can the final system size (i.e.unitary size n) become, given components of size m achieving a specified quality F Q .The results for a fixed value F Q = 0.95 ± 0.0005 and m = 2, 3, 5, 10 are presented in Fig. 6.Again, we observe already for m = 3 a significant advancement in fidelity which enables the realisation of much bigger matrices, i.e. quantum networks, at reasonable fidelities.The advantage increases with the size of the submatrices, as expected.In conclusion, we have presented an algorithm to decompose large n × n-unitaries into smaller constituent m × m-subunitaries.We have shown that these become more tolerant to loss and errors as m increases, yielding the intuition that larger unitaries are more effectively building from larger building blocks.This has implications for building large-scale unitary dynamics on quantum systems, in particular linear optics, where the decomposition in terms of 2 × 2 beam splitters and phase shifters has become ubiquitous.Exploring other devices which intrinsically operate on a larger set of modes simultaneously is thus highly advantageous, and may simplify the path towards practical large scale devices.

Figure 2 .
Figure 2. Decomposition for m = 3 and n = 5 with experimental setup.Here, the φi stand for the phase shifts which together form the diagonal matrix D.

Figure 3 .
Figure 3. Illustration of the algorithm using triangle blocks for m = 4 and n = 13.

Figure 4 .
Figure 4. Scaling behaviour of our algorithm for the number of elements to construct a unitary of size n according to n(n−1) m(m−1) + n − 1

Figure 5 .
Figure 5. Numerical simulations of the fidelity for n = 50.For each data point we reconstructed 100 random unitary matrices U , each with 20 different perturbations and averaged over all 2000 reconstructions.The size of the calculated submatrices is indicated in legend.The error bars of the statistical fluctuations are smaller than the symbol size

Figure 6 .
Figure 6.Numerical simulations of the fidelity for increasing n ≥ m at a fixed component quality FQ = 0.95 ± 0.0005.For each data point we reconstructed 100 random unitary matrices U , each with 20 different perturbations.The size of the submatrices is indicated in legend.The error bars of the statistical fluctuations are smaller than the symbol size.