Near-optimal quantum circuit construction via Cartan decomposition

We show the applicability of the Cartan decomposition of Lie algebras to quantum circuits. This approach can be used to synthesize circuits that can efficiently implement any desired unitary operation. Our method finds explicit quantum circuit representations of the algebraic generators of the relevant Lie algebras allowing the direct implementation of a Cartan decomposition on a quantum computer. The construction is recursive and allows us to expand any circuit down to generators and rotation matrices on individual qubits, where through our recursive algorithm we find that the generators themselves can be expressed with controlled-not (CNOT) and SWAP gates explicitly. Our approach is independent of the standard CNOT implementation and can be easily adapted to other cross-qubit circuit elements. In addition to its versatility, we also achieve near-optimal counts when working with CNOT gates, achieving an asymptotic cnot cost of $\frac{21}{16}4^n$ for $n$ qubits.


I. INTRODUCTION
Quantum computing relies on a quantum circuit to translate an algorithm to work on a quantum computer.The circuit expresses the physical actions that are necessary to create a particular quantum-mechanical state and whose measurement provides the output of the calculation.Every circuit is equivalent to a unitary transformation in SU (2 n ), where n refers to the number of qubits.The mapping is injective, in the sense that two different circuits can perform the same calculation and correspond to the same transformation U [1].Consequently, circuits of different lengths can perform the same circuit.In most cases, the shorter circuit is preferable, since it reduces the execution time, imprecision due to hardware limitations or, on noisy systems, reductions of noise due to fewer actual operations.Various methods can be employed to optimize circuits [2][3][4]; however, before optimization, the circuit must first be constructed.
There are several ways to construct quantum circuits.One can construct an algorithm along a schema -the well-known algorithms of Shor [5] and Grover [6] work in this way and can be scaled to the required system size by following the schema.Finding new schematic algorithms and showing their speed-up compared with classical methods is its own field of research.So far, the number of discovered algorithms is limited [7,8].
An alternative is to use a parametrized circuit and modify the parameters until the circuit fits the desired output.The approach is called quantum machine learning [9][10][11].Similar to classical machine learning, some quantum circuit ansatz is chosen, often with distinct layers of repeated subcircuits, which is then trained with some classical feedback loop to approximate a desired solution.
In our work we provide a solution to a third approach, decomposing a known unitary matrix into its corresponding quantum circuit.We can build circuits for any arbi- * maximilian-balthasar.mansky@ifi.lmu.detrary target, not just the ones for which we have schemas, and also with known performance, as we know the number of required CNOT gates.Our approach provides a direct method for translating a unitary operation U to an explicit quantum circuit.
The developed algorithm generalizes the unstructured circuit decomposition of a three-qubit unitary, done in [12], to an n-qubit unitary by using a recursive method.The mathematics upon which this recursive algorithm is constructed is based on the work of Khaneja and Glaser [13].Underlying our construction is the Cartan decomposition of a unitary U ∈ SU (2 n ) into four terms K 1 exp(z 1 )K 2 exp(y)K 3 exp(z 2 )K 4 , where all K i are part of the next-lower dimension group K i ∈ SU (2 n−1 )⊗U (1), and z i and y are algebra elements belonging to certain Cartan subalgebras.We show that this argument is recursive and allows us to decompose any unitary into components that can be easily represented in a quantum circuit.This is described in section III A.
The orthogonal elements exp(z i ) and exp(y) in the Cartan decomposition are created through the generators of the Lie subalgebras and will ultimately contain the only cross-qubit elements in the circuit.To express them in terms of circuit elements, we make use of a block diagonal decomposition to the elements of the algebra.This form is easily expressible in terms of CNOTs and elementary rotations, described in detail in section IV.
We also assess the performance of our algorithm as expressed by the number of CNOTs that an arbitrary circuit requires in the worst case.The number of gates can be determined analytically, see section VI.There we also compare our CNOT count to other methods decomposing a unitary [3,[14][15][16][17] and we also provide an outlook of future work in section VII.

II. RELATED WORK
We provide a general overview of some of the most relevant algorithms for the synthesis of general multi-qubit gates; Cosine-Sine decomposition [16], Optimized Quantum Shannon Decomposition (QSD) [17], and Khaneja- 1. Graphical representation of our work.We establish a structure from an arbitrary unitary U into circuit elements of lower dimension (Ki = Ai ⊗ Bi, Ai ∈ SU (2 n−1 ), Bi ∈ U (1)) and n-qubit elements that generate from the algebras f and h.The algorithm is recursive for all Ai and detailed in section III A. Between the recursive elements are the (n − k)-qubit elements, where k refers to the number of recursions.The elements can be expressed explicitly as CNOT and SWAP elements through the block diagonal decomposition explained in section IV.
Glaser decomposition [13].The theory behind the Khaneja-Glaser decomposition is discussed in more detail, as our work relies on the mathematical structure and extends their work to an arbitrary number of qubits.

A. Cosine-sine decompositon
One way to realize a general SU(2 n ) matrix on a quantum computer is via matrix factorization, where the initial matrix is separated into a product of matrices which can be more easily implemented as a quantum circuit.Such a factorization can be recursively achieved by using the Cosine-Sine Decomposition (CSD) [18].In general, the CSD of a SU(2 n ) matrix U can be written as follows: This decomposition can be applied recursively to the sub-matrices U i j until a 2 × 2 block-diagonal form is obtained.
In [18], it is shown that the matrices resulting from the above decomposition can be attained as a product of uniformly controlled rotations.After canceling some of the occurring CNOT gates using reflection symmetries of the circuit, and using a method for implementing uniformly controlled gates described in the paper, the authors show that a general CSD of a SU(2 n ) matrix, as shown in equation (1), can be implemented using 4 n −2 n+1 CNOT gates and 4 n one-qubit gates.

B. Optimized quantum Shannon decomposition
Another way to decompose a generic unitary matrix is by generalizing the concepts of Boolean algebra and logic conditionals to quantum circuits.By interpreting the qubits as the predicates and requiring the action of clauses to be unitary, operations in a quantum circuit can then be interpreted as quantum conditionals.In [17], the authors introduce quantum multiplexors as quantum circuit blocks implementing quantum conditionals, e.g. the CNOT gate is the simplest 2-qubit multiplexor.To perform the decomposition of a unitary matrix, the authors provide a generalization to quantum circuits of the classical Shannon Decomposition theorem, which allows any Boolean function F to be factorized as where x is a variable and x its complement.The proposed Quantum Shannon Decomposition (QSD) theorem states that an arbitrary n-qubit operator can be implemented by a circuit containing three multiplexed rotations and four generic (n − 1)-qubit operators.This provides a method to recursively decompose a generic SU(2 n ) operator.Applying this theorem to the previously discussed CSD, see II A, and by providing a method to implement multiplexed-R y rotations using Controlled-Z gates, the authors showed that the number of CNOT gates required to decompose a SU(2 n ) matrix can be reduced to 23  48 4 n − 3 2 2 n + 4 3 , a significant improvement from the previously discussed CSD.
Both approaches use post-circuit creation optimization to improve their count of operations.We compare the achieved CNOT counts with our own in table I.

III. CARTAN DECOMPOSITION
The Cartan decomposition method is a powerful tool in the realm of Lie group decomposition.It allows us to break down a given Lie group into smaller, simpler subgroups, which can be much easier to work with.This method has found numerous applications in a variety of fields, including physics, engineering, and computer science.Building upon the work of Khaneja and Glaser, we have extended their method, which uses the Cartan decomposition of a Lie group, to be applicable to an arbitrary system size.

A. Khaneja-Glaser decomposition
The underlying mathematics here relies on the work of Elie Cartan [19,20] in French and is by now part of the standard knowledge of physics and mathematics.For an English language introduction to Lie groups and algebras, see e.g.[21].Throughout this paper, let G be a compact semi-simple Lie group with identity e and let g denote its Lie algebra.Moreover, let K denote a compact closed subgroup of G.Note that, given that g is a semisimple algebra there exists, due to Cartan's criterion, a non-degenerate Killing form inducing a bi-invariant metric ⟨•, •⟩ G on G, which allows the sum decomposition of g into subalgebras.
Notation.Throughout this paper, capital letters identify groups.Capital letters with subscripts identify elements of the group.The algebras are denoted by lowercase fraktur letters, elements thereof by lowercase letters.Pauli matrices are referenced by their standard σ i .We also make use of the following notation for generalized Pauli matrices where the Pauli matrix σ x acts on the kth qubit.Matrices for the rotations around σ y and σ z are constructed similarly and denoted y k and z k respectively.Definition III.1 (Cartan decomposition of g).Let g and l be the two real semi-simple Lie algebras of G and K respectively.Then, (g, l) is called an orthogonal symmetric Lie algebra pair if the decomposition g = m⊕l, where m = l ⊥ , satisfies the following commutation relations The direct sum decomposition g = m ⊕ l is then called a Cartan decomposition of the Lie algebra g.Definition III.2 (Cartan subalgebra).Let (g, l) be an orthogonal symmetric Lie algebra pair of the groups G and K.A maximal subalgebra h of m is called a Cartan subalgebra of (g, l).
In [13] it is shown that the Lie algebra su(2 n ) defined by Theorem III.1 (Cartan decomposition of G).Let g be a semi-simple Lie algebra of the group G and let g = m ⊕ l be its Cartan decomposition.Moreover, let h be a Cartan subalgebra of (g, l) and let K be a compact closed subgroup of G.Then, where exp (h) ⊂ G.This decomposition is then called the Cartan decomposition of the Lie group G.
The decomposition of the Lie group G into two groups K linked by a determinable element of the algebra is the heart of our algorithm.In terms of the actual elements of the group, we obtain a structure as below.
Corollary III.1.Let U ∈ SU (2 n ) be an n-qubit unitary operator.Then it has a decomposition where K i ∈ exp(su l (2 n )) and for some y ∈ h, where h is a Cartan subalgebra of (su(2 n ), su l (2 n )).
In [13] it is proven that so that the unitaries K i have again a Cartan decomposition.This provides a recursive algorithm for determining a unitary U ∈ SU (2 n ) by successive decompositions.
Theorem III.2.The direct sum decomposition where The proof of this theorem can also be found in [13].
Corollary III.2.Let V ∈ exp(su l (2 n )) be an n-qubit operator.Then it has a unique decomposition where K i ∈ SU (2 n−1 ) ⊗ U (1) and for some z ∈ f, where f is a Cartan subalgebra of (su l (2 n ), su l0 (2 n )).
Corollary III.3.Let U ∈ SU (2 n ) be an n-qubit unitary operator.Then it has a decomposition where In order to define a Cartan subalgebra in the product operator basis for the pairs (su(2 n ), su l (2 n )) and (su l (2 n ), su l0 (2 n )) we proceed analogously as in [13].The elements of the Cartan subalgebra can be generated recursively by the following equations: This decomposition structure allows us to express any n-qubit unitary in terms of (n − 1)-qubit unitaries and elements of orthogonal algebras.The circuit structure is visualized in figure 1. Recursively it follows that each of the A i shown in the figure can itself be decomposed in the same way.This decomposition method of an n-qubit unitary works all the way down to SU (4), the space of two-qubit operations, which can be further decomposed by K 1 exp(y)K 2 , where It is important to note that x 1 x 2 and so on are elements of the algebra, not elements of the group such as X 1 ⊗X 2 .The corresponding group element exp(x 1 x 2 ) is not the direct product of two X rotations but rather a two-qubit operation.Moreover, note that, for α ∈ s(n − 1), α ⊗ σ x and ασ nx represent the same element, where σ xn ≡ x n .

IV. BLOCK DIAGONAL DECOMPOSITION
By employing a recursive method, the developed algorithm extends the unstructured circuit decomposition of a three-qubit unitary, as demonstrated in [12], to an n-qubit unitary.This algorithm determines a decomposition for the generators y ∈ h(n) and z ∈ f(n) of the relevant Lie subalgebras (su(2 n ), su l (2 n )) and (su l (2 n ), su l0 (2 n )) using a block-diagonal matrix.
It is important to note that within these Lie subalgebras, there are always two generators constructed through the recursive equations in (7) that are proportional to each other, since x 1 x 2 is proportional to y 1 y 2 , and z 1 z 2 is proportional to the identity.Consequently, in order to decompose the Cartan subalgebras M = e y and N = e z , where y ∈ h(n) and z ∈ f(n) represent, respectively, the generators of the Cartan subalgebras (su(2 n ), su l (2 n )) and (su l (2 n ), su l0 (2 n )), we group the proportional generators together and separate the exponential terms into 2 n−2 different components M i (a i , b i ) and N i (a i , b i ): An efficient way of mapping the generators to circuit elements is by means of a particular block-diagonal form: with entries The block-diagonal structure can be implemented on a quantum circuit in a straightforward way, visualized in figure 2. The two parameters are implemented as rotation gates on the nth wire and controlled via CNOTs from the first.can always be decomposed up to a global phase by a dimensionally adapted quantum circuit, where the one-qubit gates and the target qubits have to be adjusted accordingly.
The method we employ to decompose the exponential terms M i (a i , b i ) and N i (a i , b i ) is through a blockdiagonal matrix P ∓ (a, b).The method we found, based on the recursive algorithm (7), starts by grouping the generator a(2) into its proportional terms The circuits corresponding to these algebra elements are shown in figure 4 which decomposes the block-diagonal matrix P ∓ .Expansion to larger elements and therefore higher dimensional structures can be done recursively through adding more terms in the algebra.The central P ∓ element expands in dimension alongside.
We can differentiate expansion into higher dimensions along σ x and σ z .We find that there is a direct correspondence between enlarging the algebra and the circuit Recursive algorithm showing the decomposition of the exponential operator M = e y , y ∈ h(4), where h( 4) is a Cartan subalgebra of (su( 16), su l ( 16)).Each of the exponential operators Mi(ai, bi) is decomposed through a circuit including a blockdiagonal matrix P∓(ai, bi), displayed as a block in the center of each diagram, which can always be synthesized by two CNOT gates and two one-qubit gates, see figure 2. For each ⊗σx two CNOT gates are added, where the control qubit is always the nth dimension and the target qubit is given by the ith dimension of the subalgebra h(i) it gets generated from.The control qubits of the rest of the CNOT gates enlarge up to the respective nth dimension, with the exception of the outermost CNOT gates, which remain unaltered since they serve as a final permutation.construction.Adding a σ x to the algebra corresponds to adding a CNOT gate from the nth to the n−1th quantum gate.For σ z , the corresponding gate is a fermionic SWAP gate, see figure 5, between the same wires.This gives a circuit construction as shown in figure 6 for SU (16).Higher dimensions work in a similar fashion and exhibit a branching structure depending on which algebra dimensions are added.This is shown in more detail in figures 3 and 7.The structure is also relevant for the CNOT count, where each fermionic SWAP will count in the end as one CNOT gate, see figure 5.
In addition to these structures, on every dimension n ≥ 3, there is one generator z 1 z 2 z n of the Cartan sub- Quantum circuit decomposing a fermionic SWAP gate.A fermionic SWAP can be decomposed at worst through four CNOT gates.
algebra f(n) which is more efficient to treat separately.We found that such an exponential term depending on one parameter can always be decomposed, regardless of the dimension, with a single rotation gate surrounded by four dimensionally adapted CNOT gates, see figure 8.These constructions are sufficient to implement all possible algebra generators since they cover the whole subalgebra given in definition III.2.Hence all possible unitaries U ∈ SU (2 n ) can be covered by the construction.

V. EXAMPLE
We now apply the decomposition method described above to an operator U ∈ SU (8).Using the Cartan de- 6.Quantum circuits decomposing the exponential terms M1(a1, b1) = e i(a 1 x 1 x 2 x 3 x 4 +b 1 y 1 y 2 x 3 x 4 ) (top) and N2(a2, b2) = e i(a 2 x 1 x 2 x 3 z 4 +b 2 y 1 y 2 x 3 z 4 ) (bottom).The action of ⊗σx introduces two additional CNOT gates and the action of ⊗σz adds two additional SWAP gates, whose target qubits are given by the subalgebra h(3).The rest of the control qubits in h( 4) enlarge up to the 4th dimension with the exception of the outermost CNOT gates, which serve only as a final diagonal permutation.composition ( 6) the following decomposition is obtained where K i ∈ SU (4) ⊗ U (1), and z ∈ h(3) and y ∈ f(3), where h( 3) is a Cartan subalgebra of (su (8), su l (8)) and f(3) is a Cartan subalgebra of (su l (8), su l0 (8)).The elements of the Lie subalgebra are generated by (7) and thus given by where we have already grouped the proportional terms.
By means of the recursive algorithm introduced previously, we decompose the exponential terms of the generators of this subalgebra through a quantum circuit including a block-diagonal form P ∓ (a, b).For instance, the unitary M 1 (a 1 , b 1 ) defined by which denotes the exponential of the generators {x 1 x 2 x 3 , y 1 y 2 x 3 }, is decomposed through the quantum circuit shown in figure 9 below, where the white box denotes the block-diagonal operator P ∓ (a 1 , b 1 ).
The rest of the generators can be decomposed by following the same algorithm introduced in the previous section, see figures 9 and 10.The exponential term involving the generator {z 1 z 2 z 3 } can always be generated by an analogous quantum circuit as the one shown in figure 8 involving four CNOT gates and one one-qubit gate.The entire construction can be seen in figure 1.
By just counting we can see that there are six CNOT gates and two block-diagonal P ∓ matrices decomposing the exponential terms M 1 (a 1 , b 1 ) = e i(a1x1x2x3+b1y1y2x3 ) and M 2 (a 2 , b 2 ) = e i(a2z1z2x3+b2x3 ).To decompose the exponential terms N 1 (c 1 , d 1 ) = e i(c1x1x2z3+d1y1y2z3) and N 2 (c 2 ) = e ic2z1z2z3 there are six CNOT gates, two fermionic SWAP gates, and one block-diagonal P ∓ matrix.Therefore, to decompose a unitary U in SU (8) we need a total of 54 CNOT gates, where we have assumed that every two-qubit circuit can be decomposed at most by three CNOT gates and every fermionic SWAP gate by at most three CNOT gates.It is possible to get rid of the SWAP gates by interchanging the roles of the second and the third qubit and thus reduce the number of CNOT gates to 42.
Although this is slightly worse than the previous unstructured Cartan decomposition method [12] of a threequbit unitary which required a total of 40 CNOT gates, it is possible to further improve the cost to 38 CNOT gates by absorbing some of the CNOT gates by the neighboring A i ∈ SU (4) unitaries, which can be seen representated in figure 1.

VI. NUMBER OF CNOT GATES
In order to determine the amount of CNOT gates required, let C n denote the number of CNOT gates coming directly from the diagram decomposition for SU (2 n ), where C h(n) specifies the number of CNOT gates required to synthesize the exponential operator M = e y , and C f(n) specifies the number of CNOT gates required to synthesize the exponential operator N i = e zi .From III.3 follows then C n = C h(n) +2C f(n) .Since the amount of CNOT gates in the recursive algorithm of the Cartan subalgebra h(n) follows the same structure as Pascal's triangle, C h(n) is given by the following equation where P ∓ denotes the block-diagonal matrices, which always consist of two CNOT gates and two one-qubit gates, see figure 2. To count the number of CNOT gates for C f(n) note that the fSWAP gate consists of a SWAP gate followed by two Hadamard gates and one CNOT.Note that all occurring fSWAPS are adjacent to the block-diagonal matrix P ∓ and therefore we can get rid of the internal SWAPS by manually adjusting the one-qubit and the dimension of the target qubits of the block-diagonal matrix.Thus, in terms of CNOTs, adding a pair of fSWAPS effectively introduces two CNOTS.Moreover, the number of CNOT gates in the recursive algorithm of f(n) also follows the structure of Pascal's triangle, Recursive algorithm displaying how a part of the Cartan subalgebra f(5) gets generated.For each ⊗σx two CNOT gates are added, while for each ⊗σz two SWAP gates are added.The control qubit of the additional gates is always the nth dimension, while the target qubit is given by the ith dimension of the subalgebra h(i) it is generated from, with the exception of the outermost always unaltered CNOT gates, which serve only as a final diagonal permutation.Moreover, for each ⊗σx the control qubit of the rest of the CNOT gates enlarges up to the respective nth dimension.

Rz(−2a)
FIG. 8.Quantum circuit decomposing the exponential term N (a) = e iaz 1 z 2 z 5 ≡ Diag(z1z2z5) ∈ SU (32).This quantum circuit appears for all n ≥ 3 dimensions, where the target qubits and the one-qubit gate have to be adjusted accordingly to the dimension of the quantum circuit.
where Diag(z 1 z 2 z n ) denotes the generator z 1 z 2 z n that is not proportional to any other generator and which always consists, regardless of the dimension, of four CNOT gates and one-qubit gate, see figure 8.
Hence, the number of CNOT gates C n required to synthesize the exponential operators is where we used the fact that every block-diagonal matrix P ∓ can be decomposed by two CNOT gates, see figure 2.
Recursive algorithm showing how to decompose the unitaries M1(a1, b1) = e i(a 1 x 1 x 2 x 3 +b 1 y 1 y 2 x 3 ) and M2(a2, b2) = e i(a 2 z 1 z 2 x 3 +b 2 x 3 ) generated from the respective subalgebra elements {x1x2, y1y2} and {z1z2, 1} of a(2) by tensoring with σx.For every tensor product with a σx matrix, there is an additional CNOT gate on each side whose target qubit is given by h(2), with the exception of the outer CNOT gates since they serve only as a final diagonal permutation.10.Recursive algorithm showing how the exponential elements coming from the terms in the subalgebra h(2) by tensoring with σz are generated by adding a fermionic SWAP gate on each side.However, the exponential term which comes from {z1z2z3} and depends only on one parameter has to be treated separately, see figure 8.
This binomial sum can be determined by means of the following identities Therefore, C n is given by To determine the entire number of CNOT gates for a unitary U ∈ SU (2 n ) we also need to take into consideration the CNOT gates that recursively come from lower dimensions, see equation ( 6) and its corresponding figure 1.To that end, let T n denote the total number of CNOT gates for a unitary U ∈ SU (2 n ).By (III.3), which follows from the decomposition of a unitary, we have that the total number of CNOT gates for a unitary U in SU (2 n ) is given by which was determined by using equation ( 18) and where we have explicitly separated C 2 since our n = 2 base case does not work recursively.In [22], [23], and [24] it was proven that a two-qubit quantum circuit could usually be synthesized with at most three CNOT gates.Therefore, the total number of CNOT gates required to decompose a unitary U in SU (2 n ) by means of the Khaneja-Glaser decomposition algorithm is which is roughly a factor of five away from the bestknown theoretical lower bound for synthesizing an nqubit unitary, [24].As our method is recursive and creates unitaries of any (n − k) qubits size, more optimal unitary decompositions for a particular number of qubits can be taken into account and inserted at that size.

VII. DISCUSSION
We have implemented the algorithm in rudimentary form and provide it in the supplementary material [25].The algorithm can certainly be optimized further.We leave this as implementation work for colleagues more familiar with suitable programming environments.
The presented decomposition algorithm provides a solid basis for decomposing any arbitrary unitary in SU(2 n ).For the construction, we assume a ideal quantum computer with any-to-any connections and no noise.This assumption is common to circuit constructions algorithms and can be remedied by post-creation optimization of the circuit.The first assumption can be approached either by exchanging CNOTs on non-existing connections with CNOT ladder chains that implement an equivalent operation.In the worst case of a linear chain, a CNOT connecting qubits k apart, 4(k − 1) nearest neighbour CNOTs are required [17].It may be possible to optimize this through our approach, since there is a direct correspondence between the subalgebra generators and the CNOT gates between qubits.Restricting the subalgebra to exclude certain connections may provide a more optimal solution.We suggest this approach for future work.The robustness of a circuit with respect to noise is much more difficult to measure and achieve.The current methods for achieving fault-tolerance, such as stabilizer codes [26] and logical qubits [27] are not easily implementable in our approach.However, it is possible to create an near-optimal circuit with the presented method and adjust it to be noise-tolerant afterwards.
Our approach is not limited to CNOT gates.While we use it throughout our construction, it can be readily transformed to another gateset, as long as it forms a universal family of quantum gates [14].Thus, if the particular hardware can only (or efficiently) implement a different set of control gates, it is possible to translate the circuit into a different family of universal gates.That is, the methods described here generalize to different families of universal quantum gates, which might be more easily implemented on the particular quantum hardware.
What sets apart the Khaneja-Glaser Cartan decomposition of a unitary described here from the other decom-position methods is that it gives an explicit construction of the quantum circuit decomposing a unitary and thus can be directly implemented on a quantum computer.Moreover, this decomposition method can also be used to optimize existing computational circuits to improve their scaling.
The method presented in this paper demonstrates how to efficiently build quantum circuits implementing an nqubit unitary operation through the Cartan decomposition of Lie algebras.Our work generalizes the previous unstructured Cartan decomposition of a three-qubit unitary to a structured recursive algorithm capable of synthesizing any desired unitary operation.Our construction allows the expansion of any quantum circuit in terms of rotation matrices and generators.Moreover, we show how these generators can be recursively decomposed through CNOT and fermionic SWAP gates into circuits that can be directly implemented on a quantum computer.This Cartan decomposition method also scales well, with a near-optimal scaling of 21  16 4 n −3 n2 n−2 + 2 n CNOT gates required to synthesize an n qubit unitary operation.The algorithmic structure of the method and constructions described in this paper allows for a simple yet flexible implementation, both in terms of applications of the algorithm and software and hardware architectures.

FIG. 2 .
FIG.2.Quantum circuit decomposing the unitary block-form P∓(a, b) up to phase.The block-diagonal matrices P∓(a, b) can always be decomposed up to a global phase by a dimensionally adapted quantum circuit, where the one-qubit gates and the target qubits have to be adjusted accordingly.

{x 1 x 2 , y 1 y 2 } {z 1 z 2 ,FIG. 4 .
FIG.4.Quantum circuits generating the exponentials of the generators of the Lie subalgebra h(2) by using a blockdiagonal form.Each of the central blocks contains an instance of the dimensionally adapted circuit shown in figure2.
− 3 n2 n−2 + 2 n Theoretical lower bound [24] ⌈ 1 4 (4 n − 3n − 1)⌉TABLE I. Comparison of different methods for the number of CNOT gates necessary for synthesizing an n-qubit unitary.The Khaneja-Glaser decomposition of a unitary is a factor of four from the lower theoretical bound.