Primitive Quantum Gates for an SU (3) Discrete Subgroup: Σ(36 × 3)

We construct the primitive gate set for the digital quantum simulation of the 108-element Σ(36 × 3) group. This is the first time a nonabelian crystal-like subgroup of SU (3) has been constructed for quantum simulation. The gauge link registers and necessary primitives – the inversion gate, the group multiplication gate, the trace gate, and the Σ(36 × 3) Fourier transform – are presented for both an eight-qubit encoding and a heterogeneous three-qutrit plus two-qubit register. For the latter, a specialized compiler was developed for decomposing arbitrary unitaries onto this architecture.


I. INTRODUCTION
Classical computers face significant challenges in simulating lattice gauge theories due to inherent exponentially large Hilbert spaces with the lattice volume.Monte Carlo simulations in Euclidean time are generally used to circumvent this problem.However, this approach also fails when we are interested in the real time dynamics of the system or in the properties of matter at finite density due to the sign problem [1][2][3][4][5][6][7][8].
Quantum computers provide a natural way of simulating lattice gauge theories.Yet, they are currently limited to a small number of qubits and circuit depths.Gauge theories contain bosonic degrees of freedom and have a continuous symmetry, e.g.Quantum chromodynamics (QCD) with SU (3) local symmetry.Storing a faithful matrix representation of SU (3) to double precision would require O(10 3 ) qubits per link -far beyond accessibility to near-term quantum computers.Moreover, these qubits being noisy significantly limits the circuit depths that can be reliably performed on these devices.Therefore, studying lattice gauge theories with current and near-future quantum computers requires efficient digitization methods of gauge fields as well as optimized computational subroutines.Finally, it is important to note that a choice of digitization method affects the computation cost.
Another approach is to try and formulate a finitedimensional Hilbert space theory with continuous local gauge symmetry which is in the same universality class as the original theory.For example, the author of Ref. [50] constructed a SU (2) gauge theory where each link Hilbert space is five-dimensional.A generalization to SU (N ), however, was not obtained due to a spurious U (1) symmetry.Later, different finite-dimensional formulations were found for SU (2) gauge theories [51], the smallest of which being four-dimensional.Recently, a method inspired from non-commutative geometry was used to construct a SU (2) gauge theory in 16-dimensional Hilbert space on each link as well as a generalization to U (N ) gauge theory [52].Another finite-dimensional digitization known as quantum link models uses an ancillary dimension to store a quantum state [53].This method can be extended to an arbitrary SU (N ), and has been further investigated in Refs.[53][54][55][56][57][58][59][60][61][62][63].Although this approach may greatly simplify the cost of digitization, establishing the universality class is non-trivial [51,52,64].
Another promising approach to digitization is the dis-   (3).∆S is the gap between 1 and the nearest neighbors to it.N is number of group elements that neighbor the 1.
The discrete group approximation has several significant advantages over many other methods discussed above.It is a finite mapping of group elements to integers that preserves a group structure; therefore it avoids any need for expensive fixed-or floating-point quantum arithmetic.The inherent discrete gauge structure further allows for coupling the gauge redundancy to quantum error correction [77,95].Additionally, while other method in principle need to increase both circuit depth and qubit count to improve the accuracy of the Hilbert space truncation, the discrete group approximation only needs to include additional terms into the Hamiltonian [86,96] In this work, we consider the smallest crystal-like subgroup of a SU (3) with a Z 3 center -Σ(36 × 3) which has 108 elements.These elements can be naturally encoded into a register consider of 8 qubits or 3 qutrits & 2 qubits.A number of smaller nonabelian subgroups of SU (2) have been considered previously: the 2N -element dihedral groups D N [65,[97][98][99], the 8-element Q 8 [14], the crystal-like 24-element BT [100], and the crystal-like 48 element BO [101].From Fig. 1, we observe that freezeout occurs far before the scaling regime.This implies that the Kogut-Susskind Hamiltonian (which can be derived from the Wilson action) is insufficient for Σ(36 × 3) to approximate SU (3), but classical calculations suggest with modified or improved Hamiltonians H I may prove sufficient for some groups [67,69,85,86].
This paper is organized as follows.In Sec.II, the group theory needed for Σ(36 × 3) are summarized and the digitization scheme is presented.Sec.IV demonstrates the quantum circuits for the four primitive gates required for implementing the group operations: the inversion gate, the multiplication gate, the trace gate, and the Fourier transform gate.Using these gates, Sec.V presents a resource estimates for simulating 3 + 1d SU (3).We conclude and discuss future work in Sec.VI.

II. PROPERTIES OF Σ(36 × 3)
Σ(36 × 3) is a discrete subgroup of SU(3) with 108 elements.The group elements, g, of Σ(36 × 3) can be written in the following ordered product or otherwise known as strong generating set.That is, all the group elements, g, can be enumerated as a product of left or right transversals such that where 0 ≤ p, q, r ≤ 2 and 0 ≤ s, t ≤ 1.This indicates that either 8 qubits (2 each for p , q, and r, and one each for s and t) or 3 qutrits (p, q, and r) and 2 qubits (s and t) will be required to store the group register.Because the indices p, q, and r take on values between 0 and 2, there exists an ambiguity in mapping the threelevel states to a pair of two-level systems.We use the mapping This process is done similarly for q and r.
The strong generating set shown in Eq. 1 explicitly builds the presentation of the group from subgroups.In this way primitive gates for smaller discrete groups can be used as building blocks to construct efficient primitive gates of larger groups [100][101][102].In the case of Σ(36 × 3), the subgroups of interest are as follows: ω p generates the subgroup Z 3 ; ω p C q generates the subgroup Z 3 × Z 3 ; ω p C q E r generates the subgroup ∆ (27); ω p C q E r V 2s generates ∆ (54).Detailed information regarding these subgroups can be found in Ref. [103].
As we proceed with constructing primitive gates (see Sec. IV),the following "reordering" relations are useful: ( One can extend the relations above to derive the generalized reordering relations: It is useful to have the irreducible representations, irreps, of Σ(36 × 3) for deriving a quantum Fourier transformation (see Sec. IV).This group has 14 irreducible representations (irreps).There are four one-dimensional (1d) irreps, eight three-dimensional irreps (3d), and two four-dimensional (4d) irreps.The 1d irreps are: The eight 3d irreps can be written as a,b (g) = (−1) abt ω (1+b)p C (1+b)q E r (i a V ) 2s+(−1) b t , (5) where 0 ≤ a ≤ 3 and 0 ≤ b ≤ 1 and the matrices ω, C, E, and V are given by The irrep ρ (3) 0,0 corresponds to the faithful irrep that resembles the fundamental irrep of SU (3).The 4d irreps are given by Eq. (1) with In addition for conciseness we provide the character table from Ref [103] in Tab.II, which will be useful in constructing the trace gate.
The Hamiltonians we particularly are targeting are the pure gauge theory Kogut Susskind Hamiltonian, where □ indicates a sum over all of the plaquettes with g 1 , ..., g 4 elements of the plaquettes.Additionally, the second term is the kinetic term where the sum over l is a sum over all links.There is generally a freedom in the construction of the electric term.In Appendix D, we provide a straightforward construction based on the procedure outlined in Ref. [104].The second Hamiltonian we consider is the improved Hamiltonian, H I , which was highlighted in Ref. [96].This includes terms with the six link rectangles and an extended electric field operator.The desire to consider improved Hamiltonians comes from the fact that there are both reduced lattice spacing errors, i.e., the discretization artifacts are moved to O(a 4 ), and the β f is a larger value.

III. BASIC GATES
In this work, we consider gate sets for both qubit and hybrid qubit-qutrit systems.Our qubit decompositions use the well-known fault tolerant Clifford + T gate set [105].This choice is informed by the expectation that quantum simulations for lattice gauge theories will ultimately require fault tolerance to achieve quantum advantage [23,100,101,106,107]. Throughout, we adopt the notation ⊕ m to mean addition mod m.
For conciseness, we use a larger than necessary gate set which we will later decomposes in terms of T-gate to obtain resource costs.The single qubit gates used are the Pauli rotations, R α (θ) = e iθα/2 , where α = X, Y, Z.We also consider four entangling operations: SWAP, CNOT, and multicontrolled C n NOT, and the controlled SWAP (CSWAP).The two-qubit operations can be written as while the multicontrolled generalizations are The hybrid encoding uses a more novel set of single, double, and triple qudit gates.The single-qudit two level rotations we consider are denoted by R α b,c (θ), where α = {X, Y, Z} and indicate a Pauli-style rotation between levels b and c.The subscripts will be omitted to indicate that the operation is performed on a qubit rather than a qutrit state.Additionally, we account for the primitive two-qudit gate, which corresponds to the CNOT operation controlled on state b of qubit a, and targets qubit c with an X operation between the levels d and e.We also for conciseness consider the CSum gate: which is a controlled operation on qubit or qutrit a and targets qutrit b.It can be verified that that the CSum (see e.g.Ref. [10,108]) gate is related to the C a b X c d,e gates by Finally, we consider multi-controlled versions of both of these gates.The gate C a b C c d X e f,g corresponds to multicontrolled generalization of Eq. ( 13).The second multiqudit gate is the CCSum which acts as follows A unique artifact of this choice of quantum gates is that one can decompose multi-controlled gates using the traditional Toffoli staircase decomposition [105,109,110].

IV. PRIMITIVE GATES
We present the primitive gates for a pure gauge theory in the following subsections using the methods developed in previous papers on the binary tetrahedral, BT, and binary octahedral, BO groups [97,98,100].Using this formulation confers at least two benefits: first, it is possible to design algorithms in a theory-and hardware-agnostic way; second, the circuit optimization is split into smaller, more manageable pieces.This construction begins with defining for a finite group G a G-register by identifying each group element with a computational basis state |g⟩.Then, Ref. [97] showed that Hamiltonian time evolution can be performed using a set of primitive gates.These primitive gates are: inversion U −1 , multiplication U × , trace U Tr , and Fourier transform U F [97].
The inversion gate, U −1 , is a single register gate that takes a group element to its inverse: The group multiplication gate acts on two G−registers.It takes the target G−register and changes the state to the left-product with the control G−register: Left multiplication is sufficient for a minimal set as right multiplication can be implemented using two applications of U −1 and U × , albeit optimal algorithms may take advantage of an explicit construction [96].
While it is possible to pass the matrix constructed in Fig. 2 into a transpiler, more efficient methods for constructing these operators exist [102,[111][112][113][114][115].While method varies in their actualization, the underlying spirit is the same as for the discrete quantum Fourier transformation.The principle method involves building the quantum Fourier transformation up through a series of subgroups.In [115], it was shown that instead of the exponential O(4 n ) scaling for traditional transpilation, the quantum Fourier transform scales like O(polylog(|2 n |)), where polylog indicates a polynomial of logarithms.
In the rest of this paper, we will construct each of these primitive gates, and evaluate the overall cost.For each gate, we will start with a pure qubit system.Then, we will consider a register with three qutrits and two qubits as suggested by the group presentation in Eq. 1.

A. Inversion Gate
For the construction of U −1 , we first write the inverse of the group element g as where the permutations rules are found to be A detailed derivation of the permutation rules and the associated U −1 is found in App.A along with two other forms of U −1 which use fewer ancilla.The idealized qubit circuit is shown in Fig. 3 and requires 119 T-gates and 4 clean ancilla. 1 The qubit-qutrit hybrid encoding U −1 is found in Fig. 4.

B. Multiplication Gate
The multiplication gate U × takes two G-registers storing two group elements g = ω p C q E r V 2s+t and h = ω p ′ C q ′ E r ′ V 2s ′ +t ′ and stores into the second register the group element gh = ω p ′′ C q ′′ E r ′′ V 2s ′′ +t ′′ .Using the reordering relations of Eq. ( 3) one can derive that These rules are rather clunky and in order to write a systematic multiplication gate we decompose U × into the following product, where U ×,O indicate multiplying the state of the O generator register from the g 1 register onto the g 2 register.
We provide a detailed discussion of the breakdown of the rules in App.B. The breakdown of using this method and the product rules from Eq. ( 24) yields the circuits composed in Fig. 5 and 6 for the two encodings.

C. Trace Gate
There are two principle methods one could derive U T r .One method is to define a Hamiltonian of the form: Then, the trace operator can be written as U T r (θ) = exp (−i θ H Tr ).This operator corresponds to the phasing of the magnetic plaquette operator when g corresponds to a closed Wilson loop.To obtain the matrix form of ĤT r , one may fix a basis |g 1 ⟩, ..., |g |G| ⟩ where |G| = 108 is the order of the group.In this basis, ĤT r is diagonal, and each diagonal entry is given by H i,i = T r(g i ).
To obtain a quantum circuit realizing U Tr , we use the tree-traversal algorithm developed in [116] 2 which was shown to yield an exact circuit with asymptotically optimal CNOT gates count.The circuit obtained has 130 . .CNOT gates and 111 R z gates.Additional methods are also found in Ref. [117].A second method for deriving this gate involves mapping group elements to their respective trace classes.Σ(36 × 3) has 14 conjugacy classes that map to 10 different trace classes.If we only require the real part of the trace then this grouping reduces the 10 trace classes to 7 trace classes.The seven valid traces are ReTr(g) = {3, − 3 2 , 0, ±1, ± 1 2 }, which we can be labelled using three bits (v 0 , v 1 , v 2 ) as shown in Tab.III.
This map can be represented as three boolean functions, one for each of the variables v 0 , v 1 and v 2 .For quantum computation, it is convenient to write boolean functions in the so-called exclusive-or sum of products (ESOP) form [118,119].Then, the function can be mapped to a quantum circuit in a straightforward manner since each term in the ESOPs corresponds to a Toffoli gate.For each function, we start with their minterm forms [120].Then, we use the exorcism algorithm to find a simpler ESOP expression for each of the three functions.After factorizations, we show the final expressions in the following equation: U T r can be decomposed as U T r (θ) = V U ′ T r (θ)V † where V is a unitary operator realizing the map (p, q, r, s, t) → (v 0 , v 1 , v 2 ).This yields U ′ T r (θ) ≡ e iθH ′ , where Figure 7 shows the quantum circuit of the operator V realizing the map (p, q, r, s, t) → (v 0 , v 1 , v 2 ).Finally, the circuit of U Tr ′ (θ) is shown in Fig. 8.

D. Fourier Transform Gate
The standard n-qubit quantum Fourier transform (QFT) [121] corresponds to the quantum version of the fast Fourier transform of Z 2 n .Quantum Fourier transforms, U QF T , over some nonabelian groups are known [98,102,112,113,115].However, for all the crystal-like subgroups of interest to high energy physics U QF T is currently unknown [122] and there is not a clear algorithmic way to construct U QF T in general.Therefore, we instead construct a suboptimal U F from Eq. (20) using 5. Qubit implementation of U× using the permutation gate χ and its inverse (both shaded orange) using 2 ancillae and has a cost of 308 T-gates.
the irreps of Sec.II.The structure of U F is ordered as follows.The columns index |g⟩ from |0⟩ to |256⟩ according to Eq. ( 1).We then index the irreducible representation ρ i ordered sequentially from i = 1 to i = 14.Since Σ(36 × 3) has 108 elements, on a qubit device U F must be embedded into a larger 2 8 × 2 8 = 256 × 256 matrix 3 .While the matrix was then passed to the Qiskit v0.43.1 transpiler, and an optimized version of U F needed 30956 CNOTs, 2666 R X , 32806 R Y , and 55234 R Z gates; the Fourier gate is the most expensive qubit primitive.
As will be discussed in Sec.V, U F dominates the total simulation costs and future work should be devoted to finding a Σ(36 × 3) U QF T .
For the hybrid qubit-qutrit implementation, U F is of dimensions 108 × 108.To obtain a quantum circuit, we built a qubit-qutrit compiler, see Ref. [123].The outline of 3 An explicit construction of the matrix U F is found in the supplementary information FIG. 7. Quantum Circuit of the map (p, q, r, s, t) → (v0, v1, v2) from the group to the seven real trace classes ReTr(g) = {3, −1.5, 0, ±1, ±0.5}.This requires 15 Toffoli gates and thus 105 T gates

V. RESOURCE COSTS
The relatively deep circuits presented above strongly suggest that simulating Σ(36 × 3) will require error correction and longer coherence times on quantum devices.The preclusion of universal transversal sets of gates stated in the Eastin-Knill theorem [124] requires compromises be made.In most error correcting codes, the Clifford gates are designed to be transversal [105,[125][126][127][128].This leaves the nontransversal T gate as the dominant cost of fault-tolerant algorithms [105,129].Beyond these standard codes, novel universal sets exist with transversal BT, BO, BI and He(3)4 gates [130][131][132][133][134] which warrant exploration for use in lattice gauge theory.
For this work, we will consider the following decompositions of gates into T gates for our resource estimates.First, while the CNOT is transversal, the Toffoli gate decomposes into six CNOTs and seven T gates [105].With this, one can construct any C n NOT gates using 2⌈log 2 n⌉ − 1 Toffoli gates and n − 2 dirty ancilla qubits which can be reused later [105,109,110].For the R Z gates, we use the repeat-until-success method of [135] which finds these gates can be approximated to precision ϵ with on average 1.15 log 2 (1/ϵ)) T gates (and at worst −9+4 log 2 (1/ϵ) [136]).For R Y and R X , one can construct them with at most three R Z .Putting all together, we can construct gate estimates for Σ(36 × 3) (See Tab.V).While the results in Tab.V are nearly optimal for the U Tr , U × , and U −1 , the result for U F is not.In Ref. [111] the authors show explicit demonstrations of an efficient decomposition of the nonabelian quantum Fourier transform U QF T using the methods of [102] for certain SU (2) and SU (3) subgroups.Since it is expected that the gate cost for Fourier transforms should scale as a polynomial of logarithms of the group size [115], one can perform a fit from the results in Ref. [111] to obtain an order of magnitude estimate for U QF T of 147 + 75log 2 (1/ϵ) -a factor of ∼ 2000 smaller than our U F .Clearly, the cost of simulating Σ(36 × 3) depends on ϵ.To optimize the cost, the synthesis error from finite ϵ should be balanced with other sources of error in the quantum simulation like Trotter error, discretization error, and finite volume error.These other sources of error are highly problem-dependent, but here we will follow prior works [42,137,138] and take a fiducial ϵ = 10 −8 .Primitive gate costs for implementing H KS [139] and H I [96], per link per Trotter step δt are shown in Tab.VI.Using this result, we can determine the total T gate count With this, the total synthesis error ϵ T can be estimated as the sum of ϵ from each R Z .In the case of H KS this is If one looks to reduce lattice spacing errors for a fixed number of qubits, one can use H I which would require where the total synthesis error is Following [100,137,140], we will make resource estimates based on our primitive gates for the calculation of the shear viscosity η on a L 3 = 10 3 lattice evolved for N t = 50, and total synthesis error of ϵ T = 10 −8 .Considering only the time evolution and neglecting state preparation (which can be substantial [75,[141][142][143][144][145][146][147][148][149][150][151][152][153][154][155]), Kan and Nam estimated 6.5 × 10 48 T gates would be required for an pure-gauge SU (3) simulation of H KS .This estimate used a truncated electric-field digitization and considerable fixed-point arithmetic -greatly inflating the T gate cost.
Here, using Σ(36 × 3) to approximate SU (3) requires 7.0 × 10 12 T gates for H I and 3.5 × 10 12 T gates for H KS .The T gate density is roughly 1 per Σ(36 × 3)−register per clock cycle.Thus Σ(36 × 3) reduces the gate costs of [137] by 10 36 .Similar to the previous results for discrete groups of SU(2), U F dominates the simulations -being over 99% of the computation regardless of Hamiltonian.However [111] shows that the Fourier transformation for BT and BO can be brought down.Using the estimate for Σ(36 × 3), the Fourier gate contribution is reduced to only 51% of the simulation with a reduced total T gate count of 5.7 × 10 9 for H I with L = 10 .

VI. OUTLOOK
This article provided a construction of primitive gates necessary to simulate a pure SU(3) gauge theory via a discrete subgroup Σ(36 × 3).In addition, we have also estimated the T-gate cost incurred to compute the shear viscosity using the Σ(36 × 3) group.Notably, we found that our construction improves the T-gate cost upon that of Ref. [137] by 36 orders of magnitude.This cost reduction comes at the expense of model accuracy.
For both qubit and hybrid qubit-qutrit implementations, U F dominates the cost suggesting that further reductions by identifying a U QF T for Σ(36 × 3).In fact, as demonstrated in Ref. [111], the cost of a U QF T versus U F can be as large as a factor of ∼ 2000.
In addition, the much-improved overall cost due to the use of the Σ(36 × 3) group supports the need to also study other discrete subgroups of SU(2) and SU(3).To this end, recent studies (e.g.Ref [100,101]) have already constructed primitive gates for some SU(2) discrete subgroups, the binary tetrahedral and binary octahedral.It remains to develop such gates for other subgroups, for example the larger subgroups of SU(3) such as Σ(72 × 3), Σ(216 × 3) and Σ(360 × 3) as well as the BI group.The larger groups will reduce discretization errors but at the cost of a longer circuit depth.
Finally, beyond pure gauge, approximating QCD requires incorporating fermion fields [79,156,157].Many methods exist to incorporate staggered and Wilson fermions.It is worth comparing the resource costs for explicit spacetime simulations using staggered versus Wilson fermions not only in terms of T-gates but also spacetime costs using methods such as [158].
In order to turn this trinary arithmetic into binary arithmetic we need the following transformation axiom: and Using this set of transformation rules we find Naively translating these rules as written yields the circuit provided in Fig. 10.However the resource cost of 420 T-gates can be optimized significantly.By clever use of ancillae one could reduce the T-gate costs down to 203 T-gates using the circuit provided in Fig. 11.Instead of writing a circuit for the circuit for the whole inversion ruleset of Eq. ( 23), one instead could use commutation rules to reduce the T-gate costs even further.This construction allows the inversion operation to be decomposed into a product of smaller operations: U l −1 takes each local generator to its inverse: The operation U V −1 involves propagating through the operator V t until it is the right most element.This yields the transformations: The generators C and E are normal ordered at this point.The operation U V 2 −1 has the following transformation rule At this point the transformation rules for C, E, and ω are trivial.After all these suboperations are constructed we end up with the inversion operation from the main text provided in Fig. 3 and Fig. 4.

Appendix B: Derivation of the Multiplication Gate
The construction of the multiplication gate rules is going to follow in a similar spirit to the derivation of the inversion rules.We first start with two registers corresponding to group elements with gh given by the product rules of Eq. ( 24).When we multiply the group elements g and h together, we iteratively move the elements of g over onto h.This commutation begins by first by moving the V t1 component over to h: Propagating through V t1 gives the following transformations to the elements p 2 , q 2 , r 2 , s 2 , and t 2 : All together this gives the circuit operation, U ×,t in Fig. 6.The next step involves moving the V 2s1 operation across such that In this case, the operators now transform under the rules It follows immediately then that this is a controlled permutation on the |1⟩ 3 -|2⟩ 3 subspace on the q and r qutrits and a simple CNOT on the s 2 register.Propagation through of the E r1 then transforms the remaining states on the h register to: which gives the expression for U ×,E in Fig. 6.Since these gates act on qutrits, we will implement them using two qubits.We encode the qubit states as |q 1 q 0 ⟩ where q 0 is the least significant bit.That is, the states are ordered as |00⟩, |01⟩, |10⟩ and |11⟩.
The gate X 0,1 interchanges the states |00⟩ and |01⟩.It can be implemented as shown in Fig. 12.The X 1,2 , on the other hand, can be implemented as a qubit swap gate, see Fig. 13.In addition, the χ gate can be implemented using the circuit in Fig. 14.  formed straightforwardly from the Casimir operators, see e.g.[139].A generalization to discrete groups can be obtained by using the Laplacian operator on a Cayley graph associated to the group as is done in Ref. [104].
For a brief review of a procedure, we choose Γ, a subset of the group such that Γ is closed under inversion and conjugation.That is Γ −1 = Γ and g Γ g −1 = Γ for all g ∈ G.In addition, we will choose 1 / ∈ Γ as including this element will result in a constant shift of the spectrum.Clearly, there may be several choices of Γ.However, it is shown in Ref. [93] that the choice Γ = {g ∈ G | ReTr(g) is maximal } follows from the Wilson action, and therefore results in a manifestly Lorentzinvariant term in the Hamiltonian.Moreover, such a choice of Γ also clearly fulfills the first two conditions.Then, to compute the electric term, we will discard the identity element and choose only those elements with max [ReTr(g)] = 1 for the case of Σ(36 × 3).We find that Γ consists of 18 elements that generate the whole Σ(36 × 3) group.
Having defined Γ, the electric term can be computed as where the eigenvalues Direct computation of f (ρ) yields the values shown in Table VII.
Having constructed the electric term above H E , we can construct a quantum circuit of its time-evolution using the hybrid qubit-qutrit compiler that we will describe in Appendix E. We obtain the gate cost shown in Table VIII.
Appendix E: Qubit-Qutrit Compiler This section details the compilation method, and the full codes can be found in Ref. [123].The overarching approach of this compiler is to generalize the qubit Quantum Shannon Decomposition (QSD) [159,160] to apply to a register with qubits and qutrits.We will consider a unitary operator U acting on n 1 qubits and n 2 qutrits; that is U is of dimension N × N where N = 2 n1 × 3 n2 .
First, let's organize the qudits as q 1 , q 2 , ..., q n1+n2 .We will say that the left-most qudit is the top qudit.The compiler iteratively performs the qubit QSD if the top qudit is a qubit, and otherwise performs our realization of the qutrit QSD.The process eventually terminates when we reach the bottom qudit, in which case, we use either a single-qubit gate decomposition or a single-qutrit gate decomposition depending on whether the bottom qudit is qubit of qutrit.For the single qubit gate, we use the Euler angle parametrization ZY Z and for the single qutrit gate, we use the decomposition given in Ref. [108].
It is convenient to start with the qubit QSD case.In this case, a Cosine-Sine decomposition (CSD) (see e.g.Refs.[161][162][163]) is first performed, resulting in where V 1,2 , W 1,2 are unitaries with dimension N/2.C and S are diagonal matrices e.g C = diag cos θ 1 , ..., cos θ N/2 and similarly for S. Following [159], the next step is to decompose the two block-diagonal unitary matrices: where M and N are unitaries acting only on n 1 − 1 qubits and n 2 qutrits, and D is a diagonal unitary of dimension N/2.Thus, the compilation problem is reduced to decomposing the unitaries D ⊕ D † and the CS into simple gates.The D ⊕D † can be implemented as a uniformly controlled rotation on the top qudit (see e.g.Ref. [159]).The CS matrix, on the other hand, is related to D ⊕ D † by the rotation gate R x (π/2) on the top qubit.This concludes the case the top qudit is a qubit.In the case that the top qudit is qutrit, we need to find a qutrit realization of the procedure above.The starting point is to perform two CSD as in e.g.Ref. [164].This decomposition reads as where each block is a unitary of dimension N/3.The blocks C, S D and D ′ are defined analogously to the qubit case.
The rest is to decompose the remaining block diagonal unitaries.By performing the decomposition in Ref. [159] twice, we obtain the relation The unitaries CS are uniformly controlled rotations.We can focus on the diagonal blocks because as in the qubit case, the CS matrix can be diagonalized with an appropriate R i,j x (π/2) on the top qutrit.Ref. [165] outlines the decomposition of qutrits uniformly controlled rotations in terms of singleand two-qutrit gates.A generalization can be obtained simply by limiting a C a b X c i,j gate to C a 1 X c i,j when the control qudit is a qubit.
For n = 2 qubits, there exists optimal compilation algorithms (see e.g.Ref. [166]).Therefore, when the bottom two qudits are both qubits, we stop the decomposition and use Qiskit transpiler to obtain a quantum circuit.

FIG. 3 .
FIG. 3. T-gate optimized version of U−1 for Σ(36 × 3).The letter indicates the generator and subscript indicates the qubit in the generator register.This implementation requires 119 T-gates and 4 ancilla.

TABLE I .
Parameters of a crystal-like subgroups of SU 2 , and |2⟩ 3 = |10⟩ 2 with the |11⟩ 2 state being forbidden.Throughout this work, we will use |⟩ 3 to denote a three-level state and |⟩ 2 to denote a two-level state when there is the possibility of ambiguity.In this way, the index p is decomposed in binary as p = p 0 + 2p 1 and encoded as the state

TABLE II .
[103]cter table of Σ(36 × 3) with ω = e 2πi/3[103].Size indicates the number of elements in the group while Ord.(order) indicates the number of times the operator can be multiplied before yielding the identity.

TABLE IV
z rotations and 254 CNOT gates.The gate cost on a mixed qubit-qutrit device is shown in Table VIII. .Gate cost of primitive gates for Σ(36 × 3) for a qutrit-qubit architecture.The costs for UF were obtained with the hybrid compiler described in Appendix E. C d X b,c refers to an X rotation between states b, c of a qutrit controlled by a qudit with d levels

TABLE V .
[105]r of physical T gates and clean ancilla required to implement logical gates for (top) basic gates taken from[105](bottom) primitive gates for Σ(36 × 3).

TABLE VII .
Eigenvalues of the Electric term.We have defined

TABLE VIII .
Gate cost for the time evolution due to the electric term shown in TableVII.