How many quantum gates do gauge theories require?

We discuss the implementation of lattice gauge theories on digital quantum computers, focusing primarily on the number of quantum gates required to simulate their time evolution. We find that to compile quantum circuits, using available state-of-the-art methods with our own augmentations, the cost of a single time step of an elementary plaquette is beyond what is reasonably practical in the current era of quantum hardware. However, we observe that such costs are highly sensitive to the truncation scheme used to derive different Hamiltonian formulations of non-Abelian gauge theories, emphasizing the need for low-dimensional truncations of such models in the same universality class as the desired theories.


I. INTRODUCTION
Among the many future applications of quantum computing to solving physics problems, the simulation of non-Abelian gauge theories is one of the most desired.Gauge theories describe all fundamental forces of the Universe, but many of their features are inherently nonperturbative and resistant to analytical analysis.Even stochastic numerical methods, despite recent advancements, leave almost untouched the many-body baryon problem and real-time observables, both of which are of great phenomenological importance.
Gauge theories, like any field theory, have an infinite-dimensional Hilbert space, while digital quantum computers have finite and-for the foreseeable future-small quantum registers.Therefore, some kind of truncation of the Hilbert space is necessary.One common truncation is the substitution of the continuum, infinite space by a finite lattice.The use of a spatial lattice has the added benefit of providing an ultraviolet cutoff necessary for the non-perturbative definition of the theory.Since this step has been extensively studied in the context of lattice gauge theories and their numerical simulation with (classical) computers, much is known about it.For instance, the concept of universality controls the continuum limit that is necessary as the last step of these calculations.It is found that a variety of lattice Hamiltonians lead to the same continuum limit and that some symmetries are essential in recovering the proper continuum limit (for in-stance, gauge symmetry), while others arise naturally in the continuum limit (for instance, rotational invariance).In purely fermionic theories, the use of a finite spatial lattice is enough to render the Hilbert space finite-dimensional.In contrast, bosonic theories, defined over even a single spatial point (equivalent to the quantum mechanics of a particle in one dimension), already have an infinite-dimensional Hilbert space.As such, it is necessary to have a truncation of not only physical space but also field space.
Several methods to accomplish this goal have been suggested in different models [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15].For some of these proposals, besides the spatial continuum limit, an independent field space continuum limit is required.In this case, the dimension of the Hilbert space grows as the last limit is taken, and the issue we consider in the present paper (the cost in terms of quantum gates) becomes more severe.In other proposals, an argument is made that, even with the field space truncated, the resulting theories are in the same universality class of the targeted continuum theory.In that case, the limit where dimension of the Hilbert space grows does not need to be explored, offering a great advantage in terms of simplicity.The combined discretization of physical and field space is sometimes called qubitization.For gauge theories in four spacetime dimensions, no general qubitization method has been established, although qubitizations have been found for simpler models [16][17][18][19][20]. Nevertheless, there are some old discretized models that could play this role and have the proper symmetries and fairly small Hilbert spaces.We use some of these models to estimate the gate-complexity of simulating gauge theories.
In Sec.II we describe the compilation method that we use to find a quantum circuit implementing the Hamiltonian evolution.This method does not exploit specific features of gauge theories and arXiv:2208.11789v2[hep-lat] 22 Nov 2022 is applicable to any Hamiltonian.The method is composed of many parts; all but the final one have been known in the literature.To our knowledge they have never been put together as we do here.
In Sec.III we apply this method to a couple of models to test our compilation method.These models, an ensemble of random diagonal Hamiltonians and the Z 2 gauge theory coupled to staggered fermions, have been studied before and quantum circuits for them are known.We find that our method produces shorter circuits than those previously published.
In Sec.IV we study two possible qubitizations of SU (2) gauge theory, with a four-and a five-dimensional Hilbert space per link, respectively.These models have the exact SU (2) gauge invariance; thus, by standard universality arguments, one can suspect that they have the same continuum limit as SU (2) gauge theory.One could reasonably conjecture that a finite-dimensional model in the universality class of SU (2) gauge theories cannot be substantially simpler than these models and have a time evolution that can be approximated by a significantly smaller number of gates.We find that, for the four-dimensional model, one time evolution step involves 180 CNOT gates per plaquette, and the five-dimensional model involves 17,168 CNOT gates.The large circuit depth shows the importance of the space field truncation and stresses the need for truncations that are in the same universal class as the continuum model.
Finally, in Sec.V we summarize the results and speculate about some directions to pursue significantly reduced gate costs of simulating gauge theories.

II. THE COMPILATION METHOD
We describe here a method to take a given finite-dimensional Hamiltonian and encode its time evolution into a series of quantum gates; in other words, we describe the quantum compiler method we used.In order to facilitate use with most quantum programming languages, we compose our circuits only with common gates; in particular, we make heavy use of CNOT gates and one-qubit rotation gates, defined by for 0 ≤ φ ≤ 4π, all expressed in the computational basis {|0 , |1 } with qubits.Here, we denote the Pauli gates {X, Y, Z} .
= {σ 1 , σ 2 , σ 3 } and the one-qubit identity gate by I. Notably, these rotations include the typical phase gate S = R z (π/2) (and its inverse S † ) and easily generate the Hadamard gate H = XR y (π/2) (up to an irrelevant global phase); more broadly, we can perform arbitrary one-qubit unitary operations, sometimes denoted in an Euler angle basis U 3(θ, φ, λ) ≡ R z (φ)R y (θ)R z (λ) up to a global phase.We denote the control (a) and target (b) qubits for each CNOT gate using subscripts CNOT a,b , corresponding to the former and latter factors in Eq. ( 1), respectively.For simplicity of the analysis in this work, we assume a hardware with all-to-all connectivity between the qubits, and we do not include an error mitigation strategy in our resulting circuits.The noisiest quantum gates on current hardware are the CNOT gates, so we aim to minimize their use.In fact, the number of CNOT gates will be used as a measure of how good the algorithm is.
It is convenient to break down our compilation method into five overall steps.
1) Trotterization: The first step is to break up the time evolution operator over a time T into N terms H = N i=1 h i with matrix norms h i ∼ 1 over several time steps of duration δt = T /N t : incurring an overall error of O(δt), due to the nonzero commutators among h i .Also, we point out that, in the case of gauge theories, most actions are a sum over plaquettes (or slightly larger Wilson loops), which commute if they do not share a link.So, one can then split the Hamiltonian into two commuting sets of terms, E(ven) and O(dd), and evolve each set independently: for a time step δt.Since the plaquettes within each set commute, their corresponding quantum gates can be applied in parallel.Consequently, in the SU (2) gauge theory examples we will consider later, it will suffice for us to determine circuits to simulate a single plaquette going forward.
2) Decomposition into Pauli strings: Suppose we encode the degrees of freedom of one plaquette into n qubits.The Hilbert space on which H acts is then 2 n -dimensional.Hermitian H can be written as a real linear combination of Pauli strings on n qubits: where P j ∈ {I, X, Y, Z} ⊗n .Further, the inner product on Pauli strings, allows one to easily calculate the coefficients above: 3) Separation of Pauli strings into commuting sets: Since commuting Pauli matrices can be simultaneously diagonalized, it is useful to split all Pauli strings into commuting sets or "clusters," so that the quantum gates used in diagonalizing each set need to be applied only once.The full time evolution can be Trotterized as the evolution given by each cluster, whose respective Trotterizations each incur no error due to commutativity.
It is possible to efficiently decide whether two Pauli strings commute, by counting the number of positions at which the Pauli matrices of the two strings do not commute.The two Pauli strings commute if and only if this number is even.Moreover, in general, partitioning Pauli strings into clusters of commuting operators can be performed sequentially.Namely, one can process the strings in the order they are given, placing each string in the first available cluster.If no cluster is available, we create a new one and insert the string.As a note of caution for general use, this clustering is not guaranteed to result in the minimal number of clusters,1 and its outcome in principle can depends crucially on the order in which the strings are processed.Therefore, we will reformulate the clustering in terms of the graph coloring problem, as previously described in Refs.[21][22][23][24].
In this formulation, we construct a graph G whose vertices are the Pauli strings, where an edge exists between any two Pauli strings P i and P j (j = i) if they do not commute.We recall that a vertex coloring of G assigns colors to the vertices such that any two adjacent vertices have different colors [25].Therefore, all Pauli strings with the same color will commute.In this formalism, finding the optimal clustering is coloring G with the least required number of colors, known as the chromatic number of G. Finding the chromatic number of a graph is known to be NP-complete [26], and so there is no known efficient algorithm to do so.Nevertheless, several approximation algorithms provide a coloring close to optimal [27,28].In the examples we discuss in Sec.III, we were able to find convenient commuting clusters with relative ease.4) Diagonalization of each cluster: This step describes how to construct the quantum circuit of an operator that simultaneously diagonalizes the commuting Pauli strings.The approach that we use entails representing the strings in the tableau formalism described in Refs.[23,29,30].
In particular, we will follow the presentation in Ref. [23] where the tableau is composed of the X , Z, and S blocks, which we will define below.Suppose we have N Pauli strings, each of length n.Then, the X and Z blocks are N × n matrices, while the S block is a N -dimensional column vector.For the ith Pauli string P i in a given ordering with coefficient c i = 0, the entries to the corresponding row of each block are given by In other words, X ij and Z ij encode whether the jth digit of the Pauli string P i contains a factor of X and Z, respectively, while S i simply encodes the sign of the Pauli string's coefficient.Note that a Pauli string P i is diagonal if and only if it contains only factors I and Z, or, equivalently, all the entries in row i of X are 0. [23, 31] [23, 32] Consequently, the task of simultaneously diagonalizing commuting Pauli strings may be viewed as applying conjugations with unitary gates that reduce all entries of X to 0. Since we seek to transform a Pauli string to another, these unitary transformations are Clifford gates, generated by Hadamard, phase, and CNOT gates.The conjugation of these gates acts on the tableau according to the following rules [29]: • Hadamard gate conjugation on qubit j or H (j): for all i = 1, . . ., N .That is, we flip the sign of a Pauli string if the Pauli matrix at position j is Y .Then, swap the jth columns of X and Z; practically speaking, we replace any factors X or Z in the jth digit of each string with Z or X respectively.
• Phase gate conjugation on qubit j or S (j): for all i = 1, . . ., N .That is, for the Pauli string P i encoded in row i we flip the sign of its coefficient if its jth digit is Y , and we replace any factors X or Y in the jth digit of each string with Y or X respectively.
• CNOT conjugation with control qubit a and target b or CNOT (a, b): for all i = 1, . . ., N , and the interpretation is similar to those in Eqs. ( 9) and (10).
Note that the update on S should be applied first in each case because the sign update depends on the current values of X and Z.Here, the symbol ⊕ denotes the addition modulo 2.
As an example, consider a Hamiltonian with mutually commuting Pauli strings H = IXX + ZY Z + XXI 2 .The X , Z, and S blocks are labeling the qubits, left to right, from 1 to 3. We can use CNOT 1,2 and CNOT 2,3 to nullify the first two columns of the Z stack.Then, we observe that the third columns of the X and Z stacks are identical.Hence, a phase gate conjugation on the third qubit will nullify the Z stack.Finally, we apply Hadamard gates on all the three qubits to swap the columns of X and Z.Thus, all of the resulting entries of the X stack are all 0. Meanwhile, the Z and S stacks become  strings with at most O(n 2 ) two-qubit gates [23,31].Moreover, there exist several algorithms to accomplish this diagonalization, although not necessarily with the minimal number of two qubit gates [23,32].5) Exponentiation of I/Z Pauli strings via a Binary Tree: Finally, we describe our procedure to compile a quantum circuit simulating a Hamiltonian composed of N diagonal Pauli strings (i.e., in {I, Z} ⊗n ).In particular, we propose a bookkeeping process to search for shared CNOT gates between strings within the cluster, in contrast with existing methods, which compile a circuit individually reducing each Pauli string with CNOT conjugations (described below) and then look for cancellations post hoc.
To start, for a single such Pauli string, we may use Eq. ( 11)-more specifically, the identity By successive applications of this identity we can map any diagonal Pauli string to a string containing a single factor of Z. Exponentiating such Pauli strings can then be accomplished with a single one-qubit rotation gate, R z (2c j δt) on a given qubit: This procedure is equivalent to the gate decomposition of Refs.[31,33].For example, a circuit implementing exp(−ic j ZZZδt) in such a fashion is shown in Fig. 2(a).This prescription deviates slightly from the usual technique (e.g., Ref. [34]), which utilizes a single ancillary qubit on which the rotation is performed instead.Via that alternative procedure, there are more CNOT conjugations required and fewer opportunities for the gates involved to cancel.We display the same example, ZZZ, using this procedure in Fig. 2(b).
Equipped now with this generic strategy of reducing individual Pauli strings to one-qubit rotations, we describe in the remainder of this section our own compilation method to efficiently simulate the diagonal Pauli strings together.Our procedure seeks to induce sharing of CNOT gates within the cluster, in the same spirit of reducing circuit depth as in step 4, where we share CNOT One ancillary qubit, prepared as |0 , is the target of each CNOT and the site of the rotation, as prescribed in this more common method (c.f.Ref. [34]).
costs over a cluster via simultaneous diagonalization.In particular, as we apply the first CNOT gate in each conjugation reducing a Pauli string to a rotation, we keep track of those conjugations in a list or "stack" and classically perform the CNOT conjugations on the remaining strings in the cluster, as per Eq.(11).Moreover, we refrain from applying the second CNOT gate in the conjugation until after the rotation gates of all other Pauli strings have been executed on the circuit.In essence, this approach can be understood as inserting the identity (i.e., a pair of identical, consecutive CNOT gates to cancel) between every pair of remaining Pauli strings in the cluster.We provide instructions to efficiently carry out this procedure below, in terms of a tree traversal algorithm.(See e.g., Ref. [35] for a review of tree data structures and traversal.)We proceed by representing the linear combination of N ≥ 1 Pauli strings as a binary tree with n levels where n is the total number of qubits (equivalently the length of the Pauli strings).One calls the zeroth level of the tree the "root" and the nodes of the nth and final level the "leaves;" notably the order of these n tree levels may be a permutation of the corresponding qubit digits of the Pauli strings, equivalent to a temporary reordering of the qubit labels on our circuit. 3Each tree branch (i.e., path from the root to a leaf) corresponds to a Pauli string, and the values of its n nodes are 0 or 1, corresponding to the factors I or Z respectively of the string.Figure 3 shows the tree representation for the Hamiltonian H = IIZ + IZI + IZZ + ZZZ.For example, the values of the nodes in the bottom branch are 0, 0, and 1, corresponding to the Pauli string IIZ.For an arbitrary linear coefficient of Pauli strings, one includes the coefficient of Pauli string in the corresponding leaf node.(This task can be done for example by creating a leaf object that inherits all the properties of a node object but has an additional coefficient attribute.)Mapping a commuting cluster of Pauli strings to this tree structure, we now present how to derive a quantum circuit from a traversal of this tree.Our routine traverses the tree via a preorder Depth-First-Search (DFS) (see e.g., Ref. [35]). 4eading along each branch from the root to its leaf, when a pair of nodes both with value 1 is encountered-say, at levels i and j > i -a CNOT ji gate will be applied on the circuit.We then classically perform a CNOT ji conjugation to all Pauli strings on the tree, i.e., update the values along their branches according to Eq. ( 11); the child node (j) value is flipped when the parent node (i) value is 1.Lastly, the latter CNOT sharing a factor Z on that digit.We found this modified ordering of tree levels yielded modest savings O(n) in CNOT cost in some cases, when compared with the straightforward ordering where qubit j is labeled by tree level j, and resulted in greater cost in other cases.At this time, the cause of this higher cost is not clear.Determining how the CNOT cost depends on the relabeling of tree levels is the subject of further research.
gate in the conjugation is stored in a stack, if there is no copy already in the stack; otherwise, we discard both the CNOT gate and its copy in the stack (an operation we will justify later).By this procedure, after reaching the leaf there will be precisely one node (i) with value 1 along the branch, corresponding to a rotation on qubit i, R z (2 c δt), where c is the coefficient of the string stored with the leaf/branch.The procedure continues until the last branch is processed.Then, the CNOT gates from the stack are applied to the end of the circuit, from the latest to the earliest gate added into the stack.Figure 4 depicts the quantum circuit obtained for the example Here δt is the Trotterization time step.The qubits are ordered from top to bottom (i.e., as increasing levels in the tree structure of Fig. 3).
Now, we explain why we may discard a CNOT gate and its copy in the stack.First, we note that all CNOT gates with the same target qubit appear consecutively in the stack and commute with each other.Additionally, the control qubit j for each CNOT ji always corresponds to the child node in a pair i, j.Therefore, the stack will have at most n 2 CNOT gates.There may be opportunities for further CNOT gate simplification particularly within this stack due to cancellations described in Refs.[33,36]; however, by an argument listed below, we observe that such cancellations will not eliminate the bulk of the remaining CNOT cost in our final circuit.
To conclude, we comment on the cancellations among CNOT gates induced by this algorithm.Within a cluster of commuting Pauli strings acting on n qubits, we may have in general N ≤ 2 n such strings and in fact we typically have N n.So, without any CNOT gate cancellation, a simulation of each Pauli string independently could in principle require O(nN ) CNOT gates on the circuit before all one-qubit rotations have been performed and equally as many after (i.e., in the stack).However, on the stack there can only be O(n 2 ) such gates as we have argued above, so cancellation among CNOT gates in this stack alone can yield a factor of ∼ 2 savings in CNOT cost; the bulk of possible CNOT gates in the stack will cancel for large enough N n.Moreover, we find that at least for our models in consideration, the CNOT cost turns out to be only O(N ), due to other savings among the CNOT gates performed before one-qubit rotations.In Appendix A we establish ∼ N to be a generic lower bound on the number of CNOT gates entailed for direct simulations of N (diagonal) Pauli strings.
We also stress that the use of a tree is a data structure choice.For example, it is possible to store the Pauli strings in a list instead and perform the same CNOT conjugations according to our procedure.However, processing this list can result in more classical operations surpassing the cost of the binary tree.
The above discussion summarizes the method to decompose the time evolution into ordinary quantum gates.It is important to note that steps 4 and 5, the most involved in the whole method, can be bypassed at the expense of arriving at an algorithm with a larger number of CNOT gates.In fact, we can use H and S gates to diagonalize each of the Pauli strings separately; however, the increase in CNOT cost is substantial in the cases we considered.

III. EXAMPLES
We now apply the general method outlined in the previous section to a couple of specific models.These examples -random diagonal Hamiltonians and Z 2 gauge theory -are discussed here in order to establish the method as competitive with other methods one might consider.The Z 2 gauge theory is very simple and, actually, has been simulated in real quantum computers (with tiny lattices).

A. Random Diagonal Hamiltonians
To assess the relative performance of our method discussed particularly in step 5 above, we consider CNOT costs to simulate already diagonalized n-qubit Hamiltonians, which may be written as a linear combination of N ≤ 2 n operators {I, Z} ⊗n .
Consider first the N = 2 n case.It is known that the time evolution operator U = exp(−iHδt) due to such a Hamiltonian can be implemented exactly with O (n 2 n ) CNOT gates because each Pauli string can be implemented with at most (n−1) CNOT gates.A previous study [37] showed that this unitary operator can be realized with 2 n+1 − 3 CNOT gates, roughly twice the minimal cost.In Appendix B, we show that our method produces a circuit with 2 n − 2 CNOT gates for arbitrary n ≥ 2 qubits, achieving the optimal cost derived in Ref. [37].This result lends credence that the method we outlined is competitive with any other method currently available.
In the N < 2 n case, there are many choices of Hamiltonians and the final circuit depends on the specific Pauli strings contained in the Hamiltonian.Thus, we applied our compilation method to a random sample of M = 10 Hamiltonians with N diagonal Pauli strings and consider how the number of CNOT gates depends on N .There exist 2 n −1 N distinct sets of N Pauli strings, and so there are certain cases where the number of possible Hamiltonians is less than M ; in these cases, we would have explored all the < M possible Hamiltonians of N Pauli strings.Otherwise, the M Pauli strings we have explored comprise only a small subset of the Hamiltonians with N Pauli strings.Of course, we expect that increasing the value of M will produce yet more precise estimates of average CNOT cost.For each value of N , we calculate a mean and standard error on the mean.
A recent study [38] formulated the problem of minimizing the CNOT gates as solving the Traveling Salesman Problem (TSP) (see e.g., Ref. [26]).In this case, the vertices of the graph correspond to commuting Pauli strings and the weight on each edge is given by a CNOT-cost function they define.Given that the TSP is in general NP-Complete [26], they use the Christofides-Serdyukov (CS) algorithm, which returns a solution with a CNOT cost that is at most a factor of 1.5× the minimal cost [39,40].Moreover, they demonstrate that their ordering of Pauli strings on a quantum circuit based on the CS algorithm outperforms standard solutions such as those based on lexicographical ordering [41,42], "Deplete Group," and "Magnitude" ordering [43].We therefore apply the method in Ref. [38] on the diagonal Pauli strings we generated.Figure 5 shows the CNOT cost with the number of Pauli strings for n = 8 qubits.We also include a lower bound derived in Appendix A. The resulting CNOT gate counts are comparable for small N .However, as N increases, our procedure yields a smaller number of CNOT gates.In addition, the count from this current method approaches this lower bound while the other does not.

B. Z2 Gauge Theory
As a first example of a gauge theory, we consider the smallest nontrivial group, Z 2 , for our gauge symmetry.Despite involving only a small discrete, abelian group, its simplicity as well as its familiarity in previous literature (e.g., Refs.[44][45][46]) on quantum simulations of lattice gauge the- ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ • Current Method The blue data points are obtained using the compilation method described in step 5.The red data points are obtained using the method in [38].A lower bound for each case is the number of Pauli strings acting nontrivially on more than one qubit, as derived in Appendix A. Uncertainty bars derived from statistical sample sizes of 10 Pauli strings per choice of N are smaller than the data points depicted above.
ories make it a natural candidate for introducing our methods.Furthermore, the case of a Z 2 gauge coupled to staggered fermions yields a slightly less trivial example, where comparison of our relatively low circuit depth with that of circuits suggested in past studies [46] will show an early benefit to our methods of circuit compilation.
In particular, we consider here staggered fermions in 1 + 1 dimensions, as given in Ref. [46] and similar to a related model in Ref. [44].There, an ancillary qubit was used to simulate each term of the Hamiltonian where L is the number of fermion sites, m is the fermion mass, σ k (i) denotes Pauli matrices σ k applied to the subspace for the ith fermion site, and σ k (i, i + 1) denotes σ k applied to the subspace for the gauge link connecting sites i and i + 1.
Choosing L = 4 with open boundary conditions, an encoding of the global Hilbert space with only seven qubits is sufficient.However, with an extra eighth, ancillary qubit included in the methods of Ref. [46], there are fewer op-portunities for rearrangement and cancellation of CNOT gates on the quantum circuit.Consequently, the total CNOT cost for simulation of one time step was found to be 36.Using the methods of section Sec.II, one can perform the same simulation without this ancilla and therefore reduce a CNOT cost to only 18.The Hamiltonian in Eq. ( 16) is a linear combination of 13 Pauli strings.Using the sequential algorithm described in step 3 in Sec.II, we obtain three clusters of commuting Pauli strings, containing seven, four, and two Pauli strings.Each cluster was diagonalized as discussed in Sec.II before implementing our routine to construct the quantum circuit for each of the corresponding diagonalized clusters of Pauli strings.Setting the fermion mass m = 1, the whole quantum circuit is shown in Fig. 6.
This benefit for a relatively simple model shows promise that our methods for circuit compilation will help keep CNOT cost relatively low compared to standard methods as we proceed to consider larger gauge groups.

IV. QUBITIZATION OF NON-ABELIAN GAUGE THEORIES
We will now consider a broader class of models where the gauge dynamical degrees of freedom can be thought as lying on the links of a spatial cubic lattice.The Hilbert space is the product of the local Hilbert spaces of each link (on q qubits): H = (C 2 q ) ⊗ with a Hamiltonian of the form where U x,µ is an operator acting on the Hilbert space of the link connecting sites x and x + µ5 .
The "kinetic" term K will be specified later as it depends on the model considered.In a standard formulation of quantum SU (N ) gauge theories, each entry of the N × N unitary matrix U x,µ is itself an operator acting on an infinitedimensional Hilbert space.Here, instead, we take each entry to act on a finite-dimensional operator.Thus, for SU (2) gauge theory, on which we will focus the remainder of this section, we take , where the matrices Γ α (α = 1, 2, 3, 4) will depend on the specific model.Notice that for each x and µ, U x,µ carries two pairs of matrix indices: one taking values 1, 2 and another taking values up to the dimension of Γ α .Furthermore, a local SU (2) gauge transformation corresponds to an assignment of a different element of SU (2) at each site.Under gauge transformations the link operators U transform as with a, b = 1, 2 and two elements of SU (2), L = exp(iα k σ k /2) and R = exp(iβ k σ k /2).These transformations clearly leave the plaquette term in Eq. ( 17) unchanged.They are implemented within the Hilbert space as with ) with the same dimension as Γ α .In Ref. [47] it was shown that the 10 matrices Γ α , J L k , J R k satisfy Eq. ( 19) if they are chosen to be the generators of a representation of SO (5).As such, a choice of irreducible representation (irrep) of SO( 5) gives a model with a finite-dimensional Hilbert space and exact SU (2) gauge symmetry and, therefore, a candidate for a qubitization of SU (2) gauge theory.Generically, the smaller irreps yield lower-dimensional Hilbert spaces for the theory, and fewer quantum gates are expected to represent its time evolution.The two smallest irreps are discussed here.

A. N = 4 (spinor) irrep
The smallest irrep of SO( 5) is the fourdimensional spinor representation given by However, this list of operators so far only suggests the form of a plaquette term, while is proportional to the identity.A theory with only a single plaquette term cannot have a continuum limit, because the eigenvectors of the Hamiltonian are independent of the only parameter in the theory, g 2 .Therefore, the correlation length of the ground state cannot be made arbitrarily large (in units of lattice spacing).As such, it is necessary to include another term to make eigenvectors and, consequently, correlation lengths tunable.In the four-dimensional representation, there is one term Figure 6.Quantum circuit approximately simulating one time step for a Z2 gauge theory with staggered fermions in 1 + 1 dimensions for L = 4 lattice sites without a periodic boundary condition.We use the first qubit to encode the first fermion site, the second qubit for the first gauge link, the third qubit for the second site, and so on.The boxes denote the clusters of commuting Pauli strings, which were partitioned as follows: the Hamiltonian h1 consists of the one-body terms of Eq. ( 16), while h2 consists of the multi-site terms acting on the gauge links (1, 2) and (3,4), and h3 likewise consists of the multi-site terms acting on link (2, 3).We define the phase φ ≡ ∆t/2. where Together, the plaquette term and K form a plausible qubitization of the SU (2) gauge theory.This model was first considered in Ref. [47] where it was speculated to be in the same universality class as SU (2) gauge theory in 3+1 spacetime dimensions.In order to encode this formulation on a quantum computer, only two qubits per link are required.Now, we discuss the results of applying our method to this model.The K term consists of four Pauli strings, which are all diagonal onequbit operators; they can all be implemented without any CNOT gate.Next, we turn to the plaquette term in Eq. ( 17).After tracing out the shared two-dimensional space, we can write it as By a counting argument or explicit computation, one finds that only 64 of the coefficients c ijkl are nonzero.In addition, each Γ k in this representation corresponds to a single Pauli string.Therefore, the expansion of the plaquette term results in 64 Pauli strings.
Using the sequential algorithm described in step 3, these latter 64 operators can be grouped into four clusters, each containing 16 Pauli strings.We used 2 × 28 = 56 CNOT gates altogether to diagonalize these clusters.With each cluster diagonalized, we can apply the method developed in step 5 to compile the time evolution circuit.The total CNOT gate count to compile the time evolution of all the diagonal clusters is 124.Thus, we use 56 + 124 = 180 CNOT gates per plaquette to simulate a time evolution step of this model.

B. N = 5 (fundamental) irrep
In a five-dimensional truncation, we consider the fundamental representation of SO (5), given by Here, a non-trivial "kinetic term" of the form (l indexes the links) is possible; note here This model is sometimes called the Horn model [48]. 6 Three qubits are required to encode the fivedimensional Hilbert space in each link.Since 2 3 = 8 = 5 + 3, three directions are "wasted" in this encoding.The matrix elements of the Hamiltonian involving these extra three dimensions are unphysical and can be chosen arbitrarily, with no change in the Physics.Unfortunately, we have not been able to exploit this freedom to simplify the quantum circuits.Notably, one may attempt to encode the global Hilbert space at once in a smaller basis of 4 log 2 (5) = 10 qubits, where a given qubit may actually include information from the local Hilbert spaces of multiple links.However, though one might expect to reduce circuit depth by reducing the number of qubits in the encoding, we instead find that such an encoding results in an even greater circuit depth, as it requires more terms in the Hamiltonian in order to derive an operator that is applied only on the desired gauge link. 6There has been work to generalize the Horn model by including the coupling to gradually larger representations of SU (N ) for N ≥ 2; however, while this effort may help extrapolate away the error of this truncation of the links [2,5], it necessarily relinquishes the relatively low Hilbert space dimension of the Horn model.
Quantum circuit simulating a time step for the kinetic term K as encoded in Eq. (25).Here, we have set φ ≡ 3δt/8.
The operator K can be decomposed into Pauli strings as (a sum over links of) where we have appended each K(l) with zero eigenvalues to encode over three qubits.We note in passing that other paddings of these operators are possible, but we did not find significant changes to gate counts.The operator exp(iφIII) corresponds to a global phase, having no effect on measurements, and is ignored.Because these Pauli strings are already diagonal, we can immediately apply the routine discussed in the step 5 to obtain the quantum circuit in Fig. 7.Note that this circuit has to be applied to every link individually.Therefore, the cost of applying K for an elementary plaquette is 16 CNOT gates.
As discussed in Sec.IV A, there are 64 terms in the 4 i,j,k,l=1 c ijkl Γ ijkl expansion of the plaquette term.In this representation, however, each Γ i can be decomposed into four Pauli strings.Consequently, each Γ ijkl decomposes into 4 × 4 × 4 × 4 = 256 Pauli strings, and each plaquette therefore is a combination of 64 × 256 = 16, 384 Pauli strings.By the sequential algorithm discussed in step 3 of the method, we find that these Pauli strings can be grouped into 64 clusters, each containing 256 commuting Pauli strings.We diagonalize each cluster as per step 4 and compile the circuit of each resulting diagonal cluster as per step 5.
As expected, most of the complexity of this procedure lies in compiling the circuit of the diagonal cluster.In fact, the total number of CNOT gates required for diagonalizing all the clusters is 2 × 320 = 640.On the other hand, the total number of CNOT gates needed to compile all 64 diagonal clusters is 16,512.Therefore, we use 17,152 CNOT gates to compile one plaquette term of the Hamiltonian.Since 16 CNOT gates were used to compile the kinetic part of the Hamiltonian, we obtain a circuit with 17168 CNOT gates, or ≈ 1.05 CNOT gates per Pauli string.
It is worth mentioning a recent proposal to exploit the symmetries of wave functions satisfy-ing the Gauss's law condition of SU (2) lattice gauge theory to more sparsely (i.e., in fewer dimensions) encode the nontrivial representations of SU (2) × SU (2) on qubits [10,49].In particular, the space of wave functions for lattices whose vertices each see exactly three links meeting can be constrained at the classical level (i.e., before encoding on qubits) in a relatively simple fashion to solve the SU (2) Gauss's law condition of the theory automatically and thereby reduce the 2 ⊕ 2 representation subspace to one physical dimension.This design results in a model requiring exactly one qubit per gauge link, however severely restricting the geometry of gauge links and vertices.In order to avoid this restriction and more broadly allow for arbitrary d spatial dimensions, we consider a square lattice geometry and focus our attention to the aforementioned encoding of each gauge link on qubits separately without yet imposing Gauss's law, ensuring a more modular nature to this procedure upon which we can build larger lattice model simulations.Nonetheless, the methods we present may also be applied to a Hamiltonian with a different geometry or encoding, and we will return to a discussion of non-cubic lattice geometries later in Sec.V.

V. DISCUSSION
In this paper, we have offered a contribution to the very thorny and important problem of understanding, in detail, the resource requirements for quantum simulations-including but not limited to SU (2) gauge theories.In particular, we assembled state-of-the-art quantum circuit compilation methods and showed specifically how SU (2) gauge theory simulations would be prohibitive on digital quantum computers in the current era.Moreover, we can see from these methods that larger-dimensional gauge links for these theories will also necessitate quantum error correction codes to address the severe noise introduced by such large resource estimates.We hope that other efforts will build upon this work as a foundation, either to develop more efficient resource usage or perhaps to show these circuits to be as efficient as we can obtain.
We believe that the compilation method we used is at least competitive with the other methods currently available, as it performs well in several cases where other methods have been tried or rigorous bounds are known.Nevertheless, the compilation method is very generic, as it applies to any finite-dimensional Hamiltonian and makes no use of specific features of gauge theories.
In particular, the fact that the physical Hilbert space is a small subspace of the theory (as many states are just gauge copies of each other) was not exploited here.Conversely, it is not obvious how to utilize gauge invariance in this formulation, since it is generally acknowledged that rewriting a gauge theory in terms of gauge-invariant quantities leads to non-localities, which could make the circuit depths even larger.Still, this reduction is a possible avenue for improvement, and perhaps a gauge-fixed version of the formalism can also be used for this purpose.
There is another sense in which the large Hilbert spaces we encountered could be shrunk.For instance, the five-dimensional local Hilbert space of the Horn model was encoded onto three qubits, leaving 2 3 − 5 = 3 dimensions unused.This redundancy is present in any qubitization with a local Hilbert space whose dimension is not a power of 2. It is possible that some alternative encoding of these links will permit a quantum circuit to properly evolve the five-dimensional subspace that actually describes each link of the model and yet has a different action on the remaining three dimensions from what we have prescribed.At this time, we do not have a systematic way of exploiting this freedom to shorten circuits.
Finally, a key point: the large circuit depths we found stresses the importance of having a field space truncation with small dimensions.In particular, this point reinforces the need for qubitization in the same universality class as the continuum model.This way, the final continuum limit will require an increase in the number of spatial lattice sites (linearly increasing the number of qubits required yet not the circuit depth) but not that the field truncation be lifted-a step that would enormously increase the circuit depth as we saw in the examples shown here.
an one-qubit operator, as described at the beginning of step 5 in Sec.II.We argue that at least one CNOT is required between the rotation gates for simulating diagonal exp(iφ i P i ) and exp(iφ j P j ) sequentially, for j = i and with P i or P j consisting of at least two nontrivial factors.Here, the only restriction on the phases φ i and φ j is that they cannot both be equal to odd integer multiples of π.
Since P i and P j are diagonal, they may contain only factors I (trivial) and Z (non-trivial).Let ω [P i ] be the number of non-trivial factors in P i , or equivalently the Hamming weight of P i (i.e., ω[I] = 0 and ω[Z] = 1).In addition, let C i be a product of CNOT gates whose conjugation of P i reduces to a string of Hamming weight 1.We will prove that C i = C j , implying the existence of at least one CNOT gate that may not be canceled when implementing P i and P j in sequence.
We can readily observe that C i = C j if ω [P i ] = ω [P j ], as this assumption implies that C i and C j are sequences of different lengths.Therefore, we turn our focus to the case ω [P i ] = ω [P j ] = k ≥ 2, for which we will prove C i = C j by induction on k.
First, consider the base case k = 2: Let P i = P j and ω [P i ] = ω [P j ] = 2. Perform a CNOT a,b gate conjugation, with a and b chosen such that P i → P i with ω [P i ] = 1.Suppose the same CNOT conjugation also results in ω P j = 1.Then Eq. ( 11) implies necessarily P i = P j , contradicting the assumption P i = P j .(To see this conclusion, repeat the same conjugation again to P i and P j , reproducing P i and P j respectively, which both have Hamming weight 2. Consequently, we see that P i and P j must each act nontrivially only on qubit b, meaning P i = P j .)Therefore, there is a distinct CNOT conjugation needed to implement P j , and hence C i = C j .
Having taken care of the base case, we state the induction hypothesis: there exists an integer k ≥ 2 such that if P i = P j while ω [P i ] = ω [P j ] = k then C i = C j .Finally, we prove the inductive step; consider the case where P i = P j while ω [P i ] = ω [P j ] = k + 1. Perform a CNOT a,b conjugation resulting in P i and P j such that ω[P i ] = k.If ω[P j ] = k, then we are done, by the observation made before the beginning of this induction proof.Now, if ω[P j ] = ω[P i ] = k, we consider the remaining cases: P i = P j (already yielding a contradiction to our assumption P i = P j by a similar reasoning as in the base case) and P i = P j .In this latter case, we may invoke the inductive hypothesis, to see that C i and C j corresponding to P i and P j are not equal.Therefore, we also have C i = C j , completing our inductive step.
Iterating this argument over all consecutive pairs of the N diagonal Pauli strings in a given order for Trotterized time evolution, we find that there must be greater than N 2 CNOT gates used to implement the complete circuit, where N 2 is the number of Pauli strings with Hamming weight greater than one, which means N − (n + 1) ≤ N 2 ≤ N .In addition, when considering our method prescribed in Sec.II, there is a cost of uncomputing the CNOT conjugations instantiated with each C i ; however, we argue in step 5 of our procedure that this cost is O(n 2 ).In kind, the diagonalization of Pauli strings has a CNOT cost of O(n 2 ) as well.For N n, we thus expect a scaling in CNOT cost that is at least ∼ N .
It is worth noting that relative CNOT cost of simulating consecutive Pauli strings with one ancilla was found to define a metric space over these strings [38], which would imply a generic lower bound cost of at least ∼ N as well.However, we do not find this formulation for assessing relative CNOT costs to be applicable to our method.In this section, we prove for general n ≥ 2 that, given a diagonal Hamiltonian with all 2 n −1 Pauli strings besides the identity, our algorithm realizes the exact time evolution with precisely 2 n −2 CNOT gates.To simplify the wording of our proof, we also include the identity Pauli string, so that we have 2 n Pauli strings, contributing only an extra global phase and without changing the CNOT gate count.
First, recall our tree traversal procedure for compiling the circuit of a diagonal Hamiltonian in step 5 of Sec.II.We perform a CNOT gate conjugation once we encounter a node of value 1 that has a parent with value 1, flipping the value of the child node in the process.For the tree representing a Hamiltonian composed of 2 n Pauli strings, every node has an equal number of children with values 0 and 1 at each deeper level.Consequently, any CNOT gate conjugation preserves the number of children nodes of value 1 that have a parent with value 1, even if we perform the CNOT conjugation on multiple branches simultaneously.Since a CNOT conjugation is applied when we encounter such a child node, the number of CNOT conjugations is precisely the number of such children.With these observations in hand, we are ready to find the CNOT cost of a circuit prescribed by our procedure.
Neglecting the root of the tree, we label the levels by l = 1, . . ., n.The number of level l nodes that have a parent node with value 1 is 2 l−1 − 1 in a complete binary tree (a binary tree in which every node besides the leaves has 2 children).Then, the total number of such nodes is n l=1 2 l−1 − 1 = 2 n − (n + 1), which corresponds to the total number of CNOT conjugations.As discussed in step 5, this quantity is precisely the number of CNOT gates that are implemented on the circuit before inserting any of the CNOT gates from the stack into the circuit.Thus, determining the overall CNOT cost has been reduced to calculating the remaining number of CNOT gates in the stack.Having every possible diagonal Pauli string in our Hamiltonian, we must perform ev-ery possible CNOT ji with j > i as prescribed.For each i, j, the number of CNOT ji gates in the stack is odd if and only if j − i = 1 because every node with value 1 at a level i has precisely 2 (j−i)−1 children at level j > i with value 1.Having the same target qubit, these CNOT gates are all consecutive in the stack, and we may cancel all the CNOT gates with j − i = 1, leaving only one of each CNOT gate with j − i = 1, of which there are n − 1.Hence, the total CNOT cost is therefore 2 n − (n + 1) + (n − 1) = 2 n − 2.
respectively.The resulting diagonal Pauli strings are together V H V † = IZI − IZZ + ZII, where we denote the diagonalizing circuit by V .Figure 1 depicts the circuit diagonalizing H.In general, it is possible to simultaneously diagonalize an arbitrary cluster of commuting Pauli 2 From now on, we omit the tensor product sign.

Figure 2 .
Figure 2. Quantum circuit implementing the factor exp(−icjZZZδt).(a) No ancillary qubit is used.Notably, there are other combinations of CNOT conjugations that one may perform to arrive at other equivalent circuits implementing this operator.(b)One ancillary qubit, prepared as |0 , is the target of each CNOT and the site of the rotation, as prescribed in this more common method (c.f.Ref.[34]).

Figure 3 .
Figure 3. Tree representation of IIZ + IZI + IZZ + ZZZ.The value inside each node along a branch indicates the gate I or Z on each qubit for the corresponding Pauli string term.For a general linear combination, the coefficient of a Pauli string is stored with its corresponding leaf.

Figure 4 .
Figure 4. Circuit simulating one Trotterized time step with the Hamiltonian H = IIZ + IZI + IZZ + ZZZ.Here δt is the Trotterization time step.The qubits are ordered from top to bottom (i.e., as increasing levels in the tree structure of Fig.3).

Figure 5 .
Figure 5.The average CNOT gate count required to simulate a diagonal Hamiltonian as function of the number of Pauli strings for n = 8 qubits.The blue data points are obtained using the compilation method described in step 5.The red data points are obtained using the method in[38].A lower bound for each case is the number of Pauli strings acting nontrivially on more than one qubit, as derived in Appendix A. Uncertainty bars derived from statistical sample sizes of 10 Pauli strings per choice of N are smaller than the data points depicted above.
Appendix B: CNOT gate count for a diagonalHamiltonian with N = 2 n Pauli Strings