Speeding up Learning Quantum States through Group Equivariant Convolutional Quantum Ans\"atze

We develop a theoretical framework for $S_n$-equivariant convolutional quantum circuits with SU$(d)$-symmetry, building on and significantly generalizing Jordan's Permutational Quantum Computing (PQC) formalism based on Schur-Weyl duality connecting both SU$(d)$ and $S_n$ actions on qudits. In particular, we utilize the Okounkov-Vershik approach to prove Harrow's statement (Ph.D. Thesis 2005 p.160) on the equivalence between $\operatorname{SU}(d)$ and $S_n$ irrep bases and to establish the $S_n$-equivariant Convolutional Quantum Alternating Ans\"atze ($S_n$-CQA) using Young-Jucys-Murphy (YJM) elements. We prove that $S_n$-CQA is able to generate any unitary in any given $S_n$ irrep sector, which may serve as a universal model for a wide array of quantum machine learning problems with the presence of SU($d$) symmetry. Our method provides another way to prove the universality of Quantum Approximate Optimization Algorithm (QAOA) and verifies that 4-local SU($d$) symmetric unitaries are sufficient to build generic SU($d$) symmetric quantum circuits up to relative phase factors. We present numerical simulations to showcase the effectiveness of the ans\"atze to find the ground state energy of the $J_1$--$J_2$ antiferromagnetic Heisenberg model on the rectangular and Kagome lattices. Our work provides the first application of the celebrated Okounkov-Vershik's $S_n$ representation theory to quantum physics and machine learning, from which to propose quantum variational ans\"atze that strongly suggests to be classically intractable tailored towards a specific optimization problem.

One of the most important neural network architectures in classical machine learning are Convolutional Neural Networks (CNNs) [24][25][26][27][28]. In recent years, CNNs have also found applications in condensed matter physics and quantum computing. For instance, [29] proposes a quantum convolutional neural network with log N parameters to solve topological symmetry-protected phases * hanz98@uchicago.edu † lizm@mail.sustech.edu.cn ‡ junyuliu@uchicago.edu § ss870@cam.ac.uk ¶ risi@cs.uchicago.edu in quantum many-body systems, where N is the system size. One of the key properties of classical CNNs is equivariance, which roughly states that if the input to the neural network is shifted, then its activations translate accordingly. There have been several attempts to introduce theoretically sound analogs of convolution and equivariance to quantum circuits, but they have generally been somewhat heuristic. The major difficulty is that the translation invariance of CNNs lacks a mathematically rigorous quantum counterpart due to the discrete spectrum of spin-based quantum circuits. For example, [29] uses the quasi-local unitary operators to act vertically across all qubits.
In quantum systems there is a discrete set of translations corresponding to permuting the qudits as well as a continuous notion of translation corresponding to spatial rotations by elements of SU(d). Combining these two is the realm of so-called Permutational Quantum Computing (PQC) [30]. Therefore, a natural starting point for realizing convolutional neural networks in quantum circuits is to look for permutation equivariance. In one of our related works [31], we argued that the natural form of equivariance in quantum circuits is permutation equivariance and we introduced a theoretical framework to incorporate group-theoretical CNNs into the quantum circuits, building on and generalizing the PQC framework to what we call PQC+ [31].
In this paper, we further explore PQC+ and its significance for machine learning applications. Roughly speak-ing, PQC+ machine consists of unitary time evolutions of k-local SU(d)-symmetric Hamiltonian. As a feature, Schur-Weyl duality between the aforementioned S n and SU(d) actions on qudits systems appears naturally and will be used throughout the paper. Most importantly, it indicates that any SU(d) symmetric quantum circuits can be expressed in S n irreducible representations (irreps). Exploiting the power of S n representation theory in quantum circuits towards NISQ applications is thus the central theme of the paper. The representation theory of S n has been found to be a powerful tool in various permutation equivariant learning tasks, e.g., learning set-valued functions [32] and learning on graphs [33,34]. Most applications of permutation-equivariant neural networks work with a subset of representations of S n . In contrast, in physical and chemical models where the Hamiltonian exhibits global SU(d) symmetry, such as the Heisenberg model, it is necessary to consider all the S n irreps (a detailed explanation of this significant insight can be found in Section V). However, even the best classical Fast Fourier Transforms (FFTs) over the symmetric group S n require at least O(n!n 2 ) operations [35,36], which dashes any hope of calculating the Fourier coefficients even for relatively small n. Indeed, despite increasing realization of the importance of enforcing SU (2) symmetry, none of the neural-network quantum state (NQS) ansätze are able to respect SU(2) symmetry for all SU (2) irreps, due to the super-polynomial growth of the multiplicities of irreps and the super-exponential cost to compute Fourier coefficients over S n . Finding variational ansätze respecting continuous rotation symmetry is desirable because it not only helps to gain important physical insights about the system but also leads to more efficient simulation algorithms [37,38].
Motivated by the class of problems with a global SU(d) symmetry, in Section III, we construct what we call the variational S n -equivariant Convolutional Quantum Alternating ansätze (S n -CQA), which are products of alternating exponentials of certain Hamiltonians admitting SU(d) symmetry. This is a concrete example of the PQC+ framework and may also be thought as a special case of QAOA with SU(d) symmetry. Using the Okounkov-Vershik approach [39] to S n representation theory as well as other classical results from the theory of Lie group and Lie algebra [40,41] we prove that S n -CQA generates any unitary matrix in each given S n -irrep block decomposed from an n-qudit system, hence it acts as a restricted universal variational model for problems that possess global SU(d) symmetry (Theorem 1). Consequently, it can be applied to a wide array of machine learning and optimization tasks that exhibit global SU(d) symmetry or require explicit computation of high dimensional S n -irreps, presenting a quantum super-exponential speed-up. Our proof techniques are of independent interest and we provide two more applications. It is shown in [42,43] that QAOA ansätze generated by simple local Hamiltonians are universal in the common sense. Forgetting the imposed symmetry, we use our techniques to derive the universality for a different but related class of QAOA ansätze with a richer set of mixer Hamiltonians (Theorem 2). In addition, we find a 4-local S n -CQA model which is universal to build any SU(d) symmetric quantum circuits up to phase factors (Theorem 3) with awareness of the fact that 2-local SU(d) symmetric unitaries cannot fulfill the task when d ≥ 3 [44,45]. Consequently, when compared with other SU(d) symmetric ansätze, products of exponentials of SWAPs (eSWAPs) proposed in [46] admit the restricted universality with SU(d) symmetry only when d = 2 and the CQA model is universal in general cases when restricted to any one of S n irreps.
In Section IV we explore more details about Schur-Weyl duality on qudits systems. To be specific, talking about S n or SU(d) irrep blocks in a qudits system requires using the Schur basis, instead of the computational basis. The Schur basis can be constructed by either SU(d) Clebsch-Gordan decomposition [30,47,48] or by the S n branching rule [49][50][51], which yields two ways to label the basis elements by either SU(d) Casimir operators or by the so-called YJM-elements used in Okounkov-Vershik approach. We rigorously demonstrate the equivalence of these labeling schemes (Theorem 4), first conjectured in Harrow's thesis [47], (see also discussions by Bacon, Chuang, and Harrow [52] and Krovi [51]). As a result, we find a state initialization method, using constantdepth qudit circuits, to produce linear combinations of Schur basis vectors, which may be preferred in NISQ devices rather than implementing a Quantum Schur transform (QST). We show that the measurements taken for variationally updating parameters in S n -CQA can be efficiently calculated on the Schur basis, while similar conclusion is unlikely to be drawn classically.
In the numerical part V, we illustrate the potential of this framework by applying it to the problem of finding the ground state energy of J 1 − J 2 antiferromagnetic Heisenberg magnets, a gapless system with no known sign structure or analytical solution in quantum many body theory. We compare our model with classical and quantum algorithms like [38,46,53]. We emphasize the consequence of the failure of Marshall-Lieb-Mattis theorem [54][55][56] in the frustrated region with which classical neural networks struggle to discover the sign structure to an admissible accuracy due to violation of global SU (2) symmetry [57,58]. We include numerical simulation to show the effectiveness of the S n -CQA ansätze in finding the ground state with frustration using only O(pn 2 ) parameters for p alternating layers. Noisy simulations are also provided to show the robustness of S n -CQA.
Our theoretical results about SU(d) symmetry can be reversed to exhibit S n permutation symmetry. We define SU(d)-CQA on SU(d)-irrep blocks and leave it to future research work to explore its theoretical and experimental potential. All statements and theorems discussed in the main text are proved in full detail in the Supplementary Materials (SM).

II. BACKGROUND ON REPRESENTATION THEORY OF THE SYMMETRIC GROUP
In this section we define some of the mathematical concepts and notations used in the rest of the paper. Further details can be found in [49,50,59].
We treat V ⊗n as the Hilbert space of an n-qudit system. The so-called Schur-Weyl duality reveals how the above two representations are related.
Schur-Weyl duality is widely used in quantum computing [60], quantum information theory [47] and high energy physics [61]. In particular, in Quantum Chromodynamics it was used to decompose the n-fold tensor product of SU(3) representations. In that context, standard Young tableaux are referred to as Weyl-tableaux and labeled by the three iso-spin numbers (u, d, s). The underlying Young diagrams containing three rows λ = (λ 1 , λ 2 , λ 3 ) are used to denote an SU(d) irreducible representation (irrep). There is another way using Young diagrams λ ′ = λ 1 − λ 3 , λ 2 − λ 3 ) labeled by Dynkin integers via highest weight vectors. In short, there are two conventions in literature to denote SU(d) irreps. On the other hand, S n irreps can also be denoted by Young diagrams [59]. Schur-Weyl duality says that irreps of SU(d) and S n are dual in the following sense and denoted by the same Young diagrams with n boxes and at most d rows.
Theorem (Schur-Weyl Duality). The action of SU(d) and S n on V ⊗n jointly decompose the space into irreducible representations of both groups in the form where W λ and S λ denote irreps of SU(d) resp. S n , and λ ranges over all Young diagrams of size n with at most d rows. Consequently, where m SU(d),µ = dim S µ and m Sn,λ = dim W λ .
One can easily verify that π SU(d) and π Sn commute (further properties are described in the SM). Consider the symmetric group algebra C[S n ] consisting of all formal finite sums f = i c i σ i . Its representation is theñ π Sn (f ) = i c i π Sn (σ i ). When there is no ambiguity, we denote by U σ or simply σ the representations π Sn (σ).
Working from the perspective of Schur-Weyl duality requires, at least theoretically, using the Schur basis rather than the computational basis. A conventional way to build such a basis is conducting sequential coupling and Clebsch-Gordan decompositions of SU(d) representations [30,47,48] which transform common matrix representations of SU(d) into irrep matrix blocks like in Fig.2 (a). Since our focus are ansätze, operators and quantum circuits with SU(d) symmetry which commute with π SU(d) , and since Schur-Weyl duality and the double commutant theorem (see SM) say that they must be established from the group algebra C[S n ], we need to explore S n irreps blocks as in Fig.2 (b) however. We are going to introduce a method to decompose permutation matrices in this picture and explain basic notions in S n representation theory and the Okounkov-Vershik approach, since they are essential to understand the theoretical results in this paper.
We first consider the so-called permutation module M µ . In the case of qubits, permutation modules correspond to sets of Schur basis elements having different read-out on total spin components. Fortunately there is an accessible way to understand M µ in the tensor product space V ⊗n . To make things simpler, consider the d = 2 case of SU(2) − S n duality on (C 2 ) ⊗n . Only tworow Young diagrams λ = (λ 1 , λ 2 ) appear in this duality and the half of difference 1 2 (λ 1 − λ 2 ) between the lengths of the two rows gives the total spin of the SU(2)-irrep W λ . The permutation module M µ is isomorphic with the linear span of all computational basis vectors with z-spin components equal to 1 2 (µ 1 − µ 2 ). Note that, (C 2 ) ⊗n = µ M µ and each M µ is invariant under S n permutation. Furthermore, M µ can be further decomposed into S n -irreps. For two-row Young diagrams (SU(2) − S n duality), the decomposition is easy: M µ = λ≥µ S λ where λ, µ have the same size n and we use the dominance order λ ⊵ µ if λ 1 ≥ µ 1 . In summary, we have Isomorphic copies of S λ come from different permutation modules. The largest permutation module contains all distinct S n -irreps in (C 2 ) ⊗n (see e.g., Fig.2 (c)). For general Young diagrams, decompositions of M µ would have nontrivial multiplicities [50,59]. is an arrangement of (b) which respects both permutation modules and S 6 irreps.

FIG. 3:
Bratteli diagram for S 6 . Upper Young diagrams connecting by arrows to lower ones arise from the decomposition. Orange arrows form a path. Note that diagrams with more than two rows cannot appear in SU(2) − S 6 duality.
Each S λ can be decomposed further with respect to S n−1 ⊂ S n as S λ = ρ S λρ , where S λρ denotes an S n−1 irrep of Young diagram ρ (with n − 1 boxes) contained in the S n irrep S λ . The so-called branching rule guarantees that the decomposition is multiplicity-free, i.e., each distinct S n−1 -irrep S λρ appears only once in the decomposition. The so-called Bratteli diagrams in Fig.3 show how how different irreps are decomposed. Continuing the decomposition process for S n−2 , ..., S 1 , the original space S λ will be written as a direct sum of 1-dimensional subspaces (S 1 -irreps are 1-dimensional): Each 1-dimensional subspace S λρσ···τ can be represented by a nonzero vector in it. Normalizing them, we obtain an orthonormal basis {|v T ⟩} of S λ called the Gelfand-Tsetlin basis (GZ) or Young-Yamanouchi basis. Indices λ, ρ, σ, ..., τ form a path in the Bratteli diagram (see Fig.3) and can be used to define a standard Young tableau T (Fig.4). Young basis vectors are in one-to-one correspondence with standard Young tableaux [59]. The branching rule is also discussed in SU(d) representation theory and some authors refer to the SU(d)-irrep basis as the GZ-basis if it is constructed in a similar manner. A more comprehensive discussion about building Schur basis by SU(d) Clebsch-Gordan decomposition and by S n branching rule is postponed in Section IV. Let us introduce the central concept used in our work: Definition 1. For 1 < k ≤ n, the Young-Jucys-Murphy element, or YJM-element for short, is defined as a sum of transpositions X k = (1, k)+(2, k)+· · ·+(k−1, k) ∈ C[S n ]. We set X 1 = 0 as a convention.
As the name indicates, this concept was developed by Young [62], Jucys [63] and Murphy [64]. Okounkov and Vershik showed that YJM-elements generate the Gelfand-Tsetlin subalgebra GZ n ⊂ C[S n ] [39], which is a maximal commutative subalgebra consisting of all centers Z[S k ] of C[S k ] for k = 1, ..., n. Another striking fact is that all YJM-elements are strictly diagonal (indeed, representation of GZ n consists of all diagonal matrices) in a Young basis {|v T ⟩} whose eigenvalues can be read out directly from the standard Young tableaux T . To be precise, let λ be a Young diagram. Since this is a 2-dimensional diagram, we can naturally assign integer coordinates to its boxes. The content of each of its boxes is determined by the x-coordinate minus y-coordinate. Suppose T is a standard Young tableau of λ. Arranging all contents with respect to T , we obtain the content vector α T . For instance, the Young diagram λ = (4, 2) has contents 0, 1, 2, −1. The specific standard tableau T in Fig.4 has content vector α T = (0, −1, 1, 2, 0, 3). Let |v T ⟩ denote a Young basis vector corresponding to T . Measuring by YJM-elements, we have X 1 |v T ⟩ = 0, X 2 |v T ⟩ = − |v T ⟩ , X 3 |v T ⟩ = |v T ⟩ and so forth. Each Young basis vector is determined uniquely by its content vector (see [39] and [50] for more details).

III. Sn CONVOLUTIONAL QUANTUM ALTERNATING ANSÄTZE
Consider the YJM-elements {X 1 , ..., X n } introduced in Section II which generate the maximally commuting subalgebra GZ n . The use of YJM-elements allows us to design the following mixer Hamiltonian: This YJM-Hamiltonian is still strictly diagonal under the Young basis. As there is an efficient quantum Schur transform U Sch (QST) which transforms the computational basis to Schur basis even for qudits with gate complexity Poly(n, log d, log(1/ϵ)) [47,51,52]. It is reasonable to assume that we can initialize any Young basis element |Ψ init ⟩ from S n -irreps via QST. Moreover, given a problem Hamiltonian H P with SU(d) symmetry, it can be written as H P = π(f ) for some f ∈ C[S n ] by the double commutant theorem and Wedderburn theorem (see Section II and SM). Inspired by QAOA (Pauli X is diagonal wrt. |+⟩ ⊗n ) [14], we thus propose the following ansatz: To summarize, we have defined a family of mixer operators H M parameterized by β i1...i N which is diagonal under the Young basis and naturally preserves each S n -irrep determined by the initialized states. Following [65,66] and one of our results [31] which interpret quantum circuits as natural Fourier spaces, we call this family of ansätze S n -Convolutional Quantum Alternating Ansätze (S n -CQA).
Deviating from the original purpose of using QAOA to solve constraint satisfaction problems, our S n -CQA are applied to problems with global SU(d) symmetry or permutation equivariance, as long as the problem Hamiltonian H P can be efficiently simulated in the circuit model. Indeed, most practical examples involve only 2-or 3-local spin-interactions such as the Heisenberg model studied in Section V, and thus by Theorem 2 of [31] can be efficiently simulated. On the other hand, the k-dependent constant in Corollary 1 from [31] has exponential scaling. So in practice, we would like to have 3-or at most 4-local terms for the YJM-Hamiltonian simulation in ansätze. We focus on the following mixer Hamiltonian consisting of only first and second order products of YJM-elements: whose evolution can be efficiently simulated in O(n 4 log(n 4 /ϵ)/ log log(n 4 /ϵ)). Below, we prove that with a mixture as in the Equation (2), the S n -CQA ansätze are able to approximate any unitary from every S n -irrep block ( Fig.2 (b)). This can be seen as a restricted version of universal quantum computation to S n -irreps. Since the 4-local S n -CQA is an all-you-need approximation algorithm within PQP+, it is strongly suggestive that the PQP+ class proposed in [31] contains circuits that can approximate matrix elements of all the S n Fourier coefficients, for a polynomial number of alternating layers p. Moreover, the 4-local S n -CQA is also the universal approximator for solutions of the problem with global SU(d) symmetry, such as the Heisenberg models, due to its nature as variational ansätze.
A. Restricted Universality of Sn-CQA Ansätze in Sn Irreps We now present the main theoretical result of this paper: the S n -CQA ansätze approximate any unitary in any S n -irrep decomposed from the system of qudits. This restricts universal quantum computation on U(d n ) to S n -irrep blocks ( Fig.2 (b)) because our ansätze preserve SU(d) symmetry. This is of interest for three reasons: (a) the density result indicates that S n -CQA ansätze is a universal approximator in PQP+ proposed in [31] and it is the theoretical guarantee of our numerical simulations. (b) The result is valid for qudits under SU(d)−S n duality and we show the advantage of working with S n as there is no need to deal with complicated SU(d) symmetry, generators in the proof. (c) When changed from the Young basis to the computational basis, i.e., forgetting the SU(d) symmetry, our results form a new proof to the universality of a broad class of QAOA ansätze. (d) It is shown in a recent work [45] that SU(d)-invariant/symmetric quantum circuits with d ≥ 3 cannot be generated by 2-local SU(d)-invariant unitaries. With the focus on locality, we verify that S n -CQA ansätze can be built by 4-local SU(d)-invariant unitaries and 4-locality is enough to generate any SU(d)-invariant quantum circuit up to phase factors.
Mathematically, we aim to show that the subgroup generated by S n -CQA ansätze is equal to the unitary group U(S λ ) restricted to S λ decomposed from V ⊗n . However, arguing directly on the level of the Lie group is complicated. Instead, we prove that the generated Lie algebra is isomorphic with the unitary algebra u(S λ ) restricted to S λ . Then combining with some classical results from the theory of Lie group [40,41] and the Okounkov-Vershik approach to S n -representation theory [39], we complete the proof. We outline our results here and put all the proof details into the SM.
Our first step is motivated by a classical result from the theory of Lie algebra: any semisimple Lie algebra can be generated by only two elements [40]. Finding these elements would be tricky and encoding them by a quantum circuit would even be infeasible, so we will adopt a different routine and solve these problems gradually. We first work on the complex general linear algebra gl(d, C) which is not semisimple, but facilitates our proof. To begin with, it is easy to find its Cartan subalgebrathe collection d(d) of all diagonal matrices. Let M be a matrix with nonzero off-diagonal elements c ij . It can be thought of as a perturbation from d(d). We want to know how large the subalgebra generated by M and d(d) would be. More precisely: be the matrix unit with entry 1 at (i, j) and 0 elsewhere. Given any matrix M , let I ⊂ {1, ..., d}×{1, ..., d} be the index set corresponding to nonzero off-diagonal entries c ij of M . Then the Lie subalgebra generated by d(d) and M contains where R ij is the 1-dimensional root space spanned by E ij .
Intuitively speaking, GZ n defined in Section II corresponds precisely to the Cartan subalgebra of gl(dim S λ , C). In proving Lemma 1, we are required to used all basis elements of GZ n rather than the n YJM-generators [39]. Thus we need to employ all highorder products X i1 · · · X i N of YJM-elements (as dim S λ increases exponentially for large number of qudits Fig.9) and that pose the first problem for a practical ansatz design, which requires k-local C[S n ] Hamiltonian in order to be efficiently simulated by quantum circuits [31]. This problem is solved in Lemma 2.6 the SM with the help of Okounkov-Vershik theorem [39,50]. We prove that the collection {X i , X k X l } of first-and second-order YJM-elements, while in general cannot form a basis for GZ n , are enough to establish Lemma 1. As a reminder, merely taking the original YJM-elements X i is not sufficient and we provide counterexamples in the SM. As X i are 2-local, this result also provides some insights on the fact that 2-local SU(d)-invariant unitaries cannot gen-erate all quantum circuits with SU(d) symmetry in the general case [44,45,67].
As another ingredient of S n -CQA ansätze, the problem Hamiltonians H P of interest are complicated in general and hard to diagonalize classically. It also forms the other part (the matrix M ) in generating the Lie algebra in Lemma 1. For the purpose of easy implementation, we show in Lemma 2 that H P only needs to be pathconnected or irreducible in the language of graph theory. A Hamiltonian is of this kind if its associated index graph G H P is connected. For example, the Pauli X and Y are path-connected while Z is not. We further prove in Lemma 3.1 in the SM that the 2-local Hamiltonian H S = n−1 i=1 (i, i + 1) defined by all adjacent transpositions (i, i + 1) ∈ S n is path-connected. We will discuss path-connectedness further after Theorem 1 as well as in Section V. It is also seen in the famous Perron-Frobenius theorem and applied to graph theory.

Lemma 2.
Let H P be a path-connected Hamiltonian. Then the generated Lie algebra ⟨d(d), H P ⟩ = gl(d, C). Consider d R (d) consisting of all real-valued diagonal matrices. Generated over R, ⟨id R (d), iH P ⟩ R = u(d).
Since YJM-elements as well as their high order products have real diagonal entries under Young basis, we concretize d(d) by {X i , X k X l }. With all these preparations, we consider the subgroup H defined in pure algebraic sense by alternating exponentials of iX k X l and iH P where H P is path-connected. To verify that H is a Lie group (with smooth structures [68]), we apply another classical theorem due to Yamabe [41,69] and conclude with: Theorem 1. Restricted to any S λ with isomorphic copies decomposed from V ⊗n , the subgroup generated by X k X l with any path-connected Hamiltonian H P equals U(S λ ). Then a S n -CQA ansatz is written as where we redefine X 1 as I with which any first-order YJM-element X i can be written as X i X 1 .
Consider the case when H P is not path-connected. That is, H P is block diagonal (after a possible re-cording of basis elements) in S λ . It is straightforward to check that Theorem 1 still holds within each sub-block of H P . Suppose our task is to find the lowest eigenstate |v 0 ⟩ of H P within S λ . There is generally no prior knowledge about which sub-block v 0 is in. The brute-force way to find the minimum is by taking a collection of initial states from each of these sub-blocks and applying the theorem repeatedly. One way to do this is by implementing the efficient QST which gives us access to all Young basis elements. The state initialization proposed in Section IV B with constant-depth may take a hit (forcing the depth of the circuit to increase) if the problem Hamiltonian is not path-connected.

B. Universality of QAQA
The first proof of the universality of the QAOA ansätze was given in [42], where the authors considered problem Hamiltonian of the first-order and second-order nearestneighbor interaction. [43] subsequently generalized the result to broader families of ansätze defined by sets of graphs and hyper-graphs. We now describe a new proof based on the techniques developed in this paper that covers novel, broader family of QAOA ansätze. More precisely, we change to the computational basis in which all tensor productsZ r1...rs := Z r1 ⊗ · · · ⊗ Z rs of Pauli basis can span any diagonal matrix. In the language of Lie algebra, Z i generates the Cartan subalgebra d(2 n ) of gl(2 n , C) (comparing with the case of GZ n = d(S λ ) under Young basis). Let H X be the uniform summation of Pauli X operators (we do not write its explicit form to avoid any confusion with the notation of YJM-elements). The Hamiltonian H X is path-connected under {|e i ⟩}. To restrict the using of high order Pauli Z operators analogously as we did for YJM-elements, we prove in Lemma 5.1 that the Hamiltonian H Z composed by {Z i , Z k Z l } are enough to establish Lemma 1 & 2 with H X in the present setting. Unlike the H Z used in [42] which contains only nearest neighbor terms Z j Z j+1 , we take all second-order products Z k Z l in our proof. The resulting Hamiltonian H Z is still simple though and the proof works for both odd and even number of qubits [43]. Moreover, replacing H X by any other path-connected Hamiltonian, e.g., an unfrustrated Heisenberg Hamiltonian with boundary condition [56], still guarantees the universality, and this fact enables one to experiment with a wide range of mixer Hamiltonians. In summary, Theorem 2. Let H X be any path-connected Hamiltonian on computational basis, the group generated by the QAOA-ansatz with H X , H Z = β kl Z k Z l equals U(2 n ), i.e., it is universal.

C. Four-Locality of SU(d)-symmetric Quantum Circuits
A well-known result in [70] states that any quantum circuit can be generated by 2-local unitaries for qubits as well as for qudits. It has been shown in a recent work [45] that this statement fails to hold when we impose the SU(d) symmetry on qudits system with d ≥ 3. Let V k denote the subgroup generated by k-local SU(d)-invariant unitaries, so V 2 ̸ = V n , where V n stands for all the irrep blocks from Fig. 2 (b). On the other hand, we use U(S λ ) in Theorem 1 which specifies one (with equivalent copies) of them in searching ground state of Heisenberg Hamiltonian in Section V).
Counting all inequivalent S λ is an interesting problem of its own, especially when studying the subgroups V k ⊂ U(d n ) induced by symmetry, but it would cause a phase factor problem: one may not be able to manipulate relative phase factors of unitaries generated in inequivalent S λ arbitrarily. We could simply ignore these phase factors as they make no difference in measurements respecting the symmetry. Then we consider SV k ⊂ V k restricted to SU(S λ ) for all S λ decomposed from V ⊗n . It is shown in [44,67,71,72] that SV 2 = SV n when d = 2. However, [44,67] prove in a pure math flavor by Brauer algebra from representation theory that the statement fails when d ≥ 3 and [45] shows by constructing an counterexample based on the qudit-fermion correspondence that V 2 is not even a 2-design. That is, the distribution of unitaries generated by random 2-local unitaries cannot converge to the Haar measure of V n . With results about CQA developed above, we prove the following theorem: Theorem 3. Ignoring phase factors, SU(d)-invariant quantum circuits can be generated by 4-local SU(d)invariant unitaries for any d ≥ 2. Using grouptheoretical notation, SV 4 = SV n ⊂ CQA.
We sketch the proof strategy and leave the details into the SM: We define the subgroup generated by CQA, still denoted by CQA for simplicity, by a 2-local path- Since second order YJM-elements are at most 4-local, one can intuitively conclude that SV 4 = SV n . Moreover, in contrast to Theorem 1 which addresses the universality restricted to one fixed S λ , we now consider all inequivalent S λ from V ⊗n and our method only handles the problem when ignoring phase factors, Thus we claim that SV 4 = SV n and they are all included in CQA because CQA contains generators with nontrivial phases: e.g., e iθI . We discuss more details about the phase factor problem by S n representation theory and show that CQA is a compact subgroup of V 4 ⫋ V n generally in the SM.

IV. CORRESPONDENCE BETWEEN SPIN LABELS AND CONTENT VECTORS
As introduced in Section II, Young basis vectors are labeled by content vectors via YJM-elements. A similar phenomenon is also seen for the SU(d) irrep basis vectors constructed by Clebsch-Gordan decompositions [30,47,48] under which the space decomposes as Fig.2 (a): they are labeled by d − 1 Casimir operators [73]. We now turn to the question whether these two labeling schemes are equivalent in a certain sense explained in the following. This was conjectured to be true in [47] and surfaced again in [51] when the author introduced an efficient Quantum Schur Transform (QST). An affirmative answer to this conjecture is crucial in this work for three reasons: (a) The Young basis is algebraic. Thus, the gate action drawing from the group algebra C[S n ] is basis-independent. In particular, it can be implemented directly in the computational basis without computing the Fourier coefficients -this is a key observation that underpins the super-exponential quantum speed-up. (b) This identification allows us to apply both classical tools from SU(d) representation theory as well as Okounkov-Vershik approach to Schur basis no matter how it is established. As an example, we show in Section IV B an efficient algorithm to generate Schur basis states required for optimization and learning tasks. (c) A detailed examination on Schur basis enables us covert all the previous results about S n -CQA to, what we call, SU(d)-CQA with S n symmetry.
For two-row Young diagrams, this conjecture was shown to be correct in [74], where the author studied the question by 1 2 -spin eigenfunctions instead of YJMelements. The general case for SU(d) − S n duality still holds and can be proven in a surprisingly easy way using YJM-elements and the Okounkov-Vershik approach. We present details in the SM. As a brief illustration of this result, let us consider the sequentially coupled total spin basis |j 1 , ..., j n ; m⟩ of SU (2). The spin component m and spin labels j k are determined by spin operator S n z as the summation of all half Pauli Z matrices 1 2 Z i at each i site and sequential coupled Casimir operators J 2 k = (S k x ) 2 +(S k y ) 2 +(S k z ) 2 respectively (we abuse our language for simplicity as true eigenvalues of J 2 k are j k (j k + 1)). Since they commute with YJMelements, J 2 k X i |j 1 , ..., j n , m⟩ = X i J 2 k |j 1 , ..., j n ; m⟩. It is well-known from linear algebra that commutative operators can be simultaneously diagonalized and we elaborate this fact with the following theorem: We now illustrate by examples the correspondence between spin labels and content vectors for the simplest SU(2) − S n duality, then we go to general case. Let |j 1 , ..., j n ; m⟩ be any SU(2) irrep basis vector. Theorem 4 says that it is also a Young basis element, thus we can talk about its eigenvalues (content vector) (α T (1), ..., α T (n)) with respect to the YJM-elements (recall that T denotes a denotes a standard Young tableau, or equivalently the corresponding GZ-path from the Bratteli diagram like that from Fig.3 & 4). An equivalence between two labeling schemes means the spin label J = {j 1 , ..., j n } uniquely determines the content vector α T = (α T (1), ..., α T (n)) and vice versa. SU(2) case: For brevity, let us denote SU(2) irrep basis vectors by |J; m⟩. It is possible to find two basis elements |J; m⟩ , |J ′ ; m⟩ with the same spin component m but different spin labels. This is due to the fact that (C 2 ) ⊗n would decompose into copies of isomorphic SU(2) irreps ( Fig.2 (a)). On the other hand, a Young basis element is then denoted by |α T ; µ⟩ where µ, as explained in Section II, comes with choosing the permutation module M µ . It is also possible to find two basis elements |α T ; µ⟩ , |α T ; µ ′ ⟩ with the same content vector α T but from different permutation modules M µ (Fig.2 (c)). Let us forget the problem of copies or multiplicities for a while and only focus on the correspondence between J and α T . We would come back to discus then in Section IV A. Let |α T ; µ⟩ be a Young basis element such that α T equals (0, −1, 1, 2, 0, 3) in Fig.4. It is also an SU(2) irrep basis vector. Acted on by J 2 1 , the first spin label is definitely j 1 = 1 2 . To measure the second spin label, let us apply Schur-Weyl duality to the subset system consisting of only the first two qubits. Since |α T ; µ⟩ is constructed by branching rule, it can be seen as a Young basis element of S 2 irreps of Young diagram λ == (λ 1 , λ 2 ) = (1, 1) (read off from the first two elements of α T ). Schur-Weyl duality says that |α T ; µ⟩ should stay in the SU(2) irrep denoted by the same Young diagram, hence j 2 = 1 2 (λ 1 − λ 2 ) = 0. Inductively, j 3 = 1 2 and we obtain J = ( 1 2 , 0, 1 2 , 1, 1 2 , 1). As a brief comment on multiplicities, since the total spin (last spin label) is 1, there are three possible choices of zspin components m = 1, 0, −1. Correspondingly, µ from |α T ; µ⟩ can be three different permutation modules (Fig.2 (c)). The mechanism for reading off content vectors from spin labels is simialr. SU(3) case: Note that the pattern of constructing SU(2) spin labels is simply the familiar branching rule seen in SU(2)-irreps [48,75]. We now instantiate with d = 3 to show how this pattern generalizes. Let 0, 3, 3, 8 denote the trivial, the fundamental, the conjugate and the adjoint representations of SU(3). Then we consider the following coupling scheme: where we coupled 4 qudits in which we take the GZ-path corresponding to α T = (0, −1, −2, 1) and ended up with the Young diagram λ = (2, 1, 1).
The group SU(3) has two Casimir operators where T i = 1 2 λ i are half of the Gell-Mann matrices and d ijk are determined by the anti-commutation rela- Thus the kth sequential coupling of these operators, denoted by (C 1 , C 2 ) k , corresponds to the YJM-element X k and they are used to record irreps like 3, 3, 0, 3 appearing in the above example and yield "spin labels" (1, 0), (0, 1), (0, 0), (1, 0) which are highest weights for SU(3) irreps. For a general SU(d) − S n duality, each YJM-element X k corresponds to a pair of d − 1 Casimirs [73] for the kth sequential coupling. It is therefore more concise to use YJM-elements in general case as sequential coupling and branching rule decomposition are equivalent in describing Schur basis.
A. More facts about State Labeling and CQA with Sn symmetry With spin label-content vector correspondence, we denote a Schur basis vector by |α T ; µ S ⟩, where α T is its content vector. In the S n picture, α T tells us exactly the path to restrict an S n irrep determined by the Young diagram T to S n−1 irrep and so forth. However, there are copies of that S n irrep decomposed from the entire Hilbert space and µ S labels the multiplicity. One may wish to distinguish these isomorphic copies by permutation modules like Fig.2 (c). However, as mentioned in Section II, when d ≥ 3 isomorphic copies of S n irreps can even be found from the same permutation module M µ [50,59] and hence the superscript µ is no longer enough to identify µ S .
Interestingly, this problem can be solved in the SU(d) picture, in which α T tells us exactly the path to couple SU(d) irreps sequentially. Our finial destination W λ is uniquely determined by the path, but we now need to label |α T ; µ S ⟩ as a state in W λ . When d = 2, µ S is simply taken as the spin-z component m. When d ≥ 3, we uses classical results by Gelfand and Tsetlin [76] and Biedenharn [77]. We illustrate the main idea for d = 3: consider weight diagrams of SU (3)  With α T being determined by YJM-elements, we only need to identify which dot |α T ; µ S ⟩ corresponds to in the weight diagram of W λ . Diagrammatically, these dots have planar coordinates which are rigorously called weights measured by the isospin I 3 and hypercharge Y operator of SU(3) [49,78] just like measuring spin components by S z of SU(2). However, being different from the SU(2) case, some weight vectors (dots) occupy the same positions. It is known as the branching rule for SU(3) that dots with the same horizontal coordinate form irreps of SU(2) ⊂ SU(3). For instance, two brown dots in Fig.5 form a spin-1/2 irrep while the other four red dots from the same horizontal line form a spin-3/2 irrep. Thus after measuring weights/positions by I 3 , Y , we simply need to apply the SU(2) Casimir operator J 2 to discern dots occupying the same position.
In [77], authors provide the recipe to find these operators for a general SU(d) group. Roughly speaking, we first employ d−1 operators µ i , which span its Cartan subalgebra, to label the weights of a given basis vector. Then we take Casimir operators of SU(2) ⊂ · · · ⊂ SU(d − 1) to distinguish dots which occupy the same position. Their eigenvalues, which we record as µ S , form the so-called Gelfand-Tsetlin pattern [76] which is widely used to study SU(d) irreps. To construct a CQA model with S n symmetry, we replace the YJM-elements labeling any S λ basis states by the the following ones labeling any W λ basis states: The number of required operators to build an SU(d)-CQA ansatz is 1 2 d(d − 1) -a constant for fixed d, no matter how many qudits the system contains. However, Casimir operators of SU(d − 1) are supported on d − 1 qudits (see Eq.(4) and [78]). Applying the same proof method from Theorem 1, SU(d)-CQA is thus made of 2(d − 1)-local S n -symmetric unitaries. As a simple example, SU(2) irreps basis states are uniquely determined by the summation S n z of spin operator 1 2 Z i on each site i. Without having to employ Casimir operator, S n z , (S n z ) 2 and a problem Hamiltonian H P already form an SU(2)-CQA model. More precisely, Corollary 1. Let H P be any path-connected Hamiltonian on the SU(2) irreps basis basis, the CQA-ansatz generated by

B. State Preparation for Sn-CQA Ansätze
To investigate evaluation of the matrix elements of S n Fourier coefficients, we were confined to the Young basis, which requires the implementation of Quantum Schur Transform [47,51,52]. However, for a wide variety of quantum machine learning and optimization tasks, such as determining the ground state sign structure of frustrated magnets, it is often advantageous to relax the constraints and ask how easy it is to initialize the states that live in any given S n -irrep. An algorithm to initialize a state in the S n -irrep with Young diagram being ( n 2 , n 2 ), is given in [46]. We generalize this result to an arbitrary S n -irrep in general SU(d) − S n duality. The key is to utilize different permutation modules and multiplicities of S n -irreps as in Fig.2 (c). Simialr to the previous subsection, we construct the algorithm inductively: we first consider the SU(2) − S n duality in which a (λ 1 , where |s⟩ = 1 √ 2 (|01⟩ − |10⟩) is one of the Bell states and we assume n − k is even. Then we have: . The initialized state |Ψ init ⟩ is contained in S λ and belongs to the permutation module M µ .
Proof. Acting by the spin operator S z = i S z i , it is easy to check that the spin component of |Ψ init ⟩ equals j = k/2 hence it belongs to M µ . By Theorem 4 and discussion from the previous subsection, we have the expansion |Ψ init ⟩ = T c T |α T ; k/2⟩. Since by definition J + |Ψ init ⟩ = 0, the Young diagram underlying each α T from the summation must be the same and equals λ.
We now illustrate by several examples how to expand |Ψ init ⟩ as a linear combination of Young basis elements: . This is the state used in [46]. One can check by YJM-elements that it is a single Young basis element. (b) for a more involved case, consider the (4, 2)-irrep of S 6 and write: where α Ti corresponds to GZ-paths in Fig. 6. The first two boxes in Fig.6 corresponds to trivial irrep of S 2 acting on the subsystem formed by the first two qubits. Indeed, S 2 acts trivially on |00⟩ from |Ψ init ⟩. As it would be more apparent to see how to get α Ti in SU(2) picture, we use the spin label-content vector duality and trace the path of spin coupling. As the total spin of |00⟩ and |00⟩⊗|s⟩ are identical (Lemma 4), there are two ways to add two more boxes: putting the third box on the RHS of the first two and then putting the forth on the bottom or conversely. Tensoring again with the singlet |s⟩, we retrieve four branching paths in total. Moreover, by the same reason, it is easy to see that re-ordering tensor products of |0 · · · 0⟩ and |s⟩ in Eq.(5) yields a different expansion of Young basis elements for the same S n irreps. Fig.7 illustrates two more cases: |s⟩ ⊗ |0⟩ ⊗ |s⟩ ⊗ |0⟩ and |s⟩ ⊗ |00⟩ ⊗ |s⟩. This method can be generalized to SU(d) − S n duality. For instance, when d = 3 to initialize states for three-row Young diagrams, let us consider the upper, down, strange states u, d, s of 3. Let Its expansion can still be tracked by the branching rule as in Fig. 6 & 7. An S 6 -CQA quantum circuit with state initialization described above can be seen in Fig.8.

C. Quantum Super-Polynomial Speedup
For variational algorithms, typically one would make many measurements with updated parameters {θ µ } by some classical gradient descent scheme: where the learning rate tensor A µν (θ(t)) is often taken as identity matrix while η µ = η is the learning rate. The quantity ⟨H⟩ θ(t) is the expectation value to minimize and ∂ ∂θν ⟨H⟩ θ(t) is the derivative with respect to θ ν . With an explicitly parameterized unitaries such as in our case, we can utilize the quantum circuits to measure the gradient of the expectation.
Here, we refer taking one measurement at time t as a query and query complexity as the total time T in order to converge. The query complexity can be analyzed by recent development of the quantum neural tangent kernel [79]. Though it would be an interesting case to consider the bound on the query complexity to converge, in this work we only focus on showing that the circuit complexity per query can be efficiently simulated on quantum circuits while this is not known in classical regime.  Fig.6.
CQA denote the CQA ansätze with p alternating layers and let H ∈ C[S n ] be a SU(d)symmetric k-local Hamiltonian with most N terms. Then for any S n irrep S λ , the Fourier coefficients: where T, T ′ are standard tableaux of λ (Fig,3 & 4) and µ S records the multiplicity of S λ (Section IV A), can be simulated in O(pN (θn 4 + k 2 )) with θ being the largest absolute values of parameters.
The proof is also put in the SM. Precisely, we assume that there exists an efficient Schur transform (QST) [47,51,80] with a polynomial overhead to prepare |α T , µ S ⟩. Calculating the Fourier coefficients over S n is a classically difficult question and the best classical algorithms S n -FFT requires a factorial complexity [35,36] as S n has n! group elements and so is the dimension of its regular representation (see Wedderburn Theorem in the SM for more details). Therefore, comparing with the complexity of S n -FFT, there is an super-exponential quantum speed per query. However as a caveat, the entire Hilbert space of n-qudits only scales exponentially with n and S n irreps decomposed from the system by Schur-Weyl duality also scales exponentially. Therefore, it would be more reasonable and cautious to refers to a super-polynomial quantum speed-up for S n -CQA.
Except comparing with S n -FFT, recent work from [10,81] proposes the notion of dequantization to compare the efficiencies of classical and quantum algorithms. Roughly speaking, with well-prepared quantum initial states, quantum algorithms can always be exponentially faster than the best counterpart classical algorithms. Assume classical algorithms also have efficient access to input. If the output can now be evaluated with at most polynomially larger query complexity then the quantum analogy, it is said to be dequantized with no genuine quantum speed-up. In our case, let us assume our initial states -Schur basis elements |α T , µ S ⟩ or their linear combinations can be efficiently accessed with classical methods. Even though, dequantization still unlikely happens. Except conducting S n -FFT, matrix representations of σ ∈ S n can also be efficiently sampled [48,75], but the method works exclusively for a single group element. To sample U (p) CQA (θ) |α T , µ S ⟩ processed after S n -CQA from Eq.(3), the time evolution of CQA Hamiltonians is expand and approximated by at least super-polynomially many S n group elements (see the SM for more details) and hence is still thought to be classically intractable.

V. C[Sn] SYMMETRIES OF J1-J2 HEISENBERG HAMILTONIAN
The spin-1/2 J 1 -J 2 Heisenberg model is defined by the Hamiltonian: represents the spin operators at site i of the concerned lattice. The symbols ⟨· · · ⟩ and ⟨⟨· · · ⟩⟩ indicate pairs of nearest and next-nearest neighbor sites, respectively. The J 1 − J 2 model has been the subject of intense research over its speculated novel spin-liquid phases at frustrated region [82]. The unfrustrated regime (J 2 = 0 or J 1 = 0) for the antiferromagnetic Heisenberg model is characterized by the bipartite lattices, for which the sign structures of the respective ground states are analytically given by the Marshall-Lieb-Mattis theorem [54][55][56]. As an important result, ground states of unfrustrated bipartite models are proven to live in the S n irrep corresponding to the Young diagram λ = (n/2, n/2). By Schur-Weyl duality, this subspace is often referred as the direct sum of SU (2) invariant subspaces with total spin J = 0 in the context of physics (cf. Fig.2 (a), (b)). With this fact, algorithms like [38] has been designed to enforce SU(2) symmetry at J = 0 and solve Heisenberg models without frustration.
The system is known to be highly frustrated when J 1 and J 2 are comparable J 2 /J 1 ≈ 0.5 [83] and near the region of two phase transitions from Neel ordering to the quantum paramagnetic phase and from quantum paramagnetic to colinear phase, where no exact solution is known. Moreover, little is known about the intermediate quantum paramagnetic phase -recent evidence of deconfined quantum criticality [84,85] sparked further interest in studying these regimes. Gaining physical insights in the intermediate quantum paramagnetic phase requires solving the problem of the ground state sign structure the system approaches the phase transition. Recently, there were a number of numerical attempts to address the existence of the U(1) gapless spin liquid phase, using recently the tensor networks [86], restricted Boltzmann Machine (RBM) [87], convolutional neural network (CNN) [57,58,88], and graphical neural network (GNN) [89] -all yielding partial progress. As a significant difference from the unfrustrated case, Marshall-Lieb-Mattis theorem does not hold generally and there is no guarantee that the ground state still lives at J = 0 or equivalently λ = (n/2, n/2), which urges us to preserve the global SU(2) symmetry, which further gives us access to search in all inequivalent S n irreps decomposed from the system by Schur-Weyl duality.

Global SU(2) Symmetry and Challenges in NQS Ansätze
Taking advantage of the global SU(2) symmetry, we address this problem in a different way: we recast the Hamiltonian in Equation (8) by the following identity π((i j)) = 2Ŝ i ·Ŝ j + 1 2 I, withŜ i being further expanded as the half of standard Pauli operators {X, Y, Z}. Eq.(9) was first discovered by Heisenberg himself [90,91] (an elementary proof can be found in SM) and more recently noted by [46] in analyzing the ground state property of 1-D Heisenberg chain. As designed by products of exponentials of SWAPs (eSWAPs), the method proposed in [46] truly preserves the global SU(2) symmetry. As a brief comparison with S n -CQA, eSWAP ansätze are universal in relevant sectors given by the SU(2) symmetry. However, this property no longer holds for qudits with d ≥ 3 (Section III C). As eSWAPs are non-commutative operators in general, there are various ways to place them in a quantum circuit. A more suitable perspective to describe the ansatz might be sampling them as 2-local SU(2) random circuits [45]. On the other hand, the S n -CQA ansatz is designed by alternating exponentials of the problem and mixer Hamiltonians H P , H M just like the framework of QAOA. Similar to QAOA, S n -CQA at large p corresponds to a form of adiabatic evolution with global SU(d) symmetry, which could hint a theoretically guaranteed performance as p is large (see Section VII in the SM). An immediate consequence of using Eq.(9) is that the resulting Heisenberg Hamiltonian can be expressed in the Young basis where every S n -irrep is indexed by the total spin-label j. Mapping to this basis can be done using the constant-depth circuit state initialization in Section IV B. Using our C[S n ] variational ansatz leads to a more efficient algorithm by polynomially reducing the space. In the NISQ application, especially between 10 to 50 qubits, we have much better scaling see Fig.9.
Numerous efforts in applying NQS variational architecture to represent the complicated sign structure in the frustrated regime essentially use the energy as the only criterion for assessing its accuracy. This would result in the optimized low-energy variational states in frustrated regime still obeying the Marshall sign rules even though the true ground state is likely to deviate from it significantly [58], or breaks the SU(2) symmetry [57]. The preservation of spatial symmetry has been the core topic of discussion in the literature, with proposed C 4 equivariant CNN. However, on the 2D model Heisenberg model, the spatial symmetry consideration can only reduce the search space redundancy by a constant factor, thus scaling very poorly at even intermediate n. By reinforcing SU(2) symmetry, we achieve a polynomial reduction of Hilbert space and ensure the result to be physically reasonable, hence offering a second criterion to assess the variational ansätze.
The number of qubits scaling linearly with the number of qubits naturally circumvent the issue of having generalization property, a crucial property for the NQS ansätze to function [92]. In fact, in a related work of us [31], we showed that, making use of the representation theory of the symmetric group, this leads to the super-exponential quantum speed-up. To this end, it is unlikely that any classically trained ansätze are capable to reinforce the global SU(2) symmetry of the system.

A. Numerical Simulation
We provide numerical simulations to showcase the effectiveness of the S n -CQA ansätze, using JAX automatic differentiation framework [93]. The implementation of the S n -CQA ansätze utilizes the classical Fourier space activation by working in the S n irreducible representation subspace where the ground state energy lies. This would impact the stability of the numerical simulations, which imply the best-suited models are with 8-16 spins. This bottleneck in computational resource, as shown in Section III, presents no issue for a potential larger-scale implementation on quantum computers. The benchmarked examples with RBM and Groupequivariant Convolutional Neural Network (GCNN) [53] are drawn from NetKet [94] tutorial https://www. netket.org/tutorials.html, which form the baseline comparison. Note that we implemented no explicit global SU(2) or U(1) symmetry for these benchmark algorithms. For numerical simulation of S n -CQA, we perform random initialization of the parameters. We found that the random initialization already returns the energy which is within roughly 10 −2 precision within ED ground state energy and non-oscillating descents around the ED ground state energy comparing with that of GCNN and RBM. This is likely due to the fact that we used the S n -Fourier space activation with real-valued trial wavefunctions with explicit SU(2) symmetry. We record the optimized energy for the S n -CQA ansätze every five iterations, and we set the number of alternating layers p = 4 for the 3 × 4 lattice and p = 6 for the 12-spin Kagome lat-tice. In the implementation, we shift the Hamiltonian to H(λ) = H(λ) + m 1 d λ to ensure H(λ) is positive semidefiniteness in the S n -irrep specificity by the partition λ = (λ 1 , λ 2 ) with the total spin label j = (λ 1 − λ 2 )/2 (Section II), where m is the total number of transpositions. We only take the real (normalized) part of the wavefunction Re(ψ) = ψ+ψ * . This can be seen as a postprocessing step for the realization of S n -CQA on a quantum computer. We use the Nesterov-accelerated Adam [95] for the S n -CQA optimization with hyper-parameters: betas = [0.99, 0.999]. We also utilize NetKet's ED (Exact Diagonalization) result for the comparison with the exact ground state energy. The ground state is additionally calculated in the Young (Schur) basis by diagonalizing the Heisenberg Hamiltonian H(λ) in the irrep λ where the ground state lives. It is worth mentioning that our optimized ground state is strictly real-valued and has explicitly SU(2) symmetry, offering the missing yet essential physical interpretation. We provide code and Jupiter notebook in open-access on Github in python. The numerical simulations are run in CPU platform with the 9th Gen 1.4 GHz Intel Core i5 processors.

× 4 Rectangular Lattice
In frustrated region of J 2 /J 1 = 0.5, J 2 /J 1 = 0.8, we found that the ground state lives entirely in the total spin 0 irrep, corresponding to the partition λ = (4, 4). In the case of J 2 = 0.5, we report that the S n -CQA ansätze are able to smoothly converge to the ground state, with error to the exact ground state energy 9.1049e −5 . For J 2 = 0.8, the S n -CQA returns 5.0587e −4 precision to the ED ground state energy. We notice that the S n -CQA seem always to converge to the ground state with reasonable good accuracy without the issue of trapping in local minima, regardless of initialization (random initialization from Gaussian is used). The Learning rate used FIG. 11: For the 3 × 4 lattice, in either case, the S n -CQA ansätze are able to converge to the ground state at least 10 −4 precision with explicitly reinforced SU(2) symmetry. This can be seen from the expectation of Sn-CQA never falls below the exact ground state, while non-symmetry respecting algorithms inevitably do. The numerical results are subject to room for further development, for instance-with better gradient descent algorithms such as to utilize the Hessian-since we have only 200-500 learnable parameters to optimize. Therefore, we expect the performance and convergence rate of S n -CQA ansätze to further increase with perhaps more refined tuning.
here is 0.01. For the GCNN layers in both J 2 values we set the feature dimensions of hidden layers (8,8,8,8) and 1024 samples with the learning rate set for 0.02. For the RBM model, we fix the learning rate 0.02 with 1024 samples.

12-Spin Kagome Lattice
We found by comparing with ED result that the ground state of 12-spin Kagome lattice lives in the total spin 2 irrep, corresponding to partition [8,4] in J 2 = 0, which suggests it to be 5-fold degenerate. For the both frustration level J 2 /J 1 = 0.5 and J 2 /J 1 = 0.8, the ground state lives in total spin 0 irrep, which appear to be nondegenerate. We aim to learn the ground state for the 12-spin Kagome lattice at J 2 /J 1 = 0.5 and J 2 /J 1 = 0.8. In the case of J 2 /J 1 = 0.5, the optimized ground state energy by the S n -CQA ansätze at the end of iteration returns 1.5721e −4 precision to the ED result. In the case J 2 /J 1 = 0.8, we have the final optimized energy 6.2065e −5 precision to the ED ground state energy. The learning rate is set for 0.01 for the J 2 /J 1 = 0.5 and 0.8. We set the GCNN in both frustration points of feature dims (8,8,8,8) with 1024 samples. The learning rate in both frustrations is set to be 0.02. The RBM implementation uses 1024 samples with a learning rate 0.02 for both cases.

DISCUSSION
In this paper, we introduce a framework to design non-Abelian group-equivariant quantum variational ansätze as an example of PQC+ extended from Permutational Quantum Computation (PQC). The restricted univer-sality of the S n -CQA ansätze makes it applicable to a wide array of practical problems which would explicitly encode permutation equivariant structure or exhibit global SU(d) symmetry. Our proof techniques can be used to show the universality of QAOA and verify the four-locality of generic SU(d) symmetric quantum circuits. Moreover, we illustrate the remarkable efficacy of our approach by finding the ground state of the Heisenberg antiferromagnet J 1 -J 2 spins in a 3 × 4 rectangular lattice and 12-spin Kagome lattice in highly frustrated regimes near the speculated phase transition boundaries. We provided strong numerical evidence that our S n -CQA can approximate the ground state with high degree of precision, and strictly respecting SU(2) symmetry. This opens up new avenues for using representation theory and quantum computing in solving quantum many-body problems.

Open Problems
We conclude with several interesting open problems: (a) We would like to find out the computational power of PQC+. In particular, it is interesting to investigate whether quantum circuits can (in polynomial time) approximate matrix elements of any S n Fourier coefficients. A natural starting place is perhaps based on the restricted universality of S n -CQA ansätze in each S n irrep by asking if a polynomial bounded number if alternating layers p are able to approximate any matrix element of S n Fourier coefficients. Or we may further loose the condition by asking if a polynomial bounded number if alternating layers p would form an approximate k-design for subgroups U(S λ ) restricted from U(V ⊗n ) when imposing the global SU(d) symmetry. A detailed study of this question will shed some light on the nature and scope of the prospective quantum advantage. (b) In the SM we show that S n -CQA ansätze at large p can simulate certain quantum adiabatic evolution with random path-dependent coupling strengths. It would be important to investigate whether the path-dependent coupling strength parameters β kl lead to potential amplitude amplification of the spectral gap in the adiabatic path. In particular, one might need to address the physical dynamics of the random path-dependent coupling strengths. (c) More generally, the quantum speed-up we demonstrated here is inherently connected to the PQC +. Are there other quantum speed-ups within this framework? In particular, (b) suggests a possible route related to quantum annealing. Another possible route may have to do with measurement-based quantum advantage. For instance, see [96]. Therefore, one might want to ask if our S n -CQA ansätze have other sources of quantum exponential speed-up. (d) Another open direction would be to benchmark the performance of the S n -CQA ansätze in various Heisenberg models and to implement the S n -CQA ansätze on a quantum device.

CODE AVAILABILITY
The codes for the numerical simulation can be found at https://github.com/hanzheng98/Sn-CQA. The C++ implementation of S n operations can be found at https: //github.com/risi-kondor/Snob2. Data availability is upon request by emailing hanz98@uchicago.edu.
Proof. Let us considerŜ 1 ·Ŝ 2 . Expanded by definition, where for now the subscripts on J 2 denotes all sites J 2 acting on. Under total spin basis, it is easy to see that Therefore,Ŝ This argument holds for any i, j, hence the proof follows. This theorem is proved by the so-called double commutant theorem, details can be found in [1].
Theorem I.3 (Wedderburn Theorem). Given any S n -irrep S λ , let End(S λ ) denote the collection of linear transformation of S λ . With respect to any basis, e.g., the Young basis, End(S λ ) is simply the collection of all dim S λ × dim S λ matrices. As a vector space, the group algebra C[S n ] is isomorphic with the direct sum of End(S λ ): We emphasize that being different from decomposing V ⊗n by SU(d) − S n duality, all kinds of Young diagrams λ, standing for inequivalent S n -irreps, appears once and only once in the above direct sum decomposing C[S n ]. In any case, restricting to each λ, π C[Sn] produces all dim S λ × dim S λ matrices and hence any matrix commuting with π C[Sn] should be a scalar.

II. PROOFS OF THE RESTRICTED UNIVERSALITY
Let gl(d, C) denote the Lie algebra of the complex general linear group. It is simply the collection of all d × d complex matrices. Let d(d) denote its Cartan subalgebra consisting of all diagonal matrix. We first prove the following lemma in this general setting and later we would set d = dim S λ , or harmlessly d = dim S λ × m S λ , because our focuses are S n -irreps S λ .
where in the second step of the above computation, we omit all possible diagonal elements of A. This is legal because h commutes with any diagonal matrix. Besides, a kl · λ is thought as the inner product of a kl = (a kl (E ii )) and λ = (λ i ). Even most components of a kl equal zero by the definition of roots, it turns out that the notation a kl · λ is neat for the following proof. With M 1 being defined, we continue to set Recall that |I| is the number of nonzero off-diagonal elements of A. Then let us consider the following Vandermonde matrix: Viewing gl(d, C) as a d × d-dimensional vector space with E ij as the standard basis, we note that V (a kl · λ) transforms nonzero vectors If det V (a kl · λ) = (−1) |I|(|I|−1)/2 s<t (a ksls · λ − a ktlt · λ) ̸ = 0, then the transformation has an inverse and thus the linear span of {M r } equals that of {c kl E kl } which turns out to be (i,j)∈I R ij by definition.
To show that det V (a kl ·λ) ̸ = 0, we simply note that a ksls −a ktlt ̸ = 0 as different roots. A basic statement from linear algebra tells us that the union of hyperplanes of finitely-many nonzero vectors, here are a ksls − a ktlt , cannot cover the whole vector space. Thus we can always find some nonzero λ which does not belong to any of these hyperplanes: i.e., (a ksls ·λ−a ktlt ·λ) ̸ = 0 to fulfill the requirement. On the other hand, the Lie subalgebra by d(d) and M contains all possible linear combinations and Lie brackets of these elements and the proof follows.
The crucial technique of using Vandermonde matrix is adapted from the classical paper [2] in which Kuranishi proved that any semisimple Lie algebra by merely two elements. Roughly speaking, one of these elements is h in the above proof obtained by solving λ and anther is just the summation of all matrix units E ij for i ̸ = j. However, we cannot take the accessibility of these elements for granted on a qudits system. To be precise, we take some irrep S λ (with its equivalent copies) decomposed from the n-qudit system. Then we denote by d(S λ ) the Cartan subalgebra of all diagonal matrices. By Okounkov-Vershik theorem [3,4], it equals the representation of GZ n under Young basis generated by YJM-elements. To span the Cartan subalgebra and define h however, n YJM-elements are generally not enough and we need to take their high order products X i X j · · · X k . This leads to a practical problem for ansatz design. For another generator mentioned above, we need to design an operator whose matrix representation is the summation of all E ij . For qubits, the Hadamard operator H ⊗n , even containing −1 in its entries, is likely to be chosen at the first glance. However, we are working on Young basis and by Schur-Weyl duality and Wedderburn theorem, H ⊗n is a scalar matrix on any S n -irrep. With these difficulties being clarified, we are going to solve the problem which finally gives us the analytical form of S n -CQA ansatz. We present in the following sections that our methods can also be used to prove the universality of QAOA and verify that any SU(d)-invariant quantum circuit can be built from 4-local SU(d)-invariant unitaries up to phase factors. In a qudits system, the matrix M from Lemma II.1 can be chosen as a Hermitian matrix which we interpret as the Hamiltonian of a certain physical system in the main text. To be specific, we consider the following kind of Hermitian matrices.
Definition II.2. Given an arbitrary Hermitian matrix H P , let G H P be the underlying indices graph of H P . The matrix H P is path-connected if the associated graph G H P is connected. For instance, the matrix is path-connected. However, is disconnected with partition {1, 4}, {2, 3}, {5}. Path-connectedness is also seen in the famous Perron-Frobenius theorem which has lots of applications in graph theory.
Lemma II.3. Let H P be a path-connected Hamiltonian. Then the generated Lie algebra ⟨d(d), H P ⟩ = gl(d, C). Especially, let us consider d R (d) consisting of all real-valued diagonal matrices. Generated over R, ⟨id R (d), iH P ⟩ R = u(d) the Lie algebra of U(d) consisting of all skew Hermitian matrices.
Proof. By Lemma II.1, Given any pair of indices i ̸ = j, since the underlying indices graph of H P is path connected, we can always find a path i = i 0 → i 1 → · · · → i k → i k+1 = j with no repeating indices for which H P,(i l ,i l+1 ) ̸ = 0. Thus belongs to the generated Lie algebra. Since gl(d, On the other hand, by definition, the real Lie algebra ⟨id R (d), iH P ⟩ R is contained in u(d). Since its complexification can be expanded as and since any matrix from gl(d, C) is uniquely decomposed as the sum of a Hermitian matrix and a skew-Hermitian matrix, we complete the proof.
Theorem II.4. Let H P be any path-connected Hamiltonian and let X GZ ∈ GZ n denote a real linear combination of elements from {X i , X i X j , X i X j X k , ...}. Given any S λ , the generated group H := ⟨exp(iβX GZ ), exp(iγH P )⟩ equals the unitary group U(S λ ) restricted to S λ . Thus the ansätze constructed by alternating exponentials can be used to approximate any eigenstate of H P .
Proof. Before applying Lemma II.1 & II.3, we reminder that H is generated in the pure algebraic sense, it may not even be a Lie subgroup (with smooth structure) and we may not be able to talk its Lie algebra. One strategy to solve this problem is considering the closureH and employing Cartan's closed subgroup theorem. However, there is a another powerful theorem on Lie groups suitable for the current salutation: Yamabe's theorem [5][6][7] which states that any arcwise-connected/path-connected subgroup of a Lie group is itself a Lie subgroup. In our setting, H is path-connected as any two points in the H are path-connected by products of exponentials. Thus we can talk about its Lie algebra h. By definition, exp(iβX GZ ), exp(iγH P ) ∈ H and thus Thus Lemma II.3 tells us that h equals u(S λ ). Since U(S λ ) is a connected Lie group and H is generated by u(S λ ), They must be identical with U(S λ ).
We would discuss more about applying Yamabe's theorem in Section IV. Before presenting the following materials, especially for Lemma II.6, we should explain more about how to convert the language of root systems used Lemma II.1 into that of content vectors from S n -representation theory. For any X ∈ GZ n , its root a kl (X) defined under matrix unit E kl is simply its kth diagonal element minus lth diagonal element. For instance, for any YJM-element X i , a kl (X i ) = α T k (i) − α T l (i) where T k , T l denote the kthe and lth standard Young tableaux/Young basis elements respectively for irrep space S λ . On the other hand, we see in proving Theorem II.4, X GZ is taken as an arbitrary linear combination of elements from {X i , X i X j , X i X j X k , ...} which spans the corresponding d(S λ ). In design quantum circuits, we hope to bound this spanning set. A reckless restriction to simple YJM-elements {X i } violates the condition  Fortunately, restricting to second order YJM-elements {X i , X i X j } suffices to establish Lemma II.1.
Definition II.5. For any standard Young tableau T of shape λ = (λ 1 , ..., λ d ), Measuring via second-order products X k X l of YJM-elements, we obtain an (n × n)-component vector denoted by β (2) T (r, s). It can also be written as a real symmetric matrix (β T ) T β T and hence we call it tensor product content vector or second order content vector.
Lemma II.6. Different standard Young tableaux/Young basis elements have different tensor product content vectors. Moreover, let T i , T j , T k , T l be four standard Young tableaux of size n. We require that T i ̸ = T j and T k ̸ = T l . If all underlying Young diagrams are the same, we further require that T i ̸ = T k or T j ̸ = T l (this requirement comes from the computation of roots of E ij , E kl above). Then is an abbreviation for β (2) Ti . That is, when extended to tensor product content vectors, the differences of different standard Young tableaux are still different.
Proof. The first statement follows by the one-to-one correspondence between content vectors and spectra of Young basis [3,4,8]. Actually, we abuse the language above for simplicity: spectra of Young basis, as the name indicates, means eigenvalues under YJM-elements and are just what we denote by α T . While content vectors simply refer to collections of ordered numbers read off from standard Young tableau. Okounkov and Vershik proved these two concepts are equivalent in a more abstract setting. To prove the second statement, we first look at differences between common content vectors If they are different, then we finish the proof.
Remark. We now examine the complex Lie algebra generated by Y k Y l and H P . As a caveat, {Y k Y l } is not enough to generate the whole GZ n as the Cartan subalgebra, but with Lemma II.6, we are allowed to perform the same proof method from Lemma II.1. For instance, we can rewrite M 1 defined there as follows: Accordingly, we can find a nonzero solution to λ such that iterated Lie brackets of h := λ ij Y i Y j and H P generate 1-dimensional subspaces corresponding to nonzero entries of H P .
Then by Lemma II.3, Y k Y l , H P are able to generate all the off-diagonal root spaces R ij if H P is path-connected. To complement the missing diagonal elements which cannot be generated by second order YJM-elements, we note that [E ij , E ji ] = E ii − E jj . Thus all traceless diagonal matrices can be found in ⟨Y k Y l , H P ⟩. Moreover, since for instance, Y 2 2 = Y 2 Y 2 is diagonal with only positive entries, all diagonal matrices can be generated. Thus ⟨Y k Y l , H P ⟩ = gl(S λ ) and by the second part of Lemma II.3, ⟨iY k Y l , iH P ⟩ R = u(S λ ). In conclusion, we obtain the following theorem.
Theorem II.7. The subgroup generated by Y k Y l with a path-connected Hamiltonian H P still equals U(S λ ) when restricted to any S λ . Since Y k Y l = (X k + kI)(X l + kI) = X k X l + k(X k + X l ) + k 2 I and since exp(iθI) is simply a phase term, a S n -CQA ansatz is written as where we redefine X 1 as I with which any first-order YJM-element X i can be written as X i X 1 .
Remark. In the above remark, we argue that ⟨Y k Y l , H P ⟩ = gl(S λ ) and hence its real form equals u(S λ ). This is true for any fixed S λ with its equivalent copies decomposed from V ⊗n . As the main topic in Section III, we are going to generate λ gl(S λ ) which encompasses all inequivalent irreps simultaneously. It is straightforward to see that all the above proofs hold in this general case except one point: one may not be able to manipulate phase factors of unitaries generated in inequivalent S λ arbitrarily. We would like to call this a phase factor problem. To be precise, exponentials of second order YJM elements give us access to change phases, e.g., e −itI . But as we are concerning more than one inequivalent irrep blocks, phases of unitaries from inequivalent irrep blocks may not vary independently. Even though, since the Lie brackets [E ij , E ji in Lemma II.3 yield all kinds of traceless diagonal matrices, we are still safe to claim: Restricting to second order YJM-elements causes no troubles if we ignore these phase factors. More discussions can be found in Section III & Section IV.

III. FOUR-LOCALITY OF SU(d)-SYMMETRIC QUANTUM CIRCUITS
It has been shown in [9][10][11] that to build an SU(d)-invariant circuit for d ≥ 3, merely taking 2-local SU(d)-invariant unitaries may not be sufficient. This contrasts with the well-known fact that 2-unitaries are universal for quantum circuits without any symmetry [12]. We now explain our solution by employing 4-local Hamiltonians. We follow the notation in [11,13,14] which denote by V k the subgroup generated by k-local SU(d)-invariant unitaries.
Remark. We first discuss a subtle point about the definition of V k which is important for the following proof. It is also clarified in [11,13]. By definition, elements from V k are of the form where by Schur-Weyl duality, the Hamiltonians are representations of elements from the group algebra C[S n ]. Moreover for each H q , there should be a fixed subcollection of k qudits where H q is supported on. This guarantees that each exp(−it q H q ) is still k-local.
On the other hand, one can loose the condition by taking Hamiltonians H ′ q = i c i π(σ i ) such that different permutations σ i can be supported on different k sites. Then Let us consider the integral group V ′ k consisting of elements of the form like Eq.(1) only by replacing H q with H ′ q . Obviously by definition, V k ⊂ V ′ k . Moreover, by taking derivative Eq.(1), one can check by the same methods from Theorem II.4 that V k and V ′ k admit the same Lie algebra. As connected Lie groups, V k = V ′ k [7] and two of them are all adopted as the definition of quantum circuits generated by k local unitaries.
To address the phase factor problem mentioned in the previous section as well as in the main text, we also consider the subgroup SV k ⊂ V k with trivial phase factor relative to each S λ . To be precise, recall that unitaries from V k respecting SU(d) symmetry can be decomposed as a sum of unitary blocks by Schur-Weyl duality. We require that each unitary block has determinant equal to 1 and we say these SU(d)-symmetric quantum circuits are taken with phase factors being ignored. We now prove Lemma III.1 which gives us a 2-local path-connected Hamiltonian. Combing with the fact that second order YJM-elements X k X l are 4-local, we obtain Theorem III.2. Proof. Before we present the proof, we reminder that change of bases may break the connectedness of the indices graph. For instance, the 2 × 2 Pauli X matrix is path-connected. However, Y XY −1 = Y XY † = Z which is not path-connected. When d = 2, the unfrustrated antiferromagnetic Heisenberg Hamiltonian H = n i=1Ŝ i ·Ŝ i+1 with periodic boundary condition is verified to be path-connected under computational basis and the fact is further used in proving Marshall-Lieb-Mattis theorem [15,16]. We are going to show a general case for any d, but under Young basis, so the change of bases is just the Schur transform. We will again use S n representation theory and there is no need to write out Schur transform explicitly, which would make the proof more intricate.
As a caveat, we consider the summation of adjacent transpositions without boundary condition. Specifically, let S λ denote a certain S n irrep, we want to show that the indices graph G of H = n−1 i=1 (i, i + 1) is connected. In graph theory, this is equivalent to say that H is irreducible. That is, H cannot be block diagonalized under any permutation action. Since we are working under Young basis, matrix entries of any adjacent transposition (i, i + 1) can be explicitly written out by the so-called Young's orthogonal form [4,17]. Roughly speaking, each matrix representation consists of blocks like  where the number r is determined by content vectors with respect to different Young basis elements. Different (i, i + 1) would contain different numbers of such blocks located in different positions. Even though, a notable feature is that any off diagonal element of (i, i + 1) are nonnegative and so is the summation H. Therefore, if H S is block diagonal in S λ , all these adjacent transpositions will be block diagonal simultaneously at the same positions. As adjacent transpositions generate S n , the irrep S λ would be decomposed further into these blocks which leads a contradiction. As a reminder, we do not include the boundary condition (1, n) as it cannot be expressed by Young's orthogonal form. After expanded by products adjacent transpositions, it may bear negative off-diagonal entries and invalidate the above proof method.
Proof. By definition, X i X j consists of at most 4-local unitaries and thus by Lemma III.1 and Theorem II.4, CQA defined by H S is contained in V ′ 4 introduced in the previous remark. Note that no matter we restrict to one S n -irrep or consider all of them decomposed from V ⊗n simultaneously, Lemma II.1 & Lemma II.3 always hold. Combining with by Lemma II.6 and Theorem II.7, SV n ⊂ CQA. Note that as CQA contains generators with nontrivial phase: e.g., e iθI , the inclusion is proper. Then by the previous remark Remark. We make a brief remark on the problem of using 2-local SU(d)-symmetric unitaries. By Schur-Weyl duality, these unitaries are exactly exponentials of SWAPs, i.e., transpositions (i, j) taken from S n and [9,10] provide a through treatment on the Lie algebra g λ (hence Lie group) generated by transpositions. For Young diagrams λ with more than two rows, there are exceptions, like these in Fig.2, for which g λ ∼ = so(S λ ) or sp(S λ ) ⫋ sl(S λ ). Thus being different from Lemma II.3, its compact real form can never be su(S λ ) and hence the generated group is properly contained in SU(S λ ). As we explained in the main text, Young diagrams with more than two rows appears in general qudits system with d ≥ 3. Therefore, even ignoring phase factors, 2-local SU(d) unitaries cannot generate all SU(d) symmetric quantum circuits now. As Lemma II.6 works for any Young diagram, those exceptions do not influence our claims on CQA.
At the end of this section, we examine the inclusion relationship of CQA, V 4 and V n . Due to the phase factor problem, they are not identical in general. To illustrate this point explicitly, we apply the following proposition from S n -representation theory [4,17]. We first introduce a related definition: Definition III.3. Let λ = (λ 1 , ..., λ r ) be an arbitrarily Young diagram of n boxes. A permutation σ ∈ S n is of cycle type λ if it is decomposed into cycles of lengths λ 1 , ..., λ r .
In the case of CQA, it is easy to check that second order YJM-elements cannot even generate c λ with λ = (4, 1, ..., 1). Hence we argue that CQA ̸ = V 4 ̸ = V n generally.

IV. COMPACTNESS OF CQA
As a starting point to unravel more deep facts about the structure of CQA, we prove its compactness. Proofs in this section are independent from others and can be skipped with no harm.
We first remark the difference in proving that V k , CQA are Lie subgroups. In [13], the author argued that for any fixed k sites, all k-local SU(d)-invariant unitaries on these sites generate a compact Lie subgroup. This can be seem by applying Schur-Weyl duality to the concerned k-qudit subsystem. Since V k is generated by n k many compact subgroups of this kind, it is still a compact Lie subgroup (see more details in [13]). In our case, generators of CQA only form a proper subset of the collection of 4-local unitaries and hence we cannot use Schur-Weyl duality directly but turn to Yamabe's theorem for help (see Theorem II.4). Even though, it is still unclear whether CQA is compact or not, but we are going to provide a affirmative answer.
To this end, let H P be the problem Hamiltonian used to construct CQA. Let Tr λ (H P ) denote the trace of H P restricted in S λ . Then the operator c H P consisting of block matrices (Tr λ (H P )/ dim S λ )I λ is a center element from Z(C[S n ]). Similarly, we define c X k X l for second order YJM-elements. Recall that ⟨X k X l , H P ⟩ are able to generate all traceless block matrices with respect to the decomposition of the n-qudit system under SU(d) symmetry. Matrices like H P − c H P , X k X l − c X k X l , and thus c H P , c X k X l can be all found in ⟨X k X l , H P ⟩. Then we strengthen the discussion from the remark after Theorem II.7 as: Lemma IV.1. The complex Lie algebra ⟨X k X l , H P ⟩ equals λ sl(S λ ) plus the subspace spanned by c H P , c X k X l . Accordingly, its real form ⟨iX k X l , iH P ⟩ R equals λ su(S λ ) plus the real subspace spanned by ic H P , ic X k X l .
Proof. By Lemma II.3, Lemma II.6 and the above definition of c H P , c X k X l , these elements can be generated by X k X l , H P . To prove the converse inclusion, let us take A ∈ ⟨X k X l , H P ⟩ arbitrarily. By definition, we write A = B + C where B is a linear combination of X k X l , H P and C stands for a summation of involved Lie brackets. Note that with respect to any S λ decomposed from V ⊗n , Lie brackets yield only traceless matrices. Besides, we decompose the matrix B by H P = (H P − c H P ) + c H P , X k X l = (X k X l − c X k X l ) + c X k X l Its traceless part plus the matrix C can be found in λ sl(S λ ) while the part with nonzero trace is automatically spanned by c H P , c X k X l by definition. The case for ⟨iX k X l , iH P ⟩ R follows immediately.
By definition, the Lie group corresponding to λ su(S λ ) is just SV n = SV 4 . On the other hand, let H denote the integral Lie group (see Section III) generated by exponentials of ic H P , ic X k X l . It is compact as when restricted to any S λ , H ∼ = S 1 ∼ = U(1). As a caveat, integral groups are generally non-compact, e.g., for an irrational number a, traces a non-compact irrational curve. Then the compactness of CQA can be shown by either the theory of algebraic group [18,19], or by the simple observation: for any (g, h) ∈ SV 4 × H, let be defined by f (g, h) = gh. Note that h commutes with g by definition, thus the map f is a Lie group homomorphism with Im f = CQA because they share the same Lie algebra by definition. Since SV 4 × H is compact, the image is also compact.
Theorem IV.2. The group CQA is connected and compact.
The last ten cases can be confirmed immediately as differences of eigenvalues measured by Z n on the last qubit are different. We need to analyze the cases like (0, 0, 1, 1). Deleting the last qubits on both sides, suppose then u 1 = u 2 and v 1 = v 2 as strings of 0 and 1. Since we require u 1 ̸ = v 1 , u 2 ̸ = v 2 , there exists some i < n such that α u1 (i) − α v1 (i) = α u2 (i) − α v2 (i) ̸ = 0. Then their eigenvalues of Z i Z n are different. If u 1 ̸ = u 2 or v 1 ̸ = v 2 after deleting the last qubits, then by the above induction hypothesis, we obtain the proof..

Finally we have:
Theorem V.2. Let H be any path-connected Hamiltonian under computational basis, the QAOA-ansatz generated by H, H Z = β kl Z k Z l is dense in U(2 n ), i.e., it is universal.
For concrete examples of path-connected Hamiltonian under computational basis, we mentioned in Lemma III.1 that the Heisenberg Hamiltonian H = n i=1Ŝ i ·Ŝ i+1 with periodic boundary condition or the uniform summation of Pauli X operators H X are all possible candidates. then we finish the proof. Otherwise, we still have . Because if this is no the case, we divide both sides of the above equation by (m 1 − m 2 ) = (m 3 − m 4 ) (they cannot equal zero by assumption from Lemma II.6), which yields (m 1 + m 2 ) = (m 3 + m 4 ). This further indicates that m 1 = m 3 , m 2 = m 4 , but this still violates the assumption.

VII. PROOF OF THE CIRCUIT COMPLEXITY WITH CQA
We now prove that the necessary measurements during the gradient descent for S n -CQA ansätze under the Young basis can be performed in polynomial time; while classically this problem is referred to calculating the S n Fourier coefficients whose best classical algorithms, S n -fast Fourier transforms [27,28], take factorial time. Here we assume a rather problem-independent construction of S n -CQA anätze with the problem Hamiltonian: Recall that in Lemma III.1, we proved that H S is path-connected and hence S n -CQA with H S is universal in any given S n irrep by Theorem II.7. Then we state the following lemma adapted from Theorem 1 in [29].
Lemma VII.1. The Fourier coefficient of the S n -CQA anätze per layer for any S n irrep S λ can be simulated by O θn 4 log(θn 4 /ϵ) log log(θn 4 /ϵ) SWAP operators with θ being the largest absolute values of parameters.
Proof. It is showed in [29] that the time evolution of a k-local Hamiltonian H = i c i σ i ∈ C[S n ], can be simulated by O tCk 3 n k log(tCkn k /ϵ) log log(tCkn k /ϵ) SWAPs where C = max i |c i |. Our purpose is simulating a 4-local Hamiltonian k,l β kl X k X l with a 2-local one H S = γ n−1 i=1 (i, i + 1) simply at t = 1 with C being replaced by max{|β kl |, |γ|}. Since k = 4, we conclude the result. The above lemma states that the Fourier coefficients of S n -CQA ansätze can be approximated roughly in O(n 4 ) SWAP gates in qudits, while classically there is no polynomial time algorithm known. This suggests that the S n -CQA ansätze are highly unlikely to be classically tractable. To better motivate Theorem 5 in the main text, it is known that supervised quantum machine learning model can be viewed as kernel methods [30], where one can define the the feature map ϕ(x) ∈ End(H) for some data x, equipped with a metric on the feature/Hilbert space H. In our case, the metric might be given by the problem Hamiltonian ⟨.| H S |.⟩ = ⟨·, ·⟩ H S , where the feature map |ϕ(x)⟩ encodes data on the Schur basis elements. Then the kernel defined as with c T , c T ′ the coefficients resulted from encoding data x onto the Young basis given the irrep S λ . The key insight of the kernel method is to perform the so-called kernel trick: one do not need ever to explicitly access the feature map ϕ but only its kernel, which is assumed to be classically tractable. However, Theorem 5 implies that this may not be true due to the factorial complexity of the classical S n -FFT while it is polynomial on quantum circuits, suggesting a potential fruitful research direction in performing supervised quantum machine learning with S n -CQA on the Young basis with rigorously-proved quantum super-exponential speedup. This would include a wide array of problems concerning with SU(d)-symmetry or permutation equivariance. We now prove the theorem 5 from the main text.
CQA denote the CQA ansätze with p alternating layers and let H ∈ C[S n ] be a SU(d)-symmetric k-local Hamiltonian with most N terms. Then for any S n irrep S λ , the Fourier coefficients: can be simulated in O(pN (θn 4 + k 2 )) with θ being the largest absolute values of parameters.
with h i being k-local. Thus each of them can be decomposed by at most k 2 geometrically local SWAPs/adjacent transpositions. Suppose we have p alternating layers, then for each layer, Thus by the above lemma, the coefficient ⟨α To sample matrix entries of the time evolution of an arbitrary k-local Hamiltonian H = c i σ i ∈ C[S n ], we consider the following Taylor expansion to order K: with N being the number of all k-local permutations and ϵ being the truncation error with K = O( log 1/ϵ log log 1/ϵ ) [29]. Thus the exp(−itH) can be approximated by the truncated series as a linear combination of matrix representations of S n group elements. In the case of S n -CQA, to approximate the evolution of YJM-element Hamiltonian β kl X k X l at Kth order, the term σ i1 · · · σ im from above contains all possible Kth order products of transpositions (i, j) by the definition of YJM-elements. Consequently, the linear combination contains at least all K + 1 cycles from S n whose total number scales as n K by definition [29]. Suppose we set the error ϵ to be O( 1 n ). Assembling all the aforementioned facts, the linear combination used to approximate S n -CQA contains super-polynomially many different group elements from S n and therefore is thought to be unlikely tractable by classical methods.

VIII. RELATION TO ADIABATIC QUANTUM COMPUTING WITH SU(d) SYMMETRY
Similar to the limiting p → ∞ behavior of QAOA, our ansatz corresponds to the adiabatic quantum evolution for each S n irreducible representation subspace. The standard interpolating Hamiltonian is given by: where s ranging from 0 to 1 is the time-parameterized path with s(0) = 0 and s(T ) = 1. In contrast with the standard interpolating Hamiltonian, we set the coupling strength parameters {β kℓ } as path dependent I.I.D. random variables drawn from some possibly unknown distribution D. Using the Central Limit Theorem, it suffices to draw {β kℓ } from normal distributions (when n is large), say N (0, σ). We shall work in SU(2)-S n duality but generalization to qudits is straightforward.
Choose an irrep subspace S λ with the instantaneous ground and first excited states |v 0 (s)⟩ , |v 1 (s)⟩ and the spectral gap ∆ = min s∈[0,1] E 1 (s) − E 0 (s). The adiabatic theorem implies that |v 0 (s = T )⟩ is L 2 -ϵ close to the ground state of H p if: where: The quantum evolution in the Heisenberg picture is: dt H(s(t))).
Taken T = N ∆t ≡ ∆t N where s(∆t j ) is constant at this small time increments, the standard Trotterization technique implies that: exp(−i∆t j s(∆t j )H p )Π n k≤ℓ exp(−i∆t j (1 − s(∆t j ))β k,ℓ (∆t j )X k X ℓ ).
Setting γ j = ∆t j s(∆t j ) and ( β k,ℓ ) j = ∆t j (1 − s(∆t j ))β k,ℓ (∆t j ), we recover the S n -CQA ansätze at p = N . The variational parameters {( β k,ℓ ) j } k,ℓ for the classical optimization corresponds to the randomized coupling strength parameters for the initial Hamiltonian strictly diagonal in the total spin basis elements. The adiabatic evolution time T may be exponential if the spectral gap ∆ is exponentially small as the number of spins grows. There are many works involving techniques to amplify the spectral gap, such as modifying the initial and final Hamiltonian [31,32], quantum adiabatic brachistochrone [33], and adding non-stoquastic Hamiltonian [34,35], to name a few. We refer interested readers to [36] for further discussion. Therefore, one may ask whether the path-dependent randomized coupling coefficients would lead to amplification of the spectral gap so that the final evolution time T can be polynomial bound. In particular, one might need to consider the dynamics of the random path-dependent coupling β k,ℓ (s), such as the Langevin or diffusion processes, supplemented by systems of stochastic differential equations.
The answer to this question will shed light on the existence of a quantum advantage in finding the exact eigenstate of the problem Hamiltonian. Furthermore, by relating it to the adiabatic evolution, similarly to QAOA, this would imply that the S n -CQA ansätze enjoys a guaranteed performance improvement as the number of alternating layers p increases. We leave this as an open question. The success in approximating the sign structure of the ground state in numerical simulation from the S n -CQA ansätze may serve as heuristic evidence for such a potential quantum advantage, in addition to the super-exponential quantum speed-up in performing Fourier space convolution in quantum circuits.

A. benchmark Details
List of Hyparameters used for J2 = 0. 5  An important challenge in the near-time quantum application is to validate proposed variational ansätze in the presence of noise. We report that the S n -CQA anätze are also suspectible to the noise similarly to the noise-induced barren plateau (NIBP) argued in [38]. More precisely, we assume a noise training scenario with noise-free cost function, i.e. expectation with respect to the J 1 -J 2 hamiltonians, with the local noise q i such that −1 ≤ q i ≤ 1 per ith site. We assume the local Pauli noise with q i ≡ q X i = q Y i q Z i with {X, Y, Z} acts on the ith qubits. The YJM mixer Hamiltonian under the action of this type of noise decomposes as: q i q j q k q l S i · S k · S j · S l + q i q k 1 2 S i · S k + 1 2 q j q l S j · S l + 1 4 I We assume the noise on each site to be i.i.d random variable so the central limit theorem applies when the number of qubits are large, these noise sources can be assumed to be Gaussian. Therefore, the impact of noise on the parameters can be modelled as: β k,l → β k,l + ξ (1) k,l + ξ (2) k,l + ξ k,l = k−1 i=1 q i q k , and ξ k,l = l−1 j=1 q j q l to be Gaussian noises. We further consider the Gaussian noise ξ k,l ≡ ξ (1) k,l + ξ (2) k,l + ξ (3) k,l . Similarly, we assume the parameters of the J 1 -J 2 Hamiltonian: Where ζ is some Gaussian noise. In the numerical simulation, we model the noise in the gradient descent, where the update rule for the varitaional parameter θ is given: This measurement-induced noise model is studied in [39] where the authors show that quantum variational ansätze in general are noise-resilient in the over-parameterized regime where the number of variational parameters scale roughly square of the dimension of the physical Hilbert space. In addition to studying how S n -CQA can be resilient to noise induced in the gradient descent, it is of interest investigate the efficient way to simulate the S n -CQA dynamics (consisted of Hamiltonian evolution of problem and mixer Hamiltonians ) that  [38], we fix p and vary the magnitude of the noise ξ. We see that with moderate level of measurement noise, Sn-CQA still could converge to the ground state energy. We see with noise scale increases the performance gets worse. In any case the noise-induced barren plateau (NIBP) seems to be present if the measurement noise is large as the loss seems to be stagnant (upto to oscillating) in the later training as compared with the noise free model. We note that the frustration-free case is slightly more robust to measurement noise than frustrated case.
is robust to noise. In particular one might study how the symmetry protection protocol proposed in [40] could lead a faster Trotterization of S n -CQA in NISQ era. We leave this study as intriguing future works.