Theory for Equivariant Quantum Neural Networks

Quantum neural network architectures that have little-to-no inductive biases are known to face trainability and generalization issues. Inspired by a similar problem, recent breakthroughs in machine learning address this challenge by creating models encoding the symmetries of the learning task. This is materialized through the usage of equivariant neural networks whose action commutes with that of the symmetry. In this work, we import these ideas to the quantum realm by presenting a comprehensive theoretical framework to design equivariant quantum neural networks (EQNN) for essentially any relevant symmetry group. We develop multiple methods to construct equivariant layers for EQNNs and analyze their advantages and drawbacks. Our methods can find unitary or general equivariant quantum channels efficiently even when the symmetry group is exponentially large or continuous. As a special implementation, we show how standard quantum convolutional neural networks (QCNN) can be generalized to group-equivariant QCNNs where both the convolution and pooling layers are equivariant to the symmetry group. We then numerically demonstrate the effectiveness of a SU(2)-equivariant QCNN over symmetry-agnostic QCNN on a classification task of phases of matter in the bond-alternating Heisenberg model. Our framework can be readily applied to virtually all areas of quantum machine learning. Lastly, we discuss about how symmetry-informed models such as EQNNs provide hopes to alleviate central challenges such as barren plateaus, poor local minima, and sample complexity.


I. INTRODUCTION
Recognizing the underlying symmetries in a given dataset has played a fundamental role in classical machine learning.For instance, noting that the picture of a cat still depicts a cat when we translate the pixels of the image, gives a hint as to why convolutional neural networks [1] have been so successful in image classification: they process images in a translationally-symmetric way [2].
In recent years, the importance of symmetries in machine learning has been studied in problems with more general symmetry groups than translations, leading to the burgeoning field of geometric deep learning [2].The central thesis of this field is that prior symmetry knowledge should be incorporated into the model, thus effectively constraining the search space and easing the learning task.Indeed, symmetry-respecting models have been observed to perform and generalize better than problem-agnostic ones in a wide variety of tasks [2][3][4][5][6][7][8][9].As such, a great deal of work has also gone into developing a mathematically rigorous framework for designing symmetry-informed models through the machinery of representation theory.This has provided the basis for so-called equivariant neural networks (ENNs) [10][11][12][13][14], whose key property is that their action commutes with that of the symmetry group.In other words, applying a symmetry transformation to the input and then sending it through the ENN produces the same result as sending the raw input through the ENN and then applying the transformation.
Geometric quantum machine learning (GQML) at-Figure 1.Schematic representation of our main results.a) In GQML we start by identifying the symmetry group -or groupsthat leave the data labels invariant.For the example shown, the data can be visualized on a three-dimensional sphere, and the labels are invariant under the action of SO(3).b) Both in classical and quantum machine learning it has been shown that models with equivariant layers often have an improved performance over non-equivariant architectures.The key feature of equivariance is that applying a rotation to the input data and sending it through the layer is the same as first sending the data through the layer and then rotating the output.On the other hand, feeding either a raw or a rotated data instance into a non-equivariant layer usually leads to distorted outputs which are not related by a rotation.c) In this work we provide a toolbox of methods for creating equivariant quantum neural networks (EQNNs) that can be readily used to construct quantum architectures with strong geometric priors.
tempts to solve the aforementioned issues by leveraging ideas from geometric deep learning to construct quantum models with sharp inductive biases based on the symmetries of the problem at hand.For instance, when classifying between states presenting a large, or a low, amount of multipartite entanglement [24,42,43], it is natural to employ models whose outputs remain invariant under the action of any local unitary [44].While recent proposals have started to reveal the power of GQML [44][45][46][47][48][49][50][51][52], the field is still in its infancy, and a more systematic approach to symmetryencoded model design is needed.
The goal of this work is to offer a theoretical framework for building GQML models based on extending the notion of classical ENNs to equivariant quantum neural networks (EQNNs) (see Fig. 1).Our main contributions can be summarized as follows: • We provide an interpretation for EQNN layers as a form of generalized Fourier space action, meaning that they: perform a group Fourier transform, act on the Fourier components, transform back.This allows us to quantify the number of free parameters in an EQNN layer, and unravels the exciting possibility of using different group representations as hyperparameters to act on different generalized Fourier spaces.(Sec.IV) • We introduce a general framework for EQNNs, extending previous results from unitary quantum neural networks to channels.We characterize the different types of EQNN layers such as standard, embedding, pooling, lifting and projection layers.This permits a classification of EQNN layers depending on their input and output representations.In addition, we also explore how equivariant non-linearities can be introduced via multiple copies of the data.(Sec.IV) • We describe three alternative methods for constructing and parametrizing EQNNs.These are based on finding the nullspace of certain system of matrix equations, applying the twirling formula over the symmetry group, and using the Choi operator representation of channels.Our methods have a better complexity than existing methods and can efficiently find unitary or non-unitary equivariant layers even when the symmetry group is exponentially large.We discuss strengths and weaknesses of each approach and general methods to optimize/train equivariant channels.(Sec.V) We conclude with a discussion on how EQNNs provide hope to alleviate critical challenges in QML such as illbehaved training landscapes (barren plateaus and local minima), and to reduce the sample complexity (data requirements) of the model.Taken together, we hope that our results will serve as blueprints and guidelines to a more representation-theoretic approach to QML.

II. RELATED WORK A. Equivariance in geometric deep learning
In this section, we provide an overview of literature on classical equivariant neural networks (ENNs), leaving the formal treatment of equivariance to Section III.At a high level, equivariance is a mathematical property that preserves symmetries in features throughout a multilayer ENN.One imposes equivariance onto ENN layers via tools from group representation theory, the workhorse behind geometric deep learning [2].The most well-known equivariant architecture is the convolutional neural network (CNN) [1], ubiquitous in image and signal processing.The relevant symmetry group in CNNs is the translation group in the plane R 2 , and one can show that their convolution and pooling layers are equivariant to this group [11].Ideas to generalize CNNs to other groups and data were first laid out in [10] and further made mathematically rigorous in [11,12].These works are concerned with the so-called homogeneous ENNs, which include as special cases spherical CNNs (where the relevant group is SO(3)) for spherical images [6], and Euclidean neural networks (the Euclidean group E(n) = R n ⋊ O(n) and its subgroups) for molecular data [53][54][55][56].Notable non-homogeneous architectures include graph neural networks (the permutation group S n ) [4,57,58].In addition, more advanced representation-theoretic treatments on nonhomogeneous data [14] have led to steerable CNNs [59], gauge-equivariant CNNs [13] on general manifolds.Moreover, it has been shown that equivariant layers can be constructed from either the real space or Fourier space perspectives [5].More recently, methods for designing equivariant layers have been studied in [8,51,60,61].For a theoretical analysis of the improvements in training and generalization error arising from using ENNs we refer the reader to [9,[62][63][64], while the expressibility and universality of ENNs have been studied in [65][66][67][68][69].

B. Equivariance in quantum information
Equivariance1 has a long history in quantum information theory .As such, we will not attempt here to review its full impact on the field, but we will rather focus on several relevant works where equivariance has been studied in the context of quantum channels.
To begin, the set of all irreducible SU(2)-equivariant channels has been characterized in [91], with extensions to a wide class of finite groups presented in [92].The work in [93] presents conditions to construct group-equivariant generalized-extreme channels.On the other hand, the history of equivariance in QML is much more recent.In [44] and [45] the authors lay a theoretical groundwork for the integration of symmetries into QML.However, prior work are either non-constructive [44] or only work efficiently on restricted sets of problems and symmetries [45].In particular, two main types of symmetries have been most extensively explored: the action of the local unitary group SU(d) on each qudit in a correlated manner U ⊗n , and the action of the permutation group S n by permuting the qudits.It is well known fact in representation theory, called the Schur-Weyl duality, that these two group representations commute.
On the side of SU(d)-symmetry, Ref. [49] has proposed a specific task -approximating matrix elements of S n irreps evaluated on arbitrary group algebra elements-together with a novel polynomial-time quantum algorithm, based on the combination of Quantum Schur Transform and Hamiltonian simulation, that potentially achieves a super-exponential quantum speedup given that best known classical algorithms require O(n!n 2 ) time.In turn, [48] exploits the ideas in [49] to derive an ansatz -the S n -Equivariant Convolutional Quantum Alternating Ansatze (S n -CQA)-that is universal for the subgroup of SU(d)equivariant unitaries.As the name suggests, it is based on the qudit permutation action of S n on the quantum system which, via Schur-Weyl duality, linearly spans the subspace of SU(d)-equivariant operators.Interestingly, S n -CQA can be shown to achieve universality in the subgroup of symmetric unitaries with only four-body interactions, something that is remarkable given the typical limitations of universality imposed by locality constraints [87,94].S n -CQA's performance and resource requirements are benchmarked in [50].
On the S n symmetry side, [95] has shown that S nequivariant QNNs exhibit a wide range of favorable prop-erties such as being immune to barren plateaus, efficiently reaching overparametrization, and being able to generalize well from few training points.Moreover, methods for constructing equivariant quantum circuits for graph problems were given in [46,47,52,96].We also note that, while not explicitly mentioned, some recent quantum algorithms can be analyzed from an equivariance point of view [21,23,[97][98][99].We discuss these in Appendix A.

III. PRELIMINARIES
Here we give some of the necessary background in quantum machine learning and representation theory to tackle GQML.For a more comprehensive treatment of these topics, we refer the reader to standard textbooks in representation theory [100][101][102] and geometric ML theory literature [2,12,14].For a QML-oriented approach to group theory and representation theory, see [103].
A. From QML to GQML For simplicity and concreteness, in this paper we focus on quantum supervised learning with scalar labels.However, we remark that GQML is relevant in other contexts such as unsupervised learning [104,105], generative modeling [106][107][108][109] or reinforcement learning [110,111].For instance the constructions presented here can be readily adapted to learning problems with non-scalar output (e.g., quantum generative models, where the output is a quantum state, or a probability distribution).
Suppose we are given some dataset composed of M quantum states and scalar labels {ρ i , y i } M i=1 , where ρ i ∈ R are quantum states from a data domain R ⊂ B(H), with B(H) the set of bounded linear operators on H.The labels come from a label domain Y and are obtained from a (potentially probabilistic) function: For example, in binary classification one has Y = {0, 1}.This data may come from some physical quantum mechanical process (quantum data [24]) or may be classical information embedded into quantum states (classical data [112]).Given the dataset, one then optimizes a learning model: where θ are trainable parameters, with the intent of closely approximating the underlying function f .
In variational QML [18], the states in the dataset are fed into a trainable quantum circuit, which is usually modelled by a sequence of parametrized unitary matrices.However, in this work we will consider more general operations -parametrized quantum channels -which we refer to as quantum neural networks (QNNs).Respectively denoting the spaces of bounded linear operators in H in and H out as B in := B(H in ) and B out := B(H out ), a QNN is a parametrized completely positive and trace-preserving (CPTP) linear map N θ : B in → B out .
We can further decompose the QNN as a concatenation of channels, or layers.We say that N θ is an L-layered QNN if it can be expressed as , where the N l θ l (with l = 1, . . ., L) are parametrized CPTP channels such that θ = (θ 1 , . . ., θ L ).From the previous, the l-th layer maps between operators acting on some Hilbert space H l−1 to operators acting on some (potential different) Hilbert space H l .That is, N l θ l : B l−1 → B l , where we have defined for simplicity of notation B l := B(H l ).
After applying the QNN to an input state ρ, one measures the resulting state with respect to a set of observables {O j } j to obtain the expectation values {Tr[N θ (ρ)O j ]} j .Finally, a classical post-processing step, C, maps these outcomes to a loss function We quantify the performance of the model over the dataset via the so-called empirical loss defined in terms of some problem-dependent function F. Finally, employing a classical computer, one optimizes over the parameters θ to minimize the empirical loss until certain convergence conditions are met.The optimal parameters, along with the loss function, are used to predict labels.One of the most important aspects that make or break the QML scheme are its inductive biases, i.e., the assumptions about the problem that one embeds in the structure of the model.In our case, this amounts to an adequate choice of the parameterized layers N l θ l forming the QNN and of the measurement operators O j .In a nutshell, the inductive biases are responsible for the model exploring only a subset of all possible functions from R to Y.If these inductive biases are too general or not accurate, the model is expected to train poorly, while models with appropriate inductive biases can often benefit from an improved performance [26,31,95].GQML aims at providing a framework for incorporating prior geometrical information in the model with the hope of improving its trainability, data requirements, generalization, and overall performance.In particular, the main goal of GQML is to create models respecting the underlying symmetries of the domain over which they act.In the next sections we will briefly review how to use tools from representation theory to deal with symmetries, as well as recall basic concepts such as equivariance and invariance.

B. Symmetry groups and representation theory
The first step towards building a GQML model is identifying the set of relevant operations that the model needs to preserve.We say that a QML problem has symmetry with respect to a group G if the labels are unchanged under the action of a representation of G on the input states.
Definition 1 (Label symmetries and G-invariance).Given a compact group G and some unitary representation R acting on quantum states ρ, we say the underlying function f has a label symmetry if it is G-invariant, i.e., if As previously mentioned, the goal of GQML is to build models that respect the label symmetries of the data.That is, we want to build G-invariant models such that h θ (ρ) = h θ (R(g)ρR(g) † ), for any g ∈ G, and for all values of θ.
To further understand how symmetry groups act, and how one can manipulate them, we recall here some basic concepts from representation theory.(See Ref. [103] for further background.)Namely, given a group G, its representation describes its action on some vector space H, which we assume for simplicity to be a Hilbert space.
Definition 2 (Representation).A representation (R, H) of a group G on a vector space H is a homomorphism R : G → GL(H) from the group G to the space of invertible linear operators on H, that preserves the group structure of G.
This implies that, for all g ∈ G, the representation of its inverse is the inverse of its representation, R(g −1 ) = R(g) −1 , and the representation of the identity element e is the identity operator on H, R(e) = 1 1 dim(H) .Given a representation, it is relevant to define its commutant.
Definition 3 (Commutant).Given a representation R of G, we define the commutant of R as the set of bounded linear operators on H that commute with every element in R, i.e., Consider the following remarks about representations: • A representation is faithful if it maps distinct group elements to distinct elements in H.As an example of unfaithfulness, the trivial representation maps all group elements to the identity in H.
• Two representations R 1 and R 2 are equivalent if there exists a change of basis • A subrepresentation is a subspace K ⊂ H that is invariant under the action of the representation, i.e., R(g) |w⟩ ∈ K for all g ∈ G and |w⟩ ∈ K.The group can then be represented through R| K , the restriction of R to the vector subspace K.A subrepresentation K is non-trivial if K ̸ = {0} (the zero vector) and K ̸ = H.Definition 4. (Irreps) A representation is said to be an irreducible representation (irrep) if it contains no non-trivial subrepresentations.
Irreps are the fundamental building blocks of representation theory.For any finite or compact group, the representations can be chosen to be unitary [113].Hence in the rest of this paper we will consider unitary representations on complex Hilbert spaces.In the case that the representation is finite-dimensional, we can go a step further and say that the vector space can be decomposed into a direct sum over irreducible subrepresentations.This leads to the so-called isotypic decomposition where ∼ = indicates that there exists a global change of basis matrix W that simultaneously block-diagonalizes the unitaries R(g) for all g ∈ G. Here, λ labels the irreps, m λ is the multiplicity of the irrep R λ and R λ (g) When G is not a finite group, we assume it to be a compact Lie group with an associated Lie algebra g such that e g = G.That is, g = {a|e a ∈ G}.In particular, if G has a representation R then g has a representation r given by the differential of R, i.e., given a ∈ g, R(e a ) = e r (a) .
In this work we will mainly focus on the adjoint representation of G, as it describes how the group acts on density matrices (and other bounded operators).A unitary representation R on H induces an action on B(H), given by where Ad R(g) denotes the adjoint representation.Note that for the case of Lie groups, the adjoint representation also exists at the Lie algebra level and is given by ad r(a) (•) = [r(a), •].
To finish this section, we find convenient to define a distinction between symmetry groups.Definition 5 (Inner and outer symmetries).Given a composite Hilbert space, we call a representation of a group an inner symmetry if it acts locally on each subsystem, and an outer symmetry if it permutes the subsystems.
For instance, when working with n-qubit systems, the tensor representation of SU(2), R(g ∈ SU(2)) = g ⊗n is an inner symmetry, as it acts locally on each subsystem.On the other hand, the qubit-permuting representation of S n , given by R(g) n j=1 |ψ j ⟩ = n j=1 ψ g −1 (j) is an outer symmetry.

C. Equivariance and invariance in quantum neural networks
Here we present a recipe to obtain G-invariant QML models based on the key concept of equivariance, which we will first define for linear maps and then for operators.
Given a group G and a representations R, we typically say that a linear map ϕ : B(H) → B(H) is equivariant if and only if ϕ • Ad R(g) = Ad R(g) • ϕ for all g ∈ G.We can extend this definition by noting that neither the input and output representations, nor the Hilbert spaces, need to be the same.Thus, we consider the following general definition.
Definition 6 (Equivariant map).Given a group G and its representations (R in , H in ) and (R out , H out ).A linear map ϕ : The property of equivariance can be visualized via the following commutative diagram: .
The action of an equivariant map ϕ commutes with the action of the group.That is, for an equivariant ϕ it is equivalent to (i) first acting with ϕ and then acting with R out , or to (ii) first acting with R in and then acting with ϕ.Note that, for the special case of R out being the trivial representation, the map is invariant: such that ϕ • Ad R in (g) = ϕ for all g ∈ G.
Next, let us define what an equivariant operator is.
Definition 7 (Equivariant operator).Given a group G and its representation (R, H), an operator Evidently, Definition 7 implies that O ∈ comm(R), from which we can easily see that comm(R) is the space of all equivariant operators.Moreover, we can also see that the adjoint action of a The previous definitions present us with a recipe to build GQML models of the form in Eq. ( 3) whose outputs are invariant under the action of the group.
Proof.For every g ∈ G, ρ ∈ B in and θ we have Armed with the previous definitions we are now ready to present the basic framework for EQNNs.
First, however, we find it instructive to provide an example of a classification problem naturally amenable to these methods.Suppose that we are given the ground states of the bond-alternating Heisenberg model, which has the Hamiltonian where S i = (S i x , S i y , S i z ) is the spin operator for the ith qubit.There are two phases of matter for this Hamiltonian: trivial and topologically protected.As a learning problem, we consider the task of determining if the states are in the trivial or topologically protected phases.Consider the representation R(g) = g ⊗n of SU (2).For a ground state |ψ⟩, one can show that R(g) |ψ⟩ is also a ground state.Thus, the labels of states are invariant under an action of SU(2).In Sec.VII we return to this problem and show that a EQNN significantly outperforms a quantum convolutional neural network for this task.We consider a QML problem composed of a dataset (that can either be quantum mechanical in nature, or corresponding to classical data that have been encoded in quantum states) as well as a label symmetry group G.The first step is to define the input and output representation of G at each layer, where these can be natural, faithful, non-faithful, etc. From here, we will provide different techniques which allow us to construct the EQNN layers and control, for instance, the locality of their gates.b) Dashed lines indicate the representations of the symmetry group G at specific stages in the EQNN, which may change between layers.At first, the input state ρin is acted upon by the representation R in .The l th layer of the EQNN, N l θ l , must be (G, R l , R l+1 )equivariant.In sum, the full architecture, The (G, , R out )-equivariant measurement operator O is in the commutant of the output representation R out .Note that if we only want the EQNN to produce an output state equivariantly or invariantly (e.g. in generative models), we can omit the measurements.

IV. THEORY OF EQUIVARIANT LAYERS FOR EQNNS
In this section we will shed some light on the importance of the choice of representation by studying how EQNNs act on data, and how many degrees of freedom they have.Most notably, we will show that layers that are equivariant to different representations can process data in different ways, so that a given layer could potentially "see" information that is inaccessible to another one.The latter will point to the crucial importance that intermediate representations have.Finally, we will present a classification of EQNN layers based on their input and output representations, allowing them to be non-linear, and even change the symmetry group under which they are equivariant.Our results can be summarized in Fig. 2.

A. Equivariant layers as Fourier space actions
Let us start by analyzing how EQNNs act on data.For simplicity, we first consider the case when R in = R out = R, the states in the dataset are pure ρ = |ψ⟩⟨ψ|, and the EQNN is unitary, i.e., N θ (ρ) = U (θ)ρU (θ) † .Note that if N θ is a (G, R, R)-equivariant map, then U (θ) is a (G, R)equivariant operator, and hence, it belongs to comm(R).Hence, we can understand the action of U (θ) on |ψ⟩ by studying the structure of the commutant.
Theorem 1 (Structure of commutant, Theorem IX.11.2 in [100]).Let R be a unitary representation of a finitedimensional compact group G. Then under the same change of basis W , which block diagonalizes R as in Eq. (8), any operator H ∈ comm(R) takes the following blockdiagonal form where each H λ is an m λ -dimensional operator that is repeated d λ times.
The previous theorem shows that any equivariant unitary can be expressed as U (θ) = W † ( λ 1 1 d λ ⊗ U λ (θ)) W , indicating that in the irrep basis, it can only act nontrivially on the multiplicity space.Drawing a parallelism with the classical machine learning literature, where it has been shown that linear equivariant maps can only act on the group Fourier components of the the data [5,6,53,57,58], we can also here interpret EQNNs as a form of generalized Fourier space action.Specifically, the action of U (θ) can be understood as: (i) first transforming the data to the generalized Fourier space W |ψ⟩ = λ ψ λ ⊗ |ψ λ ⟩, (ii) acting on each Fourier component |ψ λ ⟩ with U λ (θ), and (iii) transforming back with W † .That is, Note that this interpretation can be readily generalized to channels.
Here, we can see that once the representation of G is fixed, so is the information in the input state one has access to (equivariantly).Explicitly, the EQNN cannot manipulate information stored in the components ψ λ of the input state.As we will see in the next subsection, one can still try to access this information via changes of representation.
Notably, Eq. ( 14) generalizes group convolution in the Fourier basis: When R is the regular representation, the change of basis is the well-known group Fourier transform [90,114] (see Appendix B).This generalized Fourier space picture has proved crucial in designing various classical architectures [5,6,53,57,58].This also provides a representation-theoretic justification for the recent quantum "convolutional layers" in [23].Recently, this interpretation of equivariant unitaries has also been noted for the special case of SU(d)-equivariant quantum circuits in [48,49].

Equivariant unitaries
The Fourier space picture previously discussed enables the counting of free parameters in equivariant unitaries.
Theorem 2 (Free parameters in equivariant unitaries).Under the same setup as Theorem 1, the unitary operators in comm(R) can be fully parametrized by λ m 2 λ real scalars.
Proof.Any unitary U in comm(R) takes the block-diagonal form U = λ 1 1 d λ ⊗ U λ in the Fourier basis.Observe that the operators U λ must also be unitaries since λ real scalars, hence a total number of λ m 2 λ parameters suffice to parametrize U .Theorem 2 describes how "significant" the symmetry is to the problem, in the sense that the larger the representations of G, the smaller the commutant and thus the fewer parameters needed to fully characterize equivariant unitaries.In Table I we present examples of different symmetries constraining the number of free parameters in a unitary EQNN to both exponentially many, polynomially many, and constant.

Equivariant channels
We have already seen how the inductive biases in unitary EQNNs affect their structure, and concomitantly their number of free parameters.We now turn our attention to (G, R in , R out )-equivariant channels.First, recall that any linear channel ϕ : B in → B out , can be fully characterized

Group Representation
Free parameters comm(R) None R trivial (g) = 1 12n Table I.Free parameters in unitary EQNNs.We show how different symmetries impact the number of free parameters in a (G, R, R)-equivariant unitary.Here, for set a S, we have defined through its Choi operator [115] which acts in B(H in ⊗ H out ).The action of ϕ on an input state ρ ∈ B in can be recovered from J ϕ as follows [116] ϕ The Choi operator is related to equivariance via the following theorem.
, where the * symbol denotes complex conjugate.
Noting that (R in * ⊗R out ) is a valid representation as per Definition 2, we can combine Theorem 1 and Lemma 1 to determine a parameter count for general equivariant channels.

Let the irrep decomposition of
We defer the proof to Appendix C. Intuitively, the parameter count of equivariant CP maps follows similarly to the proof of Theorem 2 and the extra term C(R in , R out ) arises from imposing that the channel ϕ must be trace preserving (TP).
Similar to classical ML literature [59], the parameter count benefit of using equivariant layers can be assessed via the parameter utilization metric where Hom CPTP (R in , R out ) denotes the set of CPTP maps between B in and B out and That is, the larger µ, the larger the benefit of using an EQNN is, in the sense that available parameters are used more effectively.For instance, by imposing SU(2) equivariance on 2-to-2 qubit channels, one reduces the number of free parameters to less than or equal to 14 (see Section VI), yielding a reduction of µ ⩾ 240/14 ≈ 17.

C. Intermediate representations as hyperparameters
Let us here discuss an aspect of EQNNs which has been purposely overlooked up to this point.Namely, while the input representation R in is fixed by the action of the symmetry group on the input data, the intermediate and output representations acting on the spaces B l are not.This means there exists freedom in choosing a sequence of representations (R in , R 1 , . . ., R out ) under which the layers are equivariant.That is, The equivariance encoded in the L-layered EQNN of definition (8) can be visualized via the following commutative diagram .
The previous allows us to see that Note that, as previously discussed, a representation defines a Fourier space, meaning that it determines the space over which a layer of EQNN can act, or alternatively the information in the states that can be accessed.As such, one can use intermediate representations to change how the model accesses and processes information, which can fundamentally determine the success of the learning model.
The most general way of fully specifying a representation is via the multiplicities of the irreps.Thus, the irrep multiplicities m l λ of the intermediate representations can be understood as hyperparameters of the EQNN, similar to the number of channels in a conventional CNN.While in general there is no strict rules on what representations to use, here we discuss strategies to choose the intermediate representations that are physically meaningful and ease calculations of equivariant layers.
First, one should choose representations that are natural on quantum systems.For example, the unitary group U( 2) admits a natural representation on n-qubits, where H = (C 2 ) ⊗n , as R(U ) = U ⊗n and the cyclic group Z n admits the representation the cyclic group Z n admits the representation R(g t ) n j=1 |ψ j ⟩ = n j=1 |ψ j+t mod n ⟩, i.e., cyclic shifting the qubits.Second, the following proposition asserts that equivalent intermediate representations yield the same model expressibility [31,117], and hence it suffices to consider inequivalent ones when designing EQNNs.We defer the proof to Appendix C.

Proposition 2 (Insensitivity to equivalent representations). Consider an EQNN as defined in Definition 8.
Then changing an intermediate representation, R l , to another representation equivalent to it, V R l V † , where V is a unitary, does not change the expressibility of the EQNN.
Finally, we note that the case of finite groups and regular representations (i.e., when the intermediate representations are chosen to be R reg : G → C[G] corresponding to the group action on its own group algebra) has been studied in the classical literature under the name of homogeneous ENNs [14].In this case, any equivariant map is a group convolution [11], which can be realized as a unitary operator embedding the classical convolution kernel by the quantum algorithms in [118].Combining this with quantum algorithms for polynomial transformations of quantum states [119,120] allows one to quantize classical homogeneous ENNs.In other words, classical homogeneous ENNs can be implemented on a quantum computer as a special case of EQNNs.

D. Field-guide to equivariant layers
As previously discussed, intermediate representations can be considered as hyperparameters for the EQNN.In what follows we define and characterize different types of equivariant layers arising from different intermediate representations.

Standard, embedding and pooling
We start by presenting a definition that categorizes equivariant layers based on the sizes of input and output representations.
, and a standard layer if dim(B l ) = dim(B l−1 ).Definition 9 does not require the layer to be a quantum channel and is thus applicable beyond the context of quantum-to-quantum layers, e.g., in quantum algorithms with classical post-processing as we discuss in Appendix A. For the special case of EQNNs mapping from a Hilbert space of n qubits to a Hilbert space of m qubits, we say that N l θ l is a pooling layer if m < n, an embedding layer if m > n, and a standard layer if m = n.
Equivariant quantum circuits have been proposed and used in previous works [45][46][47][48]52] mostly in the context of graph problems.However, we note that these fall into standard layers and our framework provides more flexibility as the operations need not be unitary.An idea of pooling layers was proposed in [23], although the pooling layer they used was not equivariant to the symmetry of the considered classification task (see Appendix A).We will provide examples of pooling equivariant layers in later sections.While embedding layers have been used to map classical data to quantum data, they are usually not equivariant [38,112] (with a few notable recent exceptions [45,121]), meaning that all the symmetry properties of the classical data is lost during the encoding to quantum states.In addition, to our knowledge, embedding layers mapping quantum data to quantum data have not been formalized in QML prior to this work.Intuitively, embedding layers equivariantly embed the quantum data into a larger Hilbert space, allowing access to higher-dimensional irreps and perform non-linearities (discussed below).A prototypical general EQNN architecture using these equivariant layers inspired by classical literature [3,14] is illustrated in Fig. 3.

Projection and lifting
Another common technique in classical geometric deep learning literature is to relax the symmetry constraints in the later layers, typically corresponding to greater-scale features, of the ENNs.This is achieved by projection layers (also called reduction layers in some work [14]), which go from representations R in to R out with ker(R in ) < A standard layer maps data between spaces of the same dimension.An embedding (pooling) layer maps the data to a higher-dimensional (smallerdimensional) space.In a lifting layer, ker(R l−1 ) > ker(R l ), while in a projection layer ker(R l−1 ) < ker(R l ).
ker(R out ). Recall that the kernel of a representation R is defined as ker(R) := {g ∈ G|R(g) = 1 1}, so that R is faithful if and only if ker(R) = {e}.Similarly, one can also define lifting layers where ker(R in ) > ker(R out ).These lifting layers are used as the first layer in many homogeneous ENN architectures [3,8,11,12], but their usefulness is not known in general non-homogeneous ENNs [14].Here we similarly define projection and lifting equivariant layers for EQNNs based on the kernels of the representations as follows.

Definition 10 (Equivariant projection and lifting layers).
A (G, R l−1 , R l )-equivariant layer is defined as a projection layer if ker(R l−1 ) < ker(R l ) and a lifting layer if ker(R l−1 ) > ker(R l ).
Projections layers usually become necessary in pooling layers in the presence of outer symmetries which exchange subsystems (see Definition 5), such as Z n , D n , or S n under qubit permuting representations.In contrast to inner symmetries which act locally or globally as general unitaries, such as U(2) with R(g) = g ⊗n , outer symmetries typically have no faithful representations when the number of qubits is reduced by a pooling layer.Hence, in this case it is convenient to use a non-faithful representation on the output Hilbert space of fewer qubits, i.e., a projection layer.We provide examples of projection layers in Section VI.
Lifting layers instead can potentially be beneficial when the symmetry of the problem is unsubstantial and does not greatly reduce the number of free parameters in the model, leading to too expressive EQNNs with potential trainability issues.By lifting to a larger group, one can further reduce the expressibility and potentially improve trainability [31,95].However, the actual benefit of lifting layers is not known.
Lastly, we note another interpretation of lifting and projection layers as follows.A non-faithful representation R of G with ker(R) = H can be thought of as a faithful representation of the quotient group G/H.Then, lifting layers map from a faithful representation of a quotient group to that of a larger quotient group while projection layers have the opposite effect.

Non-linearities
Finally, it is a common practice in QML to assume repeated access to the dataset, which means that one can potentially access multiple copies of the input state ρ.The mapping of the form ρ → ρ ⊗k , which could be applied in the first or an intermediary layer of an EQNN, can thus serve as a non-linear equivariant embedding layer, where Definition 11 (Non-linear equivariant embedding layers).An order-k equivariant non-linearity in EQNNs is defined as the composition of a map adding k−1 copies of the input state ϕ nonlinear : ρ → ρ ⊗k .
From the Fourier space perspective, this operation is analogous to the widely used irrep tensor product nonlinearity in classical ENNs.For instance, the Clebsch-Gordan decomposition, which computes the tensor product of SO(3) irreps, has been used in classical literature to achieve universal nonlinearity [5][6][7]53].In the quantum setting, on the other hand, tensor product is performed naturally by composing systems, giving opportunities for equivariant data processing on high-dimensional irreps.Indeed, the first step in the quantum-enhanced experiment model [21] performs this non-linear equivariant layer.Doing so can drastically simplify non-linear learning tasks [21,44] (see Appendix A).

V. METHODS FOR CONSTRUCTING EQUIVARIANT LAYERS FOR EQNNS
In this section, we describe methods to construct and train layers in EQNNs.Our first step will be to identify, given a group and its in-and-out representations, the space of equivariant maps.For this purpose we present three distinct approaches based on finding the nullspace of a system of matrix equations, the twirling technique, and on the Choi operator.Once the space of equivariant maps is determined, we discuss how to parametrize and optimize over them.An overview of the results in this section can be found in Table II.

A. Simplifying the task of finding equivariant maps
As per Definition 6, a linear map ϕ is equivariant if it satisfies the superoperator equation The set of all such maps forms a vector space, and therefore to characterize them all it suffices to find a basis of this space.While naively it would seem that one needs to solve Eq. ( 18) for every g ∈ G, we will now see that it is usually enough to solve this equation only over a well-chosen subset of elements of the group (or of its Lie algebra for Lie groups of symmetries).

Finite groups
We first consider the case when G is a finite group.Here, we recall the concept of a generating set.A subset S = {g 1 , . . ., g |S| } ⊂ G is a generating set of G if any element of the group can be written as a product of elements in the generating set.Denoting ⟨S⟩ the closure of S, i.e., the repeated composition of its elements, we say that S generates G if ⟨S⟩ = G.For example, the symmetric group S n can be generated by the set of transpositions.It is a well-known fact in group theory that a finite group can be generated with a subset S of, at most, size log 2 (|G|) [101].Thus, even exponentially large groups can be handled efficiently through their generating set.In particular, we can simplify the task of finding the equivariant maps via the following theorem.
Theorem 4 (Finite group equivariance).Given a finite group G with generating set S, a linear map ϕ is

Lie groups
While Theorem 4 is useful when the group G is finitely generated, many relevant groups, such as the Lie group U(d), are not.However, we can consider generating sets, but now at the Lie algebra level.Ref. [60] provides a method for imposing equivariance under Lie groups, where Nullspace uses a linear-algebraic approach to impose equivariance on the generating set of the group or its algebra.Twirling uses the twirl formula defined in Eq. ( 25) or Eq. ( 26).Choi operator block-parameterizes the Choi operator via an irrep decomposition.Time complexity: denotes the computational complexity of the method.The time complexity of Gaussian elimination is where d is the size of the linear system.Assuming the generating set has size O(1), then the linear system obtained in the nullspace method is of size 2 2(m+n) , where n and m are the number of qubits at the input and output of the map.For twirling and Choi operator, the time complexity is dominated by the Haar-integral and irrep decomposition, respectively.In the case of twirling, it can be computed analytically, approximately, or implemented in-circuit depending on the problem at hand.Locality: determines whether we can control the locality of the operations we need to implement.CPTP: indicates whether the output channel is CPTP, and how hard it is to impose this condition on the output maps.In nullspace, it is trivial to impose TP on the solution but imposing CP is more involved.Twirling guarantees CPTP as long as the channel we twirl is CPTP.In Choi operator, imposing CP is straightforward, but TP might be more involved due to the dimension mismatch introduced by the irrep decomposition.Kraus rank: indicates whether we can control the Kraus rank of the channel.Notes: denotes whether we can find a basis for the equivariant map vector space, or if we find one map at a time.In the table, decomp.is short for decomposition, elim.for elimination, lin.for linear, and equiv.for equivariant.
the equivariance constraint is imposed over a basis of the Lie algebra.Evidently, this becomes impractical in the case of large Lie groups, since the method scales linearly on its dimension.Instead, as we prove below, it suffices to impose the constraint only over a generating set.That is, we can consider s = {a 1 , . . ., a |s| } ⊂ g a generating set for g if its Lie closure ⟨s⟩ Lie , the repeated nested commutators of the elements of the set, spans the whole Lie algebra.With this concepts at hand, we are ready to impose equivariance at the algebra level.
Theorem 5 (Lie group equivariance).Given a compact Lie group G with a Lie algebra g generated by s such that exponentiation is surjective, a linear map ϕ is where r in , r out are the representations of G induced by R in , R out .
We note that the assumption of surjectivity of the exponential map can be relaxed with incorporation of additional constraints.This relaxation together with pertinent examples and proofs of the theorems are given in Appendix C.

B. Nullspace, twirling and Choi operator
With the previous in mind, in this section we present three different techniques that can be used to determine equivariant channels.

Nullspace method
In the nullspace method, we formulate the equivariance constraints as a linear system of matrix equations, one per element in the generating set, and then solve for their joint nullspace.This yields a basis for the vector space of equivariant linear maps (not necessarily quantum channels).For the rest of this section we assume we have a finite group and a set of generators at the group level.The case of Lie groups follows analogously by working at the level of the Lie algebra.
Our method generalizes those in [60,122] and proceeds as follows.The first step is to represent the superoperators in Eq. ( 19) as matrices, sometimes referred to as transfer matrices.This can be achieved through the following map ϕ → ϕ = i,j ϕ i,j |P i ⟩⟩⟨⟨P j |, where P j and P i are Pauli operators in the input and output Hilbert spaces, respectively [123].Here, ϕ is a dim(B out ) × dim(B in ) matrix.
The latter transforms Eq. ( 19) into a matrix multiplication equation of the form Next, we will perform a vectorization [124], which maps a matrix into a column vector, and allows us to write Eq. ( 21) as Here, vec(ϕ) is a dim(B in ) dim(B out )-dimensional column vector and With the previous, we can obtain equivariant maps by computing the intersection of the nullspaces of each M g , i.e., In Fig. 4 we present an example of the nullspace method.
Let us make here several important remarks about the nullspace method.First, it is clear that this technique can rapidly become computationally expensive.For example, finding equivariant channels mapping from n qubits to m qubits by solving the nullspaces through Gaussian elimination [125] has a complexity of O(2 6(m+n) ).Second, let us note that the solutions of Eq. ( 24) will lead to a basis for all equivariant linear maps, and therefore additional steps would be required to find the subset of physically realizable operations (see Sec. V C).For instance, we can obtain trace-preserving maps (TP) by noting that ϕ is TP if and only if ϕ contains the term dim(H in ) dim(H out ) 1 1 dim(H out ) ⟩⟨ 1 1 dim(H in ) and no other terms mapping to 1 1 dim(H out ) ⟩.
In practice, we can significantly reduce the computational complexity of this method by restricting the set of Pauli operators we need to consider in the input and output spaces.This is particularly useful for inner symmetries, where the action of the group can be locally studied.For example, consider the following lemma: Lemma 2 (Global equivariance via local equivariance).Let B in (out) be composite input (output) spaces of the form B in (out) = j B in (out) j . Then, assume that the representations acting on each of these space takes a tensor product structure over subsystems as R in (out) (g) = j R in (out) j (g).For local equivariant channels mapping between each pair of in-and-out subsystems We demonstrate how to use the nullspace method to determine the space of 1-to-1-qubit (G,R in ,R out )-equivariant quantum channels, with The matrix representation of both in and out adjoint representations of the symmetry group.b) A basis for the 8dimensional solution space, as well as two possible equivariant channels: ϕ(ρ) = Tr[ρ]/2 obtained from the solution in red, and ϕ(ρ) = (XρX + ZρZ)/2 obtained by combining the two solutions in green.
Thus, we can find equivariant maps locally and take tensor products of them to obtain a global equivariant layer.While such approach can greatly reduce the computational cost (for instance, solving for 2-to-1 qubit maps requires dealing with 64 × 64 matrices), this will come at the cost of expressibility, as the composition of local equivariant channels may have a restricted action when compared to a general equivariant global channel [87].
In the case of outer symmetries such as G = S n , and R in (g) = R out (g) = R qub (g) (as defined in Table.I), we can use a generating set S including only local transpositions (i.e., involving only two-body operators).Thus, if we want to obtain maps ϕ containing only one-and two-body terms we only need to consider the sub-block of M g corresponding to one-and two-body Pauli operators, reducing its size from exponential to only polynomial.

Twirling method
We now explain a second method for finding equivariant maps based on twirling.This approach was first proposed in [45] to determine equivariant unitary channels.We here extend this framework to general non-unitary quantum channels with (possibly) different representations in the input and output spaces of ϕ.
Starting with a given channel ϕ : B in → B out , we define its twirl over a finite symmetry group G as For the case of Lie groups we replace the summation with an integral over the Haar measure From the invariance of the Haar measure dµ, it is clear that for all ϕ, = ϕ for all equivariant ϕ.Combining these observations one can see that T G is the projection onto the space of equivariant maps.This realization allows us to write any channel ϕ as where ϕ A is the "anti-symmetric" part of ϕ, i.e., the part satisfying T G [ϕ A ] = 0.As such, any measure of the form ∥ϕ A ∥ quantifies how symmetric ϕ is.On the practical side, twirling is easy for small groups, as one can efficiently evaluate Eq. ( 25) (see Fig. 5 for an example).However, for large finite groups or for Lie groups a direct computation of the twirling becomes cumbersome, requiring the use of more advanced techniques.In Appendix D we discuss different approaches to implement Eqs. ( 25) and (26).These range from analytical methods based on the Weignarten calculus [126,127] (which requires knowledge of the commutant of the representations) to experimental schemes such as in-circuit twirling, and approximate twirling approaches [128].In particular, we present two approaches for in-circuit twirling based on either the use of ancilla qubits, or classical randomness.Both of these are exemplified in Fig. 5.
For completeness, let us compare the twirling method to the nullspace approach.First, we note that one of the main advantages of twirling is that, unlike in the nullspace method, we are guaranteed that the twirl of a CPTP channel is also CPTP.However, while the nullspace method allows us to find all equivariant maps, twirling is performed one map at a time, meaning that finding a complete basis for the equivariant map vector space could be more intricate (although still possible, as we will show in Section VI).As such, if one wants a single equivariant channel, twirling is strongly recommended.

Choi operator method
Here we present a third method for finding equivariant channels.We recall from Eq. ( 15) that the Choi operator of a channel ϕ is given by J ϕ = i,j |i⟩ ⟨j| ⊗ ϕ(|i⟩ ⟨j|).Then, as indicated by Lemma 1, the channel will be equivariant if J ϕ ∈ comm(R in * ⊗ R out ) (where * denotes complex conjugate and R * the so-called dual representation of R), or alternatively if for all g ∈ G While we can vectorize Eq. ( 28) and obtain equivariant maps by solving for the nullspace of the matrices (R in (g) , this would not be significantly different from the nullspace method previously presented.
Instead, here we focus on a different technique based on the fact that, since R in * ⊗ R out = R is a valid representation of G, then it has some irrep decomposition R(g) ∼ = q R q (g) ⊗ 1 1 mq (see Theorem 3) and the Choi operator of any equivariant map takes the form Equation ( 29) allows us to build equivariant maps by controlling precisely how the associated Choi operator acts on each irrep component of the quantum states.In Fig. 6 we exemplify this method.Just like in the nullspace method, this approach produces general equivariant linear maps, and hence, additional constraints need to be imposed to find the subset of physical channels.For instance, we can impose TP by requiring that Tr in [J ϕ ] = 1 1 dim(H out ) , where Tr in indicates the partial trace over H in .Then, we know that ϕ will be completely positive (CP) if and only if J ϕ ⩾ 0. The last condition implies that J ϕ can be further expressed as [82] J where w k ∈ C mq×mq .Moreover, the TP conditions leads to q Tr in [1 1 dq ⊗ w † q w q ] = 1 1 dim(H out ) .Thus, given the irrep decomposition of R one can construct a basis of CP maps in the block-diagonal form of Eq. ( 30) and impose the tracepreserving condition afterwards (note however that, taking the partial trace is now more involved due to the subspace mismatch introduced by the isomorphism in Eq. ( 29)).Finally, we remark that one could even go a step further and consider conditions for ϕ to be an extremal equivariant CPTP channel. 2 Conditions for such, however, are much more involved [82]. 2 The set of equivariant CPTP channels forms a convex hull.Chan-Let us here remark that the main limitation of the Choi operator approach is that identifying the isomorphism in the irrep decomposition of R in * ⊗ R out can be in general challenging.For common compact Lie groups, these decompositions are conveniently implemented in a variety of software packages [129,130].That being said, this method is best suited for local channels since the size of the Choi operator scales as dim(H out ) dim(H in ).Thus, if dim(H out ) dim(H in ) is not prohibitively large, one can solve for the change of basis of the isotypic decomposition and identify maps with specific irrep actions.
Lastly, implementing the nullspace method requires representing channels as matrices.(This could be done as well for twirling.)To check that these maps are actually channels, i.e.CPTP, we may want to convert from a matrix to the Choi operator, for which CPTP are readily verified.We discuss how to perform this conversion in Appendix E 2.

C. Parametrizing the layers of an EQNN
In GQML we are not only interested in finding equivariant maps, but we also want to parametrize and optimize over them.In this section we show how one can parametrize the layers of an EQNN.For simplicity, we first consider the case of unitary channels, and then study the case of general maps.An overview of the methods proposed in this section can be found in Fig. 7.

Parametrizing equivariant unitaries
Let us here consider the case of a unitary EQNN layer with the same input and output representations.That is, While this case has been considered in [44,45,96] we will here review it for completeness.
The simplest way to parametrize a unitary is by expressing it as the exponential of some Hermitian operator usually known as a generator, i.e., U l (θ l ) = e −iθ l H l , where θ l is a trainable parameter.Evidently, we can obtain (G, R)equivariant unitaries by taking equivariant generators, i.e., H l ∈ comm(R).Note that we can find equivariant generators via the nullspace or twirling approaches previously detailed.While these methods were presented for superoperators, they can be straightforwardly adapted to the case nels on the boundary of this hull are said to be extremal and fully characterize the convex hull.However, this set of extremal channels may not be a finite set.Leveraging a classical optimizer we find updates for the parameters in the EQNN.In the case of channels, if might be necessary to project the updated map to the feasible CPTP region.Note that we can use classical compilers to transform a linear combination of channels into an sequence of gates that we can implement on a quantum device (Appendix E).The procedure is repeated until convergence is achieved.of operators.Alternatively, one could use the Choi operator approach and require the solution to be rank-1 (recall that the rank of the Choi operator is the Krauss rank of the associated channel, with unitaries being Kraus-rank-1 channels).

Parametrizing equivariant channels
Here we describe how to parametrize and optimize over equivariant channels.We assume that a basis of equivariant maps (or at least a subset of this basis) has been found via the nullspace or Choi operator method.As mentioned before, while it is relatively easy to find equivariant maps, these need not be physical channels, as they may not be TP, CP, or neither.However, one can still parametrize a set of non-CPTP equivariant maps and optimize over them by appropriately constraining the parameters such that the final map is CPTP.
For instance, when using the Choi operator method, we know that the CP condition is satisfied if J ϕ ⩾ 0. Hence, one could start with some basis of trace-preserving and trace-annihilating equivariant maps {J ϕj }, linearly combine them as J(x) = j x j J ϕj , and optimize the set of real parameters x = {x j } under the constraint that the eigenvalues of J(x) are non-negative.The latter will yield a region of feasible equivariant quantum channels (see Section VI for an example).Note, that during the optimization of x the update rule might take us outside of the equivariant region, in which case one needs to project back to the feasible space.We further discuss how such projection can be performed in Appendix E. Finally, we note that while it might not be directly obvious how to implement the ensuing channel, one can always transform it into an implementable sequence of gates, acting on a potentially larger space, via compilation techniques [131][132][133][134].Here we also remark that in many cases, particularly when the maps act on large-dimensional spaces, finding the eigenvalues of J(x) might may be quite difficult.For these scenarios, one can simply optimize over a subset of equivariant channels (i.e., maps that are already CPTP) which can be found via twirling.Here we are guaranteed that any convex combination of equivariant channels will lie in the feasible region since CPTP channels form a convex set [82].
An alternative approach to constructing equivariant channels is via the Stinespring dilation picture [115].In this case we use the fact that any channel can be written as a unitary operation on a larger space, i.e., where |e⟩ ∈ H E is a fixed reference state on an environment Hilbert space H E , and where Tr E denotes the trace over Here we can use any of the tools previously discussed to find and parametrize U .This approach has the advantage that by fixing the dimension of the environment, we can look for channels of small Kraus rank which are easier to implement in practice.

VI. APPLICATIONS
In this section we exemplify the applicability of the methods presented in the previous sections to design EQNNs.2)-equivariant QCNN, we alternate between 2-to-2 channels acting on neighbouring qubits and 2-to-1 equivariant pooling channels which reduce the feature space dimension.

A. SU(2)-equivariant QCNN
As a first practical application of the present framework we propose to generalize standard QCNNs [23] to groupequivariant QCNNs.We start by recalling that QCNN have been successfully implemented for error correction, quantum phase detection [23,135], image recognition [136], and entanglement detection [24].QCNNs exhibit several key features that make them promising architecture for the near-term, such as having a shallow depth or not exhibit barren plateaus [137].Despite these advantages, standard QCNNs need not respect the symmetries of a given task.In what follows, we will show how one can design equivariant layers for QCNNs, thus promoting them to groupequivariant QCNNs.
We consider problems where the symmetry group is SU(2).This symmetry appears in tasks where the data arises from certain spin chain models [138][139][140] and in tasks related to entanglement measures [42,141,142].Moreover, we consider the case where the data correspond to n-qubit quantum states, and where the input representation of G = SU(2) is R in (g) = R tens (g) = g ⊗n .For ease of implementation on quantum hardware we restrict ourselves to channels with locality constraints (see Lemma 2).That is, as illustrated in Fig. 8 we want to build an equivariant QCNN that composed of alternating layers of 2-to-2 standard equivariant channels acting on neighbouring qubits and 2-to-1 equivariant pooling channels (for completeness the 1-to-2 qubit equivariant embedding maps are presented in Appendix F).Of course, this choice of architecture trades locality at the cost of expressibility, as there may be more general equivariant channels on n qubits.However, the success of models with locality constraints [23,137] suggests this may be an interesting regime regardless.

2-to-2 layers via Choi operator
Let us commence by studying 2-to-2-qubit maps via the Choi operator method.Since the input and output representations are R(g) = g ⊗2 , the Choi operator must commute with the representation (g * ) ⊗2 ⊗ g ⊗2 (see Eq. ( 28)).As SU(2) * shares the same irrep structure as SU(2), we can find that the Choi operator for any completely positive SU(2)-equivariant map takes the form where A is a non-negative scalar and B and C are complex positive semidefinite matrices of dimensions 3 and 2, respectively.Thus, the space of such CP equivariant maps is 12 + 2 2 + 3 2 = 14 dimensional.
In the special case of 2-to-2 equivariant unitary layers, where N l θ l : (C 2 ) ⊗2 → (C 2 ) ⊗2 and N l θ l (ρ) = U l (θ l )ρU l (θ l ) † , we know that if U l (θ l ) = e −iθ l H l , it suffices to use equivariant generators, that is, such that [H l , g ⊗2 ] = 0 for all g ∈ SU(2).Here we can use the Schur-Weyl duality [44,143], which states that the only possible equivariant operators are 1 1 and SWAP, which correspond to the two elements of the qubit-permutational representation of S 2 .Without loss of generality, we can choose H l = SWAP so that U l (θ l ) = e −iθ l SWAP .Following Lemma 2, we know that if we compose these two-qubit equivariant unitaries as in Fig. 8, the result will be an n-qubit equivariant unitary.Using the nullspace method we can find a basis for all 2-to-1 (SU(2), g ⊗2 , g)-equivariant pooling maps.These can then be linearly combined to form a general parametrized equivariant map as in Eq. ( 35), and we find in (37) the region in parameters space leading to CPTP channels.Here we depict said region as the volume of the hyperbole (red) below the plane (green).
{X, Y, Z}.Thus, one needs to simultaneous solve for the nullspace of the following matrices Solving, we find five superoperators that form a basis for 2-to-1 qubit (SU(2), g ⊗2 , g)-equivariant maps.These are Notably, in addition to expected SU(2)-equivariant maps such as trace, partial traces, and SWAP-measurement, we identify a potentially interesting new equivariant map ϕ 5 (ρ) that is dubbed the cross-product map (we further study its properties in Appendix F).Note that ϕ 1 , ϕ 3 , and ϕ 4 are trace preserving while ϕ 5 is traceannihilating.One can also verify that ϕ 2 may non-trivially alter trace.As the only map that may do so, we can drop it from our basis set for being non-physical.To continue and find the set of equivariant quantum channels, we first make a modification to our basis set.In the Pauli basis we have ϕ 1 ↔ 2 |1 1⟩⟩⟨⟨1 1, 1 1|, and it is easy to see that both ϕ 3 and ϕ 4 also contain this term.Thus, we can remove it, leaving trace-annihilating versions of partial trace, which we will denote by ϕ ′ 3 and ϕ ′ 4 .Thus, any TP map must take the form where the coefficients are real numbers.It remains to find the region such that this channel is CP.This can be done via the Choi operators of these channels.That is, we would like to find {x, y, z ∈ R : Note that the coefficients here must be real numbers for the Choi operator of the sum to be positive (as the Choi operators in the sum are linearly independent).Requiring the eigenvalues of this linear combination to be non-negative yields the feasible region x, y, z : y + z ⩽ 1 , and This region is illustrated in Fig. 9.Here we note that as the set of equivariant channels is convex, this feasible parameter region is a convex subset of R 3 .A crucial aspect to note is that when training the SU(2)equivariant QCNN, one can directly train over the coefficients x,y, and z of each pooling channel ϕ(x, y, z) of the form in Eq. (35).When training an equivariant QCNN, e.g. using gradient descent, we will obtain parameter updates (x (t+1) , y (t+1) , z (t+1) ) ← (x (t) , y (t) , z (t) ) − αD t ((x (t) , y (t) , z (t) )).To ensure that the operations remain physical, one would continually solve the projection at each iteration.This can be turned into the convex optimization problem min x,y,z ∥(x (t+1) , y (t+1) , z (t+1) ) − (x, y, z)∥ 2 , subject to Eq. ( 37) (38) over a convex domain (see Appendix E).

B. Various examples and physical considerations
In this section we present additional applications of the methods detailed in Sec.V.In particular these are now applied to discrete groups, and motivated from practical problems.

Z2 × Z2-equivariant layers
Here we consider problems with Z 2 ×Z 2 symmetry which are common in spin chain models such as the S = 1 Haldane chain [23,138,144], or in classical data on the twodimensional plane [45].
We start with a task on n-qubits, where the representation of Z 2 × Z 2 is given by Here, the sets O and E, respectively contain the odd and even qubit labels.From the previous, we are interested in finding all equivariant standard unitary channels where R in = R out .Since the group is small, we can readily employ the twirling method to determine the set of all equivariant generators.The set of all such generators forms a subalgebra of the unitary algebra u(2 n ), that we denote u Z2×Z2 (2 n ), and which is given by Next, let us find pooling channels that are equivariant with respect to Z 2 × Z 2 .Similarly to the SU(2) case previously considered, the representations of the group act locally, meaning that we can again consider local 2-to-1 equivariant pooling maps and later combine them into a global equivariant channel (see Lemma 2).On any pair of neighbouring qubits, the input representations are {1 1 ⊗ 1 1, 1 1 ⊗ X, X ⊗ 1 1, X ⊗ X} and we set the output representation acting on a single qubit to be {1 1, 1 1, X, X}.Note that the output representation is not faithful.As |Z 2 ×Z 2 | = 4 we can again readily apply the twirling procedure.To do so, we begin with a basis for trace-preserving maps.That is, matrices in the Pauli-string basis of the form 2 |1 1⟩⟩⟨⟨1 1 ⊗ 1 1| + |P out ⟩⟩⟨⟨P in | where P out ̸ = 1 1.A simple counting argument reveals that there are 48 such matrices.By twirling all 48 maps, and extracting a linearly independent set, we find the following 13-element basis for 2-to-1 equivariant projective poolings:

Zn-equivariant layers
We proceed now to analyze a problem with Z nsymmetry, whose representation cyclically shifts qubits as R(g t ) n j=1 |ψ j ⟩ = n j=1 |ψ j+t mod n ⟩.Such symmetry arises naturally in condensed matter problems with periodic boundary conditions [25,145,146].
Let us start by finding all equivariant standard unitary maps where R in = R out .Since the group is small, we opt for the twirling approach.For the sake of implementability, we will seek channels composed of one-and two-qubit gates.We can readily see that the twirl of a single qubit generator such as X 1 , leads to the sum of single-qubit operators T [X 1 ] = 1 n n j=1 X j .Similarly, twirling a two-body generator will lead to a sum of two-body generators (e.g., Notably, these generators lead to equivariant unitaries of the form U l = e −iθ l 1 n n j=1 Xj = n j=1 e −iθ l Xj /n .The previous shows a crucial implication of outer symmetries (such as Z n ): in many cases, equivariance in unitary layers can be achieved by correlating parameters of local gates within a layer [96,147].
Note that a necessary condition for the previous to hold is that all the terms in the twirled operator must be mutually commuting.One can see however, that the twirl of Z 1 Y 2 leads to j Z j Y j+1 , which is a global operator whose terms are non-commuting and which can be challenging to implement on near-term devices.An alternative here is to employ a randomized method.First, we construct unitaries U l,O = j∈O e −iθ l Z j Y j+1 and U l,E = j∈E e −iθ l Z j Y j+1 (where we recall that O and E respectively contain the odd and even qubit labels), which are Z n/2 -equivariant.Then, to achieve Z n -equivariance, we can apply either U l,O or U l,E at random and with equal probability, effectively performing the quantum channel ρ → (U l,O ρU † l,O + U l,E ρU † l,E )/2.This channel can be readily shown to be Z n -equivariant.
This trick of randomly applying local channels can also be applied to equivariant layers that have different numbers of qubits in the input and output.For example, a Z nequivariant projection layer that reduces the number of qubits from n to n/2 is Φ Zn : ρ → (Tr odd ρ + Tr even ρ)/2.Observe that the output representation has a non-trivial kernel isomorphic to Z 2 , as the number of qubits is halved.One can readily extend these ideas to projecting pooling layer over other outer symmetry groups G, such as S nequivariance: Φ Sn : ρ → ( S Tr S ρ) / n n/2 , where S are uniformly random subsets of n/2 qubits.The Hoeffding's bound implies that only O(log |G|) samples are needed for this method to converge to within a specified error bound.In a sense, this is similar to the dropout regularization technique in neural networks [148].

VII. NUMERICAL EXPERIMENTS
In this section, we numerically compare the performance of SU(2)-equivariant QCNN constructed in Sec.VI A against a problem-agnostic QCNN in a quantum phase classification task.Similar to the classical problem of assigning the correct labels to images, the task of classifying quantum phases of matter can be carried out in a supervised setting and provides a natural playground to study the efficiency of equivariant quantum learning model.

A. Bond-Alternating Heisenberg Model
The 1-D XXX Heisenberg model describes the behavior of a one-dimensional chain of spin-1/2 particles coupled through the standard Heisenberg interaction Hamiltonian between nearest neighbors: where S i is the spin operators for the i-th spin, with S = (S x , S y , S z ) = 1 2 (X, Y, Z) and J x = J y = J z = J.The bond-alternating XXX Heisenberg model is a generalization of the regular XXX Heisenberg model, in which the exchange coupling constant alternates between two different values J 1 and J 2 : We consider the model described by Eq. ( 47) with open boundary conditions and with both the couplings in the ferromagnetic regime, i.e.J 1,2 > 0. In this case, a trivial phase and a topologically protected phase are defined by 1-D Bond-Alternating XXX Heisenberg Model (Eq.( 47)) and its phase diagram in terms of exchange coupling constants J1,2 > 0.
the expectation value of the partial reflection many-body topological invariant [139,140,149].The quantum phase transition between these two phases occurs at a critical value of the bond alternation parameter α = J 2 /J 1 .When α < 1, the system is in the trivial phase, otherwise, the system is in the topologically protected phase.
The Hamiltonian can be readily seen to possess an SU(2) symmetry.The symmetry extends to the whole system through the tensor product representation R in (g) = R tens (g) = g ⊗n .This is also a symmetry of the phase labels, as quantum phases are global properties of the ground states of the model, and symmetries of the Hamiltonian are also symmetries of the ground space.Indeed if [H, g ⊗n ] = 0 and |ψ⟩ is a ground state, then g ⊗n |ψ⟩ is also a ground state.Thus, a quantum phase classifier can be endowed with this inductive bias using the SU(2)-equivariant quantum maps from Sec. VI A. Another inductive bias that we can utilize is translation symmetry: shifting the qubits by two sites leaves the model unchanged.One way to exploit this is parameter sharing within each layer of the SU(2)equivariant QCNN as discussed in Sec.VI B.

B. Defining the learning models
We now describe the learning model in more details.We use the SU(2)-EQCNN architecture in Fig. 8, where each standard (convolution) layer consists of two brickwork (sub)layers of two-qubit SU(2)-equivariant gates of the form U (θ) = e −iθSWAP .Due to translation symmetry, we further enable parameter sharing of two-qubit gates within each of such sublayer.This leads to having two parameters for each standard convolution layer.If needed, we can repeat the standard layers more than once before applying the pooling layer.
Finally, we choose an equivariant observable operator Ô to measure at the end of the EQCNN.As discussed in the previous sections, the final equivariant measurements belong to the commutant of the output representation R out of the last layer of the EQCNN.Now, since we need two out-puts to label the two phases y trivial = 1 and y topological = 0, it is tempting to end the EQCNN with a binary measurement on m = 1 qubit.However, from the discussion in Sec.VI A we know that the commutant of the defining representation of SU(2) over a single qubit is the trivial set comm(R natural ) = {1 1}.Thus we choose a EQCNN that ends with m = 2 such that the commutant of g ⊗2 contains the nontrivial element SWAP.Conveniently, SWAP has two eigenvalues ±1 so that we can bind, say, the +1 outcome to y trivial and the -1 one to y topological .We adopt this strategy, with a little modification to have the output of the EQCNN, that we will indicate as f θ (ρ), take values in [0, 1].Namely, we define Where ϕ θ denotes the EQCNN that outputs two qubits in the last layer.Then, we assign the predicted phase label to any input state ρ as for some trainable threshold value τ that is initialized to be τ = 0.5.
We test this EQCNN architecture against a QCNN with no inductive biases.In particular, the QCNN whose standard layers are parametrized circuits inspired by the standard hardware efficient ansatz (HEA) [150], whereas the pooling layers consist of simple alternate partial traces, i.e., at each pooling operation we discard half of the qubits.The classification will then proceed as for the SU(2)-EQCNN, with a SWAP measurement and phase assignment described in Eq. (49).We dub the non-equivariant QCNN as HEA-QCNN.

C. Training procedure
We use the standard ML pipeline of supervised learning.
1. We collect a training dataset D N T train , where N T is the size of the dataset, by choosing some representative values of the parameter J 2 while always keeping J 1 = 1 and then analytically computing the ground states |ψ⟩ J2/J1 g of the Hamiltonian in Eq, (47).Knowing the phase diagram of the alternating model, which is shown in Fig. 10, especially that the critical value at which the transition happens α = J 2 /J 1 = 1, we can then associate these states with their true labels y ∈ {0, 1}.Particularly, we try training dataset made of N T = (2,4,6,8,10,12) ground states, always distributed homogeneously in the range , 0)}.
2. We initialize the learning model at hand, equivariant or not, with random parameters θ.
3. We select an optimizer for the learning model.In our case we always use ADAM [151], the golden standard of gradient-based optimization in ML.
4. For a number of epochs E, we divide the training dataset D N T train in batches of size n batch = 2.For each batch, the training states |ψ i ⟩ are processed by the model to output the predicted label y θ (|ψ⟩) and the mean squared error loss function is computed by comparing the predictions to the real labels y i We then compute the gradient of L θ and use the optimizer to update the model's parameters.The goal is to minimize L θ .
5. The QCNN outputs for the training states are used to update the threshold τ .Particularly, only the two training points that are closer to the critical value α = 1 are considered, and the threshold value is set to the average of the corresponding outputs.
6.As an additional figure of merit for the training we keep track of the prediction accuracy of the model.

7.
At the end of the last epoch, we let the model predict the labels of the whole test dataset, and we compute its final accuracy as a measure of the goodness of the training.Then, we also plot the predicted phase diagram to get a visual proof of the performance of the model.

D. Training results
We are now ready to illustrate the results of our numerics.First thing first, we must state that we have not been able to train the EQCNN when using the general 2-to-1 pooling layers described in Sec.VI A, as the projection step (Eq.( 38)) onto the feasible CPTP region seems to cause instability in the optimizing procedure.We leave a full numerical analysis of equivariant quantum learning models to a future upcoming work, and here focus on the more simple tracing pooling operations.That is, the EQCNN architecture is still the one depicted in Fig. 8, but the pooling operations are just 2-to-1 partial trace channels, corresponding to the solution ϕ 3 in Eq. (34).In other words, the SU(2)-EQCNN and HEA-QCNN use the same pooling layers (but still different convolution layers -equivariant versus HEA).The training results are illustrated in Figs.11, 12.
We considered system sizes ranging from N = 6 to N = 13, and trained both the EQCNN and the HEA-QCNN according to the training loop described in Section VII C for a fixed number of training epochs E = 750.Since the two architectures are very different, and the EQCNN, as opposed to the HEA-QCNN, uses parameter sharing, in order to have a fair comparison we decided to stack multiple standard layers before each pooling one in the EQCNN, in such a way as to have a similar amount of training parameters for both the learning models.In Fig. 11 we show the predicted phase diagrams for N = 12 and N = 13.The thing that immediately stands out is the (d) panel of that figure.While the other three plots basically showcase similar behavior, with the QCNN at hand being able to efficiently separate the two phases of the alternating model, panel (d) shows a cloudy behavior of the HEA-QCNN predictions, as it assigned different phases even for states with similar parameters α.This is in sharp contrast with the trained EQCNN (panel (c)) that successfully learned to classify the two phases with excellent accuracy, demonstrating the advantage of equivariant models.
Interestingly, for the N = 12 qubit case (panels (a) and (b) in Fig. 11), the EQCNN does not significantly outperform the HEA-QCNN.This is due to fact that there is actually no need for equivariance in that case!Indeed, equivariance is meant to enhance the performance of learning models that deal with labels invariant under some symmetry group, but this invariance should not come from the invariance of the input states themselves.Think yet again of the classical problem of classifying images of cats and dogs, the labels, i.e. the semantic meaning of the images, are invariant if we translate the images, but the images themselves are not translation-invariant.On the other hand, if we translate images that are full of either black or white pixels, instead of showing cats and dogs, the labels (the colors) of the images are translation-invariant simply because the images do not change.This is what is happening in our case.We have already discussed that if a Hamiltonian H is symmetric under a group G, any unitary representation of it U G leaves the ground space unchanged.For non-degenerate ground space, this means the unique ground state is invariant under the group action, in analogy to the above black/white image example, and thus equivariant learing models are not needed.For degenerate ground states, i.e. when the Hamiltonian symmetry is broken, the symmetry group action does change the ground states nontrivially and rotate them within the ground space.Degenerate ground states are akin to images of cats and dogs, and as such equivariance can finally shine.Indeed, the alternating model is degenerate for odd system sizes, while for even system sizes the ground state is unique.This explains the different behaviors shown in Fig. 11 between N = 12 and N = 13.
The previous discussion also motivates the analysis shown in Fig. 12. There, we show a statistical study of the performances of EQCNN and HEA-QCNN when tackling even and odd system sizes.As is evident from the left panel, enforcing equivariance when it is not needed can be more detrimental than beneficial.Indeed, the reduced expressibility of the learning model is not compensated by any benefit and training instabilities emerge, as evidenced by the large error bars in the left panel.However, when equivariance has a reason to be used, as it is for the odd size states studied in the right panel, the advantage of using the EQCNN against a non-informed one is clear.Already with only two training points the equivariant QCNN performs greatly, while the HEA-QCNN needs more training data to generalize well.Interestingly, even with a minimum number of trainable parameters, that for the system sizes studied ranges from 4 to 6, the EQCNN seems to perform better than the non-equivariant one.
As stated in the beginning, this is only a preliminary analysis on a simple learning task, and as such we postpone any general conclusion until further studies on the performance of equivariant quantum learning models on more complex systems, against different non-equivariant architectures, and for different symmetry groups.Nonetheless, we think that the preliminary numerical results shown in this section hint at confirming that injecting inductive biases into quantum neural networks boosts their performance, paving the way to the design of new, more efficiently implementable and trainable variational quantum machine learning models.

VIII. DISCUSSIONS AND OUTLOOK
Geometric quantum machine learning is a new and exciting field which seeks to produce helpful inductive biases for quantum machine learning models based on the symmetries of the problem at hand.While there already exist several proposals in the literature within the field of GQML [44][45][46][47][48]52], these mainly deal with unitary models which main-  tain the same group representation throughout the computation.In this work, we generalize previous results and we present a theoretical framework to understand, design, and optimize over general equivariant channels, which we refer to as EQNNs.While presented in the setting of supervised learning, our work is readily applicable to other contexts such as unsupervised learning [104,105], generative modeling [106][107][108][109] or reinforcement learning [110,111].
Our first main contribution is a characterization of the action of equivariant layers as generalized Fourier actions.We argue that the isotypic decomposition of the symmetry's representation determines a generalized Fourier space over which the EQNNs act.This realization not only allows us to characterize the number of free parameters in an EQNN, but it also unravels the crucial importance that the choice of representation has.That is, different representations have different block-diagonal structures, and hence, can act on different generalized Fourier spaces and see different parts of the information encoded in the quantum states.Then, we provide a general classification of EQNN layers, introducing the so-called standard, embedding, pooling, projection, and lifting layers, and we note that non-linearities can be introduced via multiple copies of the data.As a by-product, we highlight the exciting possibility of accessing higher-dimensional irreps of the group symmetry via these non-linearities, which can be a venue to access information that would otherwise be unavailable.
Our next main contribution is the description of three  (6,8,10,12) for the even case and (7,9,11,13) for the odd one, and on 10 different, randomly initialized training runs for each problem size.The results are plotted against the number of training datapoints NT .The blue circles refer to the EQCNN with the same number of parameters as the HEA-QCNN (orange stars).The green circles describe the EQCNN with the minimum number of parameters possible, i.e., the architecture, as it is, in Fig. 8 with only two standard layers before each pooling layer.These plots demonstrate that equivariance provides significant improvements when, and only when, there is degeneracy in the ground space.methods to construct EQNN layers.In the first, which we call the nullspace method, we map the equivariance constraints to a linear system of matrix equations and then solve for their joint nullspace.The second method leverages the technique of twirling over a group, whereby a channel is projected onto the space of equivariant maps.In our third method, we use the Choi operator of the map to create equivariant layers with specific irrep actions.Our methods can find unitary or non-unitary equivariant layers efficiently even when the symmetry group is exponentially large, rendering applications for groups that are inaccessible using existing methods in prior literature.We then compare the strengths and shortcomings of each method, presenting scenarios where one should be favored over the other.Our final key contribution is showing how to parametrize and optimize EQNNs.In particular, since our work seeks to find equivariant channels, we discuss how one can guarantee that the ensuing maps are phys-ical and potentially easy to implement.To finish, we exemplify our methods by generalizing standard QCNNs to group-equivariant QCNNs, and show how to create, from the ground-up, an SU(2)-equivariant QCNN.Finally, we apply this model to a quantum phase classification task on the 1D bond-alternating Heisenberg model and numerically demonstrate its superior performance over a symmetryagnostic QCNN.

A. Equivariance versus barren plateaus, local minima and data requirements
Here we argue why EQNNs can alleviate some of the crucial issues in QML, such as barren plateaus, excessive local minima, and poor data requirements.
First, we recall that the barren plateau phenomenon refers to the exponential concentration of gradients exhib-ited by certain variational quantum models that result in an exponential flattening of the training landscape, and concomitantly in an exponential demand of measurement shots to accurately resolve a parameter update [27][28][29][30][31][32][33][34][35][36][37][38][39][40][41].The presence or absence of barren plateaus has been directly linked to the expressibility of the model [31,35,36,39], such that highly expressible architectures exhibit smaller gradients.In our context, the imposition of symmetry constraints to the quantum neural network is expected to shrink -in a problem oriented way -its expressibility, alleviating such gradient vanishing issues.
Another challenge in training QML models is spurious local minima in the loss landscape.It is known that agnostic models exhibit landscapes that are plagued by local minima [152][153][154][155].However, it is also known that there exists a critical number of trainable parameters above which the model can become overparametrized, meaning that all spurious local traps disappear [156].While reaching the overparametrization regime requires exponentially deep circuits for agnostic ansatzes, it has been proven that certain architectures (with reduced expressibilities) can be efficiently overparametrized with polynomial depth.Thus, the hope is that by restricting the expressibility of the model via geometric priors, one can reduce and realistically reach the overparametrization threshold, thus getting rid of fake local minima.
Finally, we discuss sample complexity.The ultimate goal of supervised machine learning is to make predictions on unseen data.This is often characterized by generalization bounds, which measure the difference between the performance of the learned model on training and testing data.Recent work has studied the training sample complexity needed for QML models to generalize [25,[157][158][159].In particular, Ref. [25] showed that the training sample complexity typically scales polynomially with the number of trainable parameters.Given a trainable QNN, we have seen that imposing equivariance can drastically reduce the numbers of free parameters, thus implying that incorporating equivariance allows for stronger (more optimistic) bounds on generalization performance.Further, the bounds of [25] are statistical and worst-case (over all possible learning tasks), meaning that EQNNs when applied to the corresponding symmetric learning tasks could potentially achieve better generalizations than indicated by these bounds.
While the previous arguments merely indicate why equivariance can improve the performance of a model (in terms of trainability and generalization), these do not constitute a proof that equivariance can indeed fulfill these promises.However, we refer the reader to the recent work of [95] which studies S n -equivariant models, and rigorously proves that the equivariance constraints lead to an architecture that avoid barren plateaus, can be efficiently over-parametrized, and generalizes well with only polynomially many training points.Thus, the results in [95] showcase the extreme power of EQNNs and GQML.

B. Implications of our work and future directions
Many concepts and results in our work can be thought of as quantum analogues of existing classical techniques that have enjoyed tremendous successes [14].We envision that GQML will soon be a thriving field as it provides blueprints to create arbitrary architectures and inductive biases suitable for a given problem.As such, the first direct application of our work is building appropriate schemes to embed classical data into quantum states.Currently, most proposals dealing with classical data use problemagnostic embedding architectures which completely obviate and destroy the symmetries in the input data [38,112,160].As such, it is crucial to create embedding schemes that will preserve said symmetries and promote them from the classical to the quantum realm.
As near-term quantum hardware's main challenge is noise, a most important future research direction is to study the interaction of noise and equivariance.Here, there are two possible paths.On one side, one can accept that noise will break equivariance and study the effects of such approximate equivariance.Interestingly, it has been observed in the machine learning literature that mildly breaking equivariance can improve the performance over strictly equivariant models in certain tasks [161].On the other side, one can attempt to equivariantize the noise.Being framed in the general superoperator formalism, the present work contains all the necessary tools to study and develop symmetrization strategies to project noise into the symmetric subspace of a given group representation.Finally, near-term computation will also be limited in resources (e.g., circuit depth, hardware connectivity, etc) which could prohibit exact equivariance enforcement.Can we derive "cheaper" EQNNs at the cost of only approximate equivariance enforcing?How well do EQNNs perform as a function of the symmetry breaking?
Note added.After the publication of our work as an arXiv preprint, there have been a number of follow-up works exploring the methods and open questions discussed in our work to develop EQNNs and GQML applications in various contexts.For examples, [162][163][164][165] use the twirling method to construct QNN equivariant to finite groups, with applications ranging from calculating molecular force fields, quantum phase detection, image classification.These works all provided numerical results demonstrating improved performance using EQNN.Ref. [166] propose a method based on spin networks that is shown to be equivalent to the Choi method and apply it to several lattice Hamiltonian models.Ref. [167] study the role of choosing representations in EQNN performance.Ref. [168] studies the beheaviour of EQNNs in the prescence of noise and derives strategies to protect equivariance.
. Reproduction of the "exact" QCNN architectures used in [23] for the classification of the ground states of the Haldane model.C stands for convolution, P for pooling, and FC for fully connected.The blue three-qubit gates are Toffoli gates with control qubits taken in the X basis.The orange two-qubit gates apply a pauli Z onto the target qubit when an X measurement of the control qubit results in a −1 outcome, but leaves the target qubit unchanged otherwise.The black two-qubit gates are controlled-Z gates.The green interleaved lines corresponds to a SWAP of 2 qubits.The C-P structure is repeated L times until the system is left with only three qubits.
For this ground-state classification problem, two QCNNs were studied in Ref. [23]: one trainable (see Supplementary Figure 2 in Ref. [23]), and one "exact" (Figure 2b of Ref. [23] and reproduced in Fig. 13) that is obtained based on the MERA representation of the ground states of H hal .Remarkably, close inspection of the exact QCNN reveals that it is composed of equivariant layers and measurement.That is, the exact model for this task follows the framework of EQNNs laid down in this work.However, one can see that the choice of trainable model adopted in Ref. [23] does not comply with equivariance requirements, such that we expect that it could be further improved by imposing equivariance.In the following, we briefly discuss how one can identify equivariance of the different layers of the exact QCNN.
First we consider the action of a convolution layer (denoted C in Fig. 13 and excluding the SWAP operations) onto R(r).Using the following identity X = X Z and noticing that, due to the connectivity of the controlled-Z gates the resulting Zs unitaries acting on any of the qubits can only be created by pairs (yielding an identity), one can verify that R(r) commutes with the action of all the controlled-Z gates.Additionally, given that both the controls and target of the Toffoli gates act on the eigenbasis of X operators, one can see that R(r) commutes with the action of the whole convolution layer.Similar reasoning shows that such commutativity properties also holds true for R(s), and thus for R(rs).Overall, we find the convolution layers to be (G, R, R)-equivariant.Second, we consider the action of the pooling layer (denoted P in Fig. 13 and including the SWAP operations).One can verify that ϕ pool • Ad R(g) = Ad R ′ (g) • ϕ pool , where we have denoted as ϕ pool the map realized by the pooling from n to n/3 qubits, and by R ′ (g) the representation of g ∈ G on this reduced space.In particular, R ′ is defined in equivalence to R but over a reduced number n/3 of qubits.Overall, we find the pooling layers to be (G, R, R ′ )-equivariant.
Finally, note that the measurement realized by the fully connected layer (FC in Fig. 13) corresponds to a measurement of the Pauli observable O = ZXZ on the 3 remaining qubits.Notably O ∈ comm(R out ), where R out is the representation of G on the remaining qubits and is defined as R out (r) = XIX and R out (s) = IXI.That is, we find the measurement to Notice that the sum over E is equal to the Haar integral over U(2 n ) as the Clifford group forms a 3-design [169], i.e., O = U ∈U(2 n ) dµ(U )U †⊗2 OU ⊗2 , where O := z |zz⟩ ⟨zz|.Thus, O commutes with the representation R 2 (U ) := U ⊗2 of U(2 n ).We can therefore interpret Eq. (A4) as a composition of three equivariant layers in Definition 9: The corresponding representations transform as Next, we consider E to be the Pauli ensemble.The expected classical shadow is where in the last equality we used the fact that the 1-qubit Pauli group forms a 3-design of U(2).By explicitly evaluating the integral, we find that M j = SW AP , thus M = SW AP ⊗n , which commutes with the representation R 2 .A similar composition of equivariant layers to Eq. (A5) thus follows.
where we have used the fact that P j commutes with all R(g).Then, taking the trace on both sides leads to Repeating Eq. (D3) for all P j 's in basis(comm(R)) leads to dim(comm(R)) equations.Thus, one can find the vector of unknown coefficients c(X) = (c 1 (X), . . ., c dim(comm(R)) (X)) by solving A • c(X) = b(X), where b(X) = (Tr[XP 1 ], . . ., Tr XP dim(comm(R)) ).Here, A is a the so-called Gram matrix, a dim(comm(R)) × dim(comm(R)) symmetric matrix with entries [A] ij = Tr[P i P j ].One can then solve the linear system problem by inverting the Gram matrix, as c(X) = A −1 • b(X).The matrix A −1 is known as the Weingarten matrix.

In-circuit twirling with ancillas or classical randomness
While we usually find the set of equivariant channels analytically, in the cases of small finite groups, we note one could perform the twirling directly on the quantum circuit using the following unitaries along with a = log 2 |G| ancilla qubits initialized to the uniform superposition state H ⊗a |0⟩.That is, the in-circuit twirling of a n-to-m-qubit channel ϕ can be realized via the following circuit.It can be readily verified that this circuit performs the twirling formula in Eq. (25).With this, ϕ can be any parametrized channel native to the circuit platform.Alternatively, the ancilla qubits can be replaced by classical randomness.That is, we classically sample a group element g then apply R in † (g) and R out (g) as follows.
The latter method can be favorable on near-term devices.Furthermore, the Hoeffding's bound implies that only O(log |G|) classical samples are needed to achieve a good approximate of the twirled channel.One disadvantage of in-circuit twirling, however, is that albeit ensuring equivariance we lose the parameter count reduction as in the case when we first compute equivariant channels analytically before parametrization.

Recursive approximate twirling
For Lie groups with more intricate representation theory, computing the twirling formula can quickly become complex and difficult.Instead, [128] provided an algorithm for approximating twirling operators that converges exponentially fast in the number of Haar-random samples of group elements.While the authors did not mention this, their proofs do not rest upon any assumptions beyond that the representations they consider are unitary representations of compact Lie groups and the input of the twirling formula is self-adjoint.Their algorithm can be applied to our case when one can efficiently sample from the Haar measure and is summarized in Algorithm 1.This approximate twirling algorithm can also be implemented in-circuit using classical randomness, similarly to what we saw earlier in the case of finite groups.
▷ HS: Hilbert-Schmidt norm Appendix E: Implementing and optimizing equivariant channels

Channel compiling
In the process of creating equivariant QNNs we consider not just equivariant unitaries, but also more general quantum channels.Via the Stinespring dilation theorem any channel ϕ : B(H A ) → B(H B ) can be represented as a unitary operation on a larger space: where |e⟩ is a fixed reference state on the environment E. The size of this environment system is directly related to the Kraus rank of ϕ.Recall that a quantum channel can be written as where we say {K i } are the channel's Kraus operators.Note that the spectral decomposition of the Choi operator yields one possible set of Kraus operators.At most dim(H A ) dim(H B ) Kraus operators are necessary to represent any channel [115].One can then define a unitary with action where U ′ is some arbitrary operator that completes U to be unitary.Thus, to represent a channel with k Kraus operators, an environment of dimension at least k suffices.Working with qubit systems, if the input space is n qubits and the output m, then the maximum Kraus rank is 2 m+n , which may require an environment of up to m + n qubits.We will not go into great detail on how to perform this circuit compilation, but rather refer the interested reader to the considerable body of work on compilation.For example, a software package for this decomposition with nearly optimal CNOT-count can be found in [172] with corresponding theory in [173,174].For more general works on circuit compiling we direct the reader to [25, 97, 132-134, 175, 176].

Converting vectorized channels to Choi operators
In solving the nullspace for equivariant maps, we work with vectorized channels.That is ϕ → ϕ = i,j ϕ i,j |P i ⟩⟩⟨⟨P j | , where P j and P i are Pauli strings.In some references this is refered to as a transfer matrix.The canonical construction of transfer matrices is as follows.Consider the Kraus operator form of a channel, ϕ(ρ) = i K i ρK † i .The vectorization map vec(X) takes a matrix X to a vector through the mapping vec(|i⟩ ⟨j|) = |i⟩ ⊗ |j⟩ , (E4) with linear extension.Then, using the identity vec(AXB) = (B T ⊗ A)vec(X), one can write a matrix representation of ϕ as This is the transfer matrix.Further, the transfer matrix can be directly mapped to the Choi operator via where Γ is an involution map such that ⟨i, j| ϕ |k, l⟩ = ⟨l, j| ϕ Γ |k, i⟩ . (E7) For us to apply this identity, we need convert from the transfer matrix in terms of Pauli strings to that of the computational basis.As any |i⟩ ⟨j| can be written as a sum over Pauli strings, there is some change of basis U from Pauli strings to the computational operator basis.Explicitly, the columns of U will be vectors corresponding to the expansion of Pauli strings in the computational basis.If we obtain a matrix in the Pauli basis, X, we can then write it in the computational basis via U AV −1 , where U is the change of basis on the output space and V is that on the input space.Further, if we take Pauli strings to be normalized such that Tr[P i P j ] = δ i,j , then U and V can be assumed to be unitary matrices.In this case, J ϕ = (U ϕV † ) Γ . (E8)

Optimizing equivariant channels
We now provide a strategy for optimizing n-to-m qubit equivariant channels.Assume we have found a basis {ϕ i } for such channels via methods outlined in this work, equivariant channels in the span of this basis can then be written as The coefficients must be constrained so that ϕ x is CPTP.For convenience we have fixed ϕ(ρ) = Tr[ρ] 2 m 1 1 as one of the basis elements.Without loss of generality we can then take all ϕ i to be trace annihilating, i.e., Tr[ϕ i (ρ)] = 0. Note that depending on methods used to find these maps, we can bake in CP or TP or even both.Given a valid set of parameters for this pooling layer, we can classically solve the following circuit compiling problem as described before.The top 2n qubits are discarded.U (θ) is a general (2n + m)-qubit unitary.Note that m + n is an upperbound on the number of ancilla qubits needed, but depending on the ranks of the basis channels we could potentially need fewer qubits.After solving this classical circuit compilation problem, one can then implement it on a real quantum circuit.Now that we have a way to implement and parametrize equivariant channels, we can train them via two approaches: corresponding to 1 1 ⊗ 1 1 → 1 1.For example, the Tr A channel becomes With these modified channels, we know that the set of equivariant channel can be characterized as {x, y, z ∈ R 3 : Requiring the eigenvalues of this linear combination to be non-negative yields the feasible region {x, y, z : y + z ⩽ 1 and y + z ⩾ 3x 2 + 4(y 2 − yz + z 2 ) − 1}. (F4)

Action of SU(2)-equivariant maps
Here we further analyze the action of the 2-to-1 qubit maps.This analysis will show that different channels can "see" different parts of the input state, and hence that they are complimentary.First, define the Bell basis states as These equations show how different channels combine different pieces of the information of ρ.

Cross-product channel
We now take a closer look at ϕ 5 , which we call the cross-product channel CP : H ⊗2 → H, where H is the single-qubit Hilbert space.The action of the CP (not to be confused with complete positivity) channel is as follows CP(ρ) = where σ µ ∈ {X, Y, Z} for µ = i, j, k.
First, note that in the form above the CP channel is not truly a channel as it is not trace preserving nor completely positive.Rather, Tr[CP(ρ)] = 0. To be a channel, we must consider superoperators of the form ϕ + αCP, where ϕ is some trace preserving map.For simplicity we consider ϕ(ρ) = Tr Note that the action of the CP channel is to check if ρ is in a superposition of states with different symmetries.As such, at the output of the map the coefficients associated with the different Pauli operators correspond to the entries of the matrix entries that account superposition between the anti-symmetric state and the three different symmetric states.Hence, the CP channel outputs the zero matrix for any state that is block diagonal in the symmetric and anti-symmetric subspaces (such as ρ = σ ⊗2 for any single-qubit state σ).
Let us note here an interesting fact.Namely, that the CP channel, in its vanilla version, has an asymmetry embedded into it.Namely, it only accounts for either the real or imaginary part of the matrix of ρ.This can be solved by defining We exemplify our techniques by showing how to generalize standard quantum convolutional neural networks (QCNN) to group-equivariant QCNNs where the convolutional and pooling layers are equivariant to the task's symmetry group.For examples, we show how to construct Z 2 × Z 2 , and Z n -equivariant EQNNs.(Sec.VI) Moreover, we present a new architecture called SU(2)-equivariant QCNN and numerically demonstrate its advantage over symmetryagnostic QCNN in a quantum phase classification task of the bond-alternating Heisenberg model on up to 13 qubits.(Sec.VII)

Figure 2 .
Figure2.Equivariant quantum neural network.a) We consider a QML problem composed of a dataset (that can either be quantum mechanical in nature, or corresponding to classical data that have been encoded in quantum states) as well as a label symmetry group G.The first step is to define the input and output representation of G at each layer, where these can be natural, faithful, non-faithful, etc. From here, we will provide different techniques which allow us to construct the EQNN layers and control, for instance, the locality of their gates.b) Dashed lines indicate the representations of the symmetry group G at specific stages in the EQNN, which may change between layers.At first, the input state ρin is acted upon by the representation R in .The l th layer of the EQNN, N l θ l , must be (G, R l , R l+1 )equivariant.In sum, the full architecture,ϕ = N L θ L • • • • • N 1 θ 1 , is (G, R in , R out )-equivariant.The (G, , R out )-equivariant measurement operator O is in the commutant of the output representation R out .Note that if we only want the EQNN to produce an output state equivariantly or invariantly (e.g. in generative models), we can omit the measurements.

Figure 3 .
Figure 3. Different types of equivariant layers in a general architecture of EQNNs.A standard layer maps data between spaces of the same dimension.An embedding (pooling) layer maps the data to a higher-dimensional (smallerdimensional) space.In a lifting layer, ker(R l−1 ) > ker(R l ), while in a projection layer ker(R l−1 ) < ker(R l ).

Figure 4 .
Figure 4. Example of the nullspace method.We demonstrate how to use the nullspace method to determine the space of 1-to-1-qubit (G,R in ,R out )-equivariant quantum channels, with

Figure 5 .
Figure 5. Example of the twirling method.We demonstrate how to use the twirling method to determine the space of 1-to-1 qubit (G,R in ,R out )-equivariant quantum channels, withG = Z2 = {e, σ}, R in = {1 1, X} and R out = {1 1, Z}. a) Explicitcalculation using the twirling formula of Eq. (25).b) Ancillabased scheme for in-circuit twirling.c) Classical-randomness scheme for in-circuit twirling.Both schemes in b) and c), detailed in Appendix D, recover the twirling in a).

Figure 6 .
Figure 6.Example of the Choi operator method.We demonstrate how to use the Choi operator method to determine the space of 1-to-1 qubit (G,R in ,R out )-equivariant quantum channels, with G = Z2 = {e, σ}, R in = {1 1, X} and R out = {1 1, Z}. a) Isotypic decomposition of the group representation .b) We show that a specific choice for the block-diagonal components of J ϕ leads to the map ϕ(ρ) = (XρX + ZρZ)/2.

Figure 7 .
Figure 7. Procedure to parametrize and optimize equivariant quantum neural networks.a) We provide techniques to parametrize both equivariant unitaries and general equivariant channels.b) Once we have a parametrized EQNN, we can proceed to train it.Here, we feed the training data into the EQNN, whose outputs are used to compute the loss function.Leveraging a classical optimizer we find updates for the parameters in the EQNN.In the case of channels, if might be necessary to project the updated map to the feasible CPTP region.Note that we can use classical compilers to transform a linear combination of channels into an sequence of gates that we can implement on a quantum device (Appendix E).The procedure is repeated until convergence is achieved.

Figure 8 .
Figure 8. SU(2)-equivariant QCNN.a) We consider the problem of building 2-to-2 standard equivariant channels and 2-to-1 equivariant pooling channels.In the figure we present the respective input and output Hilbert spaces, as well as the input and output representation.b) In an SU(2)-equivariant QCNN, we alternate between 2-to-2 channels acting on neighbouring qubits and 2-to-1 equivariant pooling channels which reduce the feature space dimension.

Figure 9 .
Figure9.Region of parameter space leading to CPTP channels.Using the nullspace method we can find a basis for all 2-to-1 (SU(2), g ⊗2 , g)-equivariant pooling maps.These can then be linearly combined to form a general parametrized equivariant map as in Eq. (35), and we find in(37) the region in parameters space leading to CPTP channels.Here we depict said region as the volume of the hyperbole (red) below the plane (green).
Pauli strings with even # of Y's and Z's on both E and O .

Figure 11 .
Figure 11.Predicted Phase Diagrams.The four panels show the phase diagram of the 1D Bond Alternating XXX Heisenberg Model for system sizes of N = 12 and N = 13 qubits as reconstructed by a trained SU(2)-EQCNN or HEA-QCNN.Particularly, each panel shows the QCNN output when it is tested against a dataset of 500 homogeneously distributed ground states.States whose output is above (below) the optimal threshold τ (green dashed line) are colored in blue (red) and classified as belonging to the trivial (topological) phase.The training points are shown as black crosses.The vertical solid black line is the theoretical critical value J2/J1 = 1.The configurations leading to the panels are the following.(a): SU(2)-EQCNN, N = 12, 60 trainable parameters, 12 training points; (b): HEA-QCNN, N = 12, 63 trainable parameters, 12 training points; (c): SU(2)-EQCNN, N = 13, 66 trainable parameters, 12 training points; (d): HEA-QCNN, N = 13, 66 trainable parameters, 12 training points.Details about the training procedure are given in the main text.

Figure 12 .
Figure 12.The actual power of equivariance.The two panels show the mean and variance of the testing accuracy reached by trained QCNNs on the bond-alternating Heisenberg model of even (left) and odd (right) sizes.The average is conducted on both the chosen sizes,(6,8,10,12) for the even case and (7, 9, 11, 13) for the odd one, and on 10 different, randomly initialized training runs for each problem size.The results are plotted against the number of training datapoints NT .The blue circles refer to the EQCNN with the same number of parameters as the HEA-QCNN (orange stars).The green circles describe the EQCNN with the minimum number of parameters possible, i.e., the architecture, as it is, in Fig.8with only two standard layers before each pooling layer.These plots demonstrate that equivariance provides significant improvements when, and only when, there is degeneracy in the ground space.

Table II .
Overview of the different methods for finding equivariant channels.Generating set size: denotes that size of the generating set for which the method is better suited.Main technique: indicates the tools used to create the equivariant channels.