Splitting and Parallelizing of Quantum Convolutional Neural Networks for Learning Translationally Symmetric Data

The quantum convolutional neural network (QCNN) is a promising quantum machine learning (QML) model that is expected to achieve quantum advantages in classically intractable problems. However, the QCNN requires a large number of measurements for data learning, limiting its practical applications in large-scale problems. To alleviate this requirement, we propose a novel architecture called split-parallelizing QCNN (sp-QCNN), which exploits the prior knowledge of quantum data to design an efficient model. This architecture draws inspiration from geometric quantum machine learning and targets translationally symmetric quantum data commonly encountered in physics and quantum computing science. By splitting the quantum circuit based on translational symmetry, the sp-QCNN can substantially parallelize the conventional QCNN without increasing the number of qubits and improve the measurement efficiency by an order of the number of qubits. To demonstrate its effectiveness, we apply the sp-QCNN to a quantum phase recognition task and show that it can achieve comparable classification accuracy to the conventional QCNN while considerably reducing the measurement resources required. Due to its high measurement efficiency, the sp-QCNN can mitigate statistical errors in estimating the gradient of the loss function, thereby accelerating the learning process. These results open up new possibilities for incorporating the prior data knowledge into the efficient design of QML models, leading to practical quantum advantages.

The quantum convolutional neural network (QCNN) is a promising quantum machine learning (QML) model that is expected to achieve quantum advantages in classically intractable problems.However, the QCNN requires a large number of measurements for data learning, limiting its practical applications in large-scale problems.To alleviate this requirement, we propose a novel architecture called split-parallelizing QCNN (sp-QCNN), which exploits the prior knowledge of quantum data to design an efficient model.This architecture draws inspiration from geometric quantum machine learning and targets translationally symmetric quantum data commonly encountered in physics and quantum computing science.By splitting the quantum circuit based on translational symmetry, the sp-QCNN can substantially parallelize the conventional QCNN without increasing the number of qubits and improve the measurement efficiency by an order of the number of qubits.To demonstrate its effectiveness, we apply the sp-QCNN to a quantum phase recognition task and show that it can achieve comparable classification accuracy to the conventional QCNN while considerably reducing the measurement resources required.Due to its high measurement efficiency, the sp-QCNN can mitigate statistical errors in estimating the gradient of the loss function, thereby accelerating the learning process.These results open up new possibilities for incorporating the prior data knowledge into the efficient design of QML models, leading to practical quantum advantages.

I. INTRODUCTION
Quantum computing is an innovative technology that is expected to solve classically intractable problems and open up new frontiers in scientific research and technological advancements [1].Quantum machine learning (QML) is one of the central research fields in quantum computing, allowing us to solve various tasks such as classification, regression, and clustering by discovering relationships and patterns between data using quantum computers [2][3][4][5].Recent studies have demonstrated quantum speedups in QML beyond classical machine learning for specific artificially engineered tasks, suggesting the potential of QML [6][7][8].While many studies have been devoted to achieving quantum advantages of QML for practical problems, the path to achieving this goal remains unclear [9].
The quantum neural network (QNN) is a promising QML model that combines the principles of quantum information processing and artificial neural networks to enhance the capabilities of data-driven technologies [10][11][12][13][14].The QNN is represented by a parametrized quantum circuit, which is optimized via training data to solve a given task [15].Since the efficient simulation of quantum circuits is generally impossible with classical computers, the QNN can learn the complex features of data that are classically intractable [16,17].Among various QNN architectures, the quantum convolutional neural network (QCNN) is a leading one that enables classification tasks [18,19] [Fig.1(a)].For instance, the QCNN can classify the phases of matter in quantum many-body * chinzei.koki@fujitsu.comsystems, an important research object in the broad field of physics [20][21][22].Due to its high trainability and feasibility [23], the QCNN is particularly suitable for noisy intermediate-scale quantum (NISQ) devices with a limited number of possible gate operations [24].
It is conjectured that achieving quantum advantages in QML requires encoding the prior knowledge of a problem, or inductive bias, into the learning models [25].In QNNs, the architecture design tailored to prior knowledge is considered crucial to take full advantage of its capabilities [26].Geometric quantum machine learning (GQML) based on equivariant QNNs, where the symmetry of a problem is encoded in a variational unitary circuit, is one of such prior knowledge-tailored learning models, reducing the parameter space to be searched and enhancing trainability and generalization [27][28][29][30][31][32][33][34][35].For example, it was theoretically proved that permutation-equivariant QNNs [36] do not suffer from barren plateaus [37][38][39][40], the exponential vanishing of the gradient in a loss function, due to the exponential reduction of parameters to reach overparametrization [41].This technique would be a powerful tool to achieve quantum advantages with QNNs.
The high resource requirement of the measurement process remains a practical barrier for QNNs to learn data on real quantum computers [15].During the learning process of QNNs, a predefined loss function is minimized by adjusting the variational parameters of the circuit.This loss function is computed from a training dataset by measuring specific observables in the parameterized quantum circuit.Therefore, the measurement cost scales with the number of parameters to be optimized and the amount of data to be processed [42].This situation presents a significant bottleneck when considering large-scale QML applications and the potential of practical quantum advantages [9,43].To mitigate this measurement requirement, one possible solution is the multiprogramming of quantum computation, which allows multiple circuits to be executed in parallel on different regions of a quantum processor [44][45][46][47].Although this parallelization reduces the total runtime, it increases the required qubit resources, which are limited in current devices.
We address this issue by proposing a novel QNN architecture called split-parallelizing QCNN (sp-QCNN).This architecture is inspired by GQML and targets translationally symmetric data, such as solid-state materials in condensed matter physics.This model exploits the data symmetry as prior knowledge to substantially parallelize the QCNN without increasing the number of qubits, improving the measurement efficiency [Fig.1(b)].The circuit of the sp-QCNN consists of two elements: translationally symmetric layers and circuit splitting.First, we impose translational symmetry on the convolutional and fully-connected layers to preserve the symmetry of the input state.Second, we split the circuit (rather than discarding some qubits) at the pooling layers and then perform the same unitary operations on each branch in parallel.The combination of this circuit structure and data symmetry substantially parallelizes the conventional QCNN consisting of the same unitary layers, improving the measurement efficiency of local observables and their gradients by a factor of O(n) (n is the number of qubits throughout this paper).
For verification, we apply the sp-QCNN to a quantum phase recognition task.The results show that the sp-QCNN can improve the measurement efficiency by a factor of O(n) while achieving sufficient classification performance to recognize the symmetry-protected topological (SPT) phase [48][49][50][51].In training with limited measurement resources, the sp-QCNN with high measurement efficiency can suppress statistical errors in estimating the gradient of the loss function to accelerate the learning process compared to the conventional QCNN.Our model opens up a new research direction in the QNN architec-ture design, contributing to practical quantum advantages in near-term quantum devices that lack sufficient computational resources.
The remainder of this paper is organized as follows.First, Sec.II briefly reviews the QCNN and discusses its computational cost.Section III introduces two key components of the sp-QCNN, translationally symmetric layers and circuit splitting, and clarifies the similarities and differences between the sp-QCNN and the GQML.Section IV shows the advantage of the sp-QCNN, i.e., the improvement of measurement efficiency for local observables and their gradients, based on symmetry.For verification, Sec.V presents the application of the sp-QCNN to a quantum phase recognition task, showing that it can solve the task with sufficient accuracy and improve the measurement efficiency by a factor of O(n).Finally, Sec.VI summarizes this paper and discusses potential future research directions.

II. REVIEW OF QCNN
The convolutional neural network (CNN) is a celebrated classical machine learning model that solves various tasks, such as image classification [52][53][54].The CNN consists of three different types of layers: convolutional, pooling, and fully connected layers.The convolutional layer filters the input data to extract its local features, and the pooling layer coarse-grains the data to leave only relevant information.After the convolutional and pooling layers are alternately applied, the fully connected transformation is applied to the remaining data to produce a final output.For example, in classification problems, the output indicates which class the input data belongs to, and the CNN is trained to correctly classify training data.
The QCNN is a CNN-inspired QNN model that can process quantum data whose dimension is exponentially larger than classical ones and is expected to achieve practical quantum advantages [18,19].Similar to the CNN, the QCNN consists of convolutional, pooling, and fully connected layers [Fig.1(a)].The convolutional layers apply local unitary gates to extract the local features of the input data, and the pooling layers discard some qubits to coarse-grain the quantum information.After alternately applying the two types of layers, we perform the fully connected unitary, measure the remaining qubits, and obtain an output indicating the data class.In the QCNN, the quantum circuit is characterized by variational parameters, which are optimized to correctly classify training data.Such a variational algorithm is central in the NISQ era, as it works even in a relatively shallow circuit [24].
The QCNN is promising for quantum advantages in NISQ devices because of its two significant features.One is its high feasibility.Since the number of qubits in the QCNN decreases exponentially in each pooling layer, the circuit depth is O(log n).This logarithmic depth is advantageous for NISQ devices where the number of possible gate operations is limited.The other feature of the QCNN is its high trainability.In many variational quantum algorithms, the exponential vanishing of the gradient in a loss function, known as the barren plateau phenomenon, prevents scalable optimization [37][38][39][40].Meanwhile, Ref. [23] proved that the QCNN does not suffer from the barren plateaus due to the logarithmic depth and the locality of unitary operations and observables.The absence of barren plateaus leads to the high trainability of the QCNN, which is crucial for achieving quantum advantages in QML tasks.
However, the high resource requirement of measurements for optimization presents practical difficulties in QNNs, including QCNN [15].Let us estimate the required measurement cost in the QCNN.First, we suppose that half of the qubits are discarded at each pooling layer and the number of variational parameters is common QCNNs, gates acting in parallel share the same parameters, thus, the number of independent parameters is O(log n), but measuring the gradient of the loss function requires O(n) cost (see Sec. IV B for details)].We also let N train , N epoch , and N shot denote the number of training data, maximum epoch (one epoch refers to a complete iteration through a dataset), and the number of measurement shots used per observable, respectively.Then, the total required number of shots during training is O(nN train N epoch N shot ).In terms of practicality, the QCNN is not easy to implement for large-scale problems requiring many qubits and a large dataset.Below, we present a new architecture of the QCNN that can ideally reduce the required number of shots by a factor of O(1/n), bringing the QCNN closer to realization.

III. SPLIT-PARALLELIZING QCNN
In this section, we describe the two key components of the sp-QCNN, translationally symmetric layers and circuit splitting, and discuss the relationship between the = FIG.2. Example of translationally symmetric unitary layer.Single qubit rotations are applied in parallel, followed by ZZ rotations on the nearest neighboring qubits.These procedures are repeated d times.The rotation angles are translationally symmetric, and thus the number of independent parameters is 4d.
sp-QCNN and the GQML through symmetry.For simplicity, this work focuses on the case where quantum data is defined on qubits aligned on a one-dimensional lattice.The generalization to arbitrary dimensional lattices is straightforward.

A. Translational symmetry
In the sp-QCNN, we exploit data symmetry as prior knowledge to design an efficient QML model.The target of the sp-QCNN is translationally symmetric data, which is represented by a density matrix ρ i with the following property: where T is the translation operator by one qubit (e.g., The most relevant field for the application of sp-QCNN is condensed matter physics, in which translationally symmetric materials such as solids are the largest research topic [55][56][57].In Sec.V, we will demonstrate that the sp-QCNN can detect the quantum phases of translationally symmetric many-body states.
To ensure the equivalence of outputs in parallel computation of sp-QCNN, we also impose translational symmetry on each of the convolutional and fully connected layers, whose unitary is denoted by V i , as follows: As will be shown later, the translational symmetries of the data and the circuit contribute to the substantial parallel computation.We give an example of hardware efficient ansatz respecting the translational symmetry (Fig. 2): where we have defined with the periodic boundary condition Z j+n = Z j .The rotation angles, α k , β k , γ k , and δ k , are variational parameters to be optimized and do not depend on the qubit position.By construction, V i is symmetric by one-qubit translation: [V i , T ] = 0 (and therefore [V i , T m ] = 0 holds for any integer m).In contrast, the convolutional layer of the conventional QCNN, V conv i , is symmetric by two or more qubit translations: We note the expressivity and classical simulability of our ansatz.As for the expressivity, the ansatz in Eq. ( 3) is efficient to implement in quantum hardware but is not general because it cannot express all translationally symmetric unitaries due to the extra inversion symmetry.In principle, the sp-QCNN allows arbitrary translationally symmetric unitaries since we do not impose any constraints on V i other than translational symmetry (2).For example, unitary operators that break inversion symmetry, such as exp[−iθ j X j Y j+1 ], can be implemented using Trotter decomposition [1].Then, however, the circuit tends to be deeper and therefore more difficult to implement in near-term devices.Also, extra symmetries of local gates can limit the circuit expressivity [58].Finding a more expressive and compact ansatz is an important open issue.Meanwhile, it is known that strong symmetry constraints on quantum circuits can lead to classical simulability [59][60][61].For example, permutation symmetric quantum circuits are classically simulable in many situations because the dimensions of permutation invariant subspaces are polynomial [62].Nevertheless, since the translation symmetry is much weaker than the permutation symmetry, we suggest that the translation symmetry does not lead to classical simulability in general.In support of this suggestion, the dimensions of the translationally invariant subspaces are exponentially large.We leave further analysis on it as a future research problem.

B. Circuit splitting
Another key component of the sp-QCNN is circuit splitting.In the conventional QCNN, the pooling layer discards some qubits to coarse-grain the quantum data.In contrast, the sp-QCNN splits the circuit at the pooling layers instead of discarding the qubits, as shown in Fig. 1(b).After splitting, we perform the same operations on each branch and finally measure all the qubits in the computational basis.In some types of quantum computers, such as superconducting [63] and ion-trap devices [64], unitary operations can be performed in paral- (b) A specific circuit-splitting method.We first divide the qubits into q miniblocks consisting of p qubits and split the circuit such that the jth qubit of the ith miniblock is connected to the ith qubit of the jth branch.
lel, and thus this parallel computation does not significantly increase the runtime.
In this model, we split the circuit such that it is invariant under the translation operation.Figure 3(a) shows an illustrative example of circuit splitting, where the translation operation only swaps the two branches (red and blue lines) but does not modify the overall circuit structure.Here, we give a specific circuit-splitting method for one-dimensional lattice cases.With n as the number of qubits, we choose a prime factor of n, denoted by p, and define q = n/p.Then we introduce splitting in which n = pq qubits are split into p branches [Fig.3(b)].First, we divide the qubits into q miniblocks each comprising p qubits in order from top to bottom.Next, we split the circuit such that the jth qubit of the ith miniblock is connected to the ith qubit of the jth branch.By repeating this procedure on each new branch until the number of qubits becomes one, we obtain the en-tire sp-QCNN circuit.Note that this splitting procedure requires SWAP gates to rearrange the qubits in quantum hardware without all-to-all connectivity (see Appendix A for details).
Due to the translational symmetry of V i and circuit splitting, the sp-QCNN substantially parallelizes the nonsplitting QCNN that consists of the same V i [Fig.1(b)].For convenience, we define ⟨A⟩ ns and ⟨A⟩ as the expectation values of an operator A in the nonsplitting and sp-QCNNs.In the nonsplitting QCNN, we measure one of the remaining qubits in the computational basis and consider its expectation value (i.e., ⟨Z 1 ⟩ ns ) as the output of the QCNN.On the other hand, in the sp-QCNN, we measure all the qubits and regard the average of the n expectation values (i.e., ⟨Z avg ⟩ = j ⟨Z j ⟩ /n) as the output.In the next section, we will discuss the mechanism and validity of this parallelization in more detail.

C. Relation with geometric quantum machine learning
We consider the sp-QCNN from the viewpoint of GQML or equivariant QNNs [27][28][29][30][31][32][33][34][35][36].The concept of GQML has recently emerged as a potential solution to some critical QML issues associated with trainability and generalization.It leverages the symmetry of a problem as inductive bias and provides a problem-tailored circuit architecture.For example, let us consider the classical task of recognizing whether an image represents a cat.If an image represents a cat, then its rotated image should also represent a cat.In this sense, this task has rotation symmetry.In GQML, such symmetry is encoded in the network architecture.Formally, given a symmetry operation S and an output function f (ρ), the S-invariance of GQML is defined as follows: In other words, the symmetry operation S on the input data never changes the output of GQML.In GQML, the neural network is usually designed based on the equivariant circuit to satisfy this invariance.In theory, GQML significantly enhances the capability of machine learning in several tasks [29,36].
The circuit of the sp-QCNN has the same invariant property as GQML.Let us consider the unitary transformation U of the entire sp-QCNN.Due to the translational symmetry of each V i and the splitting structure, U itself is translationally symmetric: This symmetry leads to the equivariant relation between input and output, That is, the translation operation applied to the input is identical to that applied to the output.We also define f (ρ) = tr(U ρU † Z avg ) with an observable which is the T -invariance of GQML (6).Therefore, the sp-QCNN can be seen as applying GQML to the QCNN.This insight suggests that the sp-QCNN can be used to enhance QML capability in tasks where the translation operation on the input data should not change the output.Here, let us clarify the difference between our model and the previously proposed equivariant QCNN [35].In Ref. [35], the equivariance of the pooling layer in the QCNN is achieved by randomly selecting which qubits to discard based on a given symmetry.The advantage of this conventional model lies in its applicability to various symmetries, while the sp-QCNN is specifically tailored to translational symmetry, imposing equivariance by splitting the circuit.In terms of the number of shots required, our model should be more efficient than the conventional equivariant QCNN by maximizing the utilization of qubit resources.
Our work also provides a new direction for exploiting data symmetry to improve the potential of QML.A critical difference between our problem setting and common ones in GQML is that the input data itself is symmetric in our problem [Eq.( 1)], but not in GQML (e.g., the cat image is not rotation invariant).Therefore, each approach brings different benefits.Although the usual GQML improves trainability and generalization, our method reduces measurement costs through substantial parallelization.Thus, the sp-QCNN is particularly advantageous for near-term quantum devices where computational resources are limited.

IV. MEASUREMENT EFFICIENCY IN SP-QCNN
In this section, we describe the parallelization mechanism in the sp-QCNN and show that it can improve the measurement efficiency of local observables and their gradients.We also analytically prove that the improvement factor is O(n) for a random input state.

A. Measurement efficiency of local observable
First, we show that the translational symmetry of V i and circuit splitting allow for parallel computation and improve the measurement efficiency of local observables.
A key property of the sp-QCNN is the equivalence of expectation values for all the qubits.We recall that the unitary transformation U of the entire sp-QCNN is trans- In accordance with the chain rule, the gradient is the sum of several derivatives, ∂ ⟨Z1⟩ /∂θ = j ∂ ⟨Z1⟩ /∂θj.For example, we suppose that the parameter θ is in the first convolutional layer as shown in the figure (the red boxes denote ∂/∂θ2 and ∂/∂θ1).Then translating the circuit proves ∂ ⟨Z1⟩ /∂θj = ∂ ⟨Zj−2⟩ /∂θ1 and thus ∂ ⟨Z1⟩ /∂θ = j ∂ ⟨Zj⟩ /∂θ1, which can be computed with only two circuits by measuring all the qubits.lationally symmetric [Eq.(7)].This symmetry leads to where ρ is an input state satisfying Eq. ( 1), and we have used ρ = (T † ) j−1 ρT j−1 and T j−1 Z 1 (T † ) j−1 = Z j .This equation indicates the equivalence of the expectation values for all the qubits, i.e., ⟨Z i ⟩ = ⟨Z j ⟩ for any i and j.This argument can be applied to other single-qubit Pauli operators, leading to 4(a) graphically illustrates this equivalence, which can also be proved by translating the circuit.This equivalence tells us that the sp-QCNN substantially parallelizes the nonsplitting QCNN that consists of the same V i , as shown in Fig. 1(b).As mentioned above, we regard the average of the expectation values for all the qubits, ⟨Z avg ⟩ = j ⟨Z j ⟩ /n, as the output in the sp-QCNN.Meanwhile, we consider the expectation value for only one qubit, ⟨Z 1 ⟩ ns , as the output in the nonsplitting QCNN.Given the equivalence in Eq. ( 9), the nonsplitting and sp-QCNNs produce the same results if statistical errors are absent: Here we have used ⟨Z 1 ⟩ ns = ⟨Z 1 ⟩, which can be proved by noticing that the nonsplitting QCNN is a part of the sp-QCNN.In the sp-QCNN, we estimate the output from T measurement shots as follows: Here, z = ±1 is the ℓth measurement outcome at the jth qubit, and we have defined the average of the ℓth measurement outcomes as z avg can be a/n (a ∈ {−n, −n+2, • • • , n}), corresponding to the measurement outcome of Z avg .We note that the number of outcomes in the sp-QCNN is n times greater than that in the nonsplitting QCNN where the output is estimated as 1 /T .Therefore, the sp-QCNN can reduce the number of shots required to achieve a certain estimation accuracy.Since this argument only relies on the symmetry property of data, the sp-QCNN is general and can be applied to broad tasks with translationally symmetric data.
It is worth noting that the sp-QCNN does not necessarily improve the measurement efficiency by a factor of O(n).This is because, in each shot, n measurement outcomes are correlated to each other via quantum entanglement.For example, if the output state is the GHZ state |ψ⟩ = (|000 2, then the sp-QCNN does not improve the measurement efficiency at all because the n outcomes are completely correlated and can only provide one bit of information.In contrast, if the output state is the W state |ψ⟩ = (|100 √ n, then the exact expectation value can be obtained with only one shot by measuring all the qubits in the sp-QCNN, whereas many measurements are required in the nonsplitting QCNN.Therefore, how well the sp-QCNN improves the measurement efficiency depends on the details of the problem, such as input data and circuit parameters.Later, we will analytically prove that the sp-QCNN can improve the measurement efficiency by a factor of O(n) for a typical random input state.
The advantage of the sp-QCNN is illustrated in Fig. 5(a).In actual experiments, we cannot obtain the exact expectation value because of statistical errors.Therefore it is usually estimated from the mean value of a finite number of measurement outcomes.In the nonsplitting QCNN, the estimated value is generally drawn from the Gaussian distribution with a variance of O(1/N shot ) in accordance with the central limit theorem.In the sp-QCNN, we obtain n measurement outcomes at once and thus expect that the variance scales as O(1/nN shot ), indicating the O(n) times improvement of measurement efficiency.We note that the sp-QCNN can improve the measurement efficiency of the conventional QCNN, not necessarily other QNNs.Our model further enhances the feasibility of QCNNs, bringing its practical quantum advantages closer to realization.
To quantify the effectiveness of the sp-QCNN, we introduce the relative measurement efficiency: Here σ 0 and σ sp are the standard deviations (i.e., square root of variance) of the Gaussians followed by estimated expectation values in the nonsplitting and sp-QCNNs with the same number of shots.This quantity means that the shot number required to achieve a certain estimation accuracy using the sp-QCNN is 1/r-times fewer than that using the nonsplitting QCNN.In the next section, we will demonstrate the efficiency of the sp-QCNN for a concrete task using this quantity.

B. Measurement efficiency of gradient
In general, the most costly part of machine learning is the optimization of neural networks using a training dataset, in which the loss function is minimized by tuning the network parameters.In classical machine learning, gradient-based methods are often used for optimization and work well for large-scale problems.Even in QML, gradient-based optimizers are important and powerful tools.However, many measurements are necessary to estimate the gradient in quantum computing [42].Our architecture makes such gradient measurements efficient.
The circuit splitting and translational symmetry in the sp-QCNN allow us to compute ∂ ⟨Z 1 ⟩ /∂θ j in parallel, improving the gradient measurement efficiency.For simplicity, we suppose that each V i has the form in Eq. ( 3) and that θ is in the first convolutional layer [Fig.4(b)].By translating the entire circuit, we can rewrite each term in Eq. ( 13) as Here we have used ρ = T j−1 ρ(T † ) j−1 , (T † ) j−1 Z 1 T j−1 = Z 2−j , and (T † ) j−1 Ũj± T j−1 = Ũ1± .This relation tells us that the derivative of Z 1 by θ j is identical to that of Z 2−j by θ 1 , as illustrated in Fig. 4(b).Thereby, Eq. ( 13) is reduced to where we have replaced Z j−2 with Z j in the summation.
According to this equation, we can obtain the gradient ∂ ⟨Z 1 ⟩ /∂θ with just two circuits Ũ1± by measuring all the qubits, instead of using 2m θ circuits that are conventionally necessary.By generalizing this argument and using the equivalence ⟨Z avg ⟩ = ⟨Z 1 ⟩, we estimate the gradient of the output as follows: where z j± is the jth qubit measurement outcome of the ℓth shot in the parameter-shifted circuit with θ 1 = θ ± π/4.In our ansatz [Eq.( 3)], the factor m θ /n appears when θ is in the second or later layer.We emphasize that the sp-QCNN enables us to execute n parallel computations even for the gradient estimation, thus accelerating the gradient-based training.Similar to the previous case, the relative measurement efficiency r for the gradient depends on the details of the problem due to the entangled property of the output state.

C. Measurement efficiency for random state
How well the sp-QCNN improves the measurement efficiency depends on the details of the problem.Here, we analytically prove that the efficiency is improved by a factor of O(n) for a typical state randomly chosen from the T -invariant Hilbert subspace in the limit of n → ∞.
Let us begin by considering the nonsplitting QCNN, where we measure Z 1 and obtain an outcome s = ±1 for every measurement.In the limit of n → ∞, the probability of obtaining an outcome ±1 is almost 1/2 for a typical random state because the statistical fluctuations by randomness are negligible due to the exponentially large Hilbert space [this probability distribution is depicted in the left panel of Fig. 5(b)].Given its Bernoulli distribution, the estimation accuracy of the expectation value is where N shot is the number of shots.
In the sp-QCNN, we measure all the qubits in the computational basis and regard the mean of the n measurement outcomes as the output of the QCNN [Eq.(11)].In other words, we measure Z avg = j Z j /n rather than Z 1 and obtain one of the eigenvalues s (= ±1, ±(n − 2)/n, • • • ) as an outcome.Also, given that the full unitary transformation U is translationally symmetric, the output state of the sp-QCNN has the same symmetry.The right panel of Fig. 5(b) shows the number of eigenstates of Z avg with an eigenvalue s, D n (s), on the T -invariant Hilbert subspace.In the limit of n → ∞, D n (s) approaches the following asymptotic form (see Appendix B for derivation): where C n is a constant independent of s.The width of D n (s) in s is O(1/ √ n), which finally gives rise to a small estimation error.
Here, we assume that when measuring Z avg for a typical state randomly chosen from the T -invariant subspace, the probability of obtaining an outcome s is proportional to D n (s).This assumption would be justified in the limit of n → ∞, where D n (s) is sufficiently large, and the statistical fluctuations are insignificant.Considering that the width of D n (s) is O(1/ √ n), we can estimate the expectation value from N shot experiments with an accuracy From the quantification in Eq. ( 12), the relative measurement efficiency of the sp-QCNN is This result indicates the O(1/n) times reduction in the number of experiments required to achieve a certain accuracy.
The scaling argument in Eq. ( 20) is valid in situations where the output state is a random quantum state.Therefore, it may arise in the early stage of the learning process when the parameters of the QCNN are randomly initialized and the output state is approximately random.In Sec.V, we will show that the sp-QCNN exhibits O(n) scaling for a concrete task in the early stage of learning and, remarkably, even in the final stage.

V. APPLICATION TO QUANTUM PHASE RECOGNITION
In this section, we apply the sp-QCNN to a quantum phase recognition task investigated in Ref. [19] and verify its effectiveness.For the remainder of this paper, we simulate the quantum circuit with Qulacs, an open-source quantum circuit simulator [66].

A. Formulation of problem
Let us consider a one-dimensional cluster Ising model with the periodic boundary condition, whose Hamiltonian is given by where n is the number of qubits, and X j , Y j , and Z j are the Pauli operators at the jth qubit.This Hamiltonian exhibits SPT [48][49][50][51], paramagnetic (PM), and antiferromagnetic (AFM) phases on the h 1 -h 2 plane.The SPT phase is protected by Z 2 × Z 2 symmetry characterized by X even(odd) = j∈even(odd) X j .The ground state of H, an input state in our task, is translationally symmetric because of T HT † = H.
Our task is to recognize the SPT phase using the sp-QCNN.Quantum phase recognition is one of the main applications of the QCNN, and many studies have been conducted with the aim of practical quantum advantages [20][21][22].In this task, the sp-QCNN can be applied because the input data (i.e., the ground state of H) is translationally symmetric.For training data, we use 20 ground states of H evenly located on the line of h 2 = 0 from h 1 = 0.05 to 1.95.Using the Jordan-Wigner transformation [67], we can analytically obtain the exact ground state for h 2 = 0, which transits from the SPT to PM phases at h 1 = 1.To evaluate the generalization of our method, we test the trained model with 28 data samples, most of which are not included in the training dataset.These samples correspond to ground states at (h 1 , h 2 ) with h 1 ∈ {0.35, 0.65, 0.95, 1.25} and h 2 ∈ {0, ±0.5, ±1, ±1.5}.The exact phase diagram on the h 1 -h 2 plane is computed with the density matrix renormalization group (DMRG) [68][69][70][71].In this work, we prepare the input data with exact diagonalization for simplicity.Yet, in actual experiments, other preparation methods must be applied, such as variational quantum eigensolver on a quantum computer and analog-digital transduction from a quantum experiment.
To train our model and evaluate its generalization, we consider the following error as the loss function: where |ϕ i ⟩ and y i are the training/test data and its corresponding label, and U is the total unitary of the circuit (M is the number of training/test data).Here, we set y i as 1 if |ϕ i ⟩ belongs to the SPT phase and 0 if it does not.We optimize the loss function using the stochastic gradient descent (SGD) method [72].In SGD, we update the parameters as ⃗ θ (t+1) = ⃗ θ (t) − η (t) ∇L, where ⃗ θ (t) is the parameter vector at optimization step t, and ∇L is calculated from only one of the training data at each step.We also decrease the learning rate as η (t) = η 0 /t to stabilize the training and set η 0 = 200.Besides, to investigate the statistical properties of the sp-QCNN, we simulate the same circuits with N p different random initial parameter sets.We set N p = 50 in Sec.V B and N p = 200 in Sec.V C.

B. Performance with limited measurement resources
We investigate how the high measurement efficiency of sp-QCNN enhances the machine learning performance in training with limited measurement resources.In such situations, statistical errors in estimating the gradient of the loss function would disturb the training process, reducing accuracy within a limited computational resource.Here, we show that the sp-QCNN can suppress the statistical errors, stabilizing and speeding up the learning process.
As the sp-QCNN circuit, we use the ansatz in Eq. ( 3) with d = 5, where the total number of parameters is 60 for n = 8 and 80 for n = 16.We also compare the sp-QCNN with the conventional QCNN depicted in Fig. 1(a), where the convolutional and fully-connected layers consist of two-qubit unitary gates parametrized as 15 j=1 e −iθj Pj (P j = IX, IY, • • • , ZZ).Since the gates acting in parallel share the same parameters, the number of independent parameters in the conventional QCNN is 75 for n = 8 and 105 for n = 16.When measuring the gradient in the simulation, we match the shot number per parameter in the conventional and sp-QCNNs for each layer.Here, we use 2m θ N shot shots for parameter θ in the sp-QCNN and set N shot = 5.
We first show that our sp-QCNN has comparable classification performance to the conventional QCNN.The sp-QCNN effectively suppresses statistical errors and accelerates the learning process while maintaining comparable classification performance to the conventional QCNN.In Figs.6(a) and (b), the loss functions converge rapidly for both QCNNs in the absence of statistical errors (dashed lines).However, in the presence of statistical errors (solid lines), the loss convergence becomes significantly slower in the conventional QCNN, whereas it remains relatively modest in the sp-QCNN.This fast convergence in the sp-QCNN stems from its high measurement efficiency.While significant statistical errors disturb the rapid and stable optimization in the conventional QCNN, the high measurement efficiency in the sp-QCNN suppresses the statistical errors, stabilizing and accelerating the optimization.As shown in Figs.6(a) and (b), this improvement is more prominent for n = 16 compared to n = 8, due to the O(n) improvement in the measurement efficiency.The fast convergence of training is highly effective for near-term quantum devices where a long optimization run is impractical due to limited computational resources.

C. Quantification of measurement efficiency
The previous subsection reveals that the sp-QCNN suppresses the statistical errors in estimating the gradient and accelerates the learning process due to its high measurement efficiency.Here, we numerically quantify the measurement efficiency of sp-QCNN, showing that it improves the measurement efficiency by a factor of O(n) in the quantum phase recognition task.The measurement efficiency is quantified by the ratio of the variances in the nonsplitting and sp-QCNNs, as shown in Fig. 5 and Eq. ( 12) (we assume that the nonsplitting QCNN consists of the same unitary V i as the sp-QCNN).We calculate the measurement efficiency for n = 8, 12, 16 and 18 to identify the scaling with respect to the number of qubits.In this verification, the changes in the measurement efficiency are computed during the training process for three typical input states: SPT, PM, and AFM states, which are the eigenstates of H for (h 1 , h 2 ) = (0, 0), (+∞, 0), and (0, −∞).We also explore the efficiency of measuring the loss gradient for the first parameter.The unitary circuit consists of the translationally symmetric ansatz (3) with d = 10 and is split such that the number of qubits in a branch varies as 8 → 4 → 2 → 1, 12 → 6 → 3 → 1, 16 → 8 → 4 → 2 → 1, and 18 → 9 → 3 → 1 for n = 8, 12, 16, and 18, respectively.Also, to distinguish between the effects of statistical errors on the circuit optimization and the observable measurement, we train the  variational circuit with exact gradient (i.e., without statistical errors in estimating gradient) and estimate the measurement efficiency with a finite shot simulation at each epoch., the efficiency r is high at the beginning and does not significantly decrease during training.For the SPT state and loss gradient [(b) and (e)], r is initially high but decreases as training, finally converging to a small value (r = 2-5).These results imply that the improve-ment rate of measurement efficiency strongly depends on the input data, what we measure, and the stage of learning.Even for the SPT state and loss gradient, the final efficiency is higher than one, indicating that the measurement in the sp-QCNN is more efficient than that in the nonsplitting QCNN.We also investigate the measurement efficiency for predicting the phase diagram by the trained sp-QCNN with n = 8 and 16 qubits [Figs.8(c) and (d)].By comparing these figures, we notice that the efficiency r for n = 16 is more than twice that for n = 8 in most areas.This result implies the O(n) times improvement for prediction.We also observe that the efficiency is low in the SPT phase but relatively high in the PM and AFM phases, a trend evident in Fig. 7 as well.We infer that this phenomenon is due to the following reason.For the SPT state, the expectation value of j Z j /n after training is almost one because we have assigned the label as y i = 1 for the SPT phase in the loss function, which means that U |ϕ SPT ⟩ ∼ |0 • • • 0⟩.Given that the sp-QCNN has no advantages for measuring |0 • • • 0⟩, the measurement efficiency is not significantly improved.For complete understanding, additional analyses must be conducted in future works.

VI. CONCLUSIONS
In this study, we have proposed a new QNN architecture, sp-QCNN, which reduces measurement costs by exploiting the translational symmetry of data as prior knowledge.In the sp-QCNN, we symmetrize and split the QCNN circuit to parallelize the computation, thus improving the measurement efficiency.We have demonstrated the advantage of the sp-QCNN for the quantum phase recognition task: it has high classification performance for this task and can improve the measurement efficiency by a factor of O(n).In a realistic setting where measurement resources are limited, the sp-QCNN can enhance the speed and stability of the learning process.These results present a new possibility for the symmetrybased architecture design of QNN and bring us one step closer to achieving the quantum advantages of the QCNN in near-term quantum devices.
This work offers some research directions for the future.First, finding practical applications of the sp-QCNN is crucial for quantum advantages.A promising candidate is the research of solids, where the sp-QCNN could offer some hints on unsolved problems in condensed matter physics, such as the phase diagrams of the Hubbard model [56] and the kagome antiferromagnetic Heisenberg model [57].The second direction is further studies of symmetry-based architecture design to reduce measurement costs.Although this work has provided a new approach for QML, its coverage is limited to data with translational symmetry.Hence, generalization to other symmetries, such as the space group, is intrigu-ing and fruitful and may be applied to chemical molecules as well as solid-state materials.The third direction is to find a better ansatz.Although this work establishes the basis of the sp-QCNN, the best V i for a given problem remains unclear.In general, low expressivity tends to result in poor QML accuracy, while excessively high expressivity can lead to barren plateaus [40,73].Therefore, finding an ansatz with appropriate expressivity depending on a problem is helpful to realize sp-QCNN experimentally.
Finally, we provide several open issues on the sp-QCNN.This work has shown that the sp-QCNN has sufficient expressivity, trainability, generalization, and measurement efficiency to solve the phase recognition task.However, whether it can solve other complicated tasks remains unclear.In particular, the translational symmetry of V i could suppress expressivity and limit solvable tasks in general.Uncovering the possibilities and limitations of the sp-QCNN is an important open issue.For trainability, elucidating whether barren plateaus exist in the sp-QCNN is crucial.In the conventional QCNN, barren plateaus do not appear due to its unique architecture: the logarithmic circuit depth and the locality of unitary operations and observables [23,[37][38][39].Considering that the sp-QCNN shares these properties with the conventional QCNN, we suggest that no barren plateaus will appear even in the sp-QCNN [74].The results in this paper show that the training of the sp-QCNN works well up to n = 18 qubits, supporting our conjecture.The analysis of measurement efficiency is also an important research issue.Besides, while we observed the O(n) times improvement of measurement efficiency in the phase recognition task of Hamiltonian (21), it remains unclear whether the measurement efficiency is improved by a factor of O(n) in other problems as well.More thorough analyses are necessary for complete verification.

Appendix A: Implementation of circuit splitting
In real hardware, the implementation of circuit splitting depends on the hardware connectivity.For example, in quantum computers with all-to-all connectivity such as ion trap devices, special processes for circuit splitting are not necessary because the unitary operations on each branch after splitting can be implemented with the qubits being separated.Conversely, in quantum computers where only local entangling operations on neighboring qubits are available, we have to swap the qubits in order to implement the unitary operations after splitting.For concreteness, let us consider a case where n FIG. 9.
An implementation of circuit splitting using local SWAP gates.This figure shows the circuit splitting of n qubits to two branches (n even).The SWAP procedure consists of (n − 2)/2 steps.For step d, we swap jth and (j + 1)th qubits for j qubits are split into two branches (we assume that n is even).Figure 9 illustrates the SWAP circuit for splitting, which uses O(n 2 ) SWAP gates with a depth of O(n).Although conventional QCNNs also require SWAP gates to rearrange the remaining qubits in the pooling layers, our sp-QCNN needs more SWAP gates in general.

Appendix B: The number of eigenstates of Zavg
Here, we derive Eq. (18), where the number of eigenstates of Z avg for translationally symmetric states is with an eigenvalue s.For convenience, we consider Z tot = j Z j rather than Z avg = Z tot /n.In the sp-QCNN, we measure Z tot whose eigenvalues are ±n, ±(n − 2), • • • and obtain one of the eigenvalues every shot.In addition, the output state is translationally symmetric in the sp-QCNN.Hence, for simplicity, we now focus on the Tinvariant eigenspace of Z tot with an eigenvalue z, V z (i.e., T |ϕ⟩ = |ϕ⟩ and Z tot |ϕ⟩ = z |ϕ⟩ for any |ϕ⟩ ∈ V z ).Below, we investigate the dimension of V z .
To this end, we introduce a cyclic group generated by T , Given that the order of g is four (∵ g 4 = I), we first divide the qubits into four sets, each consisting of three (= n/ord(g)) qubits.In all the sets, the configuration of white and black circles must be the same as each other because of the condition that g does not change the state.Therefore, each set has two white and one (= ℓz/ord(g)) black circles, and there are three (= n/ord(g) ℓz /ord(g) ) possible configurations shown in the figure .|Ψ i ⟩ = |ϕ⟩∈[Φi] |ϕ⟩ /N , where |Ψ i ⟩ is the base of V z , [Φ i ] is the element of M z /G n , and N is the normalization factor (T |Ψ i ⟩ = |Ψ i ⟩ can be easily checked).Therefore, dimV z = |M z /G n | holds, where |A| is the number of elements in A. Using Burnside's lemma [75], we have dimV z as follows: where M g z = {|ϕ⟩ ∈ M z | g |ϕ⟩ = |ϕ⟩}.Then, the following theorem holds.
Theorem 1.For z ̸ = ±n, the following relation holds in the limit of n → ∞: with ℓ z = (n + z)/2.Here • • denotes the binomial coefficient.This theorem states that the asymptotic form of dimV z is n ℓz /n.
We first rewrite Eq. (B3) as Therefore, F z is reduced to We will evaluate the second term in this equation.
To calculate |M g z |, we define the order of g ∈ G n , ord(g), as the number of elements in the subgroup generated by g (i.e., {g 0 , g 1 , g 2 , • • • , g k−1 } with g k = I).Note that ord(g) is a divisor of n.Thereby, |M g z | is written as follows: n/ord(g) ℓ z /ord(g) ℓ z /ord(g) ∈ Z. (B7) Figure 10 shows a graphical description of Eq. (B7) as an example for n = 12, ℓ z = 4, and g = T 3 .Based on Eq. (B7), one can straightforwardly show that the second term in Eq. (B6) vanishes in the limit of n → ∞ for ℓ z = 1, 2 by noticing that ord(g) = 1 only for g = I and ord(g) = 2 only for g = T n/2 .Thus, we focus on 3 ≤ ℓ z ≤ ⌊n/2⌋.Because of ord(g) ≥ 2 for g ̸ = I, we have Finally, we remark that this discussion is approximately valid for large but finite n while this appendix considers the limit of n → ∞.In fact, in Sec.V C, we have observed the clear O(n) scaling for n = 18 at the beginning of training where the output state is almost random.

FIG. 1 .
FIG.1.Basic structures of (a) conventional and (b) sp-QCNNs.In the figure, C, P, and FC represent convolutional, pooling, and fully-connected layers, respectively.(a) In the conventional QCNN, some qubits are discarded at each pooling layer, and only one of the remaining qubits is measured in the end to classify the quantum data.(b) In the sp-QCNN, the translational symmetry of data is used as prior knowledge to design an efficient QML model.The circuit of the sp-QCNN (the left circuit) consists of translationally symmetric layers and splitting structures, allowing us to substantially parallelize the nonsplitting QCNN (the right circuit) to improve the measurement efficiency.
FIG. 3.(a) An illustration of translationally symmetric circuit splitting.In this circuit, the entire circuit structure is invariant under the translation operation.(b)A specific circuit-splitting method.We first divide the qubits into q miniblocks consisting of p qubits and split the circuit such that the jth qubit of the ith miniblock is connected to the ith qubit of the jth branch.

FIG. 4 .
FIG.4.Mechanism of parallelization in the sp-QCNN.(a) In the sp-QCNN, the expectation value of a local observable is equivalent for all the qubits.This can be proved by virtually translating the entire circuit.The translation does not change the input state and quantum circuit due to their translational symmetry but shifts the position of the measured qubit, showing the equivalence of expectation values at different qubits.(b) The gradient measurement can be parallelized in the sp-QCNN.In accordance with the chain rule, the gradient is the sum of several derivatives, ∂ ⟨Z1⟩ /∂θ = j ∂ ⟨Z1⟩ /∂θj.For example, we suppose that the parameter θ is in the first convolutional layer as shown in the figure (the red boxes denote ∂/∂θ2 and ∂/∂θ1).Then translating the circuit proves ∂ ⟨Z1⟩ /∂θj = ∂ ⟨Zj−2⟩ /∂θ1 and thus ∂ ⟨Z1⟩ /∂θ = j ∂ ⟨Zj⟩ /∂θ1, which can be computed with only two circuits by measuring all the qubits.
FIG. 5. (a) Quantification of measurement efficiency.In actual experiments, statistical errors arise in estimating the expectation value of an observable.This figure shows the probability distribution of the estimated expectation value.Here, we define the relative measurement efficiency r as the ratio of the variances in the sp-QCNN and the nonsplitting QCNN.(b) Number of eigenstates of Z1 and Zavg with an eigenvalue s.While the possible measurement outcome is ±1 in the nonsplitting QCNN (left panel), it is widely distributed in the range of −1 to 1 with a width of O(1/ √ n) in the sp-QCNN (right panel).

FIG. 6 .
FIG. 6. (a), (b) Changes in training (top) and test (bottom) loss functions for (a) n = 8 and (b) n = 16.The orange (blue) solid and dashed lines denote the loss functions with and without statistical errors in the sp-QCNN (conventional QCNN) respectively.The shaded areas are the 10th-90th percentiles of the loss function for 50 sets of random initial parameters at each epoch.We match the number of shots per parameter to obtain the gradient in both QCNNs.(c) Phase diagram predicted by the trained sp-QCNN with statistical errors for n = 16 qubits.The color denotes the average magnitude of ⟨Zavg⟩ for 50 sets of initial parameters.The gray dots and dashed green lines denote our training data and phase boundaries computed by DMRG, respectively.In these simulations, we set the depth of each layer as d = 5.

FIG. 7 .
FIG. 7. (a) Changes in training and test loss functions for several numbers of qubits n.The lines and shaded areas depict the median and the 10th-90th percentiles for 200 sets of random initial parameters, respectively.(b)-(e) Changes in relative measurement efficiency during training.(b)-(d) show the efficiency r for different inputs, SPT, PM, and AFM states, whereas (e) shows the efficiency for measuring the loss gradient by the first parameter.At each epoch, we simulate experiments with 1000 shots 10000 times, estimate σ0 and σsp, and calculate the efficiency r = (σ0/σsp) 2 .The solid lines and shaded areas depict the mean values and standard deviations, respectively, for 200 sets of initial parameters.Except for evaluating the efficiency, we optimize the circuit using the exact expectation value (i.e., without statistical errors).
FIG. 8. (a), (b) Relative measurement efficiency r with varying the number of qubits n for SPT, PM, AFM, and loss gradient at (a) 0 and (b) 200 epoch in Fig. 7.The four straight lines fit the corresponding types of data points.The error bars denote the standard deviations for 200 sets of initial parameters.(c), (d) Relative measurement efficiency r on h1-h2 plane after 200 epochs for (c) n = 8 and (d) 16 qubits.The color denotes the magnitude of the relative measurement efficiency.The gray dots and dashed green lines denote our training data and phase boundaries computed by DMRG, respectively.

Figure 7 (
Figures 7(b)-(e) show the changes in the relative measurement efficiency r during training for the three inputs and loss gradient.For the PM and AFM states [(c) and (d)], the efficiency r is high at the beginning and does not significantly decrease during training.For the SPT state and loss gradient [(b) and (e)], r is initially high but decreases as training, finally converging to a small value (r = 2-5).These results imply that the improve-

Figures 8 (
Figures 8(a) and (b) show the relative measurement efficiency with varying the number of qubits n at 0 and 200 epochs for the four cases (cf.Fig. 7).At 0 epoch (a), all the data points are nearly aligned on a straight line.This result supports that measurement efficiency is improved by a factor of O(n) in the early stage of learning and is consistent with the previous argument based on randomness.Even at 200 epoch (b), we can fit FIG. 10.Illustration for calculating |M g z | with n = 12, ℓz = 4, and g = T 3 .Each white (black) circle indicates a single qubit state of |0⟩ (|1⟩), and n = 12 and ℓz = 4 mean that there are eight (= n−ℓz) white and four (= ℓz) black circles in total.Given that the order of g is four (∵ g 4 = I), we first divide the qubits into four sets, each consisting of three (= n/ord(g)) qubits.In all the sets, the configuration of white and black circles must be the same as each other because of the condition that g does not change the state.Therefore, each set has two white and one (= ℓz/ord(g)) black circles, and there are three (= n/ord(g) ℓz /ord(g) ) possible configurations shown in the figure.