Application of Quantum Machine Learning using the Quantum Kernel Algorithm on High Energy Physics Analysis at the LHC

Quantum machine learning could possibly become a valuable alternative to classical machine learning for applications in High Energy Physics by offering computational speed-ups. In this study, we employ a support vector machine with a quantum kernel estimator (QSVM-Kernel method) to a recent LHC flagship physics analysis: $t\bar{t}H$ (Higgs boson production in association with a top quark pair). In our quantum simulation study using up to 20 qubits and up to 50000 events, the QSVM-Kernel method performs as well as its classical counterparts in three different platforms from Google Tensorflow Quantum, IBM Quantum and Amazon Braket. Additionally, using 15 qubits and 100 events, the application of the QSVM-Kernel method on the IBM superconducting quantum hardware approaches the performance of a noiseless quantum simulator. Our study confirms that the QSVM-Kernel method can use the large dimensionality of the quantum Hilbert space to replace the classical feature space in realistic physics datasets.


I. INTRODUCTION
In 2012, the ATLAS and CMS experiments discovered the Higgs boson [1,2] using proton-proton collision data at the Large Hadron Collider (LHC). This discovery completed the fundamental particle spectrum of the Standard Model and was a major achievement in High Energy Physics (HEP). As the LHC experiments enter the post-Higgs discovery era, physicists strive to refine the understanding of the Standard Model and pursue new physics beyond the Standard Model. Machine learning has become one of the most powerful tools for exploring the full physics potential of the huge amount of data collected by the LHC experiments. In HEP, the important usage of machine learning techniques [3][4][5][6][7] includes simulation, Quantum machine learning, where machine learning is performed using quantum algorithms, has the potential to improve the computational complexity of classical machine learning algorithms and obtain computational speed-ups when being executed on quantum computers [10]. Some of the quantum algorithms will benefit from exponential improvements in speed. For classification problems, it may also lead to better separation power than classical machine learning. A recent result [11] demonstrates a significant advantage in prediction accuracy for a quantum algorithm over some classical algorithms on engineered datasets. A key component of quantum machine learning algorithms is exploit-ing the high dimensional quantum state space through the actions of superposition and entanglement. With the progress of quantum technologies, quantum machine learning could possibly become a valuable alternative to classical machine learning for processing big data (including simulation, reconstruction and analyses) in High Energy Physics [12]. Recent projections from IBM [13], Google [14], and IonQ [15] suggest that quantum computers with thousands of qubits capable of performing practical computational tasks may become available within the next ten years. This coincides with the High Luminosity upgrades of the LHC (HL-LHC) [16], bringing a massive increase in not only the amount of collected physics data, but also the computational power needed to process that data. Present research on quantum machine learning algorithms could be implemented on those future devices, thereby ensuring the timely exploitation of quantum advantages for physics applications, and possibly even contributing to the discovery of new physics.
The challenges we face for the future include working with large numbers of events in the millions, applying large numbers of qubits in the thousands to obtain stellar performance, and achieving excellent computational speeds. It is the purpose of this publication to inch one step closer to conquering those challenges, even under the present limitations of existing quantum computer technology.
Previous studies [17][18][19] have investigated quantum annealing or quantum classifiers trained with variational circuits. A quantum machine learning algorithm, the support vector machine with a quantum kernel estimator (QSVM-Kernel), was proposed to solve classification problems [20,21]. This algorithm was experimentally implemented with 2 qubits on a superconducting quantum computer and found to be accurate for artificial datasets [20]. QSVM-Kernel leverages the quantum state space as a direct representation of the feature space, which can give rise to kernel functions that are hard to evaluate classically [20,22]. However, any potential quantum advantage does not lie solely in the high dimensionality of the quantum state space, as it is well known that classical kernels can map into feature space of arbitrary high dimensions. Rather, a path towards quantum advantage is found in the computational complexity of the quantum circuits used to compute the quantum kernel. Namely, these circuits must be hard to estimate classically. A recent result [23] establishes an exponential quantum speedup for QSVM-Kernel using a fault-tolerant quantum computer to estimate the kernel function for a classically hard classification problem. Although this particular problem was not practically motivated, the result rigorously formalizes the intuition that quantum feature maps can identify patterns classical machines are unable to capture. From this foundation, quantum feature maps can be designed and tested on practical data sets, potentially leading to better classification results than classical feature maps and kernels.
It is therefore interesting to study the QSVM-Kernel algorithm with a larger number of qubits and evaluate its performance on real-world datasets. In our study, we successfully employ the QSVM-Kernel algorithm in the ttH analysis, a recent LHC flagship physics analysis, using up to 20 qubits on quantum computer simulators and up to 15 qubits on quantum computer hardware. Furthermore, we compare the classification result of the QSVM-Kernel algorithm to a few popular classical algorithms that are commonly used by the LHC experiments.

II. ttH PHYSICS ANALYSIS AT THE LHC
The observation of ttH (Higgs boson production in association with a top quark pair) by the ATLAS and CMS experiments [8,9] was one of the LHC flagship physics results following the Higgs boson discovery. It established a direct observation of the Higgs boson's interaction with the top quark, the heaviest known fundamental particle. Study of the Higgs-top interaction may provide crucial test for the Higgs mechanism of the Standard Model and essential clues for new physics beyond the Standard Model. Due to the small ttH production rate at the LHC, its observation was highly challenging. To achieve the desired sensitivities to ttH production, the ATLAS and CMS collaborations combined results from a number of decay channels. The physics analyses in many of these channels utilize machine learning techniques. For example, classifiers based on machine learning are constructed to analyze kinematic variables of the collision events and distinguish between signal and background.
In our study, we focus on an important ttH analysis channel where the Higgs boson decays into two photons (H → γγ ) and the two top quarks decay into hadrons. The dominant background in this analysis channel is non-resonant two-photon production. See Figure 1 for representative Feynman diagrams for ttH production, H → γγ decay, and non-resonant two-photon production.

III. METHOD
The support vector machine (SVM) [24,25] is one of the most commonly used supervised machine learning algorithms for data classification. Here for a data event, x denotes the vector of its input features and y ∈ {0, 1} denotes its class label (0 for background and 1 for signal). The SVM algorithm maps x into a higher dimensional feature space, where it measures the similarity between any two data events (denoted as "kernel entry", k( x i , x j )). The SVM algorithm then optimizes a hyperplane that separates signal events from background events and classifies a new data event x by where sgn is the sign function, t is the size of the training dataset with known labels y i , and (α i , b) defines the separating hyperplane. Furthermore, a continuous SVM discriminant can be obtained by computing the probabilities of being in the signal class. A main limitation of the classical SVM algorithm is that evaluating kernel entries in a large feature space can be computationally expensive. Three different popular classical kernels are considered to benchmark the performance of the classical SVM method in our study: the linear kernel x T i x j , the polynomial kernel (γ x T i x j ) d , and the RBF kernel exp −γ x i − x j 2 (γ and d are hyperparameters).
A. Quantum Kernel Estimation A quantum version of the SVM with a quantum kernel estimator (QSVM-Kernel) was introduced in Ref [20,21], which leverages the quantum state space as a feature space to efficiently compute kernel entries. This algorithm maps the classical data event x non-linearly to a quantum state of N qubits by applying a quantum feature map circuit U Φ( x) to the initial state |0 ⊗N : It then calculates the kernel entry for data events x i and x j based on the inner product of their quantum states: The kernel entry can be evaluated on a quantum computer by measuring the U † Φ( xi) U Φ( xj ) |0 ⊗N state in the computational basis with repeated measurement shots and recording the probability of collapsing the output into the |0 ⊗N state. In our study, the general design of the quantum circuit for evaluating the kernel entries is inherited from Ref [20] and shown in Figure 3 (a).

B. Quantum Feature Map
As suggested in Ref [20], the mathematical properties of the quantum feature map should be complex enough to be hard to simulate on a classical computer, and simple enough to be executable on noisy intermediate-scale quantum computers. Furthermore, the classification performance of the QSVM-kernel method is highly dependent on the kernel matrix, which is calculated from the quantum states of the feature map. Therefore, choosing a suitable feature map is essential for optimizing the classification performance of the QSVM-kernel method.
The quantum feature map U Φ( x) gives rise to a 2 N dimensional feature space (N is the number of qubits) that is conjectured to be hard to estimate classically [22]. Following Ref [20], the quantum feature map in our study has two repeated layers: where H is a Hadamard gate and U Φ( xi) is a unitary operator that encodes data from a classical event in its parameters. We analyzed various quantum circuit candidates for the unitary operator U Φ( x) and found that the quantum circuit shown in Figure 3 (b) performs the best for our ttH physics analysis. The chosen quantum circuit is constituted by single-qubit rotation gates (A, B and A ), as well as two-qubit CNOT entangling gates. On the k th qubit, given an input feature vector x, an A gate rotates the qubit around the z axis of the Bloch sphere by x k (the kth element of x), a B gate rotates the qubit around the y axis by x d k , and an A gate rotates the qubit around the z axis by ( The entangling operations are arranged in an alternating pattern to yield short-depth circuits for execution on noisy intermediate scale quantum computers. The differences in the quantum feature map between here and Ref [20] are mainly the use of B gates, the different parameterization of rotation angles, and the extension to more qubits.

C. Separating Hyperplane
In the training phase, the kernel entries are evaluated for all data event pairs of the training sample and then used to find a separating hyperplane. In the testing phase, the kernel entries are evaluated between a new data event x and each of the data events from the training sample, which are then used to classify the new data event x according to the separating hyperplane. For both phases, quantum computers are only used to evaluate the kernel entries. Using these kernel entries, the optimization of the separating hyperplane and classification of the new data event are done in classical computers, as for a classical SVM.

D. Analysis dataset
In our study for the ttH analysis, the signal and dominant background processes are generated using Madgraph5 aMC@NLO [26] and Pythia6 [27], and simulated using Delphes [28]. To construct classifiers for the physics processes, we utilize a total of 23 objectbased kinematic variables based on the ATLAS analysis [8]: the transverse momentum (divided by the photon pair invariant mass) and pseudo-rapidity of the two leading photons, the transverse momentum, pseudo-rapidity and b-tagging status of up to six leading jets, as well as the missing transverse momentum. The signal and background processes differ in the distributions of these variables, providing discriminating power for the machine learning algorithms. Examples of the most powerful variable distributions are shown in Figure 2. To match the N qubits used by the QSVM-Kernel algorithm, the 23 kinematic variables are compressed into N variables, using a Principal Component Analysis (PCA) method [29,30]. In the case of 15 qubits, for example, each of the 15 variables is formed by combining the 23 original variables. Afterwards, the N variables are rescaled by where x i,min (x i,max ) is the minimal (maximal) value of the variable x i , so that the variable values range from −1 to +1. This ensures that the rotation angles of A, B and A gates are within [−1, +1], which is found to be slightly more optimal for the ttH analysis than [−π, +π] used in Ref [20].

A. Results from Quantum Computer Simulators
To classify the signal and dominant background processes for the ttH analysis, we employ the QSVM-Kernel algorithm using up to 20 qubits on the qsim Simulator from the Google TensorFlow Quantum framework [31], the StatevectorSimulator from the IBM Quantum framework [32] and the Local Simulator from the Amazon Braket framework [33]. on, they are referred to as the Google framework, IBM framework and Amazon framework. All three simulators model the noiseless execution of quantum computer hardware and evaluate the resulting quantum state vector. They represent ideal quantum hardware that can perform infinite measurement shots and experience no hardware device noise. With training variables processed by PCA, we perform our analysis for a number of dataset sizes. For a given dataset size, we prepare 60 statistically independent datasets to reduce the impact of statistical fluctuations. Each of the datasets consists of two samples of the same size: a training sample and a test sample. (In this study, a dataset of size n indicates a training sample of n events and a test sample of n events.) We have overcome the challenges of intensive computing resources needed for processing the datasets of up to 50000 events on the quantum computer simulators. Using the training sample, we adopt a cross-validation procedure [34,35] to tune the SVM regularization hyperparameter that controls the size of the margin between the separating hyperplane and the  data points in the feature space. With the same datasets and the same training variables, we also construct a classical SVM [24,25] classifier using the scikit-learn package [36] and a classical BDT [37,38] classifier using the XGBoost package [39]. The classical SVM and BDT serve as benchmarks for classical machine learning algorithms. We again perform cross-validation on the training sample to tune the hyper-parameters of the two classical algorithms. For the classical SVM, we optimize the choice of the classical kernel, the unique hyperparameters for each kernel, and the SVM regularization hyperparameter. The optimized hyperparameters for the classical BDT include the maximum tree depth and the learning rate. The other BDT hyperparameters were found to be irrelevant to our study.
To study the discrimination power of each classifier, we produce Receiver Operating Characteristic (ROC) curves that plot background rejection versus signal efficiency, as well as areas under the ROC curves (AUCs). ROC curves and AUCs are standard metrics in machine learning applications. The use of ROC curves and AUCs in this study is inspired by Ref. [17]. We first show the ROC curves of various classifiers using the ttH analysis datasets of 20000 events and 15 input variables in Figure 4. Each curve represents results averaged over 60 statistically independent datasets. Figure 4 (a) overlays the results of the QSVM-Kernel algorithm (from the Google framework), the classical SVM algorithm and the classical BDT algorithm. Figure 4 (b) overlays the QSVM-Kernel results from the Google framework, IBM framework and Amazon framework. Here the QSVM-Kernel classifiers employ 15 qubits on the quantum simulators. We observe that, for these ttH analysis datasets, the QSVM-Kernel performances are similar to the performances given by the two commonly used classical machine learning algorithms. Furthermore, the three quantum computer simulators, from the Google framework, IBM framework and Amazon framework, provide identical classification performances using the QSVM-Kernel algorithm.
Based on the classifier discriminant, we could perform an event selection in order to maximize S/ (B), where S is the number of selected signal events and B is the number of selected background events. S/ (B) is an approximation of the statistical significance of the signal process, and usually correlated with the AUC of the classifier. For the above-mentioned QSVM-Kernel classifier in the ttH analysis, if applying a selection with a signal acceptance of 70%, approximately 92% of background events will be rejected and hence S/ (B) will be improved by around 150% with respect to no selection.
Our observation becomes more clear in Figure 5, where we study the AUC for various classifiers as a function of the ttH analysis dataset size (10000 to 50000 events). Figure 5 (a) shows the results of the QSVM-Kernel (from the Google framework), the classical SVM and the classical BDT. Figure 5 (b) further shows the difference between the QSVM-Kernel algorithm and the classical algorithms. Figure 5 (c) shows the QSVM-Kernel results from the Google framework, IBM framework and Amazon framework. Here all the classifiers use the same 15 variables and the QSVM-Kernel classifiers employ 15 qubits on the quantum simulators. The quoted AUCs are averaged over 60 statistically independent datasets and the quoted errors are the standard deviations for the AUCs of the 60 datasets. We find that, the performance of all methods improve with increasing dataset size. For 15 qubits and up to 50000 events, the QSVM-Kernel algorithm performs similarly to the classical SVM and classical BDT algorithms. Furthermore, the QSVM-Kernel performances from the three different quantum computer simulators (Google, IBM and Amazon) are comparable.
We also investigate the AUCs of the QSVM-Kernel algorithm as a function of the number of qubits (10 to 20 qubits), as shown in Figure 6. The number of qubits is equal to the number of input variables using PCA as described in Section 3. The 60 statistically independent ttH analysis datasets of 20000 events are used in this study. In Figure 6 (a), we compare the results of the QSVM-Kernel (from the Google framework) with the classical SVM and classical BDT using the same input variables. In Figure 6 (b), we further display the  0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1  difference between the QSVM-Kernel results and the classical machine learning results. In Figure 6 (c), again, we compare the Google framework, IBM framework and Amazon framework for the QSVM-Kernel results. We find that, the QSVM-Kernel result with 15 qubits is better than 10 qubits and similar to 20 qubits. For 10 to 20 qubits and 20000 events, the performance of the QSVM-Kernel algorithm is similar to that of the classical SVM algorithm. Again, the three quantum computer simulators (Google, IBM and Amazon) yield the same classification power.

B. Results from Quantum Computer Hardware
After the studies using simulation of the ideal quantum computers, it is now of great interest to assess the quantum machine learning performances on today's  noisy quantum computer hardware. For the ttH physics analysis, we employ the QSVM-Kernel algorithm on the IBM "ibmq paris" quantum computer hardware. "ibmq paris" is a 27-qubit quantum processor based on superconducting electronic circuits. The qubit map of the "ibmq paris" quantum system [40] is shown in Figure 7. Due to limited access time available to us, we performed six runs using 15 qubits on "ibmq paris". Each run processes a statistically independent dataset of 100 events. For these six runs, the average running time on the quantum hardware is approximately 680 minutes. With more advanced quantum hardware in the future, the running time is expected to be significantly reduced. The quantum circuit of the hardware runs is kept the same as for the simulator runs, while the SVM regularization hyperparameter is separately optimized for hardware and simulator runs. To reduce statistical uncertainties in evaluating kernel entries on quantum hardware, we use 8192 measurement shots for every kernel entry.
In Figure 8, we present the ROC curve of the QSVM- Figure 7. The qubit map of the "ibmq paris" quantum system [40]. The (darker) colors indicate (lower) readout error rates of the qubits and CNOT error rates of the connections. Our study uses qubits 3, 5, 8, 11, 14, 16, 19, 22, 25, 24, 23, 21, 18, 15 and 12.  Kernel classifier with the "ibmq paris" quantum computer hardware using the ttH analysis datasets of 100 events. For comparison, we overlay the ROC curve with the StatevectorSimulator from the IBM Quantum framework using the same datasets. The results are averaged over the six hardware runs. All the QSVM-Kernel classifiers use 15 qubits and the same 15 variables. In Figure 9, we compare the ROC curve with the "ibmq paris" quantum computer hardware and the ROC curve with the StatevectorSimulator for each of the six hardware runs. With small training samples of 100 events, the performance achieved by the "ibmq paris" quantum computer hardware is promising and approaching the noiseless quantum computer simulator. The difference between the hardware performance and the simulator performance is likely due to the effect of quantum hardware noise and fluctuates among our hardware runs.  . ROC curve with the "ibmq paris" quantum computer hardware and ROC curve with the StatevectorSimulator from the IBM Quantum framework for each of the six hardware runs. Each run processes a statistically independent dataset of 100 events. All the QSVM-Kernel classifiers are using 15 qubits and the same 15 variables.

V. CONCLUSION
In this study, we have successfully employed the quantum support vector machine kernel (QSVM-Kernel) method in the ttH (Higgs boson production in associa-tion with a top quark pair) physics analysis, a recent LHC flagship physics analysis, on gate-model quantum computer simulators and hardware. The simulation study has been performed using the Google TensorFlow Quantum framework, IBM Quantum framework and Amazon Braket framework. We have overcome the challenges of intensive computing resources in the cases of up to 20 qubits and up to 50000 events on the quantum computer simulators, in order to perform quantum machine learning studies on physics datasets that closely resemble those used in the official ATLAS publication [8]. The QSVM-Kernel method achieves good classification performance that is similar to the performances of the classical machine learning methods currently used in LHC physics analyses, classical SVM and classical BDT for example. On the "ibmq paris" superconducting quantum computer hardware, we have also employed the QSVM-Kernel algorithm using 100 events and 15 qubits to assess the effect of quantum hardware noise. The performance achieved on the "ibmq paris" quantum hardware is promising and is approaching the performance from the noiseless quantum simulators.
Our quantum simulation result gives an example that quantum machine learning performs as well as its classical counterpart using three different platforms (Google, IBM and Amazon) for realistic high energy physics analysis datasets. Furthermore, our result on noisy quantum hardware provides important validation for the result on noiseless quantum simulators. Our studies confirm that the QSVM-Kernel algorithm can use the large dimensionality of the quantum Hilbert space to replace the classical feature space. In the future, large improvement in computational speed and reduction in device noise on quantum computing hardware will likely be achieved and lead to quantum advantage in quantum machine learning applications. With the large investments in quantum computing and fierce competitions in technology, this expectation is realistic. Therefore, we predict that quantum machine learning will become a powerful tool for data analysis in High Energy Physics.