Reconstructing charged particle track segments with a quantum-enhanced support vector machine

Reconstructing the trajectories of charged particles from the collection of hits they leave in the detectors of collider experiments like those at the Large Hadron Collider (LHC) is a challenging combinatorics problem and computationally intensive. The ten-fold increase in the delivered luminosity at the upgraded High Luminosity LHC will result in a very densely populated detector environment. The time taken by conventional techniques for reconstructing particle tracks scales worse than quadratically with track density. Accurately and efficiently assigning the collection of hits left in the tracking detector to the correct particle will be a computational bottleneck and has motivated studying possible alternative approaches. This paper presents a quantum-enhanced machine learning algorithm that uses a support vector machine (SVM) with a quantum-estimated kernel to classify a set of three hits (triplets) as either belonging to or not belonging to the same particle track. The performance of the algorithm is then compared to a fully classical SVM. The quantum algorithm shows an improvement in accuracy versus the classical algorithm for the innermost layers of the detector that are expected to be important for the initial seeding step of track reconstruction.


INTRODUCTION
The Large Hadron Collider (LHC) is currently the highest energy particle collider in the world.It accelerates beams of protons to almost the speed of light and then collides them at a centre-of-mass energy of 13.6 TeV at the centre of large, multi-purpose particle detectors that are designed to reconstruct the outcome of those collisions.Among the key physics objectives of the LHC are precise measurements of the properties of the Higgs boson, shedding light on the elusive particle(s) that may constitute dark matter, and searching for a wide breadth of new physics phenomena beyond the Standard Model (SM) via exotic decay signatures like long-lived particles.
To attain these physics goals, the LHC is preparing for an upgrade that will deliver an order of magnitude more data to the experiments by increasing the intensity of the proton beams, resulting in a higher instantaneous luminosity and thus many more collisions taking place every time the proton bunches cross [1].At this upgraded High Luminosity LHC (HL-LHC) the number of concurrent, overlapping proton-proton interactions (pileup) is expected to reach up to 200, a significant increase from the current average pileup of 40.Such a step change in the running conditions of the collider will significantly increase our capabilities to fulfil the goals of the LHC programme.However, it also presents challenges.The significant increase in detector occupancy will impact the performance of the entire pipeline, including data acquisition, processing, and analysis, as well as simulating the collisions in the detector.This presents significant overhead on the computational resources, with some elements, such as reconstructing charged particle trajectories, becoming a major bottleneck.
To address these high demands on the computational resources, numerous approaches are being pursued, ranging from the development of more efficient algorithms and the application of state-of-the-art machine learning techniques to the use of graphics processing units (GPUs) [2,3] to execute code that is parallelizable.One of the intriguing new avenues being pursued to tackle these challenges is quantum computing.This new paradigm offers a fundamentally new form of computing by leveraging the phenomena of quantum mechanics and opens the prospect of significantly speeding up our current algorithms and performing calculations that could only be done to some approximation with classical computers.
Particle physics has seen a surge of interest in ascertaining how quantum computers may impact the future of the field and establishing the scenarios in which they may be most advantageous.The current Noisy Intermediate Scale Quantum (NISQ) devices [4], while a stepping stone on the way to universal, fault-tolerant quantum computers, have enabled many of these proof-of-principle studies to be performed.This exploratory phase of applying current NISQ era quantum computers to challenging problems in particle physics will pave the way for the emergence of new ideas and techniques needed to fully exploit quantum computation and identify the specific problems for which they are most suitable.
Quantum computing algorithms have been studied for a range of different scenarios in high energy physics.The calculation of simple scattering processes via the helicity spinor formalism and the simulation of a parton shower was demonstrated in [5].A quantum walk framework was proposed in [6], demonstrating that the parton shower is more naturally and efficiently simulated using a quantum walk in two dimensions.Quantum computing has also been applied to jet clustering [7][8][9], classification of collisions of interest from those that are not [10][11][12], and anomaly detection in searches for new physics [13].
The challenging task of connecting the hits left by charged particles in the tracking detector and associating them with the same particle has been studied from several different perspectives including: quantum associative memory to store all the different track patterns and subsequently employ Grover's search algorithm to search through the database and recall the right track pattern [14]; quantum graph neural networks [15,16]; and quantum annealing devices to minimise an objective function [17][18][19].
This paper approaches the problem of track reconstruction by proposing a hybrid quantum-classical algorithm that uses a support vector machine with a quantum-estimated kernel.The problem is decomposed into that of classifying short segments of tracks.Often such segments can form the 'seeds' for extrapolating to the full trajectory of the track.This 'seeding' step is expected to be a large consumer of CPU time at the HL-LHC [2].Simplifications are implemented to fit the limitations of the presently available quantum simulators.

DATA PREPROCESSING
This study utilises the TrackML dataset [20,21] which has been widely used for proof-of-principle studies of classical machine learning algorithms and quantum-based approaches.The dataset provides a simplified simulation of the detector geometry and conditions expected at the HL-LHC.It features a silicon tracking detector with 9 cylindrical layers in the central region and disk geometry in the forward regions, which is typically representative of the ATLAS [22] and CMS detectors [23].The detector is segmented into three sub-detectors differing in their spatial resolution, with the inner pixel detector comprising of 4 layers, followed by a short strip detector of 4 layers and then a 2-layer long strip detector.These tracking detectors are immersed in a strong magnetic field aligned with the direction of the protonproton beam, so charged particles moving through these detectors will typically follow an approximately helical trajectory and show curved trajectories in the transverse x − y plane, the plane perpendicular to the beam line.Figure 1 shows the layout of this virtual detector used to produce the TrackML dataset and the coverage of each sub-detector in the r − z plane, where r is the radial dimension and measures the distance from the beam line and z is the distance along the beam line.For the analysis in this paper, only the hits in the barrel region of the detector are used to reduce the total number of hits to a level that can be processed within the current computational constraints.
FIG. 1.A schematic of the virtual general-purpose detector simulated in the TrackML challenge and the coverage of each sub-detector in the r-z plane.Highlighted is the barrel region used in the analysis.The numbers indicate the various detector components and layers respectively.Original image is taken from [20].
The TrackML dataset contains 10,000 simulated events.The process of interest is top-antitop production and overlaid on this 'signal' are 200 additional protonproton collisions to simulate the conditions expected at the HL-LHC.This results in an average of 100,000 hits per event in the tracking detector which must be associated with approximately 10,000 tracks.
The 3-dimensional spatial information for every hit in the detector is provided and this information is used to build the track candidates.The total number of possible combinations of those hits that can lead to a track is very large.Identifying the correct combination of hits that reconstruct the true trajectory of a particle is thus a challenging combinatorics problem. Figure 2 illustrates this by showing all hits for an event in the x-y plane of the detector and a fraction of true reconstructed tracks formed from a combination of those hits.To avoid unphysical hit sequences which would dominate the resulting dataset, selection criteria are applied to reduce the number of possible connections between hits in each event such that they can be processed without overburdening the computational resources.In addition, the problem is formulated as a classification task, with track segments consisting of a set of 3 hits in adjacent layers of the detector, called triplets, being classified as belonging to a single particle track or not.
The hits are described by three coordinates; r, φ, and z, where φ is defined as the angle around the z axis.A total of 300 events have been processed for classification.The first step in constructing the triplets is to make a dataset of doublets, which are defined as two consecutive hits in the detector.Selection criteria are applied to reduce the size of the doublet dataset and improve its quality.The following observables are used in the selection; the intercept from the extrapolation of the doublet to the z axis, z 0 , and the ratio ∆φ ∆r , as calculated from the difference in φ and r between each hit forming the doublet.This selection is summarised in Table I, and was originally implemented in [24].
The selection of triplets is based on the estimation of the transverse momentum (p T ) as determined from the three hits, the θ-breaking angle and the φ-breaking angle.The angle θ is defined in the r-z plane and a breaking angle is that between the straight lines (connecting the two hits in a doublet) of two doublets that form a triplet.The triplet selection is summarised in Table II.

SUPPORT VECTOR MACHINE
The proposed algorithm utilises a support vector machine (SVM) [25], where a kernel function is calculated either on a (simulated) quantum or a classical computer.A support vector machine is a supervised machine learning algorithm that classifies data by drawing linear decision boundaries (hyperplanes) between different groups of data.This paper focuses on discriminating between two classes of data.It takes a training dataset of size N of the form (x 1 , y 1 ), . . ., (x N , y N ), where x i is an Mdimensional vector and y i = ±1 for data that belongs to one of two classes.The hyperplane is defined by w • x + b = 0, where w is the normal vector to the hyperplane and b is an offset.These parameters are determined during the learning process.For the simple case of linearly-separable data, the training points x i of the two classes are placed on either side of the decision boundary, satisfying f (x i ) = sign w • x i + b = y i , where f (x) is called the decision function.The points closest to the hyperplane are called support vectors and the distance between them and the hyperplane is called the margin.The goal is to optimise the parameters of the hyperplane such that the margin is maximised.Figure 3 shows a visual representation of this.Once the hyperplane has been found, a previously unseen data point z can be classified using the decision function.
The decision boundary is usually defined not in the original data space but in a higher-dimensional feature space obtained with a feature map φ(x).This can introduce non-linearity whilst keeping the decision boundary linear.The goal of this operation is to achieve better separation of the two classes.Figure 4 shows a simple example of a feature map's functionality.SVMs are an example of a kernel method, where the kernel k(x, y) = φ(x)•φ(y) is a function with arguments in the original space of the data, defining a distance measure between two points in the feature space.The remarkable property of this function is that it returns the inner product in the feature space, sidestepping the explicit application of the feature map, which can become computationally expensive for sophisticated feature spaces.In support vector machines, this property can be utilised to find the separation hyperplane.This is possible because linear learning machines can be expressed in a dual representation, following the Karush-Kuhn-Tucker theory [26].During the optimisation of the dual problem, one needs to find a kernel matrix K x,y = k(x, y) (an N × N symmetric matrix) from all pairs of training data points.Expressed in its dual form, the decision function becomes: where α i are the coefficients which need to be optimised.Just like quantum computing, kernel methods perform implicit computations in a possibly intractablylarge Hilbert space through the efficient manipulation of data inputs.

QUANTUM KERNEL ESTIMATION
Quantum computers can be utilised in kernel methods if one considers a quantum circuit U(x) whose gates are parametrised by the original features of some classical data.The result of such a circuit before measurement is a quantum state which exists in a higher-dimensional FIG. 4. A visual representation of a simple feature map that takes an inseparable dataset in one-dimension to a twodimensional feature space.Separation with a linear hyperplane is possible in the new feature space.
Hilbert space.This is equivalent to a feature map.The quantum state is defined as [27]: where |• denotes the usual Dirac vector and ρ(x) is obtained via with an initial state ρ 0 .An all-zero initial state is used with |ψ 0 = |0 ⊗M .The kernel associated with such a feature map is obtained from [28]: This inner product can be calculated from the transition amplitude of two states; The circuit U † (z)U(x)|0 ⊗M is run repeatedly over R identical runs (shots).The fraction of measurements yielding an all-zero output gives an estimation of the kernel function for the two points x and z, which forms an entry in the kernel matrix.Repeated evaluations for all combinations of the input dataset give the full kernel matrix.Similar states have large kernel matrix entries while orthogonal points give k(x, z) = 0. Feature maps of particular interest are those that are difficult to calculate using classical means whilst providing good classification of data.Ideally, a kernel matrix resulting from Eq. 4 would produce results better than any classical classifier and be calculated significantly faster on a quantum device.The kernel function proposed in [28] is based on the 3-fold forrelation ('Fourier correlation') problem [29].The function is conjectured to have an exponential separation in complexity between its quantum and classical estimation.Further discussion of the potential for speedup is presented later.
The kernel-generating circuit is of the form U(x) = U φ(x) H ⊗M U φ(x) H ⊗M where H is the Hadamard gate and Z i is a gate rotating the i-th qubit around the Z axis on the Bloch sphere by an amount defined by φ S (x).S denotes a subset of qubits.Only subsets with |S| ≤ 2 are considered.The circuit for kernel estimation with U(x) in the case of 3-dimensional data is shown in Fig. 5. Ideas for generalising the circuit have been proposed in [30] and [31].Following the latter, we implement unitaries of the form: where σ a ∈ X, Y, Z and α is a constant factor to regulate the degree of rotation of the qubits.An example of U φ(x) for 3-dimensional data can be found in Fig. 6.This quantum-estimated kernel is then used as input to a classical support vector machine which performs the training and classification.The full circuit thus takes events of dimension M and projects them into an 2 Mdimensional quantum space where the hyperplane separating the two classes of data is calculated.

RESULTS
Results from the quantum-enhanced algorithm presented in this section were obtained with α = 0.1, φ k (x) = x k , σ a k = Z for single qubit rotations and φ l,m (x) = (π − x l )(π − x m ), σ a l,m = Y l Y m for two-qubit rotations.These were compared to an RBF kernel [32], defined as K RBF (x, z) = exp −γ x − z 2 with γ = 1.To reduce generalisation error and model complexity, a regularisation term C can be included into the optimisation loss function.The optimal coefficients and classical kernel type were found using a grid search [33] with cross validation and a parameter scan optimising for validation score and training time.Large values of C result in a larger penalty for overfitting.The optimal value of C = 10 6 determined from this optimisation procedure was used in both the classical and quantum kernels.
The metrics used to quantify the performance of the classifiers are defined below, through the confusion matrix shown in Table III.
A good model is expected to score high in all three metrics.Accuracy gives an overall percentage of correct guesses, efficiency is the fraction of actual true objects correctly recognised and purity measures how often the model mistakes a fake object for a true one.

Full detector triplets
The dataset used in the classification consists of triplets that passed the preprocessing step described above, which ensures a sample purity of 52% and 80% for doublets and triplets, respectively and a 99% sample efficiency in both datasets.An average of 4,600 triplets remain per event.The spatial coordinates of each hit in the triplet are used as the input data to the hybrid algorithm.This results in a 9-qubit circuit.To accommodate the dataset into our current computational constraints, it is further divided into 16 equal sections in the φ plane, with each section subtending 2π  16 radians in φ.A support vector machine is then defined for each of these regions and the relevant quantum kernel estimated.The data is divided into 50 events for training and 15 events for testing, equivalent to a total of around 230,000 and 70,000 triplets, respectively.The performance of both the classical algorithm and the quantum algorithm are evaluated using the three metrics defined above; accuracy, efficiency and purity.Furthermore, since the preprocessing step selects triplets that are more likely to form track candidates, a benchmark scenario is introduced to illustrate the performance of the classical and quantum algorithms on top of the preprocessed data.Triplets are randomly selected from the dataset and the classical and quantum classifiers are compared against this benchmark to demonstrate the improvements in classification accuracy.
Figure 7 shows the dependence of the efficiency and purity scores on relevant geometric and kinematic properties of the triplets; the angle φ of the first (innermost) hit of the triplet, the pseudorapidity |η| of the triplet, the true p T of the triplet, the number of true particles in the event (particle multiplicity), and the number of hits corresponding to a true track (track length).For fake triplets, there is some ambiguity in determining the true p T and track length, as the three hits can come from three separate particles.In such cases, the choice was made to define these variables using the particle associated with the first hit.The two classifiers show mostly comparable performance and a similar dependence on the observables defined above.Efficiencies close to 1.0 are achieved for most bins.Reduced values of the purity are observed in regions with reduced number of triplets for training, such as high-η and high-p T .The accuracy scores of the classical and quantum algorithms as a function of the size of the training data are shown in Fig. 8.The training size grows up to the computational limit imposed by the quantum simulator.Whilst the overall performance of the two algorithms follows a similar trend, the classical algorithm performs slightly better at low training size and the quantum algorithm shows a small advantage for training size above 6,000 triplets.Both algorithms significantly outperform the benchmark scenario of randomly  It is also instructive to study the performance of these algorithms for different layers of the detector, as the detector occupancy progressively decreases from the inner to the outer layers.Figure 9 shows the comparison between the quantum and classical algorithms for the efficiency and purity in layers 1-9 of the detector.While the efficiency and purity are similar between the two algorithms for the full detector, the largest difference in purity occurs for triplets identified in the first layer of the detector (the first hit is in the first layer).Since triplet formation is part of the seeding step used in many track reconstruction algorithms, we study the performance of our algorithms for triplets identified in the inner detector, with the first hit being in the first layer of the detector.

Innermost triplets
This section presents results when restricting the classification of triplets to those in the innermost layers of the tracking detector, with the three hits of a triplet in the first three layers.The reduction in the total number of triplets per event allows more events to be processed before reaching the computational limit.The dataset is split into 240 events for training and 60 events for testing, equivalent to about 230,000 and 60,000 triplets, respectively.The efficiency and purity as a function of triplet parameters are shown in Fig. 10 and the accuracy is shown as a function of the size of training data in Fig. 11.The accuracy indicates a clearer separation between the quantum and classical performances.We see a continued trend of better purity for the quantumenhanced classifier and a more comparable performance in terms of efficiency.FIG. 10.Track reconstruction efficiency and purity for triplets in the inner detector barrel region as a function of φ, |η|, pT , particle multiplicity, and the number of hits associated with the track (track length).These are compared for the quantum-estimated kernel and a classical kernel.

POTENTIALS OF QUANTUM SPEEDUP
In general, we can assume that the evaluation of the matrix elements dominates the complexity of the SVM [34].Thus, the complexity of the algorithm is C = O(βN 2 ), where the β factor depends on the kernel type and method used.For a single value of the kernel, the quantum complexity is β Q = O( −2 ) for some additive error [28,35].The current best classical algorithm proposed in [36] 3 ).Demanding that β Q < β C for = 10 −3 results in a possible advantage for M ≈ 20.
In [35] it has been shown that in order for the full kernel matrix to approximate the ideal kernel, the propagation of the desired accuracy into the classification with an SVM causes β Q to pick up a non-negligible scaling with training size N ; . It is possible that a similar analysis could introduce an N -scaling to β C .Regardless, it appears that at the current stage, quantum kernel estimation could provide possible advantage for small datasets where the data has many features.For the triplet classification data presented in this paper, nine features were used.We envisage that possible speedup may be obtained if we extend the length of the track segments considered in the classification task.
Another point to consider for quantum kernel estimation is the noise on current quantum devices.In [37], the authors show that the presence of noise can cause kernel entries produced with Eq. 6 evaluated over different input data to concentrate around some fixed value.The difference between any kernel entry and that value becomes exponentially small with the number of qubits.This results in an exponential number of shots necessary to resolve kernel entries for successful training.This dependence would have to be added into β Q in order for the required precision to be obtained.Some proposals for different quantum kernel estimation methods can be found in [35], where a probabilistic algorithm calculates only a subset of the kernel entries and [38] where the estimation of the entire kernel matrix scales linearly with training size N .Further studies could include empirical tests of classical and quantum complexities, study of noise effects in simulations and real quantum devices as well as implementation of other proposed quantum kernel estimation techniques in the context of high energy physics.

SUMMARY
Reconstructing the trajectories of charged particles at particle colliders like the Large Hadron Collider is a challenging, computationally intensive problem.This is expected to become increasingly complex with the upgraded Large Hadron Collider (HL-LHC) where O(10 5 ) hits in the tracking detector must be quickly and accurately connected to form around 10,000 tracks.This paper presents a hybrid quantum-classical algorithm with a support vector machine (SVM) using a quantumestimated kernel to classify track segments or seeds for this challenging track reconstruction problem.Using a publicly available dataset that simulates a generic particle detector for the HL-LHC, we apply a selection criteria to select doublets (set of two consecutive hits) and subsequently triplets (set of three consecutive hits).The proposed algorithm classifies these triplets as either belonging to a particle track or not.A comparison is made between the performance of a quantum-estimated kernel, a classical kernel, and randomly selected triplets from the dataset.A similar level of performance is achieved for the quantum and classical algorithms.However, when only the triplets from the inner part of the tracking detector are considered, the quantum algorithm shows an improvement in accuracy scores against the classical algorithm.This is promising as the innermost layers are expected to be the most important for the seeding procedure at the HL-LHC.This is the first implementation of a quantum-kernel SVM approach to the track reconstruction problem.

FIG. 2 .
FIG.2.The 6518 hits in an example event in the x-y detector plane (top) and a fraction of the true tracks reconstructed from those hits (bottom).The hits come from 879 particles which produced triplets in the barrel region.The 10 layers of the detector for the barrel region are also shown.The blue, red, and green layers correspond to the pixel, short strip and long strip detectors, respectively.

≤ 6 ×
10 4 [ rad mm ] |z0| ≤ 100 [mm] TABLE I.The selection criteria applied to select doublets, using the z0 intercept from the extrapolation of the doublet to the z axis and the ratio of the difference in φ and r between each hit forming the doublet.Variable Selection θ-breaking angle ≤ 0.05 − 0.07 [rad] φ-breaking angle ≤ 0.05 − 0.12 [rad] pT ≥ 0.75 [GeV] TABLE II.The selection criteria applied to select triplets based on the estimated pT and the θ and φ angles between two doublets that form a triplet.A range of values is given when the selection depends upon detector components traversed.

FIG. 3 .
FIG. 3. A visual representation of two classes of data in a 2dimensional space, separated by a hyperplane w•x +b (solid line).The highlighted points lying closest to the separation plane are called support vectors and the dotted lines passing through them define the margins.

FIG. 5 .
FIG.5.Quantum circuit diagram used to estimate the kernel and determine the inner product between two quantum states shown for data with three features.

FIG. 6 .
FIG. 6. Circuit diagram used to calculate U φ(x) in the full circuit, shown for a datapoint with three features, which correspond to the spatial coordinates of a single hit in the tracking detector.The single-qubit gates are shown in pink and two-qubit gates in blue.
Predicted positive Predicted negative Actual positive True positive (TP) False negative (FN) Actual negative False positive (FP) True negative (TN) TABLE III.Confusion matrix used in defining the performance metrics for the classifiers used in triplet recognition.Accuracy = TP+TN TP+FP+TN+FN ,

FIG. 7 .
FIG.7.Track reconstruction efficiency and purity for triplets in the barrel detector as a function of φ, |η|, pT , particle multiplicity, and the number of hits associated with the track (track length).These are compared for the quantum-estimated kernel and the classical kernel.

FIG. 8 .
FIG. 8. Accuracy to identify triplets in the barrel detector as a function of the size of the training dataset for the quantum-estimated kernel, classical kernel and selecting random triplets from the preprocessed dataset.

FIG. 11 .
FIG. 11.Accuracy to identify triplets in the inner detector barrel region as a function of the size of the training data for the quantum-estimated kernel, classical kernel and selecting random triplets from the preprocessed data.