Quantum-enhanced cluster detection in physical images

Identifying clusters in data is an important task in many fields. In this paper, we consider situations in which data live in a physical world, so we have to first collect the images using sensors before clustering them. Using sensors enhanced by quantum entanglement, we can image surfaces more accurately than using purely classical strategies. However, it is not immediately obvious if the advantage we gain is robust enough to survive data processing steps such as clustering. It has previously been found that using quantum-enhanced sensors for imaging and pattern recognition can give an advantage for supervised learning tasks, and here we demonstrate that this advantage also holds for an unsupervised learning task, namely clustering.


I. INTRODUCTION
Pattern recognition is an important task in the field of data processing 1 .The goal is to recognize the presence of a pattern -or to identify which of a number of possible patterns are present -in data.Automated pattern recognition can be accomplished via machine learning.This could be either supervised or unsupervised learning.In supervised machine learning, we have a set of data points that fit the patterns that we are trying to recognize (labeled by which of the patterns they fit), and the algorithm must learn from this set how to classify previously unseen data points (i.e.how to find the pattern most closely matched by the unseen data).In unsupervised learning, the goal is to identify patterns in data without the use of sample data.
Clustering is one example of an unsupervised learning problem.The goal of clustering is to identify clusters of similar data points in a data set.Part of the problem is defining what constitutes a cluster in the first place 2 .Intuitively, a cluster should be an region of the parameter space in which there is a high density of data points with similar values, and any two clusters should be distinct from each other.Often, there will need to be a trade off between these two intuitive ideas.How this intuition is formalized depends on the type of algorithm used, and so different types of algorithm can find different clusterings.
In some settings, the data will first need to be collected before processing it 3 .For instance, suppose we have a surface that we wish to image and identify clusters on.In a microscopy setting, we may wish to find clusters of similar particles on a surface 4 .Kerr microscopy is often used to identify domains in thin magnetic films 5 , which can be regarded as a clustering problem.In a biological setting, we may want to identify structures in cells 6,7 .In quantum reading 8 , we have a data-set encoded in set of pixels with different values, and we may want to identify clustering or structure in the underlying data.
The improved accuracy offered by using quantum measurements has previously been shown to provide an advantage for supervised learning tasks 9,10 .We might therefore expect a similar advantage for unsupervised learning tasks, such as clustering.In this paper, we compare protocols that use classical measurements for data collection with protocols that use quantum measurements, in order to establish whether such an advantage exists.
We note that this is distinct from using quantum computers to implement classical machine learning algorithms 11 , and the improvement that we are looking to prove here is in accuracy, as in quantum metrology 12 , rather than a speedup in computation time 13 .If we start with classical information, we cannot gain an accuracy advantage, because any operation that can be done by a quantum computer can be performed by a classical computer (albeit slower).The aim of gaining accuracy rather than speed is similar to Refs.[14] and [15], which consider clustering of quantum states, rather than using quantum computers to speed up the clustering of classical states.
The accuracy advantage that we can gain is due to the improved imaging possible due to using quantum probes.Although one might intuitively expect that an improvement in the data used for clustering will lead to improved cluster detection, it is not obvious that the quantum ad-vantage in imaging is significant enough to affect the results of a clustering algorithm, since it could be the case that a data processing step could reduce the advantage due to the better data to the point of any advantage becoming negligible.In this paper, we show that a quantum advantage in imaging is robust enough to survive a clustering algorithm.
In Section II, we outline the imaging and clustering task that we are considering.In Section III, we give a brief overview of classical clustering algorithms and how clusterings can be validated and compared.In Section IV, we carry out a numerical comparison of quantum and classical protocols for specific clustering tasks by simulating them using MATLAB.Section V is for conclusions.Some details of the numerical comparison in Section IV are found in the appendices.In Appendix A, we calculate the receiver operating characteristic used in our numerical studies for both the classical and the quantum cases.In Appendix B, we explain how we calculate the mutual information values in the numerical studies.The code used to carry out the calculations and generate the plots is available as Supplemental Material 16 .

II. OVERVIEW OF THE TASK
Let us formalize our setting by considering a surface that has been divided into a grid of pixels.We believe that the surface contains some form of underlying structure (i.e. a global property of the surface), and we are interested in identifying it.In Fig. 1, this ground truth is represented by (a).The ground truth could be the positions, numbers, or shapes of clusters of particles (in the setting of microscopy) or could be how the surface is divided into domains (in the setting of microscopic films), etc.It consists of everything (and only the things) that we want to find out.In the example shown in Fig. 1, this is the number of 2 × 3 blocks of dark pixels.
We mathematically represent the surface as a multichannel, with each pixel being represented by a quantum channel, C i , where the label denotes the position of the pixel ((b) in Fig. 1).These channels represent the interactions of the pixels with any probes sent at them.Each channel is drawn from a set of possible channels (such as a set of lossy channels with different reflectivities), and which channel we have is determined by some underlying classical parameter, ϕ i .The identity of the channel enacted by a given pixel is determined by ϕ i , so that identifying the channel C i is equivalent to measuring ϕ i .The data set {ϕ i } is what we call the channel pattern and is what we are trying to find out by imaging the surface.For instance, the presence or absence of a particle could each correspond to one of two possible quantum channels.In this binary case of each pixel being one of only two possible channels, each ϕ i will be drawn from the set {0, 1}.If we have more possible channels, these could each correspond to different types of particles being present (and we would have more possible values of ϕ i ).In the shown example, a cluster is a block of 2 by 3 dark pixels (which could represent collections of particles, structures, etc).The number of clusters (a) is the only piece of information that we are interested in, rather than the position of the clusters on the surface (if we were interested in the positions, that information would also be part of the ground truth).(b) is how this ground truth corresponds to the physical multi-channel that we probe.Here, each pixel is one of two types of channel: gray, representing the presence of a particle, or white, representing the absence of particles.The two clusters are outlined in blue, but note that there are two gray pixels that are not part of either of the clusters.These could represent, for instance, particles that do not form part of one of the structures that we are looking for.If we had perfect knowledge of the state of (b), we would then carry out classical clustering algorithms in order to find the two outlined clusters and recover the value of (a).Instead, we carry out measurements on the multichannel in order to estimate (b), with the result being (c), where red squares represent pixels we believe to be gray in (b) and yellow squares represent pixels we believe to be white.Our measurement process cannot perfectly reproduce (b), and so some squares that are white in (b) are red in (c) and some squares that are gray in (b) are yellow in (c).These are the two types of misdetection that can occur in this scenario.Finally, we carry out some classical clustering algorithm on (c) in order to estimate the number of clusters, and hence (a).This estimate is (d).Note that in this figure, the algorithm has only found two clusters, and hence has estimated (a) correctly, despite the extra red pixels.A different, potentially less appropriate, clustering algorithm might mistakenly decide that the two red pixels in the top-right corner of (c) are part of a cluster, and so overestimate (a).
Alternatively, the channel could be continuously parameterized, corresponding to continuous data (for instance the magnetization vector of each pixel in a thin magnetic film).This data set, {ϕ i }, will stochastically depend on the ground truth, but may not be entirely determined by it.There could, for instance, be unclustered particles randomly distributed over the surface.In Fig. 1, (a) gives the number of clusters, but not their positions or orientations.In other words, the mapping from (a) to (b) may not be deterministic.Any information present in (b) that is not present in (a) is information that a classical clustering algorithm would aim to discard.
Suppose our goal is to carry out pattern recognition on the data set, {ϕ i }, which corresponds to the channel pattern.A protocol to do so would consist of two stages: carrying out measurements to estimate the parameters ϕ i and then clustering the resulting data set.
We can image the surface by sending probes to interact with each of the pixels.These probes can be regarded as quantum states, and could consist of photons or of some other type of particles, such as electrons (e.g. in electron microscopy).Some measurement is then carried out on the return states, in order to determine the quantum channels corresponding to each pixel.In other words, we carry out quantum channel tomography (or metrology if the parameterization of the possible channels is continuous) in order to reconstruct the true channel pattern ((b) in Fig. 1) from the measurement result ((c) in Fig. 1).Note that if we are able to use unlimited energy to probe the pixels, we will be able to perfectly reconstruct the channel pattern, so there will be no difference between (b) and (c).However, if we limit the energy of the probes sent through each pixel, we will (in general) have an imperfect reconstruction of (b).This constraint could be due to our sample being sensitive, so that an energetic measurement would be destructive.We can then carry out classical clustering algorithms on the measurement result to get (d), an estimate of the ground truth (a).
We can categorize possible protocols based on the type of probe used: specifically whether or not the probes have a positive P-representation.We call states with a positive P-representation (such as coherent states) classical, and all other states, such as two-mode squeezed vacuums (TMSVs) and number states, quantum.Other quantum states, with multipartite entanglement, were considered for imaging purposes in Ref. [17].
We will therefore call protocols that probe the pixels with classical states and then carry out clustering on the measurement results "classical protocols" and will call protocols that probe the pixels with quantum states and then carry out clustering on the results "quantumclassical protocols".This latter choice of name is to allow for a potential third type of fully quantum protocol (not considered in this paper) that probes the surface with some collective quantum state -potentially with entanglement between pixels -and then carries out a collective measurement on the return state in order to extract a global property (the ground truth), rather than probing the surface pixel-wise and carrying out classical clustering on the result.However, the quantum-classical class of protocols is more relevant from a near-term perspective, since TMSVs, which could be used as signal-idler pairs to individually probe the pixels, can be generated using current technology.This may not be the case for more general states used in fully quantum protocols.
In this paper, we will consider classical and quantumclassical protocols that send a single probe through each pixel, with the same energy constraint for each pixel, and then carry out a classical clustering algorithm on the results.Although the optimal quantum-classical protocol cannot be worse than the optimal classical protocol at reconstructing the channel pattern (recovering (b) from (c)), due to the fact that classical protocols are a special case of quantum-classical protocols, there is no guarantee that any advantage gained at this stage will be retained through the data processing.It is therefore worth investigating whether the improved imaging capabilities afforded by quantum states can lead to a non-negligible advantage in clustering accuracy.

III. CLASSICAL CLUSTERING III.A. Types of clustering algorithms
Two of the main, basic types of classical clustering algorithms are centroid-based algorithms (such as k-means 18 ) and density-based algorithms (such as DBSCAN 19 ).With centroid-based algorithms, the goal is to minimize the sum of the distances (via some metric) of the points to the nearest centroid, for a fixed number of centroids.For density-based algorithms, we grow clusters as areas with a high density of points (and some density cut-off).Other possible types of algorithms include distribution-based methods, in which we fit our points to some distribution (e.g. a sum of Gaussians) where parameters of the distributions are unknown.Each method may correspond to a different scenario we may be interested in and the "best" type depends on the specific setting.
In k-means, we must choose the number of clusters in advance (as k) and then we find the k centroids (points) such that the total distance from each point to the nearest centroid is minimized.Distance may be Euclidean, squared-Euclidean, Bures, etc.By design, the clusters will always be roughly spherical (in terms of the distance metric).k-means assigns every point to a cluster.One common variant of k-means is k-medoids 20 , which is similar to k-means, but limits the possible centroids to the data points themselves (rather than allowing them to be located anywhere in the coordinate space).
In DBSCAN, we calculate the density of points in a region around each point (the size of the region, ϵ, is a parameter for the algorithm).If this is above a cut-off, the point is a core point of a cluster.If not, it is either a non-core point or not in a cluster.Such algorithms find all clusters rather than specifying the number in advance.DBSCAN may decide that some points are not part of a cluster at all.
Whilst both k-means and DBSCAN assign each data point to at most a single cluster, variants exist that carry out "fuzzy" clustering, in which data points can be assigned to multiple clusters, with different degrees of af-

filiation.
Which type of algorithm is appropriate depends on our setting and what we want to discover.This relates to the previously mentioned question of what constitutes a cluster.For supervised learning, we have a correct answer and can calculate the error.For unsupervised learning, we only have intuitive ideas of what a good clustering "should" look like.Consider Fig. 2, which compares k-means clustering and DBSCAN for two different data sets.The top data set was generated by picking three pairs of coordinates and then choosing coordinates randomly from normal distributions centered on those three points, whilst the bottom data set was generated by picking a pair of coordinates and picking points from both a normal distribution centered on it and from a ring around it.k-means clustering with three clusters is able to identify three distinct, roughly circular sets of points in the top data set, whilst DBSCAN groups two of these sets together, due to them having some overlap.In the bottom data set, DBSCAN is able to separate out the ring and the circle in the middle and identify both as being clusters whilst k-means clustering is inherently unable to identify one cluster inside another, since both clusters have the same center, and so simply splits the ring in two.In each case, one of the clustering techniques correctly identifies the ground truth -the underlying distribution used to generate the data set -whilst the other does not.The appropriate clustering algorithm depends on the data itself and what information we want to extract from it.

How can we assess how well clustering has been done?
There are several metrics that we can use, each of which weights the properties that we want for our clusters differently.Methods for assessing a clustering can be broadly divided into two categories: internal and external.
An internal metric compares a clustering to our intuitive idea of what a cluster should be.Clusters should be compact (small intra cluster distances) and well separated (large inter cluster distances).One possible metric is the sum of the distances between points and the center of their cluster.By definition, this is minimized by k-means (so long as it finds a global, rather than a local, minimum), but there is no guarantee that clusters will be well separated.It also performs poorly (compared to our intuitive idea of what clusters look like) when the clusters in the data are non-spherical.Other options include the Dunn index, which takes into account both the intra and the inter cluster distances, and the silhouette coefficient.
An external assessment assumes the existence of a ground truth.In other words, there exists some objectively correct clustering that we need to find.In a physical imaging context, this will often be the case: we carry out imaging in order to determine some real property of interest about the surface we are imaging.To assess our closeness to the true clustering, however, we would need to know the ground truth beforehand.If our goal is to find the ground truth, this can present a problem, since we would not be able to use it to assess the validity of our clustering.However, external validation can be of use in comparing different imaging and clustering protocols, to see which is better at extracting the information we want about the underlying structure.

III.C. Comparing clusterings
If we want to assess how different measurement techniques affect the clustering found, how can we go about doing this?If we were to use an internal validation method, we might find that a less good measurement method (one that less faithfully reconstructs {ϕ i }) results in better clustering performance than a better measurement method.However, this would simply be telling us that the measurement data produced by the worse measurement is more clustered than the data produced by the better measurement, and would not give any information about which method is closer to the "true" clustering.
Instead, we can use an external validation method.Specifically, we can compare the clustering resulting from a given protocol to the ground truth clustering.We can do this in terms of the mutual information between the ground truth and the estimation of the ground truth that we get from our measurement result.The mutual information is a classical quantity that tells us how much information the measurement result ((c) in Fig. 1) holds about the ground truth that we are trying to discover ((a) in Fig. 1).The mutual information is routinely used for machine learning purposes, forms the basis of the information bottleneck method 21 , and was employed to study the generalization and classification errors in supervised quantum classifiers 22 .For a given protocol, we can calculate the reduction in uncertainty about the position of the clusters due to the use of the protocol -quantified by the mutual information -and compare this to the reduction in uncertainty achieved by a different protocol.
Note that the best possible measurement will not necessarily be one that maximizes the mutual information between the measurement result and the channel pattern ((b) in Fig. 1), since it could be the case that a protocol that discards some of the information that is not relevant to the ground truth (e.g. has a low probability of detecting unclustered points or outliers) is better at identifying the ground truth than one that more faithfully reproduces the channel pattern.

IV. CLASSICAL VERSUS QUANTUM-CLASSICAL PATTERN DETECTION
In both discrete variable and continuous variable settings, we can gain a quantum advantage for channel discrimination and metrology for a large variety of sets of possible channels.Since quantum probes can give a quantum advantage in discriminating between possible channels, we expect them to be able to better determine the ground truth.
The extent of the quantum advantage depends on the set of possible channels that we are considering.For instance, if we are discriminating between a Pauli-Z channel and the identity channel, an entangled state can perfectly discriminate between the two options, whereas a classical state cannot distinguish between them at all.This means that it is not possible to make general statements about the advantage offered by using quantum states, beyond the fact that they offer an improvement.
Instead, we will consider two specific examples of problems: one involving centroid-based clustering and the other involving density-based clustering.We will compare classical protocols and quantum-classical protocols, restricting ourselves to one-shot, non-adaptive protocols.
The types of problem we will consider will involve imaging surfaces divided into grids of pixels and clustering the measurement results.In the scenarios we will consider, imaging the pixels is a quantum reading type task -each of the pixels has a binary value (0 or 1) and each possible value corresponds to a different lossy channel (C 0 and C 1 ) 8 .In each case, we will consider the same pair of possible channels, but with different types of underlying channel patterns and consequently different clustering strategies after carrying out the measurements.
Since we have only two possible channels, we have two types of errors.These are the type 1 error -the probability of detecting a particle when no particle is presentand the type 2 error -the probability of not detecting a particle that is present.The plot of the achievable type Fig. 3 Receiver operating characteristic for classical and quantum protocols, each with an average photon number of 8, discriminating between two pure loss channels, one with a transmissivity of 0.95 and one with a transmissivity of 0.4.The blue line is the lower bound for classical protocols (protocols using probes with positive P-representations), whilst the red line is an achievable curve for quantum protocols (and therefore constitutes an upper bound on the optimal quantum protocol).For low type 1 errors, we can prove a quantum advantage in terms of measurement errors.
1 error for a given type 2 error, for a given detector, is called the receiver operating characteristic (ROC).
For certain pairs of channels and energy constraints, we can prove a quantum advantage in discriminating between them, meaning that we can achieve a lower type 2 error for a fixed type 1 error.Our aim is to show, via numerical simulations (in MATLAB), that the lower type 2 errors obtained by using quantum imaging protocols can lead to an increase in clustering accuracy, and hence that using quantum, rather than classical, imaging protocols can allow us to gain more information about the ground truth.
In order to numerically compare quantum and classical protocols, we must choose some concrete parameter values.We let both channels be pure loss channels, setting the transmissivity of C 0 to 0.95 and the transmissivity of C 1 to 0.4.We constrain the per-pixel probe energy to a mean photon number of 8.This gives us the ROC in Fig. 3. See Appendix A for details of how the ROC was found for each case.
For each type of protocol, we choose pairs of type 1 and 2 errors from the ROC curve in Fig. 3, for type 1 error values between 0 and 0.05, and carry out simulations using these pairs of errors.The reason for using this range is that for large values of the type 1 error, the results become almost entirely random, since the number of false positives becomes comparable to the number of true positives.This means that, for a classical protocol, the type 2 error varies between approximately 0.1899 (for a type 1 error of 0.05) and 0.3918 (for a type 1 error of 0), whilst for a a quantum protocol, the type 2 error varies Generate a random value for A.
Generate a random matrix of 0s and 1s (variable B) that probabilistically depends on A.
Generate a random matrix of 0s and 1s (variable C) that probabilistically depends on B.
Carry out clustering on C and extract variable D from the result.between approximately 0.1107 (for a type 1 error of 0.05) and 0.1424 (for a type 1 error of 0).Note that the classical type 2 errors are lower bounds (the best possible type 2 errors for fixed type 1 errors), whilst the quantum type 2 errors are upper bounds on the best possible type 2 errors for fixed type 1 errors.Consequently, any calculated mutual information values (between the ground truth and our estimate of it) that use the classical error values will be upper bounds for classical protocols, whilst mutual information values that use the quantum error values will be lower bounds on the achievable values for optimal quantum protocols.
We will refer to the variable encoding our ground truth as A, the variable encoding the channel pattern as B, the measurement result as C, and the resulting estimate of the ground truth as D (see also Fig. 1).A is some global property of the surface (i.e.not a property of any particular single pixel) that we want to find out.B is the actual pattern of pixels on the surface (a matrix of 0s and 1s), which is correlated to A. Each protocol images each pixel, using either classical or quantum states, to get a different matrix of 0s and 1s, and then carries out clustering on the result.The matrix of 0s and 1s constitutes the variable C (our estimation of B), whilst the result of the clustering algorithm gives us the variable D (our estimation of A).We can quantify how good a protocol is at obtaining the ground truth by finding the mutual information between variables A and D. This can be expressed as The full simulation process for a given pair of type 1 and 2 errors is as follows.First, randomly choose a value of the ground truth variable, A. Next, generate a true channel pattern (matrix of 0s and 1s) based on the value of A. This channel pattern is the variable B, and the probabilistic mapping between A and B depends on the specific scenario.Then, probabilistically generate a measurement result, C, based on B. This is again a matrix of 0s and 1s.If an entry in matrix B is 0, the corresponding entry in C is 0 with probability 1 − ξ 1 and is 1 with probability ξ 1 , where ξ 1 is the type 1 error.If an entry in matrix B is 1, the corresponding entry in C is 0 with probability ξ 2 and is 1 with probability 1−ξ 2 , where ξ 2 is the type 2 error.A clustering algorithm is then carried out on variable C (the choice of algorithm and the settings depend on the scenario) to generate variable D. Finally, variables A and D are recorded.
These steps are repeated a large number of times so that the mutual information between A and D can be estimated.The process for generating each sample is outlined in Fig. 4. Note that the process is a Markov chain: the estimation of the ground truth, D, depends only on the measurement result, C, which depends only on the channel pattern, B, which depends only on the ground truth, A.

IV.A. Classical versus quantum-classical k-medoids clustering
In our first scenario, we have a surface on which there are a number of non-interacting particles.We divide this surface into an d by d grid of pixels and image each pixel to determine the presence of absence of a particle in that pixel.We assume that a pixel can be occupied by at most one particle, so that we have only two choices of possible channels, which we can label as C 0 (no particle) and C 1 (particle present).
Suppose we know that the surface contains m attractors, to which the particles are attracted, so that the probability function for finding a particle near to an attractor is Gaussian.Specifically, suppose the m attractors are located at coordinates {x m , y m } and the probability of a pixel with its center at position (x, y) containing a particle is given by where ϕ is some positive constant and σ 2 is some variance.For our simulations, we will set m = 2, so that there are always exactly two attractors.The probability of finding a particle on each pixel of the surface, for two attractors and a specific choice of attractor locations, is shown in Fig. 5.The probability is high close to the attractors and decays with distance from them.More than three standard deviations from either attractor, the probability of a pixel containing a particle is close to 0. If our task is to locate the attractors, k-medoids clustering (with m clusters) is a natural choice.We emphasize that the task is to find the locations of the attractors, via their effect on the distribution of particles, rather than the locations of the particles themselves.The grid may contain any number of particles, but will always contain exactly two attractors.Furthermore, we are not able to directly detect the attractors, we can only detect the presence/absence of particles, the probability of which (for each pixel) is affected by the positions of 0 0.2 0.4 0.6 0.8 1 Fig. 5 Proportion samples in which each pixel contains a particle, for two attractors, for the scenario in Section IV.A.The probability for each pixel to contain a particle for any given sample is given by Eq. ( 2) (assessed at the center of each pixel).As can be seen, the pixels on which the attractors are centered always contain particles, whilst those further out contain particles less frequently, and those more than 4 pixels from an attractor have a probability of containing a particle that is close to 0. The aim of imaging the surface is to locate the two attractors.
the attractors.The purpose of this is to relate the task to a physical scenario in which clustering of the results is necessary.For instance, the attractors could be electrical charges on the surface to which the particles are attracted.
The simulation is done as per Fig. 4. The coordinates of the attractors on the grid of pixels for each sample constitute the variable A for that sample, and the positions of all of the particles constitute the variable B. The coordinates of the centroids found by the k-medoids algorithm constitute variable D, our estimate of A.
We choose random pairs of well-separated coordinates for each sample of A and generate the corresponding probabilities for each pixel to have a value of 1 in the channel pattern (variable B).Each value of B is a matrix of 0s and 1s, with a probability of each that is different for every pixel and depends on A (given by Eq. ( 2)).If we were to draw a large number of sample values of B for a fixed value of A, we would get a distribution similar to Fig. 5.
As can be seen, some pixels have a value of 0 in almost all of the samples (corresponding to the absence of a particle in that pixel), whilst some almost always have a value of 1 (corresponding to the presence of a particle).Note that if we were to draw different values of A for each sample, we would have a roughly flat distribution (although not entirely uniform, due to our constraints on the possible values of A, which introduce some structure).
Each value of C is also a matrix of 0s and 1s.The probability of a pixel having each value depends on its value in the channel pattern and the two error values.Carrying out k-medoids clustering, with 2 clusters, on C Fig. 6 Proportion of (10000) samples in which each pixel is predicted to contain an attractor (i.e. to be the center of a cluster), for the scenario in Section IV.A.The predicted locations follow a distribution that has a high probability close to the true locations and a low probability elsewhere.The less spread out this distribution, the less uncertainty we have about the locations of the attractors, and hence the better the imaging protocol.This measurement outcome corresponds to a type 1 error of 0.05 and a type 2 error of 0.2.Note that this plot does not show the same thing as Fig. 5; Fig. 5 shows the proportion of samples in which each pixel contains a particle (and multiple pixels will contain a particle in each sample), whilst this plot shows the proportion of samples in which each pixel is found to be a cluster center (and each sample contains only, and exactly, 2 cluster centers).
results in a pair of pair of coordinates, similar to A.
In Fig. 6, we fixed A (to be the same as for Fig. 5) and then randomly drew B 10000 times, then drew C once for each value of B (using a type 1 error of 0.05 and a type 2 error of 0.2), and then calculated D for each value of C.This gave us 10000 samples of the calculated cluster centers for a specific, fixed value of A. We then plotted the proportions of the samples in which each pixel contained one of the calculated cluster centers.
If the protocol were ideal, these calculated cluster centers would always be at the coordinates given by A, and so the positions of the cluster centers would be a perfect predictor of the positions of the attractors (the value of A).Fig. 6 shows that the actual cluster centers found are spread over a small region around the true positions of the attractors, meaning that D has some entropy, even for fixed A, although the fact that most of the locations of the cluster centers are close to the true positions means that the protocol does gain some information about them.For further details about the calculation of the mutual information between variables A and D, see Appendix B.
To simplify our comparison of the protocols, we make the following assumptions: the attractors are wellseparated from each other (i.e. the distance between each attractor and its nearest neighbor is much greater than σ), none of the attractors are close to the edge of the surface (i.e. the distance between each attractor and the nearest edge is much greater than σ), and the attrac- We set d = 20, ϕ = 1, and σ 2 = 2.We then simulate measurement results for pairs of type 1 and type 2 errors and hence calculate the mutual information for each pair of errors.
For any given type 1 error, a higher type 2 error means fewer of the particles will be detected.Conversely, a higher type 1 error means pixels will be found to contain particles when they actually do not.Both types of error are expected to reduce the accuracy with which we can estimate the cluster centers, since in one case we have less information and in the other case we have misleading information.We would therefore expect that a lower value of either error would result in Fig. 6 having a sharper (less spread out) distribution.Fig. 7 shows, as expected, that by measuring the pixels with quantum, rather than classical states, we can achieve a higher mutual information with the ground truth, and therefore can gain more information about the positions of the cluster centers.Recall that, for each type 1 error, the type 2 error for each type of protocol is given by the curves in Fig. 3, and that the classical type 2 error varies between approximately 0.1899 and 0.3918, whilst the quantum type 2 error varies between approximately 0.1107 and 0.1424.Note that the mutual information decreases as the type 1 error increases, for both types of protocol, despite the fact that the type 2 error decreases as the type 1 error increases.channel).The particles we are looking for are 5 pixels long and 2 pixels wide.All of the pixels containing a particle should have the value 1 and all other pixels should have a value of 0. However, sometimes, we will determine a pixel to have value 0 when its actual value is 1 (type 2 error).Similarly, sometimes we will get a value of 1 for a pixel that does not contain a particle (type 1 error).In order to find which pixels that we determine to have value 1 actually contain particles and which do not, we carry out DBSCAN on the measurement results to find clusters of 1s and only consider 1s that are part of clusters to be real particles rather than measurement errors.All pixels found to have a value of 0 are colored white.Pixels (found to have a value of 1) that are determined to belong to the same cluster are given the same (non-black) color, whilst the remaining, unclustered pixels are colored black.

IV.B. Classical versus quantum-classical DBSCAN clustering
In the second scenario, we are again imaging a surface that can be represented as a grid of pixels.Suppose this surface has long, thin (non-circular) particles on it, that cover multiple pixels.For simplicity, we will assume that the particles are rectangular, with integer dimensions (in terms of pixels covered) d 1 and d 2 , and that they are oriented in one of two ways (vertically or horizontally).The particles are distributed randomly over the surface, with the only constraints being that their corners lie at the corners of pixels (so that every pixel is either completely covered or not covered) and that they do not overlap.
Our task is to determine the number of such particles, which is randomly chosen from a uniform distribution between 0 and a maximum number, m.DBSCAN may be more suitable than k-means clustering for identifying clusters corresponding to the particles, due to the non-circular shape of the particles and the fact that the number is not fixed or known beforehand.
We again carry out the numerical simulations as per Fig. 4. The number of particles present is variable A, the ground truth that we want to discover.For each sample, this is drawn randomly from the uniform distribution between 0 and m.For a given value of A, we then randomly place the clusters on the grid of pixels to generate variable B, the channel pattern (with the constraint that no two particles overlap).After generating a measurement result (variable C) based on the channel pattern and the type 1 and 2 error values, we carry out DBSCAN on the measurement result to try and identify the particles.We use the number of clusters identified by DBSCAN as variable D, our estimate of variable A. For further details about the calculation of the mutual information between variables A and D, see Appendix B.
We set the of pixels to 50 by 50, m to 10, and the dimensions of the particles to 2 by 5 pixels.For the DBSCAN algorithm, we set the minimum number of points in a region surrounding a point to identify the central point as a core point to 4, and we set the radius of the region to √ 2, so that it includes all neighboring pixels (including diagonal neighbors).
Fig. 8 shows an example measurement result for a type 1 error of 0.05 and a type 2 error of 0.2.The colored pixels are those identified as being part of a cluster, and hence as containing a particle, whilst the black pixels are those that have a value of 1 in the measurement result but are not part of a cluster (and hence are assumed to be false positives).
In this example, we identify 6 clusters, and hence this is our estimate of the number of particles.However, there are actually only 5 particles; all of the pixels in the magenta cluster at the bottom right of the image are false positives.This shows how a high type 1 error can lead to the protocol identifying clusters that do not exist, and hence overestimating the number of particles present.Similarly, a high type 2 error could lead to a cluster not being identified at all, and hence to the protocol underestimating the number of clusters.Type 2 errors can also lead to overestimations, if the pixels falsely detected to have a value of 0 lie in the middle of a particle.In this case, the two ends of the particle may be incorrectly found to be separate clusters.Both types of error can therefore lead to a misestimation of the number of particles, and hence will reduce the mutual information between A and D.
Fig. 9 again shows that measurements with entangled states achieve a higher mutual information, and are therefore better at determining the number of particles, than classical measurements.For both types of protocols, the mutual information initially rises as the type 1 error increases before falling again.This is not unexpected because the type 2 error decreases as the type 1 error increases (see Fig 3), so the increase in mutual information is not because of the higher type 1 error but because of the lower type 2 error.
In particular, for low values of the type 1 error, we have a clear quantum advantage (for a type 1 error of Fig. 9 Mutual information between the ground truth and the estimated result for a pattern detection task involving DBSCAN, for both classical and quantum measurements of pixels.For each value of the type 1 error, measurement results are obtained for the corresponding type 2 error, for each type of measurement, and DBSCAN is carried out on the results.The error bars show the variance of our estimator of the mutual information. 0, the advantage is more than one bit).The large gap between the protocols, relative to their absolute values, shows that the advantage in the misdetection probability due to using quantum probes can lead to improvements in clustering accuracy for certain unsupervised learning tasks.Note that low values of the type 1 error correspond to larger differences in the type 2 error between the two types of protocols (see Fig. 3), although there was no guarantee that this would translate to a higher quantum advantage for lower values of the type 1 error.

V. CONCLUSION
Quantum states can often give an advantage over classical states when it comes to quantum imaging tasks.In this paper, we have shown that this advantage can translate to an improvement in clustering accuracy when classical clustering algorithms are performed on the results of a measurement.We demonstrated this advantage numerically for both a scenario involving the k-medoids algorithm (a centroid-based algorithm) and one involving DBSCAN (a density-based method).In both cases, using a quantum measurement, and therefore achieving a lower type 2 error for the same type 1 error, resulted in a higher mutual information between the ground truth that we wanted to know and our estimation of the ground truth based on the measurement protocol.
Whilst this result is intuitive, a small advantage in misdetection probability could easily have been lost or made negligible during the data-processing stage.It is therefore encouraging that the quantum advantage for measurements is robust enough to survive the clustering algorithms.This complements existing results showing that imaging using quantum states can give improved results for supervised learning.
Moreover, we found that different clustering algorithms and different scenarios result in different levels of quantum advantage, even for the same pairs of possible channels.Comparing Figs. 7 and 9 shows that the same quantum advantage in measurements can lead to different quantum advantages in clustering accuracy when applied to different problems involving different clustering techniques.Indeed, the two graphs are qualitatively different, as well as quantitatively: in Fig. 7, the mutual information decreases as the type 1 error increases for both the classical and the quantum-classical cases, whilst in Fig. 9, both mutual informations initially peak before decreasing again.This demonstrates that the question raised in this paper is non-trivial.
Since ground truths such as the number of clusters on a surface are global properties of the entire surface, future research could consider the possibility that fully quantum protocols that collectively probe the pixel pattern as a whole, rather than individually probing each pixel, could give a further advantage over quantum-classical protocols.
Another possible consideration is protocols that involve multiple rounds of sending probes through the pixels to image them before the clustering stage.Such protocols could potentially also be adaptive, meaning that the probes used in subsequent rounds could depend on the results of measuring the return states from previous rounds.Adaptivity is known to provide an advantage for certain channel discrimination problems.
Further research could also consider situations in which there is a global constraint on the energy used to probe the entire surface, but no firm constraint on the per-pixel energy used (i.e. the sum of the energies of all of the probes is fixed, but not the distribution over the pixels).values of n (since it is a continuous function of n with at most one turning point).To set dL dp(n) = 0, we must have p(n) = 0 for all but two (or fewer) values of n (except in the trivial case of τ 1 = τ 2 ).Applying the continuity condition and the energy constraint, the candidate solutions take the form p n0,p0 (n) 2 = p 0 δ n,n0 + (1 − p 0 )δ n,n0+∆(n0,p0) , where δ is the Kronecker delta function.
We now minimize f (p) over our candidate solutions.We calculate and partially differentiating with regard to m 0 , we get This is always negative if p 0 > 0 and n 0 < m, so to minimize we can set n 0 = m.The minimum fidelity can therefore be achieved by using a pure coherent state with an average photon number of m.The minimum fidelity between output states for a classical probe is Finally, for discriminating between pure states with a fidelity of F , the minimum type 1 error, α, for a given type 2 error, β, is given by 24 For a TMSV probe, the initial covariance matrix of the probe (before one mode is sent through the channel) is where m is the average photon number of each of the modes.The covariance matrix of the return state is where i depends on the identity of the channel.This covariance matrix can be diagonalized by a two-mode squeezing (unitary) operation, with the squeezing parameter depending on τ i .The resulting state is the tensor product of a vacuum state and a thermal state, with average photon number ni (which again depends on τ i ).
Let us call the channel output ρ ′ i , and let us call the diagonalizing unitary for that output state U i .Then, Consequently, we can apply U 1 to our output state, defining (A16) Note that, as a composition of two-mode squeezing operations, U is also a two-mode squeezing operation, and so can be written as 25 Now, suppose we carried out a photon counting measurement on the first mode of state ρ i .If i = 1, we always get a result of 0 (a type 1 error of 0).On the other hand, if i = 2, we get a result of 0 with some non-zero probability (a non-zero type 2 error).This is one possible measurement scheme.
Suppose, on the other hand, we carried out a photon counting measurement on the first mode of state U ρ i U † (i.e.apply the two-mode squeezing unitary, U , prior to the measurement).Now, if i = 1, we get a result of 0 some of the time, but if i = 2, we always get a result of 0. In this case, therefore, we get a non-zero type 1 error and a type 2 error of 0. This defines another possible measurement scheme.Note that in both schemes, we only care about whether the result is 0 or not 0 (not the actual number), so this is more akin to a click detector.
On a plot of type 1 error against type 2 error (ROC), we can draw a straight line between these two points, and achieve any pair of errors along this line.This could be achieved by carrying out the first measurement with probability a and the second measurement with probability 1 − a. Suppose we want a better than linear interpolation between the two points.How might we go about designing a measurement to achieve this?One thing we might consider is, instead of choosing one measurement or the other with some classical probability, we could control which measurement is carried out using a quantum state.Suppose we apply a controlled unitary to the state ρ i , so that if the control qubit is |0⟩, we apply the identity to ρ i and if the control qubit is |1⟩, we apply U .Let us denote the resulting channel as U, and write If we now measured the control qubit and then the first mode, this would reduce to the classical combination of measurements.But by retaining the off-diagonal components, we may be able to do better.To reduce the complexity of the problem, let us apply the following channel to the return state (and the identity to the control qubit), retaining the superposition of measurements: where i and j both run from 1 to ∞.In other words, this channel, when applied to a two-mode Gaussian state, maps it to a two-qubit state by mapping all of the nonvacuum components to the single outcome |1⟩.Note that, by doing this, we lose some information, but not all of where {X} − is the projector onto the negative eigenspace of X.Even for our reduced system, it is still difficult to analytically find the ROC by optimizing over parameter a for each value of b.Instead, we numerically sample a large number of a and b values to get a large number of pairs {α(a, b), β(a, b)}, then join up the bottom of this set of values to approximate the ROC curve.

Appendix B: Calculation of the mutual information
In both scenarios, our aim is to calculate the mutual information between the ground truth (variable A) and the estimate of the ground truth (variable D).For both scenarios, variables A and D are the same size: in Section IV.A, they are pairs of coordinates (A is composed of the true locations of the two attractors and D is composed of the locations of the two cluster centers as found by the clustering algorithm), and in Section IV.B, they are integers between 0 and 10 (A is the number of particles on the surface and D is the number of clusters found by DBSCAN).Note that DBSCAN may find more than 10 clusters; in this case we replace the actual number of clusters found with 10, since we know this is the maximum number of particles.
Let us first discuss how we can determine the entropy of a probability distribution over a finite number of outcomes by sampling the distribution a large number of times.If we (independently) sample the distribution a large number of times, we can approximate the true probability distribution using the number of occurrences of each outcome (i.e. the probability of an outcome is approximately the number of occurrences divided by the total number of samples).We call this approximated probability distribution the sample distribution.We can then calculate the entropy of the sample distribution (the sample entropy) in order to approximate the entropy of the true probability distribution.This method is called the plugin estimator.
In the asymptotic limit of N ≫ P samples, where P is the number of possible outcomes, the value we calculate will be normally distributed around an expected value.This expected value is not the same as the true value of the entropy, since our estimator is biased: for any finite N , the expected value of the plugin estimator is less than the true entropy.We have the following conditions for the variance and bias of the estimator 28,29 : var( ĤN ) ≤ log 2 2 N N , (B1) where H is the true entropy and ĤN is the sample entropy for N samples.We will first consider the scenario in Section IV.B.Here, we generate N samples according to the flowchart in Fig. 4 Recalling that m = 10, and setting the number of samples, N , to 20000, we find that the variance of our estimator is less than or equal to approximately 0.0126 bits and the bias is approximately −0.0004 bits.
Next, let us consider how to calculate the entropy for the scenario in Section IV.A.The difficulty in this scenario is that both A and D are sets of m coordinates in a d by d grid.This means that the dimension of each is the number of unique choices of m positions out of d 2 , which is the binomial coefficient d 2 m .For a 20 by 20 grid and 2 clusters, this means that A and D each have 79800 possible outcomes.This is a lot more than the 11 possible outcomes for A and D for the scenario in Section IV.B, and so we need a lot more samples for the plugin estimator, ĤN (D), to be in the asymptotic regime (N ≫ P ).In order to estimate the conditional entropy in the same way, we would need roughly 80000 times more samples than even this, because we would be sampling the entropy of D for each possible outcome of A.
Instead, we make an assumption about the conditional entropy in order to simplify the calculations.We assume that the entropy of D conditioned on a single value of A, H(D|A = a), is approximately the same for all allowed values of A. This assumption is justified by the

Fig. 1 (
Fig.1 (a)The ground truth, namely the number of clusters (two, in the shown example), (b) the true channel pattern, (c) the measurement result, and (d) the result of the classical clustering (two).In the shown example, a cluster is a block of 2 by 3 dark pixels (which could represent collections of particles, structures, etc).The number of clusters (a) is the only piece of information that we are interested in, rather than the position of the clusters on the surface (if we were interested in the positions, that information would also be part of the ground truth).(b) is how this ground truth corresponds to the physical multi-channel that we probe.Here, each pixel is one of two types of channel: gray, representing the presence of a particle, or white, representing the absence of particles.The two clusters are outlined in blue, but note that there are two gray pixels that are not part of either of the clusters.These could represent, for instance, particles that do not form part of one of the structures that we are looking for.If we had perfect knowledge of the state of (b), we would then carry out classical clustering algorithms in order to find the two outlined clusters and recover the value of (a).Instead, we carry out measurements on the multichannel in order to estimate (b), with the result being (c), where red squares represent pixels we believe to be gray in (b) and yellow squares represent pixels we believe to be white.Our measurement process cannot perfectly reproduce (b), and so some squares that are white in (b) are red in (c) and some squares that are gray in (b) are yellow in (c).These are the two types of misdetection that can occur in this scenario.Finally, we carry out some classical clustering algorithm on (c) in order to estimate the number of clusters, and hence (a).This estimate is (d).Note that in this figure, the algorithm has only found two clusters, and hence has estimated (a) correctly, despite the extra red pixels.A different, potentially less appropriate, clustering algorithm might mistakenly decide that the two red pixels in the top-right corner of (c) are part of a cluster, and so overestimate (a).

Fig. 2 k
Fig.2k-means clustering (left) and DBSCAN (right) on two different data sets.The top data set was generated as the sum of three normal distributions with different mean values, whilst the bottom data set was generated as a normal distribution with a surrounding ring.k-means clustering (with three clusters) is better than DBSCAN at identifying the three (roughly circular) clusters corresponding to each of the normal distributions in the top image, whilst DBSCAN is better (than k-means clustering with two clusters) at identifying the structure of a ring with a circle inside in the bottom image.Note that only the DBSCAN images have unclustered points (in black).

Fig. 4
Fig. 4 Flowchart outlining the numerical simulation process used to compare classical and quantum-classical protocols.

Fig. 7
Fig. 7 Mutual information between the ground truth and the estimated result for a pattern detection task involving k-medoids clustering, for both classical and quantum measurements of pixels.For each value of the type 1 error, measurement results are simulated for the corresponding type 2 error, for each type of measurement, and k-medoids clustering is carried out on the results.The error bars show the variance of our estimator of the mutual information.

Fig. 8
Fig.8Example of a measurement result for the scenario in Section IV.B.Every pixel has been individually imaged and found to have a value of either 0 or 1 (each corresponding to a different possible channel).The particles we are looking for are 5 pixels long and 2 pixels wide.All of the pixels containing a particle should have the value 1 and all other pixels should have a value of 0. However, sometimes, we will determine a pixel to have value 0 when its actual value is 1 (type 2 error).Similarly, sometimes we will get a value of 1 for a pixel that does not contain a particle (type 1 error).In order to find which pixels that we determine to have value 1 actually contain particles and which do not, we carry out DBSCAN on the measurement results to find clusters of 1s and only consider 1s that are part of clusters to be real particles rather than measurement errors.All pixels found to have a value of 0 are colored white.Pixels (found to have a value of 1) that are determined to belong to the same cluster are given the same (non-black) color, whilst the remaining, unclustered pixels are colored black.

ĤN
, recording variables A and D for each.We then calculate the sample entropy, ĤN (D), and the sample conditional entropy, ĤN (D|A).The conditional entropy is given byH(D|A) = a p(A = a)H(D|A = a),(B3)and so, recalling that A has a uniform probability over all outcomes, we can write m is the maximum number of clusters.Note that H(D|A = a) is the entropy conditioned on a particular outcome of A, and therefore is calculated using only 1 m+1 of the total number of samples.The plugin estimator of the conditional entropy is a sum of independent normal distributions and therefore has the following conditions on its variance and bias:var( ĤN (D|A)) ≤ (m (D|A)] = H(D|A) − m(m + 1) 2N log 2 e + O[N −2 ]. (B6)The plugin estimator of the mutual information is given by ÎN (A :D) = ĤN (D) − ĤN (A|D),(B7)and so has the following conditions on its variance and bias: var( ÎN (A : D)) (A : D)] = I(A : D) + m 2 2N log 2 e + O[N −2 ]. (B9)