A neural-network approach for identifying nonclassicality from click-counting data

Machine learning and neural network approaches have gained huge attention in the context of quantum science and technology over the last years. One of the most essential tasks for the future development of quantum technologies is the veriﬁcation of nonclassical resources. Here, we present an artiﬁcial neural network approach for the identiﬁcation of nonclassical states of light based on recorded measurement statistics. In particular, we implement and train a network which is capable of recognizing nonclassical states based on the click statistics recorded with multiplexed detectors. We use simulated data for training and testing the network, and we show that it is capable of identifying some nonclassical states even if they have not been used in the training phase. Especially, in the case of small sample sizes, our approach can be more sensitive in identifying nonclassicality than established criteria which suggests possible applications in presorting of experimental data and online applications.


I. INTRODUCTION
We are currently in the so-called second quantum revolution [1]; meaning that we are systematically employing quantum systems and properties in order to develop new technologies. For the development of quantum technologies, it it crucial to be able to characterize quantum states in order to identify quantum resources for applications. In this context, it is important to certify that a quantum state is nonclassical, i.e., that it cannot be described by a (semi-)classical theory. In the field of quantum optics, nonclassicality is defined through the negativities of the Glauber-Sudarschan P representation [2,3]. In recent works [4,5], it has been shown that nonclassicality is indeed a resource for quantum technologies and can, for instance, be transformed into entanglement [6,7]. In many practical applications it is therefore important to verify nonclassicality for measurement data.
In this matter, an important task is the characterization of light in the few-photon regime. Due to the lack of photon-number resolving detectors, so-called multiplexing strategies [8][9][10][11][12][13] have been developed which allow to gain some insights about the measured quantum state even when a photon-number resolving measurement is not accessible. It is important to stress that such strategies do not provide a direct access to the photon-number distribution, and improper interpretation of the measured statistics might lead to false certification of nonclassicality [14]. Therefore, it is sensible to formulate nonclassicality criteria which can be directly applied to the recorded click-counting statistics [15][16][17][18][19][20]. Such approaches have been successfully implemented in various experimental settings and for different quantum states of light [21][22][23][24][25][26][27][28][29][30].
In this paper, we propose and implement a machine learning approach for the identification of nonclassical states based on their experimentally accessible clickcounting statistics. In particular, we implement a dense artificial neural network and train the network via supervised learning with simulated click-counting data for different classical and nonclassical states. After the training phase, the performance of the neural network is analyzed and its results are compared to a moment-based approach for certifying nonclassicality. We focus special attention to the case of small sample sizes, simulating both ideal detection with unit quantum efficiency and a realistic quantum efficiency of η = 0.6. We find that the network can identify nonclassical states even for significantly small sample sizes. Especially, in the case of realistic detection efficiencies, the neural-network approach performs better than the moment-based test. Furthermore, we show that the network can identify nonclassical states even if these states have not been used in the training of the network. We can conclude that the neural network provides a new avenue for a fast indication of nonclassicality for small sample sizes.
The paper is structured as follows. In Sec. II, we recall some background information about click-counting detection which we will use throughout the paper. In Sec. III, we describe the design and the training of the neural network for identifying nonclassical states from their click-counting statistics. We apply the trained net- works to different simulated data sets and analyse its performance in Sec. IV. In Sec. V, we summarize and conclude the results of our paper.

II. PRELIMINARIES: CLICK-COUNTING DETECTION
In this section, we briefly recall the theory of clickcounting detection and show how nonclassicality can be certified from the measured statistics. Different realizations of such click-counting detectors are shown in Fig. 1. All different implementations have in common that they split the incoming light into N detection modes of equal intensity and each mode is recorded with an on-off detector. The theoretical description of the click-counting statistics recorded by such devices, i.e., the probability of recording k coincident clicks, (0 ≤ k ≤ N ), is given by the quantum counterpart of the binomial distribution [14], where : · : indicates the normal-ordering prescription [58] andm is the no-click operator. Note that for typical on-off detectors, the no-click operator has the formm = exp(−ηn/N ) which is a function of the photon-number operatorn and characterized by the detector's quantum efficiency η; cf., e.g., Ref. [25].
In what follows, we will design, implement, and test neural networks for identifying nonclassicality which take recorded click-counting statistics as inputs. To evaluate the performance of the neural network, we compare its predictions with a well-established method of certifying nonclassicality from click-counting data. In particular, we consider the approach based on the matrix of click moments [16] which has already been used in various experimental realizations [21-23, 26, 27]. The matrix of moments M (K) is defined as with s, t = 0, . . . , K/2 ≤ N/2 for even K and N . The superscript (K) defines the highest moment within the matrix M which is bounded from above by the number of detection bins (K ≤ N ). Importantly, M (K) is non-negative for any classical light field. Note that the required moments and their statistical errors can be directly sampled from the click-counting statistics [16,22]. The simplest form of the matrix of click moments is given by moments up to the second order M (2) = 1 :m: :m: :m 2 : .
If the matrix M (2) is not positive semi-definite, the detected light field is nonclassical. Thus, the studied state is found to be nonclassical if the minimal eigenvalue of the matrix has a significant negative value. We will use this condition to compare the moments method to the predictions given by the neural network.

III. IMPLEMENTATION AND TRAINING OF THE ARTIFICIAL NEURAL NETWORK
Due to the probabilistic nature of quantum physics, in order to gain significant information about quantum states, usually many repetitions of prepare and measure protocols are required. In this paper, we aim at identifying nonclassical quantum states from click-counting data with small sample sizes which allows for a fast and efficient classification of quantum states. It is known that machine learning can perform well in categorizing data from small sample sizes [59]. Different kinds of machinelearning methods can be considered for the problem of detecting nonclassicality of states. Here, we discuss the case of supervised learning of a dense artificial neural network, i.e., we train a network with data simulated from known classical and nonclassical states and test whether the network is able to recognize nonclassicality of new states which can be similar or different to the states used in the training.

A. Implementation of the network
The elements of the click-counting statistics constitute the input of the neural network. We will focus on the case of N = 16 detectors. In this case, we have 17 input neurons, where the k-th neuron is given the value of the relative frequency of recording k simultaneous detector clicks. The python packages keras [60] and tensorflow [61] are used to implement the network and the training.

B. Creation of training data
To simulate the relative click-counting frequencies, we sample from the click-counting probabilities p k ; cf. Eq. (1). As training data, we use the sampled statistics from coherent and thermal states (classical) and from Fock and squeezed states (nonclassical). The exact expressions of p k for the different states are given in the Appendix. Averaging over a sample size m of measurement realizations generates one input data set for the neural network. We generate 1000 data points for each family of training states. The points are generated from states with average photon numbersn uniformly chosen fromn ∈ [1,16]. The simulated data is divided into 80% training and 20% validation data.

C. Training the network
We train the network for the sample sizes m = 1000 and m = 100 and for quantum efficiencies η = 1 and η = 0.6. Different network architectures were considered by varying the number of hidden layers, neurons per layer activation functions, and optimization algorithms. Optimal (minimal) architectures slightly depend on the specific quality of the training data (sample size m and quantum efficiency η). In the following, we choose a network with three hidden layers each consisting of 50 neurons, as sketched in Fig. 2. We note that these parameters correspond to a rather compact network which allows for efficient and fast training and testing. We use rectified linear units as neural activation functions in the network, while the output neuron is activated via the sigmoid function; cf. [59]. We train the network to minimize the mean squared error using the optimization algorithm Adam [62]. The network is trained until the validation error stops decreasing.

IV. CERTIFYING NONCLASSICALITY WITH THE NETWORK
The output of our neural network is a number between 0 and 1. In our training data, we assign 0 to all classical states and 1 to all nonclassical states. Therefore, a high output value of the network should indicate nonclassicality. To quantify the performance of the network, we choose a threshold t above which we say that the network identified nonclassicality. The value of t should be chosen such that the classical states never result in an output above t. In the following, we fix t = 0.9. While we might not identify all nonclassical states as nonclassical, this choice guarantees that it is very unlikely that a classical state is falsely identified as nonclassical as we will see below. In principle, t can be varied for different networks, experimental scenarios, considered classes of quantum states, or specific tasks. In this context, we emphasize the indicator role of the network: the output of the network will never be able to prove nonclassicality of the input. However, the idea is that even though only using small sample sizes, we obtain a reliable, fast, and resource-efficient indication that an input state is nonclassical. Such a fast identification can be of practical importance in many scenarios as we shall discuss in Sec. V.
We compare the output of the network with the nonclassicality test based on the moments of the clickcounting distribution; cf. Sec. II. This provides a comparison with established nonclassicality tests for such measurement scenarios. The moments method yields the minimal eigenvaluex mom of the matrix of moments in Eq.
(2) together with its sampling error ∆ xmom . If we find that the estimated minimal eigenvalue x mom =x mom ± ∆ xmom is significantly negative, nonclassicality is verified through the matrix of moments approach. A commonly accepted minimal significance level is three standard deviations. In our case, this means that the moments method detects nonclassicality if its value is three standard deviations below 0, i.e., r mom := −x mom /∆ xmom > 3. Such a significance level is especially important as we do not want to falsely identify classical states as nonclassical ones. We will use the significance value r mom as a comparison with our results obtained from the neuralnetwork approach.

A. Perfect detectors
At first, we consider data obtained from detectors with an ideal unit quantum efficiency, η = 1. Fig. 3 shows the output of the network for the different families of states. For each family, we generate 1000 different realizations of sample size m with mean photon numbers ranging from 1 to 16. The data is sampled from the click-counting distributions of the different states as explained in Sec. III. In Fig. 3 (a), we show the performance of the network for a sample size m = 1000, while Fig. 3 (b) displays the network's output for m = 100. In each case, the nonclassicality threshold t = 0.9 is indicated with a straight line. The corresponding results of the moments method are shown in Fig. 3  clearly recognized Fock and squeezed states as nonclassical, while coherent and thermal states are never seen as nonclassical. Hence, the neural network is capable of correctly identifying the cases which were used for the training. For m = 100, we observe that the results of the neural network are less clear due to the very limited sample size. However, it is anyway surprising that for such small sample sizes the network still shows a good performance. Lowering the sample size even further eventually results in a finite probability of falsely categorizing classical states as nonclassical, ending the network's validity as nonclassicality indicator.
Let us compare the neural network's results with the ones of the moments method. We see that for the two classes of classical states (coherent and thermal states), both techniques yield compatible results. On the other hand, there is a significant difference between the results for the cases of the nonclassical Fock and squeezed states. For m = 1000, Fock states are only recognized by the moments method with an acceptable significance up to a photon number of about eight; cf. Fig. 3 (c). Furthermore, for m = 100, nonclassicality can only be seen for Fock states up to a maximum of two photons [see Fig. 3 (d)]. We also observe that the moments-based approach is never capable of identifying squeezed states as nonclassical independent of the sample size m. This can be explained by the fact that the moments method is insensitive to the nonclassical feature of squeezed states. Therefore, we can conclude that the neural network can be more sensitive towards nonclassical states than the established moments method and that it may also identify nonclassical states where the latter fails completely.
To examine how the network recognizes nonclassicality of states that have not been used in the training, different nonclassical states or mixtures of (nonclassical) states (cf. also Sec. IV C) can be tested. Here, we test the network with n-photon-added thermal states (NPATS) and with even coherent states; cf. Fig. 3. For the NPATS, only states which are close to one-and two-photon added vacuum states are recognized. This can be understood by recognizing that NPATS are very similar to thermal states for an increasing mean photon number. Hence, the network has difficulties distinguishing NPATS from thermal states for larger mean photon numbers. Similarly, even coherent states are only recognized as nonclassical for mean photon numbersn 3. For larger mean photon numbers, the two coherent states in the superposition (|α + |−α ) become nearly orthogonal with the consequence that the state is very similar to a mixture of two coherent states for which the network cannot recognize nonclassicality.
We also trained neural networks including these additional states to the training data. Even then, their similarities to thermal states or mixtures of coherent states, respectively, render them impossible to be fully recognizable by the network. Additionally, we note that the network trained with a larger sample size is able to classify test data from smaller sample sizes and vice versa.

B. Realistic detection efficiencies
To simulate a more realistic experiment we now consider detectors with detection efficiencies η = 0.6. For this purpose, we implement a neural network by training with simulated data with the same quantum efficiency. In Fig. 4, in analogy to Fig. 3, we show the outputs of the network [Fig. 3 a) Fig. 3 c) and d)] for the different families of states. Again, for every state we simulate 1000 different realizations with mean photon number ranging from 1 to 16.

and b)] and the moments methods [
One observes that for m = 1000, Fig. 4 a), the trained states are still recognized very well with slightly increased fluctuations compared to the ideal detector network [cf. Fig. 3 a)]. Also, the NPATS are recognized in the same regime as before. Even coherent states, however, are only recognized as nonclassical for mean photon numbers aroundn = 1. In this case, the loss introduced by the finite detection efficiency reduces the parameter range for which the network can identify quantum correlations. Overall, we can conclude that the neural network approach still yields positive results for realistic quan-tum efficiencies. In comparison, the moments method [ Fig. 4 c)] yields in this case no significant certification of the nonclassical quantum states which is due to the combination of the relatively low sample sizes and the finite quantum efficiency. Hence, we can conclude that the neural network approach performs better under these conditions.
The results of the neural network for very small sample sizes m = 100 and quantum efficiency η = 0.6 are shown in Fig. 4 b). We see that the results are more noisy in comparison to the previous example, however, the overall discrimination between classical and nonclassical states can still be observed. This an interesting result as it shows that the network can still be of use even for realistic quantum efficiencies and very low sample sizes. In comparison, the moments method yields no significant results for any of the tested nonclassical states [cf. Fig. 4 d)]. We can conclude that the neural network is advantageous under the considered conditions and that this approach can indeed be used for a fast (pre-)identification tasks with limited amount of data.
We note, however, that the fluctuations in the results of the neural network lead to the fact that in some rare cases classical states are falsely identified as nonclassical ones; see 4 b) (coherent and thermal states). Increasing the identification threshold for nonclassical states would not resolve this problem. Therefore, the considered case is at the limit of the operational range of the neural network approach. Decreasing the sample size or the detection efficiency further, eventually terminates the validity of the nonclassicality predictions of the network.
Note that the data stemming from detectors with a limited quantum efficiency cannot be classified by networks trained with data for perfect detector statistics. This is, however, not a fundamental problem of the approach, as in practical experimental realizations the expected quantum efficiencies are usually known and it is possible to include this knowledge in the training of the network. A possible extension of the presented approach could be to train the network using data generated for different values of detection efficiencies, and see whether the network is able to learn to classify these different cases in parallel.

C. Testing non-trained mixed states
Having trained the network to detect nonclassicality, different untrained states can be tested. In this context, it is particularly interesting to investigate parametrized mixtures of previously trained classical and nonclassical states. The purpose of such an analysis is twofold. First, it provides the possibility of analyzing the performance of the network for an additional family of mixed states (besides the already considered cases of NPATS and even coherent states). Second, it allows to study the behavior of the network when it has to classify a state which is a mixture of a classical and nonclassical one where both were used in the training. In particular, we can investigate whether the network performs smoothly at the transition form a purely nonclassical to a purely classical state. Here, as an example, we test a mixture of a coherent state |α and a Fock state |n , with 0 ≤ p ≤ 1 and equal mean photon numbers n = |α| 2 . In this case, we use data and network generated from perfect detectors, η = 1, for a sample size of m = 1000.
In Fig. 5 (a), we show the output of the network for different realizations corresponding to ρ, with n = 3, 8,15 and 0 ≤ p ≤ 1. For small p, the state resembles a Fock state and is classified as nonclassical, while for large p, the contribution of the coherent state dominates and the state is not seen as nonclassical anymore. Importantly, we observe that the transition between the nonclassical (p = 0) and classical (p = 1) regime is smooth which shows that the network can resolve this transition without showing any bias which might arise form the training. The network can identify nonclassicality of the mixed state up to values p ≈ 0.4. We also see that in the intermediate regime, states with higher mean photon numbers n can be identified as nonclassical in a wider parameter regime which is due to the considered number of detec-tion bins N = 16.
Additionally, we can compare the network's performance in this transition with the moments method; see Fig. 5 b). We observe that the moments method only yields a significant certification of nonclassicality for the nearly pure nonclassical case (p ≈ 0) and that value r mom quickly decreases with increasing p. Hence, by comparing Fig. 5 a) and b), we see that the neural network can identify nonclassicality of the studied mixed state in a larger parameter region which further underlines the sensitivity of the approach.

V. CONCLUSIONS
We implemented and trained a dense artificial neural network for identifying nonclassical states of light based on the recorded click-counting statistics obtained from multiplexed detection systems. For the training and testing of the network, we simulated corresponding clickcounting data for different families of classical and non-classical quantum states of light. The results of the neural network were compared with nonclassicality conditions based on the moments of the recorded statistics. Our results show that the trained network is capable of distinguishing nonclassical states from classical ones for various types of quantum states and for different parameter regions, such as different mean photon numbers. In the case of small sample sizes, our machine learning approach may be more sensitive towards nonclassicality than the moment-based method. Importantly, the network is even capable of revealing nonclassicality of some states which were not used in the training phase.
It is important to stress that the results of the network are always just an indicator for nonclassicality. For a significant and reliable certification of nonclassicality, one additionally needs to implement a nonclassicality test or witness which provides a robust verification taking into account error bars. However, there are various practical scenarios for which the neural network can be advantageous. It can be used for fast and efficient presorting of experimental data without the need of implementing and performing an elaborate data analysis. Here, we emphasize that the neural network approach shows a good performance even for very small sample sizes. This can, for instance, facilitate the search for promising experimental parameter regions for which nonclassical states are generated. Additionally, it is possible to use the neural network as an online tool during the data acquisition. In this way, one can check for nonclassical features directly in real time. This also opens the possibility to identify possible issues in the experimental implementation while performing the measurement. Such an online usage is especially useful whenever the data acquisition rate is rather low, such as in the case of some heralded or conditional state generation schemes.
Furthermore, it is possible to optimize the training of the neural network according to each particular experimental platform. Usually, it is possible to predict the classes of quantum states which are generated in a certain experiment and, therefore, it is possible to adapt and optimize the training of the neural network to these conditions. In a same way, the detection parameters, such as the number of detection bins or the quantum efficiency, can be adapted for each specific case.
Finally, we would like to point out that the presented approach could be extended in different ways. First, it is possible to extend the neural network method to two-mode and multimode scenarios where each mode is recorded with a multiplexed detector system. This would allow to test for cross-correlations between the modes. Second, it is, in principle, possible to adapt the presented approach to other measurement strategies and other physical systems. For example, one could implement a network which is capable of identifying nonclassicality based on quadrature measurement data from balanced homodyne detectors, or one could think of adopting the presented technique to measurements in cold atoms experiments. For the NPATS, n th is the mean photon number of the initial thermal states to which n photons are added. The probabilities p k for the even coherent states |α + = N (|α +|−α ) follow directly from Eq. 1, where g is given by [66] g(λ, α) := α + |:e −λn/N :|α + (A.2) = e −λ|α| 2 /N + e (λ/N −2)|α| 2 1 + e −2|α| 2 . (A. 3) The probabilities for thermal, squeezed and photonadded thermal states include infinite sums. For the simulation of the data, these sums were truncated at a sufficiently large cutoff.  [14] sinh 2 |ξ| NPATS n th ∈ R + , n ∈ N 1 (n th +1)n n th ∞ j=n j n n th n th +1 j D η k,j [63] n th (n + 1) + n [65] even coherent α ∈ C N k k j=0 k j (−1) j g(N − k + j, α) |α| 2 1−e −2|α| 2 1+e −2|α| 2 TABLE I. Probability p k of k detector clicks and mean photon numbersn for the different states, dependent on the state-specific parameters. N is the number of on-off detectors in the multiplexing device, η is the quantum efficiency of each detector.