Quantum-enhanced data classification with a variational entangled sensor network

Variational quantum circuits (VQCs) built upon noisy intermediate-scale quantum (NISQ) hardware, in conjunction with classical processing, constitute a promising architecture for quantum simulations, classical optimization, and machine learning. However, the required VQC depth to demonstrate a quantum advantage over classical schemes is beyond the reach of available NISQ devices. Supervised learning assisted by an entangled sensor network (SLAEN) is a distinct paradigm that harnesses VQCs trained by classical machine-learning algorithms to tailor multipartite entanglement shared by sensors for solving practically useful data-processing problems. Here, we report the first experimental demonstration of SLAEN and show an entanglement-enabled reduction in the error probability for classification of multidimensional radio-frequency signals. Our work paves a new route for quantum-enhanced data processing and its applications in the NISQ era.


I. INTRODUCTION
The convergence of quantum information science and machine learning (ML) has endowed radically new capabilities for solving complex physical and data-processing problems [1][2][3][4][5][6][7][8][9]. Many existing quantum ML schemes hinge on large-scale fault-tolerant quantum circuits composed of, e.g., quantum random access memories. At present, however, the available noisy intermediate-scale quantum (NISQ) devices [10,11] hinder these quantum ML schemes to achieve an advantage over classical ML schemes. Recent developments in hybrid systems [9,12] comprising classical processing and variational quantum circuits (VQCs) open an alternative avenue for quantum ML. In this regard, a variety of hybrid schemes have been proposed, including quantum approximate optimization [13], variational quantum eigensolvers [14], quantum multi-parameter estimation [15], and quantum kernel estimators and variational quantum models [4,5]. On the experimental front, hybrid schemes have been implemented to seek the ground state of quantum systems [14,16], to perform data classification [4], to unsample a quantum circuit [17], and to solve the MAXCUT problem [18,19]. The finite quantum coherence time and circuit depths of state-of-the-art NISQ platforms, however, hold back a near-term quantum advantage over classical ML schemes. An imperative objective for quantum ML is to harness NISQ hardware to benefit practically useful applications [2]. * zsz@arizona.edu

II. SUPERVISED LEARNING ASSISTED BY AN ENTANGLED SENSOR NETWORK
A multitude of data-processing scenarios, such as classification of images captured by cameras [20], target detection through a phased array [21], and identification of molecules [22], encompass sensors for data acquisition. Recent theoretical [23][24][25][26][27][28][29] and experimental [30][31][32] advances in distributed quantum sensing have unleashed the potential for a network of entangled sensors to outperform classical separable sensors in capturing global features of an interrogated object. Such a capability endowed by distributed quantum sensing creates an opportunity to further utilize VQCs to configure the entangled probe state shared by the sensors to enable a quantum advantage in data-processing problems.
Supervised learning assisted by an entangled sensor network (SLAEN) [33] is such a hybrid quantumclassical framework empowered by entangled sensors configured by a classical support-vector machine (SVM) for quantum-enhanced high-dimensional data classification, as sketched in Fig. 1 (a). SLAEN employs a VQC parameterized by v to create an entangled probe stateρ E shared by M quantum sensors. The sensing attempt at the mth sensor is modeled by running the probe state through a quantum channel, Φ(α m ), where the information about the object is embedded in the parameter α m . A measurement modeled by M m on the output quantum state from the channel then yieldsα m as the measurement data. To label the interrogated object, a classical SVM chooses a hyperplane parameterized by w to separate the measurement data into two classes in an M-dimensional space. To learn the optimum hyperplane and the configuration of the VQC that produces the optimum entangled probe state under a given classification task, the sensors first probe training objects with known labels, and the measurement data and the true labels are used to optimize the hyperplane w of the SVM. Then, the VQC parameter optimizer maps w → v, which in turn configures the VQC to generate an entangled probe stateρ E =Û(v)ρ 0Û † (v) that minimizes the measurement noise subject to the chosen hyperplane. As a comparison, Fig. 1 (b) sketches a conventional classical classifier that solely relies on a classical SVM trained by the measurement data obtained by separable sensors to seek the optimum hyperplane for classification. By virtue of the entanglement-enabled noise reduction, SLAEN yields a substantially lower error probability than that achieved by the classical classifier, which is illustrated and compared in Fig. 1 (c-f) for two classification problems in, respectively, a twodimensional (2D) data space and a three-dimensional (3D) data space.

III. EXPERIMENT
We demonstrate SLAEN in a quantum optics platform based on continuous-variable (CV) entanglement. The experiment endeavors to classify a feature embedded in a set of radio-frequency (RF) signals: are, respectively, the RF amplitudes and phases at the M = 3 sensors, and ω c is the RF carrier frequency. The class label y is determined by a joint function of amplitudes and phases: y = F(E, ϕ).
The experimental setup is sketched in Fig. 2. An optical parametric amplifier source emits a single-mode squeezed state represented by the annihilation operatorb. To acquire data, a VQC prepares an entangled probe state, described by {b m } M m=1 , by applying a unitary operationÛ(v) onb. The VQC setting is en- where v m is the power ratio of the squeezed state sent to the mth sensor, satisfying M m=1 v m = 1, and φ m is a phase shift imparted on the quantum state at the mth sensor. The VQC is composed of two variable beam splitters (VBSs) and three phase shifters. A VBS comprises two half-wave plates (H), a quarter-wave plate (Q), a phase modulator (PM), and a polarizing beam splitter (PBS). The PM controls the splitting ratio of the VBS and thus determines v m , while the phase shift φ m is controlled by an RF signal delay (see Appendix B for details). At the mth sensor, an electro-optic modulator (EOM) converts the RF signal into a displacement α m ∝ E m sin ϕ m on the phase quadraturep m ≡ (b m −b † m )/2i. Three homodyne detectors then measure the quadrature displacements, and the measurement data are diverted to a classical processing unit for training, classification, and VQC setting optimization. SLAEN consists of a training stage and a utilization stage. The training stage is aimed at using N training data points {E (n) , ϕ (n) , y (n) } N n=1 supplied to the sensors to optimize the hyperplane used by the SVM and the entangled probe state. y (n) ∈ {−1, +1} is the true label for the nth training data point. The training data point leads to the homodyne measurement dataα (n) from the sensors. α (n) and y (n) ∈ {−1, +1} are the only information available to the classical processing unit. For a hyperplane where |x| + equals x for x ≥ 0 and zero otherwise, · is the usual two-norm, and λ w 2 is used to avoid overfitting. The w ·α (n) term represents a weighted average over the measurement data acquired by different sensors. It is the weighted average that benefits from using multipartite entanglement to reduce the measurement noise [31,33]. Only the support vectors, i.e., points close to the hyperplane with y (n) w ·α (n) + b ≤ 1, contribute non-trivially to the cost function. The rationale behind constructing such a cost function is that errors primarily occur on support vectors in a classification task, thus accounting for the deviations of all data points from the hyperplane in the cost function is non-ideal.
To enable efficient minimization of the cost function, we adopt a stochastic optimization approach in which the hyperplane and the VQC setting are updated in each training step consuming a single data point. Suppose the optimized hyperplane is w (n−1) , b (n−1) after (n − 1) training steps. Prior to updating the hyperplane in the nth training step, the inferred label is derived byỹ (n) = sign w (n−1) ·α (n) + b (n−1) . Using a simultaneous perturbation stochastic approximation (SPSA) algorithm, the hyperplane is updated to w (n) , b (n) (see Appendix A for algorithm details). Once an updated hyperplane is found, the VQC optimizer performs the mapping w (n) → v (n) to configure the VQC so that its generated entangled probe state minimizes the measurement noise subject to the current hyperplane. Specifically, one desires that the virtual modeb v ≡ M m=1 w (n) mbm , whose phase-quadrature measurement outcome constitutes the w (n) ·α (n+1) term inỹ (n+1) , is identical to the original squeezed-light modê b so that the overall uncertainty in labeling is minimized.
in the VQC parameter optimizer. Physically, this is the noise-reduction mechanism, stemming from the quantum correlations between the measurement noise at different sensors, that gives rise to SLAEN's quantum advantage over the classical classifier in which the measurement noise at different sensors is independently subject to the standard quantum limit. After N training steps, the cost function is near its minimum with the hyperplane w , b ≡ w (N) , b (N) , and the VQC setting v ≡ v (N) . Then, in the utilization stage, SLAEN configures the VQC using v and classifies the measurement dataα with an unknown label using the optimized hyperplane w :ỹ SLAEN is a versatile framework capable of tailoring the entangled probe state and the classical SVM to enhance the performance of multidimensional dataclassification tasks. In our experiment, SLAEN first copes with 2D data acquired by two entangled sensors, as illustrated in Fig. 1 (c). As an example and useful application for 2D data classification, we demonstrate the classification of the incident direction of an emulated RF field. To this end, the training commences with an initial hyperplane specified by 70 , which is mapped to an initial VQC setting v 0 = {0.50, 0.50, 0, 0, 0, 0}. The training stage comprises 200 steps each using a training data point with randomly generated RF-field phases and an associated label {ϕ (n) , y (n) } 200 n=1 , while the RF-field amplitudes are fixed equal at all sensors. Applying the training data ϕ (n) on the EOMs at the two sensors leads to quadrature displacements α (n) = {α (n) 1 , α (n) 2 }, whose each component is chosen to follow a uniform distribution in [−4, 4] (in the shot-noise unit). The signal-to-noise ratio of the data set is tuned by excluding the data points within a margin of from the hyperplane while the total number of training data points is fixed at 200. In doing so, the signal-to-noise ratio is raised as increases. The true labels for the RF-field directions is derived by the RF-phase gradient: parameterize the true hyperplane. The true labels are disclosed while {w t , b t } and α (n) are kept unknown to SLAEN. The optimization for the SVM hyperplane and the VQC setting then follows.
As a performance benchmark, we train the classical classifier, using the identical training data in training SLAEN, to undertake the 2D data-classification task. Unlike SLAEN, the classical classifier uses a separable probe stateρ S to acquire the measurement data, which are then used to train the classical SVM to seek a hyperplane that minimizes the classification error probability.
In the experiment, the squeezed-light source is turned off while applying the same training data as those used for SLAEN, thereby ensuring an equitable performance comparison. The initial hyperplane prior to the training is randomly picked as w (0) = (0.67, 0.74), b (0) = 0.39 . In the absence of entanglement-enabled noise reduction, a higher error probability is anticipated for the classical classifier, as illustrated in Fig. 1 (d).
The effectiveness of the training for SLAEN and the classical classifier is demonstrated by the converging error probabilities measured at different training steps, as plotted in Fig. 3 (a). The inset describes the VQC parameters being optimized. The convergence of the error probabilities beyond 100 training steps indicates that near-optimum settings for the hyperplanes and the VQC have been found. With such optimized parameters, SLAEN is able to generate an entangled probe state that minimizes the measurement noise, as illustrated in Fig. 1 (c) and compared to Fig. 1 (d) for the case of the classical classifier by a set of sample data points represented by the circles, whose radii correspond to the standard deviation of estimation uncertainty.
SLAEN and the classical classifier are next trained to tackle 3D data-classification problems. As an example, we demonstrate the classification of the sign for the RFfield mean amplitude across three sensors. The training in either scenario uses 390 data points {E (n) , y (n) } 390 n=1 with randomly generated RF-field amplitudes, while the RF phases are fixed at ϕ (n) = 0. The true labels are then given by y (n) = sign 3 m=1 E (n) m = sign w t · α (n) , where w t = √ 1/3, √ 1/3, √ 1/3 , b t = 0 specify the true hyperplane, which unknown to SLAEN and the classical classifier. The error probabilities during training for both scenarios are plotted in Fig. 3 (d), with its inset describing the VQC parameters being optimized. The error probabilities converge after 250 training steps, indicating that near-optimum settings for the hyperplanes and the VQC have been found. Once both are trained, SLAEN shows a clear error-probability advantage over that of the classical classifier, as observed in Fig. 3 (d) and intuitively illustrated in Fig. 1 (e) and (f).
The trajectories of the evolving hyperplane w (n) , b (n) during training are plotted in Fig. 3 (b, c) for 2D data classification and (e, f) for 3D data classification. The hexagrams entail the optimum hyperplane parameters. The hyperplane parameters approach the optimum with a decreasing error probability during training, as anticipated. Notably, the optimized hyperplanes obtained by SLAEN are considerably closer to the true hyperplanes, i.e., the optimum solutions, than those attained by the classical classifier thanks to SLAEN's reduced measurement noise. To further investigate SLAEN's improved accuracy to problem solutions, we randomly generate 50 Training steps Error probability

Error probability
Training steps Step 0 Step 100 Step 390 , b (n) during training for 2D data classification (b, c) and 3D data classification (e, f). (b, e) SLAEN; (c, f) classical classifier. Red squares: initial hyperplane parameters prior to training; blue triangles: hyperplane parameters after training; hexagrams: optimum hyperplane parameters. Color gradients: evolution of error probabilities during training. Green circles: samples of hyperplane parameters at every 20 (30) training steps for 2D (3D) data classification. Curves are obtained from a cubic spline data fitting. Simulated distributions of hyperplane parameters prior to training (g), at Step 100 (h), and at Step 390 (i). Blue filled circles: SLAEN hyperplanes; red filled circles: classical-classifier hyperplanes; hexagrams: optimum hyperplanes. Open circles: projected hyperplane parameters onto (w 1 , w 2 ) plane (grey). SLAEN's optimized hyperplanes distribute statistically closer to optimum solutions. sets of initial hyperplanes for SLAEN and the classical classifier and plot in Fig. 3 (g-i) the simulated distributions of the hyperplanes at different steps of training for 3D data classification. The simulation shows that SLAEN's optimized hyperplanes (red circles) have a distance of d S = 0.135 ± 0.056 to the true hyperplane, i.e., the optimum solutions (hexagrams), as compared to a distance of d C = 0.167 ± 0.073 for the optimized hyperplanes attained by the classical classifier (red circles) (see Appendix C for simulation results and comparison with experiment).
To investigate the performance of SLAEN and the classical classifier with respect to the signal-to-noise ratio of the data, the error probabilities, under the optimum settings for the VQC and the classical SVMs in the 2D data-classification problem, are measured as the margin varies in {0.2, 0.4, 0.6, 0.8, 1}. The results plotted in Fig. 4 show that SLAEN enjoys an error-probability scaling advantage over that of the classical classifier, as manifested in the disparity between the slopes for the two error-probability curves. At = 1, SLAEN's error probability is more than three-fold less than that of the classical classifier.

IV. DISCUSSIONS
The SLAEN theory paper [33] reported an errorprobability advantage achieved by an entangled sensor network over that of a sensor network based on separable squeezed states with the same total number of photons, which is verified by the current SLAEN experiment (see Appendix D for details). However, SLAEN's performance has been primarily benchmarked against classical classifiers that do not use any quantum resources (see Ref. [34] for an in-depth discussion about different types of resources used in distributed quantum sensing). Such a choice is motivated by two main considerations. First, the compared classical classifier represents a common configuration for sensing and data processing. Introducing quantum resources yields a performance enhancement over the existing classical schemes. In the SLAEN experiment, the power of the coherent-state portion is orders of magnitude stronger than that of either the squeezed or the entangled light, similar to the case in squeezed-light-enhanced Laser Interferometer Gravitational Wave Observatory (LIGO) [35]. In both cases, the squeezed-light power is limited due to experimental capabilities, so it barely affects the total optical power employed in sensing. As such, like LIGO, we choose to quantify the quantum advantage as the performance gain over the classical system using the same amount of laser power but taking no advantage of any quantum resources. Second, a complete experimental demonstration of supervised learning based on separable squeezed states requires three independent squeezed-light sources, which places significantly more resource overhead than SLAEN's single squeezed-light source. Hence, SLAEN also enjoys a practical advantage over classical classifiers based on separable squeezed states. It is worth noting that such a practical advantage would be more pronounced when sensors are nearby so that the entanglement distribution loss is low.
Our experiment has implemented an entanglement source trained by supervised learning. The original SLAEN proposal [33], however, also entails reconfigurable measurements. Since homodyne measurements commute with a linear quantum circuit, SLAEN's performance under three homodyne detectors equals that obtained by the variational measurement apparatus considered by Ref. [33]. The current SLAEN protocol only leverages Gaussian sources and measurements, but non-Gaussian resources would potentially improve its performance. Indeed, non-Gaussian measurements have been shown to benefit quantum metrology [36], quantum illumination [37], and entanglement-assisted communication [38]. A variational circuit approach for non-Gaussian entanglement generation and measurements would open a promising route to further enhance the performance.

V. CONCLUSIONS
In conclusion, we have experimentally demonstrated the SLAEN framework for quantum-enhanced data classification. Our work opens a new route for exploiting NISQ hardware to enhance the performance of realworld data-processing tasks. Our current experiment verified SLAEN's quantum advantage in classifying features embedded in RF signals, but SLAEN by itself is a general framework applicable to data-processing problems in other physical domains by appropriately engineering entangled probe states and quantum transducers. The present experiment only demonstrated data classification with linear hyperplanes. To accommodate nonlinear hyperplanes, non-Gaussian entangled probe states [39] and joint quantum measurements [40] would be needed, and the VQC parameter optimizer would also need to be trained to conduct an effective mapping from the SVM hyperplane to the VQC parameters. With these developments, we envisage that SLAEN would create new near-term opportunities in a variety of realms including distributed big-data processing, naviga-tion, chemical sensing, and biological imaging. The simultaneous perturbation stochastic approximation (SPSA) algorithm is used by the classical supportvector machine (SVM) to update the hyperplane in each training step. The SPSA algorithm calculates an approximation of the gradient with only two measurements, w + and w − , of the loss function. This simplicity leads to a significant complexity reduction in the cost optimization. See Algorithm 1 for details.
In the algorithm, d is the dimension of the data set. d = 2 for classification problems in a 2D data space, while d = 3 for classification problems in a 3D data space. The choices of a, c, A, and γ determine the gain sequences a k and c k , which in turn set the learning rates and have a significant impact on the performance of the SPSA algorithm. The parameters used by the classical SVM in our experiment are: a = 1, c = 1, A = 200, α = 0.602, and γ = 0.1.
The SPSA algorithm calls a loss function that is in line with the form of the cost function (Eq. (1) of the main text) but allows for an iterative optimization, as defined below: Fig. 5. Squeezed light at 1550 nm is generated from an optical parametric amplifier (OPA) cavity where a type-0 periodically-poled KTiOPO 4 (PPKTP) crystal is pumped by light at 775 nm produced from a second harmonic Algorithm 1: The simultaneous perturbation stochastic approximation (SPSA) [41] Initialization a;c;A;α;γ;d;N; generation (SHG) cavity. The cavities are locked by the Pound-Drever-Hall technique using 24-MHz sidebands created by phase modulating the 1550-nm pump light prior to the SHG. A small portion of light at 1550 nm modulated at 20 MHz is injected into the OPA cavity and phase locked to the pump light to operate in a parametric amplification regime. In doing so, the squeezed light emitted from the OPA cavity is composed of an effective single-mode squeezed vacuum state residing in the 11-MHz sidebands and a displaced phase squeezed state at the central spectral mode. Due to the large quadrature displacement at the central spectral mode, it can be well approximated by a classical coherent state. More details about the characterization of our squeezed-light source are enclosed in Supplemental Material of Ref. [31].
The squeezed light is directed to a variational quantum circuit (VQC) composed of two variable beam splitters (VBSs) and three phase shifters, parameterized by Here, v m is the portion of the power diverted to the mth sensor, satisfying 3 m=1 v m = 1. φ m is the phase shift on the quantum state at the mth sensor. Each VBS comprises a first half waveplate, a quarter waveplate, a phase modulator (PM), a second half waveplate, and a polarizing beam splitter. The power splitting ratio is controlled by applying a DC voltage generated from a computer-controlled data acquisition board (NI PCI 6115). The DC voltage is further amplified by a high-voltage amplifier (Thorlabs HVA 200) with a gain of 20 prior to being applied on the PM. The power por- tions are determined by: where E s 1 , E s 2 are DC voltages applied on PM 1 and PM 2.
After the VBSs, the three-mode entangled probe state, represented by the annihilation operators {b 1 ,b 2 ,b 3 }, is diverted to three RF-photonic sensors, each equipped with an EOM driven by an RF signal at a 11-MHz carrier frequency. Due to the phase modulation, a small portion of the coherent state at the central spectral mode is transferred to the 11-MHz sidebands, inducing a phase quadrature displacement. The quadrature displacement at each RF-photonic sensor is equal to [31] where g m = ±1 is set by an RF signal delay that controls the sign of the displacement. Choosing g m = −1 is equivalent to introducing a π-phase shift on the quantum state at the mth sensor [31], i.e., setting φ m = π in the VQC parameters. In Eq. (B2), a c m is the amplitude of the baseband coherent state at the mth sensor. Specifically, a c m = √ v m β, where β is the amplitude of the baseband coherent state at the squeezed-light source. V π is the half-wave voltage of the EOM, and γ describes the conversion from an external electric field E m to the internal voltage.
A more detailed theoretical model for the setup was presented in Ref. [31].
Subsequently, phase-quadrature displacements carried on the quantum light at the three sensors are measured in three balanced homodyne detectors. At each homodyne detector, the quantum light and the local oscillator (LO) first interfere on a 50/50 beam splitter with a characterized interference visibility of 97%, and then detected by two photodiodes, each with a ∼88% quantum efficiency. The difference photocurrent is amplified by an transimpedance amplifier with a gain of 20 × where η is the quantum efficiency at each sensor and N s is the total photon number of the single-mode squeezed light at the source. In our experiment, η ∼ 53% and N s 3.3. In comparing the summation in Eq. (B3) with that in Eq. (1), it becomes clear that choosing the mapping from the hyperplane parameter w to the VQC setting v to be w m = √ v m exp (iφ m ) minimizes the measurement noise. In the current setup, we measured a 2.9 (3.2) dB noise reduction with the three-(two-) mode entangled state. The characterization of our sensor networks has been reported in Ref. [31].

a. Calibration of variational quantum circuit
To ensure accurate configuration of the VQC, we first calibrate the power splitting ratio of both VBSs. In calibrating VBS 1, we scan the voltage E s 1 applied on PM 1 and measure the transmitted optical power, as plotted in Fig. 6 (a). The data are fitted to a sinusoidal function in Eq. (B1), which derives V π = 606 V for PM 1. An identical calibration procedure is applied on VBS 2, obtain V π = 606 V for PM 2.
We then measure the quadrature displacements under different VBS splitting ratios, as a means to test the locking stability between the quantum signal and the LO. To do so, while the quantum signal and LO are phase locked, the VBS transmissivity is randomly set to one of 17 values at 30 Hz, subject to the limited bandwidth of the control system. 100 homodyne measurements of quadrature displacement are taken at each transmissivity at a 500 kHz sampling rate. The fitted data are plotted in Fig. 6 (b), showing excellent signal stability and agreement with theory in Eq. (B1). The value of the extrapolated V π is around 612 V, in good agreement with the specification of the EOM. The tunable range for the VBS transmissivity is between 0.07 and 0.93, limited by the maximum output voltage of the high-voltage amplifier (±200 V). VBS 2 is calibrated in an identical way, deriving a V π consistent with that of VBS 1. During training, the transmissivity of the VBS is restricted within 0.125 to 0.875 to ensure sufficient light power for phase locking between the quantum signal and the LO.

b. Calibration of RF-photonic transduction
The training data of RF-field direction (meanamplitude) classification are prepared by applying phase (amplitude) modulation on the RF-signals. Modulations on RF signals are converted to different quadrature displacements by three EOMs. To ensure linearity in the transduction from the amplitude and phase of the RF signals to quadrature displacements, we calibrate the quadrature displacements at each sensor with respect to the modulation voltages that determine the amplitude and phase for the RF signals applied on the EOMs. In the calibration of phase modulation at Sensor 1 shown in Fig. 7 (a), as we sweep the modulation voltage on the function generator for the RF signal from -0.5 V to 0.5 V with an increment of 0.1 V, 100 homodyne measurements of the quadrature displacement are recorded for each modulation voltage at a 500 kHz sampling rate. The distribution of the experimental data on the vertical axis at a given modulation voltage arises from the quantum measurement noise. The fit shows an excellent linear dependence of quadrature displacement vs the modulation voltage. To calibrate the amplitude modulation on the RF signal, we first set the modulation depth to 120% to allow for a sign flip on RF signal to enable both positive and negative quadrature displacements. We then take 100 homodyne measurements at each modulation voltage at a 500 kHz sampling rate. The experimental data and fit are plotted in Fig. 7 (b), showing excellent linear dependence of the measured quadrature displacement with respect to the amplitude of the RF signal. The other two EOMs are calibrated in the same way. entail, respectively, the probed RF-field amplitudes and phases at the M = 2 or M = 3 sensors, and y (n) ∈ {−1, +1} is the true label, which can be derived using the true hyperplane {w t , b t } for the data-classification problem in hand. Each sensor then converts the probed RF field into an internal voltage signal, which in turn drives the EOM to induce a quadrature displacement on the quantum signal which is similar to Eq. (B2). A technicality associated with the quadrature displacement at the mth sensor is that it depends on both E (n) m and the amplitude of the baseband coherent light, a (n) c m = v (n) m β, as shown by Eq. (B4) (see also Ref. [31] for a more detailed description). Our experiment focuses on demonstrating the principle of SLAEN, so, without loss of generality, displacement's dependence on the baseband light is eliminated by scaling γ by a factor of 1/ v (n) m such that the amount of induced displacement is solely determined by the training data. In our experiment, this is accomplished by applying an extra amplitude modulation that introduces a gain of 1/ v (n) m on the RF signal before it goes to the EOM.
We properly choose γ such that the training data point {E (n) , ϕ (n) , y (n) } leads to random quadrature displacements α (n) at the involved sensors with each displacement value initially following a uniform distribution within [−4, 4] in the shot-noise unit. The signal-tonoise ratio of the training data set is tuned by excluding points within a margin of to the hyperplane. In doing so, the signal-to-noise ratio is raised with an increased . In the training experiments, = 0.6 is chosen, comparable to one shot-noise unit.
In the 3D data-classification experiment, amplitude modulations on the RF signals from three function generators prepare the training data. Two DC voltages produced by a multifunction I/O device (NI PCI-6115) are used to configure the two VBSs in the VQC. In the 2D data-classification experiment, phase modulations on the RF signals from two function generators prepare the training data, and one DC voltage generated by the same multifunction I/O device is used to configure VBS 1.
The flowchart of the training process is sketched in Fig. 8. The training starts with an initial hyperplane w (0) , b (0) and its corresponding VQC setting v (0) . Here, b (0) is a number stored in the classical SVM algorithm and will be updated during training.
The measurement data at each sensor are collected by a multifunction I/O device (NI USB-6363) operating in an on-demand mode and are then transmitted to a desktop computer on which the classical SVM algorithm runs. In the nth training step, the measurement dataα (n) from all sensors, the true label y (n) and the current hyperplane w (n−1) , b (n−1) are fed to the SPSA algorithm, which then updates the hyperplane to w (n) , b (n) , as elaborated in Appendix A. The VQC setting is subsequently updated to v (n) . The next training step starts with adjusting the power splitting ratios of the VBSs by applying two voltages on the PMs based on Eq. (B1) and the calibrated V π . The new training data are then applied through the EOMs.
During training, a phase shift ϕ (n) m = π needs to be applied to the quantum stateb m when sign(w (n) m ) = −1. Experimentally, this is done by flipping the sign of the emulated RF-signal amplitude. If the sign of the initial hyperplane, sign(w (0) m ), is different from that of true hyperplane, w (n) m will move across zero, which will cause zero optical power being delivered to the mth sensor such that the phase locking between the quantum signal and the LO breaks down. To avoid this, we restrict the minimum powter splitting ratio to min v (n) m = 0.125, so that  a sign flip on w (n) m will be applied whenever v (n) m hits this boundary. The training iterates 200 steps for the 2D dataclassification experiment and 390 steps for the 3D dataclassification experiment. The loss function converges to its minimum with the hyperplane w , b when training completes.

b. Utilization stage for SLAEN
In the utilization stage, SLAEN performs data classification on new measurement dataα (n) , each with an unknown label. The new data follow the same statistical distribution as the training data. To verify the convergence in the training process, we first measure the error probabilities at different training steps with the hyper- 20, 40, ..., 160, 180, 200} in the 2D data-classification experiment and k ∈ {0, 30, 60, ..., 330, 360, 390} in the 3D data-classification experiment. The classical SVM is set to use the hyperplane w (k) , b (k) , and the VQC is configured by the corresponding setting v (k) . For each error-probability measurement, 1000 testing data points are applied on the EOMs at a 500-kHz rate by a multifunction I/O device (NI USB-6363), and the measurement dataα (n) are synchronously recorded by the same device. A decision is made based oñ We then estimate the error probability via P E = N=1000 n=1 |(ỹ (n) − y (n) )|/N. To verify the scaling of error probability with respect to the signal-to-noise ratio of the data set, the hyperplane parameters and the VQC setting are configured to w , b and v . The error probabilities for the 2D example are measured using data sets with margins ∈ {0.2, 0.4, 0.6, 0.8, 1}.

c. Training and utilization stages for classical classifier
The measurement noise at different sensors in the classical classifier is independent. As such, the classical classifier can solely be trained in post processing carried out by the classical SVM. To perform a direct performance comparison, the training data sets for SLAEN are used to train the classical classifier. The hyperplane w (n) , b (n) is updated in each training step. The error probabilities at different training steps are measured to validate the convergence. As a comparison, the scaling of the error probabilities for the classical classifier in the 2D example is also measured using the same testing data sets as SLAEN employs.

Experiment for general 3D data classification
To show that SLAEN can be trained to tackle general data-classification problems, we randomly choose a true hyperplane and experimentally train SLAEN and the classical classifier to the undertake the classification task. In the experiment, the initial hyperplane is randomly set to {w 0 = (0.60, 0.566, 0.566), b 0 = 0.45}, and the picked true hyperplane is {w t = (0.8165, 0.4082, 0.4082), b t = 0}.
A training data point is supplied to SLAEN at each of the 390 steps, during which the evolving hyperplane param-eters are recorded. As anticipated, the experimental result depicted in Fig. 9 (a) shows that the hyperplane parameters move toward the optimum during training, indicating SLAEN's capability of solving general data-classification problems as long as training data are provided. As a comparison, we train the classical classifier over 390 steps with the same training data set used for SLAEN. The evolving hyperplane parameters during training is plotted in Fig. 9 (b), showing that the classical classifier can also shift the hyperplane toward the optimum.
With the experimentally measured hyperplane parameters during training, the error probabilities for SLAEN and the classical classifier are derived and plotted in Fig. 9 (c). SLAEN possesses a clear error-probability advantage over the classical classifier. Specifically, the error probability of SLAEN is two-fold less than that of the classical classifier when both are trained.

Appendix C: SIMULATIONS
We have performed Monte Carlo simulations for the training processes of SLAEN and the classical classifier on a classical computer, as a means to verify the qualitative behaviors of the evolving hyperplane parameters and error probabilities during the training experiments. Note that such a training simulation is merely a testing tool and cannot replace the physical training of SLAEN or the classical classifier in their practical applications because the original data {E (n) , ϕ (n) } probed by the sensors are in general unavailable.

Simulation for two-dimensional data classification
The simulation of the training for 2D data classification undergoes 200 steps, each of which consumes a randomly generated data point. The measurement noise for SLAEN and the classifier is also randomly generated, with the correlation between the measurement noise at SLAEN's different sensors accounted for. To facilitate the comparison between the experimental data and the simulation results, Fig. 3 (a-c) in the main text is replicated as Fig. 10 (a-c) here. Fig. 10 (d) depicts the simulated convergence of error probabilities for SLAEN (blue) and the classical classifier (red). In addition, we simulate the evolving hyperplane {w (n) 1 , w (n) 2 , b (n) } during training and plot the results for SLAEN in Fig. 10 (e) and for the classical classifier in Fig. 10 (f). In comparing the top and middle panels of Fig. 10, excellent qualitative agreement between the experimental data and simulation results is found. Note that since the experi-  Fig. 10 (i), which shows, qualitatively, that SLAEN's optimized hyperplanes (blue circles) are almost enclosed by the classical classifier's optimized hyperplanes (red circles). This is an evidence for SLAEN's enhanced accuracy in seeking the optimum solutions.
To conduct a more quantitative assessment on the convergent behaviors for the hyperplane parameters, we define the distance between SLAEN's hyperplanes and the optimum hyperplane after n training steps as The standard deviation of the distance is then defined as (C2) Likewise, the distance between the classical classifier's hyperplanes and the optimum hyperplane is defined as The standard deviation for the classical classifier's distance is then defined as The distances at different training steps are plotted in Fig. 11 for SLAEN's hyperplanes (red) and the classical classifier's hyperplanes (blue). The distance for SLAEN's hyperplanes after 200 training steps is d  Step 0 Step 50 Step 200

Simulation for three-dimensional data classification
We next simulate the training processes of SLAEN and the classical classifier for 3D data classification. The training for each case takes 390 steps, identical to the number of training steps in the experiment. To facilitate the comparison between experimental data and the simulation results, the plots in Fig. 3 (d-f) of the main text are replicated as Fig. 12 (a-c) here. The simulated con-vergence of error probabilities is plotted in Fig. 12 (d). Fig. 12 (e) and (f) draw, respectively, the simulated histories of the hyperplane parameters for SLAEN and the classical classifier during training. The qualitative behaviors for the experimental data agree very nicely with those of the simulation results, thereby supporting the validity of the experimental approach.
In addition, we conducted a statistical study of the distances between the hyperplanes and the optimum hy- perplane during training for 3D data classification. The distributions of the hyperplane parameters for SLAEN and the classical classifier are plotted in Fig. 3 (g-i) of the main text for, respectively, the initial hyperplanes, the hyperplanes after 100 training steps, and the hyperplanes when training completes. It can be visually observed that SLAEN's optimized hyperplanes locate closer to the optimum hyperplane than the classical classifier's optimized hyperplanes. As a quantitative analysis, the distances vs. training step curves for SLAEN and the classical classifier are plotted in Fig. 13. Akin to Fig. 11, SLAEN enables a reduced distance between its optimized hyperplanes and the optimum hyperplane. This is a consequence of the entanglementenabled measurement-noise reduction mechanism that SLAEN harnesses. The inset of Fig. 13   The original SLAEN theory paper [33] showed that the performance of data processing tasks undertaken by an entangled sensor network is superior to that of a sensor network based on separable squeezed states that have the same total photon number as the entangled state. In this Section, we show, in simulation and by experimental data, that our SLAEN experiment achieves an advantage over a sensor network based on separable squeezed states in data-classification tasks, subject to a photon-number constraint. Specifically, our simulation shows that data classification based on separable squeezed states has a larger error probability than that of our SLAEN experiment. We also experimentally show that the quantum noise of a sensor network with separable squeezed states is higher than that of SLAEN, thereby supporting the SLAEN's claimed advantage over supervised learning based on separable squeezed states. Finally, we present the motivation behind the main text's focus on a performance comparison between SLAEN and classical classifiers based on coherent states.

Simulation for 3D data classification using separable squeezed states
We simulate the training process of average RFfield amplitude classification undertaken by a threenode sensor network based on separable squeezed states. The total photon number of the separable squeezed states is set to be the same as that of the entangled states in the SLAEN experiment. The evolution and convergence of the error probabilities during the training process are plotted in Fig. 14.
In the simulation, the initial hyperplane is set to {w 0 = (0.9044, 0.3152, 0.2876), b 0 = 0.53}, identical to that of the training in the SLAEN experiment. As a comparison, we experimentally measure the error-probability evolution in SLAEN by taking ten more measurements based on the same experimental setting as what is used to produce Fig. 12a. Our SLAEN experiment shows an error probability advantage of ∼ 13% over that of a simulated sensor network based on separable squeezed states.

Experimental noise calibrations
A complete demonstration of a three-node sensor network with separable squeezed states requires three independent squeezed-light sources, which places a significant resource overhead. Instead, we calibrate the quantum noise of a sensor network with separable squeezed states using a time-domain multiplexing approach introduced by Ref. [30]. We first set the mean photon number of our squeezed-light source to that of a separable squeezed state at a single sensor. We then take three samples in the time domain to emulate the independent quantum noise at three sensor nodes. The histogram of the averaged homodyne data is plotted in Fig. 15 and fitted with a normalized Gaussian probability density function. Since the measured noise variance of the separable sensor network is ∼ 11.7% higher than that of SLAEN, it is anticipated that the error probability of SLAEN beats that of a sensor network based on separable squeezed states.

Performance comparison
Quantum metrology studies how nonclassical resources such as squeezed light and entanglement can be utilized in a measurement system to enable a performance advantage over systems based on classical resources. Such a performance gain in sensing underpins SLAEN's error-probability advantage over separable sensor networks. In many practical optical sensing systems such as the Laser-Interferometer Gravitational-Wave Observatory (LIGO), the usable power of the classical laser light is limited due to, e.g., thermal effects, photon radiation-pressure induced torques, and parametric instabilities that cause adverse effects on the system performance [35]. Nonclassical squeezed light is then injected into the system to further improve the measurement sensitivity. In such a scenario, the measurement sensitivity achieved by a classical laser at a given power level is defined as the standard quantum limit (SQL). Surpassing the SQL using nonclassical resources demonstrates a quantum advantage enabled by quantum metrology. In our experiment, SLAEN's performance is compared with that of a classical classifier based on laser light, i.e., coherent states. The error probabilities for a classical classifier are measured at a given laser power level. SLAEN's error probabilities are then measured at the same laser power level while entanglement is distributed and shared by the sensors. In the SLAEN experiment, the calibration gives a total photon number of the entangled state of N S = 3.3 and the quantum efficiency of η = 0.53 at each sensor. In a conceived threenode sensor network based on separable squeezed states, the mean photon number of a separable squeezed state at each sensor will be 1.1, so that the total photon number matches that of the entangled state. We can then es- timate a noise reduction of 2.57 dB below SQL at the same quantum efficiency at each sensor as the SLAEN Homodyne output (SQL unit) Histogram ( normalized probability mass function) Figure 15. Histograms of homodyne data. Red bins: separable squeezed states. Blue bins: entangled state. Histograms are normalized to probability mass functions and fitted with Gaussian probability density functions. Red curve: theory fit for separable squeezed states. Blue curve: theory fit for entangled state. Black curve: standard quantum limit (SQL). experiment. The squeezed state residing at the 11-MHz sidebands is at tens of pico Watts power level, while in the experiment most photons at each sensor originate from the strong (∼ 50 µW) coherent state at the central wavelength of 1550 nm. Give the ∼ 6 orders of magnitude power disparity between the strong coherent state and the quantum states at the sidebands, a separable sensor network based on either coherent states or separable squeezed states employs nearly identical optical power as an entangled sensor network. This situation is in analogy to LIGO in which the overall optical power remains nearly unchanged despite the injection of squeezed light into the interferometer.