Channel-noise tracking for sub-shot-noise-limited receivers with neural networks

Non-Gaussian receivers for optical communication with coherent states can achieve measurement sensitivities beyond the limits of conventional detection, given by the quantum-noise limit (QNL). However, the amount of information that can be reliably transmitted substantially degrades if there is noise in the communication channel, unless the receiver is able to efficiently compensate for such noise. Here, we investigate the use of a deep neural network as a computationally efficient estimator of phase and amplitude channel noise to enable a reliable method for noise tracking for non-Gaussian receivers. The neural network uses the data collected by the non-Gaussian receiver to estimate and correct for dynamic channel noise in real-time. Using numerical simulations, we find that this noise tracking method allows the non-Gaussian receiver to maintain its benefit over the QNL across a broad range of strengths and bandwidths of phase and intensity noise. The noise tracking method based on neural networks can further include other types of noise to ensure sub-QNL performance in channels with many sources of noise.


I. INTRODUCTION
The intrinsic properties of coherent states can enable efficient and practical classical [1][2][3][4] and quantum [5][6][7][8][9] communications. When utilizing the phase of coherent states combined with their intensity to encode and transmit information, higher rates of information transfer may be achieved compared to communication schemes using intensity-only encodings [4,10]. However, channel noise can severely limit the advantage of communications with coherent encodings. In conventional coherent communications, the optical receiver performs a heterodyne measurement with shot-noise-limited sensitivity, corresponding to the quantum-noise limit (QNL). This measurement allows for the use of post processing methods of the collected data to estimate channel noise and correct the data to recover the transmitted information [10][11][12][13][14][15][16][17][18][19]. While current coherent optical communications rely on these conventional approaches, a heterodyne measurement cannot reach the ultimate limit of sensitivity [20] and information transfer [1,21,22].
In contrast to conventional strategies, non-Gaussian receivers can surpass the QNL, providing higher measurement sensitivities for decoding information [23][24][25][26][27][28][29][30]. However, in the presence of channel noise, the benefit of non-Gaussian receivers over conventional strategies critically depends on the ability to perform efficient channelnoise tracking. Recent work demonstrated an efficient method for phase tracking for non-Gaussian receivers [31]. This phase tracking method estimates and corrects for the phase noise in real time, which is required by the strategies used in non-Gaussian receivers, as opposed to post processing of the collected data with heterodyne detection. This method enabled sub-QNL sensitivity in the presence of phase noise, which is particularly damaging for coherent encodings [31,32].
In more realistic situations there may be multiple sources of noise present in the communication channel, such as thermal noise [33,34], phase diffusion [35][36][37], phase noise, and amplitude noise. In such situations the non-Gaussian receiver must perform efficient highdimensional parameter estimation and tracking in order to maintain the expected sub-QNL performance. However, current methods for single-parameter noise tracking cannot be efficiently scaled to higher dimensions for tracking and correction of multiple sources of noise. Thus, enabling noise tracking for non-Gaussian receivers in channels with complex and dynamic noise requires novel and efficient methods for multi-parameter estimation that scale favorably to higher dimensions. Practical parameter tracking also requires estimation on a time-scale which is very small ( 1%) compared to the bandwidth of the channel noise. For example, realistic kilohertz scale phase noise [4,17,18] would require estimation on at least megahertz time scales, and a Bayesian estimator may not be compatible with this requirement.
In this work, we numerically investigate a method for multi-parameter channel noise tracking based on a neural network (NN) estimator for a non-Gaussian receiver with sub-QNL sensitivity for state discrimination of quaternary phase-shift-keyed (QPSK) coherent states. We construct a NN as a precise and computationally efficient multi-parameter estimator for tracking the timevarying phase and intensity of the input coherent states, and benchmark its performance against a Bayesian estimator, which is expected to be accurate but is computationally expensive to calculate. We find that, across a broad range of channel noise strengths and input powers, the NN-based method for noise tracking shows sim- (a) Schematic of two parties, Alice and Bob, communicating over a noisy channel with Bob at the receiver performing phase and amplitude noise tracking. The receiver uses an adaptive non-Gaussian measurement strategy to realize state discrimination below the QNL using interference between the input state and a local oscillator (LO) followed by photon counting with a single photon detector (SPD). A neural network (NN) takes the data collected from the non-Gaussian receiver and outputs estimates for the current phase offset and input intensity. The estimates are fed-forward to the LO to match the input intensity and counteract phase noise in real-time. (b) Effect of channel noise with (blue) and without (orange) perfect parameter tracking (see main text) for the non-Gaussian receiver. Black points show a perfectly-corrected heterodyne receiver, and black and gray dashed lines show the error for a non-Gaussian and a heterodyne receiver in the absence of noise, respectively.
ilar performance to a Bayesian-based noise tracking approach, and allows the non-Gaussian receiver to maintain sub-QNL sensitivity. This shows that a NN estimator is a viable method for real-time, multi-parameter channel noise tracking in non-Gaussian receivers due to its efficiency and potential scalability to higher dimensions. In Sec. II we describe the non-Gaussian receiver strategy and the NN estimator used for the noise tracking method. In Sec. III we investigate the performance of the channel noise tracking. We discuss the results of the work in Sec. IV.

II. RECEIVER AND ESTIMATION STRATEGY
We numerically study the use of a NN-based method for noise parameter tracking for non-Gaussian receivers based on adaptive measurements and photon counting. As a proof-of-concept, we investigate a NN-based method for tracking phase and amplitude channel noise, that uses only the data collected during the state discrimination measurement. This NN-based method can be easily extended to perform higher dimensional parameter estimation for tracking additional sources of noise in the channel such as thermal noise [33,34] or phase diffusion [35][36][37]. In this section, we describe (A) the measurement strategy of the non-Gaussian receiver for coherent state discrimination; (B) how the data from the measurement is used by the NN; and (C) the NN estimator, which can be used for estimation of channel noise from multiple sources.
A. State Discrimination Measurement Figure 1(a) shows a scenario where a receiver attempts to perform coherent state discrimination with an adap-tive photon counting measurement with sensitivity below the QNL [25,26]. Dynamic phase and amplitude noise induced by the communication channel degrades the attainable sensitivity of the receiver. Tracking the phase and amplitude noise of the input states induced in the channel using the data collected during the discrimination measurement can in principle allow the receiver to correct its strategy and maintain sub-QNL sensitivity.
Here, we study a method for channel noise tracking for a receiver based on an adaptive non-Gaussian strategy [26] for phase coherent states |α k ∈ {|αe i2πk/M }, where k = 0, 1, ..., M − 1. For M = 4, this corresponds to quaternary phase-shift-keyed (QPSK) coherent states. The state discrimination strategy consists of L adaptive measurement steps. Each step performs a hypothesis test of the input state using a local oscillator (LO) to implement a displacement operationD(β) through interference and single photon counting. In each adaptive step j = 1, 2, ...L, the receiver attempts to displace the most likely state to the vacuum state by adjusting the LO phase arg(β) = θ j ∈ {0, π/2, π, 3π/2} with |β| = |α k |, followed by single photon detection. The detector has a finite photon number resolution (PNR) where up to m photons can be resolved, denoted as PNR(m), before becoming a threshold detector [26]. At the end of the L adaptive steps, the best guess of the receiver θ disc for the true input phase is the state with maximum a posteriori probability given the entire detection history. As described in Section II(B) and Section II(C), the photon counting data from the adaptive measurement steps together with θ disc allows the receiver to perform phase and amplitude tracking, where estimates of the channel noise are fed-forward to the LO in order to maintain the sub-QNL performance of the receiver. Figure 1(b) shows an example of the error probability for the adaptive non-Gaussian receiver for QPSK states for an average input mean photon number of n 0 = |α| 2 = 5.0, which is proportional to the intensity, averaged over 5000 noise realizations and obtained through Monte-Carlo simulations. For all Monte-Carlo simulations in this study, we assume ideal detection efficiency, zero detector dark counts, a photon number resolution of PNR (10), and L=10 adaptive steps. To represent a realistic experiment, we use an interference visibility of the displacement operation of 99.7% [26]. The blue (orange) points show the error probability for the non-Gaussian receiver with (without) perfect noise tracking. Perfect tracking refers to a situation where the receiver has complete knowledge of the time-dependent input intensity and phase noise induced by the channel. The black points show the error of an ideal heterodyne measurement, performing at the QNL, with perfect tracking [62]. The dashed lines show the expected error in the absence of noise for a heterodyne (gray) and non-Gaussian (black) receiver. The error for the non-Gaussian measurement remaining below the heterodyne limit (QNL) shows that if the receiver can implement accurate parameter tracking, then its benefit over the QNL can be maintained. Furthermore, any tracking method for the non-Gaussian receiver requires correcting for dynamical noise in real-time to ensure sub-QNL performance [31], in contrast to methods for heterodyne receivers, where estimation and correction can be done in post-processing of the data.

B. Detection Matrix
The measurement data collected by the non-Gaussian receiver from the discrimination of N input states is used for parameter estimation. For the discrimination of one input state, this data consists of the L photon detections {d j } L and relative phases {∆ j } L between the LO and input state for each adaptive step j. Due to the low error rate achieved by the non-Gaussian measurement, the guess θ disc of the phase of the input state corresponds to the true input phase with high probability. Thus, θ disc can be used to infer the relative phase ∆ j between the LO and actual input state at every adaptive measurement step j such that ∆ j = θ j − θ disc , as in [31]. This state discrimination data {∆ j , d j } is binned into what we refer to as the detection matrix D, which is a M ×(m+1) matrix, where m is the PNR of the receiver. After each measurement, the matrix elements D l,k are incremented by the total number of times that the number of detected photons in an adaptive step j was d j = l and the relative phase was ∆ j = 2πk/M for k ∈ {0, 1, ..., M − 1}. Thus, the rows of the matrix D represent the photon number distributions for different relative phases kπ/2 between the LO (θ j ) and final hypothesis (θ disc ) for QPSK states [31]. After completing N experiments, the matrix D contains N × L pairs {d j , ∆ j } and it is used for parameter estimation. Once estimation has been performed, the matrix is reset such that D l,k = 0 for all l and k. In or-   Table I in Appendix A. The NN inputs are a flattened version of the detection matrix D normalized across each row, and the LO intensity B for the measurements whose data is contained in the detection matrix. The outputs of the NN are estimates for the phase offsetφNN and input intensityÂNN .
der to extract information from D to correct for channel noise affecting the measurement, the receiver must utilize a particular estimator. A Bayesian estimator, which uses the full likelihood functions, will yield estimates for the channel noise with small uncertainty [63]. However, this estimator is computationally demanding to calculate. Since the estimation and correction of the channel noise for non-Gaussian receivers must be performed in real-time, a Bayesian method may be incompatible with applications requiring high bandwidth sub-QNL receivers. Therefore, to enable practical implementations of non-Gaussian receivers requires an estimator that is both precise and computationally efficient while being easily scalable to higher dimensions to track multiple sources of channel noise. For example, the simple case of single parameter estimation for phase tracking for non-Gaussian measurements has been experimentally demonstrated [31] using a simple estimator, which is calculated in real-time with minimal computational resources.

C. Neural Network Estimator
We construct a NN as a multi-parameter estimator which maps the data collected from the state discrimination measurement to estimates for the input intensity and phase offset. We compare the performance of the NN estimator to a Bayesian estimator. The Bayesianbased method for noise tracking serves as a benchmark and is calculated from the same state discrimination measurement data, i.e. the detection matrix D. Although we study phase and amplitude tracking, a properly trained NN can in principle be used as an efficient high-dimensional estimator for tracking many sources of communication channel noise. Figure 2 shows a diagram of the NN architecture for the proposed noise tracking method, which has 10 layers (8 hidden), each with a Leaky ReLU activation function [64]. To obtain the input for the NN, the detection matrix D is first normalized across each row, and then arranged into a one-dimensional vector (D l,k → D l(m+1)+k ). This vector, along with the LO intensity for the previous N measurements, are the inputs to the NN. For ease of notation, we denote the time-dependent input intensity of the QPSK coherent states as A(τ ) = |α| 2 (τ ) where τ represents time discretized into steps of ∆T , where 1/∆T is the experimental repetition rate. For a single state discrimination measurement at time τ , the intensity of the LO is denoted as B(τ ) = |β| 2 (τ ). The NN output, denoted asÂ N N andφ N N , are raw estimates of the input intensity A(τ ) and relative phase offset φ(τ ) during the previous N state discrimination measurements. The NN is trained on 5 × 10 5 samples of the state discrimination measurement generated from Monte-Carlo simulations of the experiment in Python. For training the NN, we use the Tensorflow library [65] with a weighted mean squared error cost function (See Appendix A for details) [66][67][68][69].
The trained NN is then included in the Monte-Carlo simulations to perform parameter tracking on the state discrimination data such that the estimates from the NN are fed-forward to the LO to correct the measurement.

III. RESULTS
We simulate the performance of the noise tracking method based on the NN estimator for a variety of scenarios with amplitude and phase noise for average input intensities n 0 = A(0) = A(τ ) equal to 2, 5, and 10.
Here · denotes the average across all noise realizations at the time step τ . We benchmark the NN against a Bayesian estimator where the prior probability distribution for both parameters is uniform [31]. For all simulations, we use a single NN to perform multi-parameter estimation and noise tracking across a range of input powers and noise parameter regimes.
As a model for phase noise φ(τ ), we simulate a discrete Gaussian random walk in phase [31]. A single step of this walk has a variance of σ 2 1 = 2π∆ν∆T where ∆ν is the phase noise bandwidth due to finite laser linewidth [4,17,18] or other phase noise sources [4,70]. The experimental repetition rate is set to 1/∆T = 100MHz such that ∆T = 10ns to represent a feasible, near-term communication bandwidth for non-Gaussian receivers [71]. To model amplitude noise of the input states, we simulate noise in the input intensity A(τ ). As a noise model, we use an Ornstein-Uhlenbeck (OU) process [72,73] whose stochastic differential equation is given by: where γ is the amplitude noise bandwidth, Σ controls deviation of the walks, and dW denotes a Wiener process. Here the noise parameters are: γ = 25kHz, Σ 2 ∞ = 1.5, and ∆ν = 2kHz. Blue (orange) points show the error for the NN (Bayes) based estimator. Green and black points show the error with no correction and perfect correction, respectively. Gray points show the effective QNL of a perfectly corrected heterodyne measurement. Black and gray dashed lines show the error for a non-Gaussian and heterodyne receiver, respectively, with no noise. and the maximum long-time variance we implement is Σ 2 ∞ = {0.25, 1.5, 6.0} for n 0 = {2, 5, 10}, respectively, corresponding to a relative noise level of n 0 /Σ ∞ ≈ 0.25.
After N state discrimination measurements, estimates are calculated from the detection matrix D. To implement correction of the receiver, we set the LO intensity B(τ ) to the current estimated valueÂ(τ ) of the input intensity A(τ ). For phase tracking we add a correction δ(τ ) to the LO phase such that arg{β} = θ j + δ(τ ). The correction δ(τ ) is equal to the cumulative sum of individual estimatesφ up to the current time step τ . This is because the receiver is always estimating only the phase shift accumulated in the previous N experiments. The phase and intensity corrections remain fixed at these values for N experiments until new estimates are made and applied to the LO.
To reduce uncertainty in the phase and intensity estimates for noise tracking, we implement a Kalman filter [74] for both estimates (See Appendix B for details). The input for the filter are the current raw estimates for the input intensity and phase offset (Â N N ,φ N N ), and the filter outputs are updated, filtered estimates for the in-tensityÂ(τ ) and phaseφ. The same procedure is done to obtain filtered Bayesian estimates from the raw estimates (Â B ,φ B ). To implement the Kalman filter, we assume that the raw NN estimates are Gaussian distributed, and use Monte-Carlo simulations with fixed phase offset and input intensity to empirically obtain the variance of the NN estimator. We note that although we study two particular models for phase and amplitude noise, we believe this NN-based tracking method can be applied to a variety of noise forms such as power-law amplitude noise or damping noise. To study different noise models, the NN would need to be re-trained using data generated from the new model and the noise dynamics would need to be incorporated into the Kalman filter accordingly. Figure 3 shows (a) the error probability of the non-Gaussian receiver with noise tracking for 1000 different realizations with both phase (∆ν = 2kHz), and intensity (γ = 25kHz , Σ 2 ∞ = 1.5) noise shown in (b) and (c), respectively, for an input intensity n 0 = 5.0 and N = 10 experiments per estimation period. The blue (orange) points show the results the noise tracking method based on NN (Bayesian) estimators. The black points show the error probability with perfect correction, which corresponds to the case where the receiver has complete knowledge of the phase and intensity noise, so that B(τ ) = A(τ ) and δ(τ ) = φ(τ ). The green points show the error probability of an uncorrected non-Gaussian measurement, and the gray points show that of an ideal heteroydne measurement with perfect phase tracking (equivalent to φ(τ ) = 0). We note that even though the receiver may have perfect knowledge of the noise, the overall effect of the amplitude noise increases the error probability. This is because input powers smaller than the average power (A(τ ) < n 0 ) increase the errors more than the reduction of error for larger powers (A(τ ) > n 0 ). The dashed black and gray lines show the error for an adaptive non-Gaussian measurement, and an ideal heterodyne measurement with no noise, respectively. By comparing the error of the non-Gaussian measurement with perfect correction (black points) to the black dashed reference line, we observe that non-Gaussian measurements are more sensitive to amplitude noise than a heterodyne measurement (gray points vs gray dashed line), even when they are perfectly corrected. We observe that the NN-based tracking method performs equivalently to the Bayesian method, and both can allow the receiver to maintain an error probability significantly below the QNL. This result demonstrates the capabilities of a NN for efficient and reliable noise tracking for non-Gaussian receivers for state discrimination.
We study the robustness of the NN-based method in scenarios with different noise strengths and bandwidths in the phase and amplitude. For these studies, we use the heterodyne measurement with perfect phase tracking as the limit for conventional strategies, serving as the effective QNL when the same noise is applied to both receivers. In this section, we study (A) the error probability as a function of phase noise with fixed amplitude noise, and (B) the error probability when the amplitude noise levels are varied with phase noise with a fixed bandwidth.

A. Phase noise with different bandwidths
We study the performance of the noise tracking method based on the NN estimator as a function of the phase noise bandwidth ∆ν for fixed values of amplitude noise Σ 2 ∞ and γ. We compare these results to the tracking method based on a Bayesian estimator, as well as a perfectly-corrected non-Gaussian measurement. We use different amplitude noise parameters for different values of n 0 such that the relative amplitude noise strength n 0 /Σ ∞ is constant. For an average intensity of n 0 = 2, 5, and 10 we simulate 250, 250, and 500 different realizations of the noise, respectively. The simulations are run for 2 × 10 3 time bins of N = 10 experiments each, giving a total of 2 × 10 4 individual experiments per noise realization. We calculate the average error across all realizations for all time bins. Figure 4 shows the average error probability as a function of the bandwidth ∆ν for intensities n 0 = 2, 5, and 10. Figures 4(a)-4(c) have no amplitude noise (γ = 0, Σ 2 ∞ = 0), and 4(d)-4(f) have γ = 2kHz, and Σ 2 ∞ = {0.25, 1.5, 6.0}, respectively, corresponding to relative strength of n 0 /Σ ∞ ≈ 0.25. The performance of the noise tracking method based on the NN estimator (blue) is equivalent to the Bayesian-based method (orange) while being computationally efficient to implement. The purple and gray dashed lines show the average error of the non-Gaussian and ideal heterodyne receivers with perfect parameter tracking, respectively.
We observe that for all the investigated average input intensities n 0 , the NN-based method performs as well as the Bayesian method both with and without amplitude noise. The NN-based method can enable the non-Gaussian receiver to surpass the QNL up to a phase noise bandwidth of ∆ν ≈ 15kHz, even in the presence of significant amplitude noise. We note that the situation in Figs. 4(a)-4(c) with no amplitude noise is equivalent to the single parameter problem of phase tracking for non-Gaussian receivers as demonstrated in [31]. For large phase noise bandwidths for n 0 = 5, and 10, we observe that a NN estimator can perform slightly better than a Bayesian estimator. We believe this is due to the relatively small number of samples (N = 10) from which estimates are made. In this regime with a few samples for estimation, there may be estimators which perform better than the Bayesian estimator, which is asymptotically optimal in the limit of many samples. Another potential cause of this effect is that in the training process of the NN, the relative weight between error in phase estimates and error in mean photon number estimates can be adjusted. This freedom may allow for fine tuning of the overall training error to allow for a slightly better overall performance, in terms of error probability, for specific channel models.

B. Amplitude noise with different bandwidths
To investigate the effect of the amplitude noise bandwidth γ, we fix the long-time variance Σ 2 ∞ and the phase noise bandwidth ∆ν. This allows for studying the performance of the NN-based method when the amplitude noise bandwidth γ ranges from much smaller to much larger than the bandwidth for parameter estimation 1/N ∆T . Figure 5 shows the average probability of error for different amplitude noise bandwidths γ without and with phase noise of bandwidth ∆ν = 5kHz, for n 0 = {2, 5, 10} with Σ 2 ∞ = {0.25, 1.5, 6.0}, respectively. Blue (orange) lines show the error rates for the NN (Bayesian) based tracking method. Purple and gray dashed lines show the error probability for a non-Gaussian and heterodyne measurement with perfect correction, respectively. We find that the NN-based method performs closely to the Bayesian-based method, and enables the receiver to achieve sub-QNL error rates across a broad range of amplitude noise bandwidths even in the presence of phase noise.
We note that for intensity n 0 = 2 in Fig. 5(a), the error for both the noise tracking methods are below the perfectly corrected non-Gaussian measurement when ∆ν = 0. At low input powers, strategies that optimize the LO intensity (|β| 2 > |α| 2 ) yield lower error probabilities than when |β| 2 = |α| 2 [28]. Due to the small number of samples (N ×L) used for estimation, the NN and Bayesian estimators have a bias inÂ B,N N , such that B(τ ) > A(τ ). The effect of this bias in the intensity estimatesÂ B,N N is that the corrected measurement unintentionally approximates an optimized strategy [28]. This effect results in error probabilities of the corrected receiver with both NN and Bayesian based methods that are below the error of a perfectly corrected nulling receiver where B(t) = A(t), which is due to the bias of the estimators from finite sampling. Further investigation is needed to determine the capabilities of NN based noise tracking for optimized non-Gaussian receivers [28].
The performance of the non-Gaussian receiver also depends on the long-time variance Σ 2 ∞ of the amplitude noise. In our main results, Σ 2 ∞ was set to represent a "worst-case" scenario of ≈ 25% relative amplitude noise (See Fig. 3). Appendix C describes our study of noise tracking of amplitude noise with different long-time variance Σ 2 ∞ . In our findings, we observe that in the absence of phase noise, both the NN and Bayesian-based tracking methods enable the receiver to perform below the QNL, and close to the performance of perfect noise correction. In the presence of phase noise with bandwidth ∆ν = 5kHz, the sub-QNL performance of the receiver is maintained, and the effect of increasing Σ 2 ∞ is small compared to the effects of increasing phase or amplitude bandwidths.

IV. DISCUSSION
The numerical studies in this work show that methods for channel noise tracking based on NN estimators are able to accurately track dynamic phase and amplitude noise to allow an adaptive non-Gaussian measurement to maintain performance below the QNL. We note that in the asymptotic limit of many samples available for parameter estimation for noise tracking, a Bayesian estimator can achieve minimal mean-square error [63]. However, when noise tracking and correction need to be realized in real time to reduce errors in the state discrimination measurement and generate reliable data for parameter estimation, there is always a limited number of samples from which estimates are made. In these situations, there is a trade-off between estimation precision and noise tracking bandwidth. Other estimators, such as a NN, may balance these two parameters better than a Bayesian estimator for increased precision with finite samples. This property can enable efficient methods for high dimensional parameter tracking of complex dynamic channel noise.
The computational efficiency of the NN estimator is rooted in the small number of multiplications required to calculate a single estimate, the limited memory requirements, and in the fact that the NN method does not explicitly depend on the value of N or the number of adaptive steps L, as opposed to the Bayesian approach. For example, the NN-based estimator in this work requires ≈ 5500 multiplications. On the other hand, a Bayesian estimator using likelihood functions which are discretized into a 100 × 100 grid would require 100 2 × N × L = 10 6 multiplications, which may not be compatible with devices such as FPGAs [26,31]. While there may be methods to reduce this computational cost, the Bayesian estimator also would require storage in memory of the full photon counting likelihood functions, putting stringent requirements on the device memory. For example, the 100×100 grid for the Bayesian estimator with 16 bit precision would require 800 kB of memory simply to store the likelihood functions. Moreover, to extend the noise tracking method for estimation of three noise parameters, a NN would simply require a single added output and proper retraining, while a Bayesian estimator would require possibly 10 8 multiplications and 80 MB of memory.
The robustness and versatility of the NN-based noise tracking method described here, shows that NN-based methods can be practical and very useful tools for non-Gaussian receivers. In addition, other machine learning techniques, such a reinforcement learning [75,76], could provide a further benefits to these non-conventional measurements when the best detection strategy may be unknown or infeasible to calculate. We anticipate that neural networks and machine learning will have a great benefit for non-Gaussian measurements, just as these techniques have proven worthwhile for conventional measurement strategies [54][55][56][57][58][59][60][61].

V. CONCLUSION
We investigate the use of a neural network (NN) as a computationally efficient multi-parameter estimator of dynamic channel noise, enabling robust noise tracking for adaptive non-Gaussian measurements for coherent state discrimination. We study the NN-based tracking method for simultaneous amplitude and phase noise and find that the NN estimator can perform as well as a more complex Bayesian estimator. This performance is observed across a broad range of noise strengths and bandwidths for different average powers of the input coherent states. The non-Gaussian receiver used in this study can have broad applications in classical [26,27] and quantum communication [77,78] due to its ability to attain sensitivities beyond the quantum noise limit (QNL). Moreover, the proposed method for noise tracking uses only the data collected during the state discrimination measurement without requiring extra resources such as strong reference pulses. This makes the receiver and the proposed method for noise tracking well suited for energy efficient low power communications. Thus, NN based methods are ideal candidates for real-time tracking of multiple sources of channel noise for non-Gaussian receivers, allowing them to maintain their sub-QNL sensitivity in the presence of complex dynamic channel noise.
The source code is available at: github.com/UNM-QOlab/phase amp tracking nn Appendix A: Neural network estimator training To generate training data, we use Monte-Carlo simulations of the adaptive non-Gaussian measurement described in Section II(A) [26]. For a single training data element, the strategy is simulated using a randomly chosen intensity for the input state A and LO intensity B, both sampled from a uniform distribution U(0.05, 25.0). A constant value for the phase noise φ is then sampled from a Gaussian distribution N (0, σ = 0.25). In addition, we randomly sample the number of experiments N that comprise each sample from a uniform distribution U (2,200). This random sampling of the three parameters enables the NN to be used for different values of N depending on the noise characteristics. During the training of the NN, we use true values of the input parameters (φ, A) as the target values. This procedure using random sampling of multiple input noise parameters ensures sufficient sampling of the input parameter space, enabling training of a robust NN estimator To train the NN we use the Tensorflow framework [65] in Python. Table I summarizes the relevant NN and training parameters. We use the RMSprop optimizer [79] with a weighted mean-squared-error cost function. The weight w i of each training sample is given by: where A i , and B i are the intensity of the input states and LO of the i th sample, respectively. This allows the NN to accurately estimate the parameters when the LO and input intensities are close to each other, as one would expect in practice, while also being somewhat robust to large amplitude fluctuations.

Appendix B: Kalman filter
We implement Kalman filtering [74,80] of both the phase estimate and intensity estimate in order to reduce the uncertainty in the applied corrections to the adaptive non-Gaussian measurement. For the phase estimates, the predicted mean valueŷ φ and varianceσ 2 φ are: whereŷ φ represents the predicted average value for the phase, σ 2 φ the variance of the current prior probability distribution for the phase, σ 2 1 = 2π∆ν∆T , andσ 2 φ the predicted phase variance. The filtered estimateφ is then obtained from the raw estimateφ N N and (B1) by: where K φ is the Kalman gain for the phase estimate, σ 2 φ,N N is the variance of the NN phase estimate, and σ 2 φ is the updated variance of the filtered phase estimate. Similarly, the equations for the predicted input intensity mean valueŷ A and varianceσ 2 A are given by: where γ is the amplitude noise bandwidth, n 0 is the average input intensity equal to A(0), and B N is the LO intensity for the previous N experiments. These equations result from propagating the mean and variance of Eq. (1) for N time steps of duration ∆T . The filtered intensity estimateÂ and variance σ 2 A are then obtained from the raw estimateÂ N N and (B3) by: where K A is the Kalman gain for the intensity estimate and σ 2 A,N N is the variance of the NN phase estimate. The initial (τ = 0) variances for both phase and intensity are set to zero.
In order to empirically obtain the variances σ 2 φ,N N , and σ 2 A,N N we use Monte Carlo simulations of the experiment and the NN estimator without the Kalman filter. For all average intensities, we fix the input intensity to A(t) = {2, 5, 10} and phase noise to zero, and calculate the variance of 10 6 estimates. The variance is calculated for a range of the number of experiments per estimation N . We fit the variance as a function of N to a power law function so that the filter may be used for different values of N . For N = 10, as used in the simulations, σ 2 A,N N = {3.37, 5.63, 5.35} × 10 −1 and σ 2 φ,N N = {4.77, 3.61, 2.66} × 10 −3 for n 0 = {2, 5, 10}, respectively. As discussed in Section IV B, the NN estimator is not necessarily unbiased across the range of parameters it is trained on due to the small sample size of only N × L as well as imperfections in the NN training process.
Employing a Kalman filter allows for construction of the full NN-based parameter tracking method. Algorithm 1 shows the pseudo-code for running the NN-based method including the filtering steps. For every time step τ , a single state discrimination measurement is com- pleted, which yields detections {d j } L and relative phases {∆ j } L which populate the detection matrix D, as described in Section IIB. Every N time steps, i.e. every N measurements, the NN is evaluated to provide raw estimatesÂ N N ,φ N N of the intensity and phase offset within the previous N measurements. These raw estimates are then passed through the Kalman filter, which returns the current, filtered estimates for the intensitŷ A(τ ) and phase offsetφ. These current estimates are then used to update the LO parameters B(τ ), δ(τ ) for the next N state discrimination measurements.

Appendix C: Different magnitudes of amplitude noise
We further investigate the performance of the NN estimator when varying the long-time strength of the amplitude noise Σ 2 ∞ . Figure 6 shows the error probability of the receiver for n 0 = 5 across a range of Σ 2 ∞ for fixed γ = 25kHz without and with phase noise of bandwidth ∆ν = 5kHz. We find that the NN based tracking method performs similar to the one based on the Bayesian estimator, and enables an error probability below that of the ideal heterodyne measurement. We have observed in other studies that the behavior for average input intensities n 0 =2 and 10 is similar to the one for n 0 =5 in Fig. 6.

Appendix D: Estimation Time-Bandwidth Trade-off
The overall performance of the phase tracking also depends on the number of experiments N used to obtain a single estimate. For this study, we fixed N =10 for all simulations to demonstrate the versatility of the NN es- timator. Although, realistic implementations may use a particular channel with specific noise characteristics. In this scenario, the value of N can be fine tuned to optimize the performance of the phase tracking method to balance the estimation bandwidth (smaller N ) and estimation accuracy (larger N ). The optimization aims to find a value of N such that the overall error probability is minimized when implementing the noise tracking method. The Kalman filtering attempts to balance the effects of the estimation variance and the noise variance over N measurements in an optimal way through the Kalman gain K. However, there is still a trade-off between these two variances and a value of N which achieves minimal P E for specific channel noise conditions. Figure 7 shows the overall error probability as a function of N when implementing the NN estimator with filtering for three different sets of channel noise parameters for n 0 = 5.0. The blue line corresponds to ∆ν = 0.5kHz, γ = 2.5kHz, and Σ 2 ∞ = 0.1. The orange line corresponds to ∆ν = 5kHz, γ = 25kHz, and Σ 2 ∞ = 0.5. The yellow line corresponds to ∆ν = 50kHz, γ = 250kHz, and Σ 2 ∞ = 1.0. For small noise bandwidth and strength (blue), the optimal value of N (black circles) is approximately N = 40, but decreases to N = 10 and N = 3 as the noise bandwidth and strength increase (orange and yellow). The inset shows the error probability normalized by the minimum for each noise strength for clarity. These optimal values of N for specific channel noise parameters represent the optimal balance between estimation uncertainty and accumulated uncertainty from the channel noise. Thus, for a channel with known noise characteristics, an optimal value of N can be found. In the studies presented in the main manuscript, we fixed N =10, since we found that this value allows the receiver to be versatile and operate well across a wide range of noise bandwidths. In the inset, N = 10 is optimal for moderate noise levels. For small and large noise levels, the error at N = 10 is only slightly higher compared to their respective minimums. Thus, for a specific well known channel an optimal N can be implemented, but for a robust and versatile implementation a different value of N may be beneficial.