Reservoir Computing Approach to Quantum State Measurement

Efficient quantum state measurement is important for maximizing the extracted information from a quantum system. For multi-qubit quantum processors in particular, the development of a scalable architecture for rapid and high-fidelity readout remains a critical unresolved problem. Here we propose reservoir computing as a resource-efficient solution to quantum measurement of superconducting multi-qubit systems. We consider a small network of Josephson parametric oscillators, which can be implemented with minimal device overhead and in the same platform as the measured quantum system. We theoretically analyze the operation of this Kerr network as a reservoir computer to classify stochastic time-dependent signals subject to quantum statistical features. We apply this reservoir computer to the task of multinomial classification of measurement trajectories from joint multi-qubit readout. For a two-qubit dispersive measurement under realistic conditions we demonstrate a classification fidelity reliably exceeding that of an optimal linear filter using only two to five reservoir nodes, while simultaneously requiring far less calibration data \textendash{} as little as a single measurement per state. We understand this remarkable performance through an analysis of the network dynamics and develop an intuitive picture of reservoir processing generally. Finally, we demonstrate how to operate this device to perform two-qubit state tomography and continuous parity monitoring with equal effectiveness and ease of calibration. This reservoir processor avoids computationally intensive training common to other deep learning frameworks and can be implemented as an integrated cryogenic superconducting device for low-latency processing of quantum signals on the computational edge.

Rapid and high-fidelity single-shot readout is an important primitive for manipulation and processing of quantum information. In superconducting circuit quantum processors [1,2], this requires a careful calibration of the entire measurement chain, including cryogenic and room-temperature amplifiers, circulators, attenuators and room-temperature electronics. This calibration becomes particularly resource-intensive for readout systems attached to multi-qubit quantum processors. The optimization of quantum state readout has therefore been the focus of considerable ongoing research [3][4][5][6][7][8][9][10][11][12], involving a delicate balance of competing requirements: fidelity and speed.
For single-qubit readout, optimal filtering approaches [3] and hardware architectures have been developed and implemented to achieve fast and high-fidelity measurements without affecting qubit coherence [13]. More recently, recognizing that the quantum measurement problem in its very essence is the classification of time-dependent voltage signals acquired at the end of a measurement chain, machine learning solutions have been investigated [14][15][16], and have shown an increase in single-qubit state discrimination by a few percent with respect to these conventional approaches [14]. For measurement in multi-qubit systems however, the optimization and calibration of a readout system presents a difficult hardware design as well as a computationally intensive signal processing problem [5,7,8]; measurement crosstalk in particular imposes significant limitations on device scaling [8]. Here we propose the integration of reservoir computing as a novel hardware-efficient approach to high-fidelity multi-qubit state readout and its trainingbased calibration.
Reservoir computing is a machine learning framework for the processing of time-dependant data [17][18][19][20]. It is founded on the idea that any sufficiently complex and high-dimensional dynamical system, where only the linear output layer is optimized, can have the same computational capacity as a recurrent neural network and approximate arbitrary functions [21,22]. Vastly different physical systems have been employed as Reservoir Computers (RCs) for applications such as forecasting and classification [23][24][25][26][27][28][29]. The field of reservoir computing has recently expanded to include quantum systems [30][31][32][33][34][35]. However, the application of reservoir processing to the problem of quantum measurement has not yet been explored.
Our goal in this paper is three-fold: (1) Describe a reservoir computing approach to quantum measurement that utilizes a physical system with recurrent connections, (2) analyze its efficacy for fast and high-fidelity readout and monitoring of multiple qubits simultaneously, (3) propose a superconducting pre-processor based on a network of Josephson Parametric Oscillators to enable hardware efficient and low latency multi-qubit measurement. While we discuss this approach for a multiqubit superconducting platform, and a corresponding Josephson junction-based superconducting reservoir, we anticipate that the techniques are general enough to also be applicable for a broader class of quantum systems, measurement tasks, and reservoirs.
Conventional RC wisdom suggests that very highdimensional dynamical systems are necessary for strong computational performance, with 10 2 -10 3 nodes typically being used [23,36] in software or time-delay architectures (where there is less overhead associated with increasing the size of the network). Here, we show that a small physical RC (2-5 nodes) is able to classify two-qubit arXiv:2011.09652v3 [quant-ph] 3 May 2021 measurement trajectories with a fidelity that is higher than achievable under the same conditions with conventional optimal filtering approaches. Equally strong performance is seen across a variety of quantum systems and measurement tasks, without requiring any modification of the RC. This non-von-Neumann-architecture computer can be implemented in the same hardware platform as the target quantum system with minimal overhead, providing a uniquely low latency approach to quantum measurement. An important conclusion of our study is that the Kerr network RC we consider learns significantly faster than a readout system that is calibrated using an optimal matched filter. Our results indicate that a cryogenic readout device should provide a rapid, robust and autonomous pre-processor for quantum state measurement. Such analog processors are capable of operating on timescales orders of magnitude faster than digital processors in head-to-head comparison on the same computational task [37], and enable signal processing on the 'computational edge,' [20,28] significantly reducing computational costs.
In addition, we demonstrate that the RC provides a model-independent approach to quantum state measurement, ideal for multi-qubit systems whose readout chains are projected to become increasingly more complex. The readout problem we consider here is one of retrodicting certain features of the initial state of a measured Quantum System (QS), based on information obtained after a specific quantum process, such as the scattering of a probe pulse off of the QS. Within superconducting circuit implementations, the most widely-employed readout setup is that of quantum non-demolition (QND) dispersive measurement [3][4][5][7][8][9]; however, actual hardware implementations exhibit non-QND effects and experimental imperfections such as drift and cross-talk. Such non-idealities are difficult to optimize in hardware and require several calibration experiments to characterize, making precise knowledge of the implemented physical model difficult to acquire. This lack of an accurate physical model generally rules out a description of the measurement chain via a stochastic master equation (SME), integration of which would predict precisely the measurement signal obtained given any initial state of the QS.
The difficulty of extracting the implemented physical model increases the appeal of model-independent approaches, such as linear filtering of the experimental data (discussed in Sec I); however this typically requires a large amount of training data and is susceptible to errors from quantum jumps and qubit decay. An alternative model-independent machine learning approach has been taken in Ref. 15, although a practical application to multi-qubit measurement has yet to be demonstrated, and will likely be limited by computing capacity. In this paper we apply an RC to a scenario in which constraints on QNDness and cross-talk in multi-qubit readout are relaxed, thus simulating quantum measurement with unoptimized hardware where the complexity of readout is relegated to the processing of acquired signals. We find that the RC is able to perform said processing with high fidelity and minimal computational cost, enabling a powerful model-free approach to readout. Specifically, we consider a situation where two qubits are measured simultaneously through a common resonator, without dedicated readout cavities and Purcell filters; our objective is not to propose this particular measurement scheme, but rather emphasize the reduced hardware and optimization overhead, and thus increased potential scalability, enabled through our reservoir processing approach generally.
We begin in Sec. I with a description of the joint dispersive readout task we consider in this work. We then give a high-level overview of our proposed RC quantum measurement system in Sec. II A, followed by a detailed description of the Kerr network RC model in Sec. II B Typical dynamics and performance of a specific Kerr RC classifying quantum measurement records are presented in Sec. III A. We then demonstrate the ability of a Kerr RC to rapidly learn a quantum measurement task in Sec. III B, enabling fast readout calibration. In Sec. III C, we use an analysis of the Kerr RC phase space dynamics to explain the strong performance of RCs with as few as two Kerr nodes and develop an intuitive picture of RC processing. In Sec. III D we explore how behaviour varies with system hyperparameters and present basic principles for the optimization of a hardware RC. Finally, in Sec. IV we demonstrate how one can operate this reservoir processor to perform two additional important quantum information tasks: two-qubit state tomography and continuous parity monitoring.

I. QUANTUM STATE READOUT OF MULTIPLE QUBITS
Multi-qubit readout presents a sufficiently difficult problem to quantitatively assess the advantage provided by more sophisticated signal processing techniques. We therefore discuss this problem in some detail below, leaving some of the mathematical details for Appendix II. Extensions of single-qubit quantum readout approaches to larger multi-qubit systems through various multiplexing techniques have been extensively investigated [5][6][7][8]. A majority of these schemes rely on the premise of quantum non-demolition (QND) measurement through the dispersive readout technique. In the single-qubit variant, the binary state of the qubit (|0 , |1 ) is encoded in the amplitude and phase of a microwave pulse scattered from a readout resonator that is dispersively coupled to the qubit [4].
A significant problem that scales very unfavourably with system size in multi-qubit devices is cross-talk, which we take here to be any effect of the measurement process on parts of the system one is not trying to measure. Reducing readout errors generally requires precise calibration of readout pulses, carefully designed Purcell filters and a chip layout that minimizes cross-talk [7,8]. Such optimized calibration is difficult for a practical multi-qubit quantum processor due to drifts in system parameters, and reducing cross-talk imposes severe limitations on readout spectral bandwidth. Here we consider the joint dispersive readout scenario [5,6] where all qubits are coupled to the same mode of a common readout resonator, with the goal of measuring the combined state of all qubits in a single shot.
Our starting point is the multi-qubit Jaynes-Cummings (JC) Hamiltonian: Here,d andσ j are cavity field and qubit Pauli operators respectively, and (t) describes the amplitude of a coherent drive applied at carrier frequency ω d . We are in a frame rotating with ω d : ∆ c = ω c − ω d and ∆ q,j = ω q,j − ω d are the cavity and qubit detunings respectively. The cavity qubit coupling, with strength g j , is treated in the rotating wave approximation. The JC Hamiltonian is accurate for weak readout pulses (t), where the role of other qubit energy levels can be ignored [38][39][40]. If the qubit-resonator detuning δ j = ω q,j − ω c is large, the cavity population remains below a critical photon number d †d min(|δ j /2g j | 2 ) for a sufficiently weak drive. Eq. (1) can then be perturbatively transformed [41], yielding what we refer to here as the dispersive model: valid to second order in g j /δ j . Here χ j = g 2 j /δ j describes the dispersive shift of the cavity frequency due to the state of qubit j and∆ q,j = ∆ q,j + χ j is the renormalized qubit detuning. The effective coupling between qubits via their shared cavity is J jk = g j g k (δ j + δ k )/2δ j δ k , a manifestation of cross-talk.
We denote the multi-qubit state |ψ(t) = z c z (t)|z , where |z = |z j ⊗Nq represents the z-basis state of each qubit as a binary digit (σ z |0/1 = ∓|0/1 )). We consider the standard measurement process here, where the cavity is initially in the vacuum state and a coherent drive is applied at t = 0: (t) = 0 Θ(t). The X-quadrature of the cavity follows a unique trajectory for each multi-qubit state, and by measuring d +d † (t) one seeks to determine |ψ(0) . In this work, we specialize to the case of two-qubit readout (z = {00, 01, 10, 11}), which is later seen to be a non-trivial classification task. We consider both the dispersive and JC models with following parameters in units of the cavity decay rate κ: ∆ c = 0, 0 = 2, χ 1 = 1.8, χ 2 = 1.3, ∆ q,1 = 180, ∆ q,2 = 130, g j /δ j = 10 −1 , and include additional qubit decay with rate γ h = 10 −2 .
These are all physically plausible parameters for current superconducting circuit implementations of this system.
A thorough discussion of the joint dispersive measurement process is contained in Appendix II; in the remainder of this section we summarize the salient results. Figure 1(a) depicts the expected readout cavity evolution d +d † (t) for each initial qubit state |z in the measurement basis under the dispersive (Ĥ D ) and JC models (Ĥ JC ). In both cases, the cavity evolves to distinct steady-states over a timescale set by κ. The difference between these models is manifest in the qubit evolution shown in Fig. 1(b). Here we plot the expected probability that the system will be measured to be in the multiqubit state it was prepared in: |c z (t)| 2 = | z|ψ(t) | 2 , for |ψ(0) = |z . The decay of initially excited states can be seen to be significantly faster for the JC model. In the dispersive model the qubit state evolution is due to J 12 and γ h , and the corresponding timescales are taken to be slow relative to the system dynamics. As J 12 /κ, γ h /κ → 0 the measurement process becomes QND since the qubit state is conserved, where in a perfect QND measurement |c z (t)| 2 = 1. In contrast, the JC interaction ∝ g j does not commute withσ z,j , and this fast Hamiltonian evolution causes information about the initial qubit state to be lost more quickly during measurement. We consider the situation where the output cavity field X-quadrature is continuously monitored via homodyne detection; the QS (Ĥ S = {Ĥ D ,Ĥ JC }) then evolves under the stochastic master equation (SME) [4,42]: In the above, the dissipative and measurement super- The SME describes the evolution of the QSρ conditioned on the observed measurement outcome J(t). The outcome of a measurement is the continuous classical current where ξ(t) is white noise, arising from fundamental quantum uncertainty in the cavity state: ξ(t) = 0, ξ(t)ξ(t )) = δ(t − t ). In the above, the subscript c denotes expectation values taken with respect to the conditional stateρ. These quantities evolve stochastically during individual measurements: samples of J(t) and d +d † c (t) are depicted in Fig 1(c). The measurement signals J(t) are dominated by noise ξ(t), and the measurement process produces backaction on the quantum state, resulting in, for example, the sudden jump seen for the JC sample. By taking the ensemble average of many measurement records however, one recovers the unconditional system dynamics of Fig. 1(a), described in Appendix II. Our quantum measurement data is constructed by numerically integrating the SME of Eq. (3) from initial states ρ(0) = |0, z/ \0, z| using QuTip [43]. Each trajectory q has a unique noise record ξ (q) (t) and thus conditional expectation values Ô (q) c and measurement signal J (q) (t). The individual measurement currents J (q) (t) have a small signal-to-noise ratio (SNR) due largely to the additive white noise term in Eq. (4), obscuring the relevant conditional evolution, particularly when the cavity photon number is kept low to keep the measurement in the QND regime. In particular, for the chosen parameters the steady-state measurement current SNR are −6.8 dB and −7.1 dB for the dispersive and JC systems respectively. Further signal processing is thus needed to extract the underlying initial qubit state; conventionally, this is done by constructing a matched filter (MF) from a large set of measurement currents for which the initial qubit state is known: U (t) = {J (q) (t)} [3,8]. Specifically, a MF is a linear filter with kernel h(τ ) = U * (t − τ ) , the conjugated and time-reversed mean of the training set. For an input consisting of a signal plus white noise, the output approaches the auto-correlation function of the signal: This maximizes the SNR; a MF is the optimal linear filter for distinguishing signals, such as we study here, with additive noise. For the two-qubit readout system, the matched filter is constructed by averaging the absolute value of all four sets of mean outputs, which maximizes the overall fidelity with which initial states can be distinguished. The more sample trajectories used to 'train' the MF, the better h(τ ) = U * (t − τ ) is expected to represent the underlying signal, improving the filter performance.
The filter is used to define an expected bin for each filtered signal as a function of time, and quantum states are classified according to which bin they fall into. The MF classification process is depicted in Fig. 1(d); the filter is seen to remove the white noise from the homodyne signals, and the sample from the dispersive system falls into the correct bin reasonably quickly. The readout process has a more significant influence on the qubit state in the JC system, which in this case causes the first qubit to decay at t ∼ 6/κ. This results in the cavity state suddenly jumping as well, and the filtered output falls into the wrong bin at later times as a result. This loss of initial state information, at rates depicted in Fig. 1(b), places a limit on maximum fidelity with which signals can be classified, since the task is to learn |ψ(0) , not |ψ(t) .

A. Proposal and Overview
Having discussed the conventional approach to quantum state measurement, we will now describe our proposed RC approach, depicted schematically in Fig. 2. Generally, the RCs task is to classify the state |ψ(0) of the target QS, based on a quantum measurement that results in a noisy readout signal containing information about this initial state. We specialize to the QS described in the previous section and thus homodyne measurement current of Eq. (4) in this article, but note that our approach can be applied to entirely different systems and measurement modalities.
Instead of conventional processing via roomtemperature electronics and a software backend, the readout signal is fed into a hardware reservoir processor via a fixed linear input matrix W I . In our specific realization, this RC consists of a network of Kerr nonlinear oscillators. These nodes then evolve according to this input and internal structure defined by a connectivity matrix W R and nonlinearity vector Λ. This results in a complex dynamical mapping: the physical state of the RC x(t) is a high-dimensional nonlinear function of the input history u(τ < t). The output of the computation is a linear combination of the RC nodes y(t) = W o x(t), where the matrix W o can be trained such that y(t) approximates a desired function of the input F (u(τ < t)).
Only this output layer is trained; the training step is thus a simple convex optimization problem with a FIG. 2. A schematic of our proposal to use a hardware RC to process quantum measurement signals, here shown for K = 5 Kerr nodes. The QS is interrogated, and the resulting measurement record u(t) is input to the reservoir. The input layer WI randomly couples this signal to each node of the Kerr network, whose subsequent dynamics are a nonlinear function of u(t) and the network itself via Eq. (8). The output layer performs classification by measuring learned linear combinations of nodes to compute the probability that the input history corresponds to each underlying quantum state.
limited number of parameters, which is guaranteed to converge [18,25]. This is in contrast to the 'vanishing gradient problem' that plagues training of other neural networks [44]. Although it may seem like training only output weights W o would lead to inferior results for time-series processing, head-to-head comparisons between state-of-the-art recurrent neural networks and RCs show surprisingly similar performance [45][46][47], despite RC training protocols being 10 3 to 10 6 times quicker.
Since the internal structure is not optimized, RCs can be quickly retrained for different tasks, resulting in them being a powerful and generalizeable computational tool [20,23,29,37]. For our task, the RC is trained by preparing the QS in known initial states and then performing readout. This allows one to optimize linear combinations of Kerr RC node quadratures, which are measured such that the output of the RC correctly assigns the quantum state. After training, the classifier continuously outputs the probability P (t) that the measurement record up to the the present time corresponds to each underlying initial quantum state. The RC thus replaces the filtering and binning step of the quantum measurement process; its efficacy in doing so will be discussed in Sec. III.
The Kerr network reservoir we propose may be implemented in a superconducting circuit platform via a network of coupled Josephson Parametric Oscillators (JPOs). These JPOs can be either single Josephson junctions or composite elements such as Superconducting Nonlinear Asymmetric Inductive eLements (SNAILs) [48]. The recurrent connections can be flexibly gener-ated by coupling the JPOs to a common electromagnetic resonator mode. Variants of such networks have been considered as hardware for superconducting quantum annealers [49][50][51] and for stabilization of multi-qubit entanglement [52,53]. Such an RC then could be integrated with the QS to be measured, sharing a cryogenic environment. For the present work we require that the JPOs are in the weakly nonlinear regime, that their scale of nonlinearity is much smaller than their dissipation. This is the regime of Josephson parametric amplifiers [54]. Another particularly interesting platform to realize this hardware RC is an optical Kerr network, which would then be wellsuited to the readout of optical QSs. An important advantage of either of these implementations of Fig. 2 is that the RCs will operate at the timescales of the measured quantum system, faster than conventional FPGAbased electronics and potentially allowing for real-time analog processing.

B. Kerr Network Reservoir Computer Model
The reservoir, consisting of a network of coupled Kerrnonlinear oscillators, is described by the master equation: where the governing HamiltonianĤ RC takes the form: The input to the RC is a collection of signals u(t) with a common carrier frequency ω d . Each nonlinear oscillator is described by a field operatorb k , with detuning ∆ k from the carrier frequency and Kerr nonlinearity λ k . g kl and ε km are the linear couplings between oscillators and to the input respectively, and γ k is the energy decay rate. Inter-oscillator couplings can be generated by cavity-mediated interactions [52,53] or through parametric means [51]. The evolution of the field amplitude from each node is given by the Heisenberg equation of motion: We will consider Kerr networks where the field amplitudes are sufficiently large that they are in the classical regime. For c ∈ R + , defining scaled drive strengths ε km = √ cε km , nonlinearityλ k = λ k /c, and introducing Heuristically, this indicates that for c → ∞, where the nonlinearity grows weaker and the 'classical' occupation |β k | 2 = c| b k | 2 becomes simultaneously larger, quantum correlations captured in higher-order moments can be neglected. In this case, Eq. (7) becomes: Without loss of generality, we have chosen the nodal decay rates to be identical γ k = γ to define the dimensionless parameters, familiar to the RC framework: Eqs. (8) defines the set of ODEs governing the RC response to a given input signal u(t) for a K-node RC.
As mentioned in Sec. II A, the philosophical underpinning of reservoir computing is that if one has a sufficiently complex and high-dimensional system, there is no need to optimize its many internal parameters for a specific computational task. As such, here we consider random Kerr networks, whose internal structure and dynamics, specified by W I , W R , Λ and γ via Eq. (8), are set randomly and not individually optimized. Instead, the RC properties are controlled by the scale-independent hyperparameters {γ, α,Λ, µ}, where: and λ max (W R ) refers to the maximum singular value of the connectivity matrix W R . Here [a, b] defines a uniform distribution with probability density within the limits (a, b), so that the various Kerr RC internal parameters are obtained by randomly sampling appropriate uniform distributions. We constrain the ranges of these hyperparameters to be compatible with the proposed hardware realizations, while also importantly enabling desired RC evolution properties of fading memory, separability and nonlinearity; their selection is discussed in more detail in Sec. III D. We will see throughout this work that the generic behaviour of an RC is well-quantified by its hyperparameters, and that performance is robust to both network structure and variations in these values. We have also introduced significant variation into node decay rates γ k and observed that the RC performance is again unchanged. The output of the Kerr RC is a linear combination of the measured quadrature variables x φ k k (t) of its nodes. These quadrature variables are defined in terms of the 2K-element vector of complex RC node amplitudes: such that x φ = C(φ)x defines the RC node quadratures measured. The orthogonal quadrature then constitutes the 'hidden variables' which do not contribute directly to the output. The RC output y(t) can then be expressed where W o is a C × K dimensional matrix of output weights, which together with the K element vector φ = (φ 1 . . . φ K ) of measurement angles defines a total of (C + 1) · K parameters to optimize in the RC training. Finally, specific to the task of classifying C states of the QS, it is appealing to map the RC output to a probability that the input to the RC corresponds to the QS initialized in |ψ(0) = |z . This is achieved by applying the 'softmax' function (Boltzmann distribution) to the trained RC output: which maps the RC outputs to mutually exclusive probabilities that sum to unity. Details of and background on RC training are discussed in Appendix III; we briefly summarize some of the salient aspects here. We first construct a training data set consisting of Q measurement currents U (t) = {J (q) (t)} for a measurement period τ m for associated known initial states y (q) = z (q) (q = 1, · · · , Q). M measurements are conducted for each initial state by integrating the SME of Eqns. (3) and (4). The RC node trajectories are simulated by solving Eq. (8), and the resultant states are used to minimize the multinomial cross entropy cost function L x (Appendix III), optimizing W o and φ for the training set. We consider small RCs (K = 2 − 5) and small training sets (Q < 100). Therefore, the optimization of the cross entropy loss function on a digital computer is a quick convex optimization problem with a small set of output parameters. We stress that this network size is orders of magnitude smaller than typical RCs, a choice made for hardware-realizability. After training, the continuous RC output from Eqs. (12) and (13) provides the probability P z (t) that the observed record J (q) (t) up to the current time corresponds to the initial state z (q) .

III. RESERVOIR PROCESSING OF QUANTUM MEASUREMENT
We now analyze the performance of the proposed Kerr network RC on the two-qubit readout task described above. We focus on the ability of an RC trained with a small labelled training set to classify a much larger set of unknown test signals (quantum states) from either the dispersive or JC system (Ĥ S = {Ĥ D ,Ĥ JC } in Eq. (3) respectively). An obvious metric to evaluate the performance is the 'classification accuracy' C F (t), which refers to the fraction of test signals the RC correctly classifies at a given readout time τ m . However, classification accuracy does not increase monotonically with readout time; as indicated in Fig. 1, the initial qubit state stochastically evolves and can be lost during the measurement process, an outcome that is particularly likely for the JC system. After this point J(t) no longer faithfully provides information about |ψ(0) , imposing a steadily decreasing ceiling on C F (t) (as can be seen in Fig. 4).
In practice, one would simply stop the measurement process and RC computation at the time C F (t) peaks; this time is consistent for a given QS being measured. Thus, we define the 'Classification Fidelity' F of an RC or filter to be this peak accuracy and will seek to optimize this metric when considering RC design in Sec. III D. Since the maximum possible C F (t) decays with time, F combines both speed and accuracy of classification. We compare the RC performance against that of conventional filtering. In particular, we consider a boxcar filter (BF), which amounts to integrating J(t) to remove noise, a matched filter (MF) constructed using a training set of size Q of labelled homodyne currents, and a MF constructed using the analytic solution d +d † z tô H D . For the dispersive system, this corresponds to the Q → ∞ limit of the MF constructed from homodyne currents. In many realistic scenarios, the system model and parameters are not known exactly or are subject to experimental drift, and attempting to construct an analytic MF is impractical. Instead, either the BF or a finite Q MF is used because they (like the RC) do not require a model of the underlying system; for the MF this requires regularly producing large training sets every time the device is re-calibrated.

A. Reservoir Dynamics and Classification Fidelity
The response of a representative K = 5 node RC to a quantum readout signal from the dispersive system is depicted in Fig. 3. Here (and in Fig. 4 to follow) the RC has hyperparameters γ = 0.7κ, α = 1.9,Λ = 5 × 10 −2 , and µ = 5. The RC is driven with measurement signals u(t) = J (q) (t) with a total measurement time τ m = 10/κ and has previously been trained with Q = 40 readout signals (recall that this is M = 10 samples for the four states) to optimize {W o , φ}. In Fig. 3(a) we show one such noisy signal for an unknown (to the RC) quantum state |ψ(0) = |11 . Once the readout drive is turned on, the noisy input u(t) acquires a non-zero mean. This drives the network away from its rest state x = 0 to a non-trivial state in its phase space, as can be seen via the x φ trajectories in 3(b). Real time classification is performed by reading out each node continuously: the output P (q) z (t) are shown in 3(c). After only t = 3/κ, a timescale shorter than that over which the cavity reaches steady state (see Fig. 1(a)), P (q) 11 > 0.5, and the RC thus has quickly and correctly classified this sample quantum state. The low measurement SNR while the cavity is being populated is responsible for the RC initially not  z (t)| 2 = tr{ρ(t)|z/ \z|} is also depicted, and we see that the RC classification P (q) 11 (t) is robust to the significant probability amplitude fluctuations: the system almost jumps to the state |01 for this specific trajectory, but P (q) 11 remains saturated. The ability of this RC to classify unknown quantum states is quantified in Fig. 4, where we evaluate its performance on a test set of 1200 unknown quantum signals generated from the dispersive and JC SMEs, and compare with that of various linear filters. The RC is trained as above using a Q = 40 measurement set for the corresponding QS, while the MF is constructed from a much larger Q = 1200 set. We plot the classification accuracy (fraction of test signals classified correctly) as a function of readout time, for both the RC and the filters; it is apparent that the RC is able to rapidly and reliably extract the initial quantum state and thus perform the readout for both QSs. For the dispersive system both the RC and the MFs have a classification fidelity F > 0.96, achieved for a readout time τ m ∼ 6.7/κ. This fidelity is limited by non-QND dynamics that result in the initial state information being lost. As previously noted, measuring beyond a model-dependent optimal time thus increases the odds of an incorrect classification, causing the decrease in accuracy seen at longer times for all methods in Fig. 4. Conversely, at shorter readout times the SNR is much lower: the measurement cavity is still being populated, so the output signal is dominated by shot noise. A dip in the MF performance is seen around 2.5/κ; the expected filtered signals for |1(0)0 and |1(0)1 cross for this specific measurement time, meaning the MF has no information about the second qubit. This occurs in the BF as well at a slightly later time. The RC avoids such a problem by mapping the qubit state into its higher-dimensional phase space. As described in Sec. I, initial state information is lost more quickly for the JC system due to non-QND Hamiltonian evolution. Despite this, the RC is able to accurately process these quantum measurements, with a classification fidelity of 0.92, exceeding that of any linear filter. The optimal measurement time is slightly later, at ∼ 7/κ. This is due to the decreased SNR for the JC readout task: √ κ d +d † is reduced relative to the dispersive case and there is increased measurement backaction noise (see Fig. 1(a) and (c)). As noted earlier, these measurement currents have SNR of −6.8 dB and −7.1 dB for the dispersive and JC systems respectively when the cavity is in steady-state. We can decrease the SNR further by introducing a measurement efficiency η ≤ 1 ( √ κ → √ κη in Eqns. (3) and (4)). For both systems the fidelity of the RC classification decreases steadily with increasing relative noise strength, but remains comparable with that of the MF. This performance is in no way unique to the specific random realization of the RC network; the average classification accuracy of 10 random K = 5 RCs is also shown for both QSs in Fig. 4. These RCs have different structure (W I , W R , Λ), but the same hyperparameters and are trained on the same Q = 40 training set. The peak classification accuracy occurs at different times for different RCs, resulting in the average curve being artificially flattened. The average classification fidelity of the 10 different RCs is 0.951 for the dispersive system and 0.905 for the JC system (for Q = 100, this increases to 0.915), indicating the robustness of the RC approach. A few of the random networks experience a sharper drop in classification accuracy after the optimal readout time: this is because the RC states corresponding to different input signals are less separated in phase space. In the next section we explore the role of the RC evolution in its phase space in more detail. The results of Fig. 4 suggest that an RC is capable of matching the performance of an MF constructed with a much larger training set. We find that RCs hold this training-cost advantage over a MF generally, and demonstrate its dramatic nature in Fig. 5. Here, we plot classification fidelity as a function of the number of training signals Q used to train 10 random K = 5 RCs from Sec. III A. Each specific network is indicated in a different color, with the network emphasized in Figs. 3 and 4 highlighted with a plus. The average classification fidelity across the 10 networks is also shown. The RCs perform remarkably well at low Q, with Q = 4 readout signals (just one for each qubit state) being sufficient to achieve an average fidelity of 0.90. This average fidelity increases and the variance in performance across networks decreases as the training set grows, up to ∼ 40 for the dispersive system (Q ∼ 80 for the JC). Beyond this point, the small fluctuations in fidelity are due to the finite test set (1200 signals) used to calculate F. The RC thus achieves optimal performance with a small Q = 40 − 80 training set for both the dispersive and JC quantum systems, demonstrating its efficacy for rapid measurement calibration. In contrast, the MF performs very poorly in this regime, needing Q of O(10 3 ) (note the horizontal axis scale) to converge to the average RC network classification fidelity. This significant advantage is particularly relevant for readout of online quantum processors for which calibrations need to be done regularly: an RC-based measurement scheme would thus require far fewer initialization and measurement runs to train than are needed to construct an MF of comparable performance. This can enable a resource-efficient readout calibration system that can also be robustly automatized. The RC network can easily and quickly be re-trained as conditions or even the target QS change, facilitated by the computationally inexpensive software component of training, particularly for small training sets.
We note that while high-level metrics such as classification fidelity, used widely in the RC literature, are ideal for quantifying the performance of the RC, they do not elucidate the fundamental source of its classification power and fast learning ability. To address this, we carry out a detailed analysis of the phase space dynamics of the Kerr RC nodes in the following section, directly connecting these results to the observed performance.

C. Phase Space Dynamics
Generally, the effectiveness of the RC approach is attributed to the expressive power of its high-dimensional state space [36,47]. In the present discussion, the RC under scrutiny transforms a scalar input signal into the 2K-dimensional state space of the RC {β k , β * k }. It is also well-understood that the nonlinearity of the RC plays a crucial role [21,23,29], but the underlying mechanism behind how these and other RC properties impact the measurement task are not immediately clear. To perform a fundamental analysis of Kerr RC dynamics and gain unique insight into its previously-described performance, we introduce the Measured Section (MS) of the reservoir. Generally, it is difficult to visualize the dynamics in the high-dimensional RC state space. Recalling however that we only measure a certain quadrature of the RC oscillators x φ (Eq. (12)), these non-hidden RC nodes evolve in the MS, a K-dimensional subspace of the full 2Kdimensional phase space. As the K angles φ k have been optimized (Sec. II), this phase-space projection contains the relevant information for the computational task.
With the aid of RC dynamics in the MS, we will show that two non-hidden degrees of freedom are sufficient to perform a four-state classification task, provided the RC is sufficiently nonlinear. For this low-dimensional (K = 2) reservoir, the classification process can be conveniently visualized in the MS, as presented in Fig. 6. We consider the response of two RCs, both trained with  6. (a) Classification dynamics in the measured subspace for a K = 2 RC with parameters γ = 0.2, α = 1.9,Λ = 5 × 10 −2 , and µ = 5. In (b) we show the same network but with Λ = 0. Regions in RC phase space are colored according to their learned classification output, e.g. the region in (x φ 1 1 , x φ 2 2 ) space for which P11 is largest is filled in blue. The separating lines constructed from the trained Wo are also shown. Dots indicate the final RC state generated by each signal in the test set, which are colored according to their ground truth |ψ(0) = |z . Therefore, when the colors of the dots and the underlying region are the same, the classification result is correct. Q = 40 measurements, plotting the final reservoir state (at τ m = 6.7/κ) in the MS for each signal in a test set of 1200 signals, color-coded with its true |ψ(0) . The RC on the left is nonlinear withΛ = 0.05 and exhibits a F = 0.96 on the shown test set, while the RC on the right is linear (Λ = 0) and only attains F = 0.75.
The training task of learning {W o , φ} by minimizing a cost function (detailed in Appendix III) is equivalent to finding hyperplanes in the full RC phase space which separate the regions in Fig. 6 in a manner that maximally distinguishes the RC states corresponding to different inputs. After optimizing the MS by choosing φ, we see that equating rows j and k of the RC output defines the hyperplane in MS separating classes j and k: W o x φ j = W o x φ k . In the K = 2 case, these hyperplanes are simply lines, which separate the regions of the MS into four classes, shown in Fig. 6 as color-coded sections. Note that due to the symmetry of the dispersive readout classification problem, whereby d +d †

11/10
− d +d † 00/01 , two pairs of the four separating lines (thin lines in Fig. 6) fall almost on top of each other.
We note that the dynamics of the Kerr RC demonstrate the four properties required of a reservoir, such that it forms an effective RC [18,21,22,28]: separation, approximation, fading memory, and nonlinearity. Separation is the requirement that different input classes map to distinct regions of phase space, while approximation ensures that input series which are close generate RC states which are similarly close. Both these properties are manifest in Fig. 6, where the final reservoir states for each class fall into separated regions in the MS, and the final output is robust to noise in individual trajectories.
These regions are statistical steady-states: the ensemble average of the RC states for each input are in the vicinity of their fixed points, with limited diffusion resulting from individual stochastic trajectories. The fading memory property requires that the current RC state depends on the history of the input signal, with an increasing importance placed on more recent inputs. Here, the final RC state depends more strongly on the steady state readout signal, where the SNR is higher and QS states can be more easily distinguished, aiding classification. At the same time, the RC state does not depend entirely on its most recent input, making the classification result resilient to sudden changes in the input signal, for instance those due to qubit decay.
The final requirement of a RC is of some degree of nonlinearity in its dynamics or output layer for nontrivial computation. For the linear RC, note that the classes approximately lie on a line in the MS. The linearity of thē Λ = 0 RC enables its dynamics to be solved analytically: where δ k are the real parts of the eigenvalues of W R , and c jk is the product of the projections of eigenvector k onto node j and the input W I . In the steady state, the different nodes x j are now effectively scaled and rotated copies of each other. The information capacity of classifying the final RC outputs is then no different from that of classifying the input signals themselves with a linear classifier. By comparison, the role played by the Kerr nonlinearity to provide high fidelities for this task is evident from Fig. 6. The RC's nonlinearity 'shears' the high-amplitude readout signals (associated with states 01 and 10) out of the line connecting the low-amplitude signals (states 00 and 11). The measured RC quadratures are then linearly independent and not trivially related to each other or the input signal, in contrast to the linear RC. The nonlinearity of the Kerr reservoir allows it to utilize its dimensionality, forming up to K linearly independent outputs, rather than being bound by the dimension of the input signal. Nonlinearity, together with the other dynamical RC properties satisfied by the Kerr RC, thus allow the four different classes of input signals and their resultant RC state distributions to be linearly separated via the output layer. That the Kerr RC satisfies these properties is by no means guaranteed in general, but a result of the RC hyperparameters we have chosen. In particular, it is strongly dependent on the relative strength of the nonlinearityΛ and the input signal scaling factor µ, as discussed later in Sec. III D.
Finally, we discuss why the RC appears to require fewer training signals than an experimentally constructed MF to attain high-fidelity classification. In Fig. 6 we show the response to a single input trajectory for each class of input signal in black. Even though these are unlabeled, it is clear what their ground truth initial quantum state was; an effective RC is able to integrate out the noise in the readout signals, enabling efficient separation of the input signals into the corresponding color-coded phase space distributions. These trajectories are infor-mation dense; each time point functions as a new point of training data. Hyperplanes drawn to separate these trajectories are found to be differ only slightly from those shown on the plot, constructed using Q = 40 trajectories, thus indicating that training from a small set of measurement signals can yield comparable performance to training using a larger training set. This is in stark contrast to the MF approach, which uses the noisy training data to construct a time dependent kernel, as opposed to static hyperplanes. The MF needs to be able to separate signals at each point in time, and so a large number of training sets are needed to produce an estimate of the mean input at each time which is not dominated by noise. It is then reasonable to ask why these linear filters, which perform only a linear operation on the input signal, are able to perform the classification task with high fidelity, but a linear RC is not. The answer lies in the classification step: the continuous filtered signal y (q) (t) is mapped to a discrete class label z by comparing which expected signal y (q) (t) is closest to, via a 'distance' calculation that introduces the necessary nonlinearity. The RC approach is very different: the nonlinearity occurs in the dynamical signal processing, and the output classification step y = W o x φ is linear. The use of a nonlinear classification step would then supply the required nonlinearity and enable the linear RC to perform comparably to a nonlinear RC.

D. Optimization of the Kerr Reservoir
In the previous sections, we have investigated the ability of specific Kerr RC networks to perform a quantum measurement task. We also demonstrated that this performance is not particularly dependent on that network structure by completing the same task with a set of random networks that share the same hyperparameters. We now explore the role of these hyperparameters in determining RC performance, presenting the dispersive system readout fidelity of random RC networks as a function of γ,Λ, and µ in Fig. 7.
In Fig. 7(a), we plot F for 10 random RCs with K = 2, 5, 10 (blue, green, and brown respectively), but all sharing the same hyperparameters. We vary only the hyperparameter γ, which sets the rate at which the RC nodes evolve and thus the timescale over which it samples the input signal. This system response time is typically only considered for hardware RCs; in software approaches an RC conventionally evolves under an update that is equivalent to γ = 1/∆t [18]. From Fig. 7, we see that it is important for γ to be approximately matched to the timescale of the input signal's evolution [28]; for the quantum readout task, the signal √ κ d +d † c evolves at rate κ. To understand this relationship, consider first a slow RC, with γ ≤ 0.1κ: the RC then responds to the input signal averaged over a large window, β j ∝ t dτ e γ/2(τ −t) u(τ ) and it is consequently more difficult to distinguish between different signals on the timescale over which the measurement is done. This slow evolution furthermore results in very little displacement from the initial RC state β = 0 over this 10/κ measurement window. Since the nonlinearity is ∝ β 3 j , the low node amplitude also results in the RC being effectively linear.
There is a second timescale in the input signal: that of the noise ξ(t), at the sampling rate of the quantum measurement ∆t. This defines an upper limit on γ for good performance; it is advantageous for the RC to respond to the much slower underlying quantum signal rather than the rapid white noise, and so one should have γ 1/∆t. This allows the RC to average over some of the white noise, improving its performance. As seen in Fig. 7, γ ∼ 0.2 − 0.8κ is the roughly optimal range for the dispersive readout task, allowing the RC to still respond to signal dynamics while integrating out much of the readout noise.
Quite generally, with increasing K the performance of the Kerr RCs becomes more robust to variation in hyperparameter values and to randomness in the structure (W I , W R , Λ) of the RCs. For the optimal range of γ ∼ 0.2 − 0.8κ, most of the RC networks have F ∼ 0.95 (effectively the theoretical limit). Note however that out of the 30 simulated RCs for each γ, all the poorly performing networks in this range have K = 2 (blue). This can be understood as follows: since only two Kerr nodes are sufficient to perform the dispersive readout task, as K increases so does the probability that some subset of the RC phase space forms a good network. If the other hyperparameters are within some reasonably optimal window, there is a K for which the probability of finding a good sub-network is high enough that performance saturates and any Kerr RC achieves high fidelity. In fact, note that the K = 5 and K = 10 sets of networks show almost equivalent performance, indicating that as few as 5 nodes are sufficient for a random network to reliably complete the present readout task. Finally, we also find that the performance of larger reservoirs is more robust to γ and other hyperparameter values outside the optimal range.
In Fig. 7(b) we consider the role of nonlinearity and input scaling, plotting F obtained using 10 random RCs, while varying µ for various fixed values ofΛ. The nonlinear term in Eq. (8) scales as the node amplitude cubed, and as such the effective nonlinearity is related to both Λ and µ. To zeroth-order, β ∝ µ, and so the effective nonlinearity in β and x φ can be roughly quantified bȳ Λµ 2 . For a larger input strength, the field amplitude will be higher and thus a lowerΛ is needed to produce a nonlinear term with the same relative weight.
As we have seen in Sec. III C, some nonlinearity is necessary for computation, so F initially increases as either Λ or µ is increased. However, the fidelity ultimately reaches a maximum before decreasing again, as we encounter an upper limit to the nonlinearity: if it is too strong, the fixed points of the RC network for a given steady state drive become less stable. The dynamics of the RC network will generically exhibit large oscillations and not settle near their steady-states over the measurement timescale. This is exacerbated by the noise in the input signal; as these fixed points in phase space are less attractive, the strong white noise is able to generate larger excursions in phase space. Thus, as seen in Fig 7(b), F falls off sharply for largerΛ for µ above some upper limit. Indeed, we find that √Λ µ should be in the range of 0.5 − 1 for optimal performance, a trend we verified for additionalΛ which are not shown. This is precisely the regime for a single Kerr oscillator where the nonlinearity begins to significantly influence its dynamics and steady-state. RC intuition suggests that dynamics should be affected but not dominated by the nonlinearity; the observation of an optimal nonlinearity strength for a Kerr network appears consistent with this understanding. It should be noted that for this plot, we have chosen Λ j =Λ to make theΛ − µ relationship more clear, but the results are qualitatively unchanged when randomness in the nonlinearity is reintroduced as well.
The final hyperparameter α is the largest singular value of the Kerr network connectivity matrix W R and sets the strength of the node-node coupling. In reservoir computing literature, it is commonly stated that for an echo-state network (with a hyperbolic tangent nonlinearity), if the spectral radius of W R is much larger than 1, the RC steady state is not guaranteed to be stable and limit-cycle dynamics can emerge [18,25,28]. A spectral radius close to this limit (the so-called 'edge of stability') is often found to be optimal for tasks requiring significant memory (i.e. where previous states of the input are important) [21,28]. For the Kerr network RC of Eq. (8), the coupling matrix is symmetric so α is also the spectral radius of W R . Since the network has an explicit decay term, the linear network is stable for α < 2. When the nonlinearity is included, numerically we find stable steady states and thus the fading memory property are always present for α < 2 for the nonlinearity parameters we consider. In agreement with other works, we have found α ∼ 1.5 − 2 results in optimal performance, and thus have chosen to present results with α = 1.9.
Overall, there is generally a broad range of Kerr RC hyperparameters resulting in high fidelity classification, which we can summarize as γ κ, √Λ µ ∼ 0.5 − 1, and α ≤ 2. This performance is independent of specific network structure, and is robust to moderate disorder in these structural parameters.

IV. QUANTUM INFORMATION APPLICATIONS
In this final section, we demonstrate a pair of relevant quantum information applications that can be implemented in this same reservoir computing system: two-qubit state tomography and continuous qubit parity monitoring. In both cases the description of Fig. 2 applies: a Kerr network reservoir continually processes the measurement current from a joint dispersive readout system. This is not intended to be an exhaustive survey of potential applications, but to emphasize the ease of generalizing our reservoir processing approach.

A. Multi-qubit tomography
To this point, we have only evaluated the ability of the RC to classify measurement currents from quantum systems prepared in computational basis eigenstates. However, a reservoir processor trained using only these states |ψ(0) = |z can measure arbitrary joint qubit states with high fidelity. To be specific, when the target quantum system is interrogated by driving the readout cavity, backaction rapidly causes the joint qubit state to collapse to one of the measurement basis eigenstates, with probability |c z (0)| 2 . The RC will then faithfully return the current state of the target quantum system |ψ(τ m ) . If this quantum state is repeatedly prepared and measured, the distribution of RC outputs will thus agree with that of the underlying state. Furthermore, since the measured cavity quadrature is a nonlinear function of the multiqubit operatorχ = j χ jσz,j (Appendix II), one can preform full tomography on the two-qubit density matrix by simply preceding the measurement with a set of single-qubit rotations [5]. To demonstrate this capability, in Fig. 8 we compare the RC output with the true quantum state at the measurement time, for the dispersive system initialized in the indicated product or Bell states. The reservoir is that of Fig. 3-4, and in particular has the same W o : training was again done by simply preparing the qubits in each computational basis state 10 times, measuring for 10/κ, and using that initial state as the target. Even though the qubit state can jump during this readout, training is still effective, and this approach should be robust to preparation errors as well. The quantum system was then initialized in each of the test superposition states 800 times, and the time-dependent RC output P z (t) was compared against the current quantum state |c z (t)| 2 . Despite this simple training with only easily-accessible initial computational basis state labels, the RC is highly successful at producing this dynamical quantum variable, returning the current state |ψ(t) (max z {P z (t)} = max z {|c z (t)| 2 }) with 98% fidelity across all test states. Fig. 8 compares the distribution of RC outputs with the average quantum state at τ m ; it is clear the RC accurately determines the underlying quantum distribution for the three states tested. For simplicity we have chosen a set of states where all the unique density matrix elements are diagonal in the measurement basis, so this single set of measurements is sufficient to distinguish states. As described in Ref 5, full tomography on arbitrary states can be done by repeating this process after applying single qubit rotations; this reservoir processor can thus be an effective tool for general tomography in additionto computational basis measurement.

B. Joint parity monitoring
This reservoir processing approach is not limited to determining qubit states from measurement currents -generally, one can train an RC to return arbitrary dynamical observables given an appropriate measurement record. In this final example we describe the operation of the K = 5 Kerr reservoir of Fig. 3-4 to measure multi-qubit parity: σ z,1σz,2 . Recall that for the dispersive readout quantum system model described previously, the RC was able to learn the quantum state and output |c z (t)| 2 with very high fidelity: thus, one can trivially modify the output layer to instead return the expected parity via We instead consider a more interesting and relevant task for quantum information applications by modifying the dispersive readout system such that the observed quadrature does not distinguish between states in a given parity subspace, setting χ 1 = −χ 2 = κ, ε 0 = 2iκ, and leaving all other parameters unchanged. In this situation, the measured quadrature will differ only for states of different parity, and be the same for states in the even {|11 , |00 } or odd parity sub-spaces {|11 , |00 } (Eq. (3)). As a result, when the readout cavity is measured there is no backaction on qubit states in a given parity subspace: this allows one to generate and maintain superposition states, or manipulate the joint qubit state within a parity subspace, a requirement for many quantum error correction protocols. This specific parity readout system has been explored both for Bell state generation and error syndrome monitoring [6,55,56]. In particular, this is a common error syndrome in quantum error correction, where the detection of a change in parity between two qubits is an unambiguous indicator that an error has occurred [56,57]. We operate this system as previously, interrogating the measurement cavity and inputting the resultant measurement current to the RC; this task is thus also described via Fig. 2, with the output now σ z,1σz,2 instead of P j . For training, the quantum system is again prepared in each of the computational basis states Q = 40 times and a measurement is performed for 10/κ, with the target being the initial parity, ±1 for |ψ(0) = |11/00 , |10/01 respectively. For the parameters chosen, the probability of a parity jump occurring during this window is small due to the lack of measurement backaction; thus this simple training procedure allows the RC to learn to return the current multiqubit parity. This is demonstrated in Fig. 9(a), where we test the ability of the RC to continuously return the parity as the quantum system is measured over a much longer 50/κ window. For sample readout trajectories, we have plotted both the evolution of the parity (ground truth obtained from the SME) as well as the RC output. It is seen that the RC output follows the true parity closely, and in particular, quickly switches after the parity jumps due to qubit decay processes. Over 800 test measurements (each of duration 50/κ), the RC parity is correct at 93% of times. This is limited by the finite response time of the reservoir: on average the RC parity will flip 2.3/κ ∼ 1/γ after that of the quantum system.
For both error syndrome monitoring and Bell state generation, it is only necessary to detect these parity jumps, rather than exactly reproduce the evolution of the parity . In Fig. 9(b) we indicate the times of parity jumps in the quantum system and RC output with circles and stars respectively. It is clear that for the examples shown, the RC accurately detects when a parity jump occurs, and does not predict one where there is no jump. In Fig. 9(b) we test the ability of the RC to detect parity jumps: the quantum system is initialized in the 8 indicated states (of definite parity), and measured 100 times each over a duration of 50/κ. Plotted is the fraction of measurements in which the RC successfully predicted a parity jump after one occurred in the quantum system, or did not predict a parity jump if none occurred. Overall, parity change detection accuracy was 95.5%, indicating that this RC could be a powerful tool for monitoring error syndromes or tracking the evolution of general observables.

V. CONCLUSIONS
In this work, we propose a hardware reservoir computer to facilitate quantum readout. We describe how a small network of Kerr nonlinear oscillators can be implemented and integrated with superconducting circuit or conventional optical quantum systems. These RCs can reliably perform the nontrivial task of joint dispersive quantum measurement with comparable high fidelity and orders of magnitude less training overhead than conventional filtering approaches. This processor can be readily applied to important quantum information tasks such as multi-qubit tomography and the continuous monitoring of observables such as parity, with similar fidelity and simplicity of calibration. This approach offers significant efficiency improvements across several dimensions: (i) hardware resources via compatibility with existing quantum platforms, (ii) computational resources via ease of training, and (iii) latency and data overhead via edgecomputing in a physical reservoir.
Through a firstprinciples consideration of the system dynamics, we explore the features and properties of the Kerr oscillator network that enable this performance and make it an attractive platform for reservoir computing more generally. We additionally develop an intuitive phase-space picture which provides insight into how RCs process information. Here we have considered only a classical RC which is not entangled with the quantum system it measures, but extend our approach to the quantum domain reservoir computer in a separate publication. We hope that this work helps support the development and integration of hardware-based reservoir computing approaches to quantum information processing platforms, where we believe they can be of significant benefit. Kerr oscillator k field operator β k Kerr oscillator k field amplitude u input signals to Kerr network at ω d ∆ k Kerr oscillator k -input detuning: ω k − ω d λ k Kerr oscillator k nonlinearity g kl linear coupling between Kerr oscillators k and l ε km input m -Kerr oscillator k coupling strength γ Kerr oscillator decay rate, network evolution rate Reservoir computer x φ measured reservoir quadratures: (β k e −φ k +β * k e iφ k )/ √ 2 WR dimensionless connectivity matrix: 2(∆ k δ k,l + g kl )/γ WI dimensionless input layer: 2ε kl /γ Λ dimensionless node nonlinearity 2λ k /γ K number of Kerr oscillators µ input coupling strength: W I,kl ∈ [−µ, µ] Λ average node nonlinearity: Λ k ∈ [0, 2Λ] α spectral radius of connectivity matrix: λmax(WR) C(φ) matrix of node measurement angles Wo trained output layer Q number of training samples y C-dimensional target output y RC output: Wox φ Pz RC output converted to probability e yz (t) / k e y k (t) CF fraction of test signals classified correctly F classification fidelity: maxt{CF (t)}

II. JOINT DISPERSIVE MEASUREMENT
The unconditional evolution of the multi-qubit measurement system under either model is described by the master equation (ME) [5,6,41]: where κ and γ h describe cavity loss and qubit decay through coupling to the external environment.Ĥ S = {Ĥ JC ,Ĥ D }: forĤ S =Ĥ D the ME acquires an additional term to account for correlated qubit decay via the cavity: L → L + κD[ j gj δjσ −,j ], although it is O(( gj δj ) 2 ) and thus weaker than other rates considered. We could also include pure dephasing as well but consider a regime where this environmental dephasing much weaker than dephasing induced by measurement.
The utility behind this dispersive measurement system is that it allows one to perform a near-QND measurement of all the qubits simultaneously. We recall first the basic description of the measurement process from Ref. 4, based on the pointer-state formalism. With the cavity initially in the vacuum and a constant drive applied at t = 0, underĤ S =Ĥ D there is a cavity coherent state associated with each joint qubit state (i.e.ρ ∝ z,z p(t) z,z |α z (t), z/ \α z (t), z |. The coherent state amplitudes evolve as where we have defined the qubit operatorχ = j χ jσz,j . If χ j are distinct then for each of the 2 Nq {|z } qubit states,χ has a unique value and a different coherent state is generated in the measurement cavity. The corresponding steady-states are And the measurement of this cavity field results in an effective measurement of the joint qubit state, i.e. all qubits in the z-basis simultaneously. This measurement becomes QND ifχ commutes with all the operators in L, such that joint qubit-eigenstates are preserved by the measurement process. This can be achieved in practice in the dispersive regime if κ is larger than J jk and the qubit decay rates, such that these terms can be ignored on the timescale over which measurement proceeds. Even away from the dispersive regime, where the system is better described byĤ JC , the state of the cavity field still contains information about the joint qubit state, and a sufficiently rapid cavity measurement can be used to learn the initial joint qubit state |ψ(0) . By continuously recording the X-quadrature of the field radiating from the measurement cavity, one obtains the homodyne measurement current of Eq. (4). This continuous measurement provides information about the current state of the QS,ρ(t), and the evolution of an observers knowledge of the QS, conditioned on J(t) is found by including the measurement superoperator to Eq. (1), resulting in the SME of Eq. (3). Recall that ξ(t) = 0, and so by taking the ensemble average of Eq. (3) we recover Eq. (1) and J(t) = √ κ d + d † : density matrix evolution and operator expecation values converge to their unconditional results. Equation (4) describes a measurement current and corresponding noise which is continuous; both due to finite sampling times in real measurements and for our numerical simulations, we the homodyne current is actually sampled at discrete times t n = n∆t. As a consequence, ξ(t) → ξ(t n ) in both Eq. (3) and (4): ξ(t n ) = N n (0, 1)/ √ ∆t (4) where N n (0, 1) are samples drawn from the normal distribution with zero mean and unit variance. This converges to white noise in the continuum limit ∆t → 0, with ξ(t)ξ(t ) → δ(t − t ). The integrated noise power is importantly independent of ∆t: t dτ dτ ξ(τ )ξ(τ )) = t, so the SNR of the measurement current remains fixed for any sampling rate. The noisy continuous signal described in Eq. (4) is what is seen by any classical system interacting with the readout cavity output field [42]. When we take J(t) as the input to an RC, there is no entanglement between QS being measured and the classical RC processing this measurement record. In this work we assume each Kerr oscillator has sufficiently large field amplitudes to be in a classical regime. Even without a standard homodyne measurement set-up, the signal is measured through its interaction with room temperature electronics or the RC itself. If the RC and QS are built on different hardware platforms, or separated from the QS by amplification and isolation stages, this description is necessary.

III. RESERVOIR COMPUTER TRAINING
Training amounts to choosing an optimal set of phase angles to measure each oscillator and linear weights to apply to to each resultant node. For a K node network and C-dimensional target output, these are encoded in the C × K output matrix W o and the K element vector φ = (φ 1 . . . φ K ) of measurement angles, for a total of (C+1)K parameters to optimize. To train an RC, a set of labelled training data {u (q) (t), y (q) (t)}, consisting of Q (generally multi-dimensional) input signals U = u (q) (t) and their respective target outputs Y = y (q) (t) is constructed. The training data is fed into the RC, producing the dynamical response β (q) (t), which we decompose into independent quadratures x (q) (t). Importantly, both quadratures should be measured during training, in order to optimize the measurement angle. A loss function L(W o , φ) is then constructed from the training RC trajectories and target outputs, which depends on the difference between the RC and target output, and thus the output matrix and set of measurement angles. By minimizing the loss function with respect to W o and φ, one hopes to have the RC output y (q) (t) = W o C(φ)x (q) (t) → y (q) (t) and thus reproduce the desired target function.
The specific task we consider in this work is the retrodiction of the initial state that produced an observed measurement record. We demand that the computation returns the probability that the input is a homodyne record from a QS with |ψ(0) = |z . The training data labels are similarly probabilities for each input signal, P (q) , which in practice is just 1 for the ground truth class and zero for the others. The cross entropy loss function is then L x (W o , φ) = − 1 QN q,tn P (q) (t n ) · log(P (q) (t n )) (1) where P (q) (t n ) is found from y (q) (t) via Eq. (13). Throughout this work, training is done by minimizing L x of Eq. (1) with an added regularization term through gradient descent. We consider small K = 2 − 10 RCs and small training sets so this loss function minimization is a computationally easy task. In our physical RC framework, we envision this training is done using external software to the RC. The RC node trajectories are measured during training, and {x (q) (t), y (q) (t)} are used to minimize L x to compute W o and φ for the task at hand. φ then sets the measurement angles for subsequent RC processing and W o maps these measurements to the RC output. The RC output is the linear combination of measured node quadratures y (q) (t), which reflects the probability of the input measurement record being associated with each state. The softmax function is applied in software and is only necessary to construct L x during training; linear classification using y (q) (t) or P (q) (t) is equivalent.