Single-shot quantum memory advantage in the simulation of stochastic processes

Stochastic processes underlie a vast range of natural and social phenomena. Some processes such as atomic decay feature intrinsic randomness, whereas other complex processes, e.g. traffic congestion, are effectively probabilistic because we cannot track all relevant variables. To simulate a stochastic system's future behaviour, information about its past must be stored and thus memory is a key resource. Quantum information processing promises a memory advantage for stochastic simulation that has been validated in recent proof-of-concept experiments. Yet, in all past works, the memory saving would only become accessible in the limit of a large number of parallel simulations, because the memory registers of individual quantum simulators had the same dimensionality as their classical counterparts. Here, we report the first experimental demonstration that a quantum stochastic simulator can encode the relevant information in fewer dimensions than any classical simulator, thereby achieving a quantum memory advantage even for an individual simulator. Our photonic experiment thus establishes the potential of a new, practical resource saving in the simulation of complex systems.

Stochastic processes underlie a vast range of natural and social phenomena [1,2]. Some processes such as atomic decay feature intrinsic randomness, whereas other complex processes, e.g. traffic congestion, are effectively probabilistic because we cannot track all relevant variables. To simulate a stochastic system's future behaviour, information about its past must be stored [3,4]and thus memory is a key resource. Quantum information processing promises a memory advantage for stochastic simulation [5][6][7][8][9][10][11][12][13][14][15] that has been validated in recent proof-of-concept experiments [16,17]. Yet, in all past works, the memory saving would only become accessible in the limit of a large number of parallel simulations [6,18], because the memory registers of individual quantum simulators had the same dimensionality as their classical counterparts. Here, we report the first experimental demonstration that a quantum stochastic simulator can encode the relevant information in fewer dimensions than any classical simulator, thereby achieving a quantum memory advantage even for an individual simulator. Our photonic experiment thus establishes the potential of a new, practical resource saving in the simulation of complex systems.
Here we realise the first experimental demonstration of a single-shot memory advantage for simulating stochastic processes. By "single-shot" we mean that any individual simulator obtains an advantage, rather than requiring an asymptotically large array of simulators. We investigate a specific stochastic process, while noting that it is theoretically known that the advantage holds for a range of other simulation tasks [14]. The process we simulate here can be understood as the output of a biased perturbed coin after post-processing [15](see Fig. 1a): at each discrete time step, the state of the coin provides a probabilistic binary outcome, which depends on the parameters p and q that are defined by the process. Over multiple time steps, this produces a string of 'zero's and 'one's. Then, in post-processing, every '0' that precedes a '1' is replaced by a '2'. For classical simulation, this The stochastic process and its simulation. a. The perturbed coin process involves a coin in a box. At each step, the box is perturbed, which may or may not flip the coin. The probability of flipping from zero to one, p, can differ from the probability of flipping from one to zero, q, and similarly for the complementary probabilities of remaining in zero, 1 − p, and remaining in one, 1 − q. The process we study here is the post-processed data of the perturbed coin, which has three possible outputs at each time step, represented by the squares. The transition probabilities Tij, (i, j ∈ {0, 1, 2}), between outputs i and j are the functions of p and q provided next to the arrows. These probabilities form the transition matrix. b. The optimal classical simulator of the process uses causal states, as shown in the circles. The arrows represent transitions between different causal states, with the associated expressions j | Tij providing the classical output of the transition, j, and its probability Tij. In this case a simple mapping exists from the past of the process to the appropriate causal state: the last output from the string of past outputs determines the causal state. The transition probability Tij is the probability of transiting from causal state i to j while emitting j. The eigenvalues of the transition matrix form the probability distribution of the causal states, called the stationary distribution {pi}i=0,1,2. In the quantum case, the causal states become quantum states, {|Si }.
post-processing markedly increases the amount of past information that needs to be stored in order to generate future predictions. This is not so for quantum processors.
It is known that for the provably optimal simulators [15] in each class (classical or quantum) of this stochastic process (Fig. 1b), it suffices to classify any possible past into three different states called causal states [4,5]. To this end, the classical processor must have three distinguishable states, {S i } i=0,1,2 , as its memory. By contrast, as we experimentally demonstrate, the quantum processor works with the three required quantum states, {|S i } i=0,1,2 , compressed into a twodimensional quantum system.
Generally, as illustrated in Fig. 2a, a quantum simulator of a stochastic process, henceforth simply referred to as a quantum simulator, accepts a memory system and an ancilla system as inputs to a unitary transformation [5,6,14] for each simulation step. Of the two, only the memory system contains information about the past, while the ancilla system carries no information. The unitary transformation produces an entangled state of the output memory system and a second system. Measurement of the latter provides the output of the stochastic process, and collapses the memory system to the appropriate quantum state for the next simulation step. Importantly, the memory register enters and exits the quantum processor (W) as a two-dimensional system, unlike its classical counterpart in Fig. 2b, where the memory register is a three-dimensional system.
For the stochastic process of Fig. 1, the quantum memory required is a single qubit, in which the three causal states are encoded as three, non-mutually-orthogonal, pure quantum states, as described in Methods. We implement our simulator in a photonic quantum information processor. The memory qubit is encoded in the polarisation degree of freedom of a single photon. The nontrivial unitary transformations in our experiment include a mapping from the memory qubit to a qutrit space of three spatial modes (paths), followed by a controlled-NOT (C-NOT) [19,20] and a controlled-rotation (Crotation) gate, as detailed in Fig. 2c. The path measurement of this photon corresponds to measuring the qutrit in the logical basis, which provides the classical output (0, 1 or 2) of that step of the stochastic process. This collapses the output memory qubit, encoded in the polarisation state of another photon, to the correct conditional state, which can be characterised by quantum state tomography.
We overcome constraints in the nondeterministic photonic implementation of consecutive quantum gates by introducing a non-destructive measurement realised by an additional C-NOT gate [21,22] and a corresponding ancilla photon. The photons are generated via spontaneous parametric downconversion (SPDC) and four-fold coincidences (three photons for the experiment and one "spare" photon to herald the presence of its pair) are detected using superconducting nanowire single-photon detectors (SNSPDs [23]) and coincidence logic modules. The detailed experimental setup is shown in Fig 3, and additional details are in Methods.
The first goal of the experiment is to verify that the quantum simulator is performing the intended simulation. For this, two criteria must be fulfilled: i) After initialisation in each of the three possible causal states, the conditional output statistics, obtained through the qutrit measurement, should match the transition prob-abilities that determine the stochastic process (see Fig.  1). ii) Conditioned on the qutrit measurement outcome, the correct memory state should be produced, to allow the possibility of further simulation steps.
To check the first criterion, we prepare each of the three causal states, whose definitions in terms of p and q are provided in the Methods section. For each input causal state there is a probability distribution over the three possible outputs of the stochastic process. Comparing the measured distributions with the theoretical ones, we consistently obtain (classical) fidelities [24] above 0.993. For the second criterion, the collapsed output memory state is reconstructed by quantum state tomography, given each of the input causal states. The (quantum) fidelities of our experimental stationary states (see Methods) with the ideal stationary states are all above 0.991.
The second goal of the experiment is to demonstrate the quantum advantage in memory requirements. A stochastic simulator can be used in different ways, with correspondingly different ways of analysing the memory use. The most straightforward use is as a single simulator. In this scenario, the memory size, in bits, is measured by the max-entropy, which is simply log 2 D, where D is the dimensionality of the memory system [6,18]. Since the information about the past is encoded in the polarisation of a single photon, both at the beginning and at the end of the simulated step, the memory system that connects steps is obviously confined to a qubit space. In contrast to this two-level quantum system, the optimal classical simulator requires a three-level system [15]. Thus, there is a clear single-shot quantum advantage in memory.
If multiple simulations are run in parallel, the required memory is no longer determined by the dimensionality of the memory system alone. In the limit of a very large number (N ) of parallel simulations (the independent and identically distributed (i.i.d.) case [6,18]), the minimum required memory to replicate the process faithfully is given by N C, where C is called the statistical complexity [3]. The classical statistical complexity [3], C µ , is the Shannon entropy of the stationary distribution over causal states, while the quantum statistical complexity [5], C Q , is the von Neumann entropy of the quantum stationary state (see Methods for mathematical definitions). Fig. 4a illustrates the theoretically-expected statistical complexities C µ and C Q for all possible values of p and q, showing the potential for a significant quantum advantage over a large region of the parameter space. We perform the simulation for sets of (p, q) values along several cross-sections. The experimental values of C Q , shown in Fig. 4b-e, are determined from the density matrices of the output memory system and the transition probabilities (see Methods). The slight deviations of the experimental data compared to the theoretical curves arises from experimental imperfections such as reduced qubit (We use wavy lines to denote quantum objects, with the number of lines in parallel indicating the dimensionality.) The ancilla contains no information and its preparation, P , is fixed. The memory qubit undergoes a fan-out operation, F , after which the information is contained in a qutrit space. Then a unitary operation, U , acts on the qutrit and ancilla, outputting an entangled state of the memory qubit and a qutrit. A projective measurement of the qutrit provides the output of the simulation step and collapses the memory qubit to the appropriate state for the next step. b. The classical simulator requires a three-dimensional memory system. The irreversible operation W acts on the memory system to generate the classical output and the next memory state. c. The experimental realisation of the circuit in subfigure a using linear optics gates requires an ancilla qubit (photon 2) and its herald (photon 1). Following the fan-out operation F (p, q) on the memory qubit, we implement a gate, C-NOT 1, which performs a non-destructive measurement (NDM). Then the unitary operation U is performed by an additional two gates, C-NOT 2 and C-rotation. The preparation of the memory system (photon 3) P (q), the fan-out operation F (p, q), and the single qubit rotation R(q) depend on the stochastic process parameters p and q as indicated.
purity from imperfect nonclassical interference, small imperfections and setting errors in polarisation-dependent elements, and a minor imbalance in detector efficiencies. These results nevertheless demonstrate a substantial quantum advantage in the required memory for simulation in the i.i.d. case. Thus, our quantum simulator has an advantage over its classical counterpart both for the single-shot and i.i.d. cases. Remarkably, we even simulate processes, marked by the shaded regions in Fig. 4b-e, where the classical statistical complexity C µ exceeds one bit. In these cases, we have a gap between both quantum measures and both classical measures: C Q < log 2 2 < C µ < log 2 3. (Note that log 2 D always forms an upper bound on the Shannon or von Neumann entropy.) The present experiment allows us to study both the statistical complexity and the dimensionality of the mem-ory system. However, for more complex processes that entail high-dimensional memory systems, the quantum state tomography required for the estimation of the statistical complexity would require increased resources (such as photons, modes, detectors), and could become prohibitively time-consuming. In contrast, verifying a dimensionality advantage remains straightforward, because it is based on counting dimensions of a Hilbert space rather than characterising quantum states. We perform a single step of the simulation in our experiment, which is already sufficient for demonstrating a quantum advantage. In the future, it would be interesting to perform multiple simulation steps with a single-shot quantum advantage.
A natural question is to ask: what is the prevalence of such dimensionality advantage? While this remains an open question, its existence is certainly not isolated to Single photons are generated from SPDC events. The herald photon from Source 1 is sent straight to a heralding detector. The polarisation of the memory system is used to encode the relevant causal state in a qubit, using a half-wave plate (HWP). Ancillas are prepared in a fixed polarisation using HWPs. To implement the fanning out from the memory qubit to a qutrit, a HWP and polarising beam splitters (PBSs) are used. Each of the C-NOT 1 and C-NOT 2 gates is implemented using a HWP and a PBS. The C-rotation gate is realised via HWPs and partially polarising beam splitters (PPBSs). In order to vary the relative delay between the single photon wave packets, an automated translation stage is used to move one of the couplers. Classical readout is performed via projective measurements on the path modes of the qutrit, which collapses the memory state to the appropriate causal state. To verify the memory qubit, its state is reconstructed via quantum state tomography. A telecom bandpass filter is used in the tomography arm in order to spectrally filter the SPDC photons and maximise the visibility of the quantum interference. P stands for state preparation, SMF for single mode fibre, QWP for quarter-wave plate, GT for Glan-Taylor prism, and FPC for fibre polarisation controller. For more details, see Methods.
the stochastic process in this experiment. Indeed, such advantage arises naturally in the context of processes that exhibit causal asymmetry-a memory overhead (in both dimensional and entropic memory costs) between predicting the future versus retrodicting the past [25]. All such processes lead to dimensionality advantage, and there exist families of processes where this advantage can grow without bound [15].
In conclusion, we have shown that quantum information processing enables the simulation of a stochastic process with a memory that is smaller both in terms of its dimensionality (the number of orthogonal states it can support) and its von Neumann entropy, compared to the optimal classical simulator, measured by the number of states it uses and the Shannon entropy, respectively. The demonstrated decrease in the dimensionality of the memory system establishes a new type of memory savingnamely a single-shot memory advantage. This advantage becomes possible when the system being simulated has at least three causal states, in contrast to previous works with only two causal states [16,17]. Finally, we note that although our current realisation uses nondeterministic gates, this does not affect the definition of single-shot advantage. This advantage is about the fact that multiple parallel simulators are not required, but rather that the advantage can be achieved in principle at the scale of a single simulator.

Methods Stochastic processes.
A stochastic process evolving in discrete time is a collection of random variables {..., X t−1 , X t , X t+1 , X t+2 , ...}, where the previously observed variables {..., X t−1 , X t } are considered the past of the process, i.e. the list of past outputs. A faithful simulator is one that correctly generates the process's future statistical behaviour based on a given configuration of its past. The memory system of the simulator must store sufficient information about the past configuration to enable this faithful simulation [4]. Then, a processor acts on the memory, generating a new classical output X t+1 and updating the memory to be ready for the next step.
For optimal simulation of the process that we study here [15], the most recent output, X t , is sufficient for determining the memory state for step t + 1 [3]. The possible memory states are called causal states [3,4], and there are three of them for this process. The classical causal states are perfectly distinguishable states, {S i } i=0,1,2 . The quantum causal states, {|S i } i=0,1,2 , can be similarly defined as However, by choosing a different basis, these states can be mapped to a single qubit space [15]: where |0 , |1 form an orthogonal basis. Experimental details. The quantum simulator is used to investigate several sets of processes with different values of p and q. The entropy of the reconstructed stationary states (see Methods) determines the quantum statistical complexity (red dots). The black and blue curves represent the theoretical Cµ and CQ, respectively. The plots demonstrate a considerable memory advantage for the i.i.d. case. Furthermore, the grey shaded areas mark processes where the complexity Cµ of the classical simulator exceeds one bit, while the quantum simulation runs with only one memory qubit. Uncertainties are estimated from the Poissonian distribution of photon counts.
Four photons are generated via SPDC, as shown in Fig.  3. For this, two SPDC sources are realised using a 775 nm Ti-sapphire picosecond-pulse-length pump laser and ppKTP (46.20 µm poling period) crystals cut for type-II collinear degenerate phase matching [26,27]. The photons are not entangled in polarisation. The crystal temperature is controlled at 25 • C by a temperature controller. The bandpass filter is centred at 1550 nm and has a FWHM of 8.8 nm.
To run the simulator, the causal states in equation (2) are encoded in the polarisation degree of freedom of a single photon acting as the memory system. We use polarisation modes such that |0 = |H and |1 = |V , where H and V are horizontal and vertical polarisations, respectively.
The fan-out transformation implements the basis change from equation (2) to (1), so that the three paths correspond to orthogonal states |0 , |1 , and |2 . The experimental setup contains non-deterministic two-qubit gates. The C-NOT gates 1 and 2 are realised with a HWP, a PBS, and post-selective detection. This simplified version (compared to a universal photonic C-NOT gate [19]) is adequate, since the photons in the two input spatial modes always have a fixed polarisation. The controlled-rotation gate is comprised of two single-qubit rotation gates, R(q), and a two-qubit controlled-Z gate. This controlled-Z gate is based on the scheme in Ref. [20], which uses three partially polarising beam splitters (PPBSs). However, we only require two because of the fixed polarisation in one of the input spatial modes. Four-fold coincidences are detected in a 5 ns coincidence window, using SNSPDs and fast counting electronics.
The detection channels have slightly different efficiencies, which may affect the probabilities determined from the various coincidence detection combinations and thus the inferred transition probabilities. The possible fourfold detection combinations are formed by coincidence detections between detectors from each of the following four sets (see Fig. 2c): {1}, {2, 3}, {4, 5, 6} and {7, 8}. This implies that the detectors within each set should ideally have the same efficiencies. In the experiment, the detectors are installed in such a way as to match this criterion as closely as possible.

Statistical complexity.
The statistical complexity [3,4,25] is the minimal memory a model needs to generate future statistics correctly using only information from past observations. The classical statistical complexity is where p i is the probability of each causal state in the stationary stochastic process, i.e. in the limit of a long evolution. The quantum statistical complexity is defined as [5]: where ρ = i p i |S i S i | is the quantum stationary state.
Our simulator implements the provably optimal model, the so-called quantum epsilon machine [3,5,15]. Therefore, we can measure C Q by inputting the causal states described in equation (2) for a given set of p and q values. The stationary state, ρ, is calculated as: where {d i } i=0,1,2 are the eigenvalues of the experimentally measured transition matrix: T ij is the probability of classical output j when the input causal state is |S i . Moreover, S pol|Si is the reconstructed polarisation state of the output memory system when the input causal state is |S i .

Data availability
The datasets generated during and analysed in the current study are available from the corresponding author on reasonable request.