Quantum coarse-graining for extreme dimension reduction in modelling stochastic temporal dynamics

Stochastic modelling of complex systems plays an essential, yet often computationally intensive role across the quantitative sciences. Recent advances in quantum information processing have elucidated the potential for quantum simulators to exhibit memory advantages for such tasks. Heretofore, the focus has been on lossless memory compression, wherein the advantage is typically in terms of lessening the amount of information tracked by the model, while -- arguably more practical -- reductions in memory dimension are not always possible. Here we address the case of lossy compression for quantum stochastic modelling of continuous-time processes, introducing a method for coarse-graining in quantum state space that drastically reduces the requisite memory dimension for modelling temporal dynamics whilst retaining near-exact statistics. In contrast to classical coarse-graining, this compression is not based on sacrificing temporal resolution, and brings memory-efficient, high-fidelity stochastic modelling within reach of present quantum technologies.

Stochastic modelling of complex systems plays an essential, yet often computationally intensive role across the quantitative sciences. Recent advances in quantum information processing have elucidated the potential for quantum simulators to exhibit memory advantages for such tasks. Heretofore, the focus has been on lossless memory compression, wherein the advantage is typically in terms of lessening the amount of information tracked by the model, while -arguably more practical -reductions in memory dimension are not always possible. Here we address the case of lossy compression for quantum stochastic modelling of continuous-time processes, introducing a method for coarse-graining in quantum state space that drastically reduces the requisite memory dimension for modelling temporal dynamics whilst retaining near-exact statistics. In contrast to classical coarse-graining, this compression is not based on sacrificing temporal resolution, and brings memory-efficient, high-fidelity stochastic modelling within reach of present quantum technologies.

I. INTRODUCTION
Everywhere we look, we are surrounded by complex systems. They manifest across all scales, from the microscopic level of chemical and physical interactions, through biological processes, to geophysical and meteorological phenomena and beyond [1][2][3][4][5][6][7][8]. As the descriptor complex suggests, with such systems manifesting a rich tapestry of emergent behaviours it quickly becomes an insurmountable task to track their many interacting components in full. Computational tractability demands that when modelling complex systems we keep only a partial knowledge, sufficient for predicting relevant properties of interest. Meanwhile, the remaining information that is discarded (or was not possible to observe in the first place) manifests as stochastic effects on top of this. Accordingly, stochastic modelling [9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26] is a critical part of modern science, and identifying ways and means of maximising its efficacy is a transdisciplinary endeavour.
A key bottleneck is the amount of memory available, restricting the amount of information that can be stored. Each configuration the system can take is assigned to a state in the memory; the number of states the memory can support -its dimension -limits the number of distinct configurations that can be tracked. A form of compression to mitigate this is coarse-graining -grouping together configurations that are sufficiently close into a single combined configuration, reducing the effective dimension, at the cost of precision. This is particularly prominent for temporal information: time is a continuous parameter requiring an unbounded amount of information to specify to arbitrary precision [27]; in practice it is coarse-grained into bins of finite width [28].
For a quantum memory, the dimension is no longer synonymous with the number of different possible states it can support. In the context of stochastic modelling, by * physics@tjelliott.net encoding configurations with partially overlapping features into linearly-dependent quantum states, a dimensional compression can be achieved [29][30][31][32][33]. This quantum compression advantage can be of significant magnitude [32], though present techniques are constrained to exact (lossless) compression, hampering widespread applicability. Nevertheless, quantum encodings have been shown to almost universally reduce the information cost of stochastic modelling [30,[34][35][36][37][38][39][40], suggesting that many of the dimensions in the memory are barely utilised. This substantiates a strong motivation to develop lossy quantum encodings that trim down these excess dimensions whilst retaining high fidelity with the exact model.
Here we introduce such a lossy compression protocol that can be applied to greatly reduce the memory dimensions devoted to tracking temporal information. Our compression is based on reconstructing approximateyet near-exact -models of a process where the quantum memory states are constrained to a low-dimensional Hilbert space, emancipating the dimension from the number and width of time bins. After reviewing the necessary background, we describe our protocol in detail for pure temporal dynamics, with examples to illustrate the highfidelities and extreme quantum advantages that can be achieved with only a few memory qubits. We then describe how the protocol can be used for compressed modelling of general continuous-time stochastic processes.

A. Stochastic processes and models
Herein we are concerned with continuous-time, discrete-event stochastic process [39,41]. These consist of a series of events described by a sequence of couples x n := (x n , t n ), where x n ∈ X is the nth event in the series and t n ∈ R + is the time between the (n − 1)th and nth events [42]. The sequence is probabilistic, drawn from a distribution P (. . . , X n−1 , X n , X n+1 , . . .); throughout we use upper case to represent random variables, and lower case the corresponding variates. We assume the set of possible events X to be finite. A continguous block of the sequence is denoted x j:k := x j x j+1 . . . x k−1 . We consider bi-infinite length sequences such that n ∈ Z, and assume the process to be stationary such that P (X 0:L ) = P (X m:m+L )∀m, L ∈ Z. We will also consider discretetime approximations to such processes, where times are coarse-grained into finite intervals of size ∆t, recovering the continuous case in the limit ∆t → 0.
We can partition the process into a past and future, delineating what has happened and what is yet to happen respectively relative to some point in the sequence. Without loss of generality we can set n = 0 to represent the present with x 0 the next event to occur, such that the past consists of ← − x := x −∞:0 (∅, t← − 0 ) and the fu- Here, t← − 0 represents the time since the last event and t− → 0 the time until the next event (t 0 = t← − 0 + t− → 0 ), and ∅ denotes that the 0th event is yet to occur [39,41]. We desire models that are able to track relevant information from the past of a process in order to faithfully replicate the corresponding future statistics [8,43]. We require the models to be causal, entailing that they can be initialised for any given past, and store no information about the future that could not be obtained from the past observations [29]. Such models function by means of an encoding function f : ← − X → M that maps pasts into memory states ρ m ∈ M, and a transition structure Λ : M → M, ∅ ∪ X that produces the future statistics and updates the memory state accordingly [44]. In the continuous-time setting this transition structure is a continuous evolution, while in the discrete-time setting it acts once at each timestep [27,28,37,41]. A model with a lossless encoding is able to replicate the future statistics perfectly, while a lossy one produces an approximation thereof.
Continuous-time stochastic processes can be represented by edge-emitting hidden semi-Markov models (HSMMs) [23,41]. A HSMM comprises of (hidden) modes G, event alphabet X , and transition dynamic Λ. Conditional on the current mode and the time it has been occupied, the transition dynamic describes the probability of the model emitting a symbol x ∈ X and transitioning to a new mode, with the probabilities depending on the particular process [ Fig. 1(a)]. That is, the system resides in a given mode g ∈ G until an emission x ∈ X occurs, at which point it transitions to a new mode g ∈ G; the probability that a system resides in mode g for a time t before emitting symbol x and transitioning to mode g is given by the modal wait-time distribution xg P (x, g |g)φ x g g (t), where the probabilities P (X, G |G) describe the symbolic transition structure between modes, and the dwell functions φ x g g (t) the distribution for the time spent in a given mode before such a given transition occurs. See Refs. [39,41] and Section VI for further details.  1. (a) HSMM representation of a continuous-time, discrete-event stochastic processes showing the transition structure between modes. Each node corresponds to a mode of the model, and the arrows labelled x : p(t) denote transitions between modes accompanied by event x occuring at time t since the previous event, with the transition occuring with probability p(t). (b) Unpacking into HMM tracking mode occupation times; the nodes continue to represent modes and thin lines the transitions, while the thick black line indicates a continuum of states of the model, tracking both the current mode and time since last event.
A HSMM can be unpacked [39] into an edge-emitting hidden Markov model (HMM) [45] with a continuousstate space tracking the occupation time for the modes [ Fig. 1(b)]. States in the HMM represent a mode and time since last event (g, t← − 0 ), with a transition structure taking the system to (g, t← − 0 +dt) on non-events in the next infinitesimal time interval dt, and (g , 0) upon events. The corresponding emitted symbols are ∅ for non-events and x ∈ X for each event; transition probabilities follow from the conditional form of the modal wait-time distributions. Models of discrete-time stochastic processes can similarly be modelled by discrete-state HMMs, in which the occupation time is tracked by the corresponding coarse-grained states [28].

B. Memory and quantum advantage
A key metric of efficiency for a model is how much memory it requires to operate [46]. One way this can be parameterised is the information cost -in the sense of Shannon entropy -of storing the compressed past information [34,44,46]. Another, to which we direct our focus here, is the size of the substrate into which this information is encoded -in other words, the dimension of the memory state space [30,32,44,47,48]. The choice of encoding function will impact upon the memory cost, and is ideally chosen to make it as small as possible.
For stationary stochastic processes the optimal classical lossless memory encoding function is provided by the causal equivalence relation (∼ ε ) of computational mechanics [8,44,49], which partitions the entire set of semiinfinite pasts ← − X into equivalence classes called causal states s ∈ S such that two pasts belong to the same causal state iff they effect the same conditional future statistics: The memory-optimal lossless classical model (known as the ε-machine) is then constructed by designating a memory state |s for each causal state s, and having the causal state encoding function f ε assign pasts accordingly. A typical process evolving in continuous-time will require an infinite-dimensional memory to record the progress through infinitesimal divisions in time [27,32,37], engendering the need for lossy approximations that evolve with discretised timesteps [28,32].
With the advent of quantum information processing tools, the optimality of ε-machines has been supplanted [34]. Quantum encoding functions f q map pasts into a set of quantum memory states; by leveraging the possibility of encoding information into an ensemble of non-orthogonal states, further compression beyond the causal state encoding function may be attained. Prior work has centred on lowering the information cost of storing the past [30,[34][35][36][37][38][39][40], showing that a quantum compression advantage can almost always be procured. Recent focus has been devoted to obtaining corresponding advantages in compressing the dimension of the memory, by engineering quantum memory states with linear dependencies [29][30][31][32][33]. Examples have highlighted that such dimensional compression can sometimes be made arbitrarily strong with respect to the optimal classical encoding [32], though instances where it may be achieved in the lossless regime appear to be much less ubiquitous than in the case of reducing the information cost [30]. The lossy encoding protocol we introduce seeks to remedy this present shortcoming of the quantum models in the context of tracking the temporal aspect of their dynamics, to escape the associated memory dimension divergence in the continuous limit.

C. Renewal processes
With our attention directed towards compressing the temporal information, for much of this manuscript we will work with a special class of continuous-time stochastic processes that are purely temporal in nature: renewal processes [50]. These consist of a single mode and a single symbol, such that the resulting process is a series of identical events stochastically separated in time, with the spacing of each consecutive pair of events drawn from the same distribution. The distribution governing the time between events is called the wait-time distribution φ(t), and the survival probability Φ(t) := ∞ t φ(t )dt is the probability that a given interval is of length t or greater [27,28,32,37].
With few exceptions, for generic renewal processes the causal states group pasts together according to the time since the last event occurred [27,28,37]. That is, all relevant information for predicting the future of a renewal process is contained within the time since last event - such that the causal states are in one-to-one correspondence with t← − 0 -and moreover, can only provide predictive power with respect to the time t− → 0 until the next event will happen.
The transition structure between the memory states of the ε-machine for a renewal process has been likened to a 'conveyor belt' [27], progressing continuously along a line with time until an event occurs, whereupon the memory jumps to a 'reset' state corresponding to t← − 0 = 0. The probability of occupying the memory state corresponding to t← − 0 is given by π( −1 is the so-called mean firing rate [27,37]. The discrete-time analogue consists of a linear sequence of memory states through which the system progresses, akin to the incrementation of a counter, until also resetting upon an event [28,32]. Both are illustrated in Fig. 2. The exact continuous-time version requires an infinite continuum of memory states, and thus requires a memory of unbounded dimension; when there is no maximum value for t← − 0 the discrete-time case will similarly need an infinite-dimensional memory, and thus finite-dimensional approximations must also adopt a terminal state that the counter cannot exceed [32].

A. Quantum models of renewal processes
In previous work [37] we have established that a general renewal process with wait-time distribution φ(t) can be exactly simulated by a quantum model with a memory encoding function with {|t } an infinite-dimensional orthogonal basis and ψ(t) := φ(t). 1 The future statistics are extracted from these memory states by means of a continuous measurement sweep that at each infinitesimal interval δt produces a binary outcome as to whether or not the system is found in a state |t in the interval [0, δt): if yes, then the event is deemed to have occured and the memory is re-initialised in state |ς 0 ; if not then the event does not occur, and a relabelling t → t − δt takes place. A fine-grained discrete analogue of this evolution with time-step interval δt can be implemented through the following unitary interaction U δt coupling the memory state to an ancillary system used to provide the measurement readout, where 0 and 1 represent non-events and events respectively [32]: (3) After measurement, the ancilla is set to |0 ready for the next timestep. The amplitudes on the righthand side of this equation are set such that they yield the correct probabilities for the future statistics, as Arbitrary complex phases can be added to these amplitudes without affecting the statistics [30,32]; on the first term it is equivalent to appending an irrelevant phase to the quantum memory states, while on the latter it mirrors the effect of a complex phase on ψ(t).

B. Quantum model memory as an integral kernel
The steady-state ρ of the quantum model memory is given by a mixture of the quantum memory states, weighted by their probability of occurence [37]: The rank of ρ corresponds to the dimension required by the memory substrate to support the range of quantum memory states. This is given by the number of non-zero elements in the spectrum of ρ, which can be found from the characteristic equation 1 Note that in principle an arbitrary, time-dependent complex phase can be added to ψ(t); provided that |ψ(t)| 2 = φ(t) the model will still yield the correct statistics, albeit with a potentially different memory cost. We return to this point later.
This has the form of a homogenous Fredholm integral equation of the second kind [51], with We are thus in a position to leverage results from Fredholm theory to reveal properties of the spectrum {λ} of ρ. Most pertinently, if ρ represents a degenerate kernel, wherein it can be expressed as ρ(t, t ) = N j=1 α j (t)β j (t ) for some finite integer N and set of functions {α j , β j }, then the spectrum has at most N non-zero elements [51]. Consequently, the memory states can be stored within an N -dimensional space. However, the general form of ρ as per Eq. (4) does not readily present as a degenerate kernel, and indeed, exact quantum models of renewal processes often require an infinite-dimensional memory space. Nevertheless, the amount of information retained in the memory about the past of the process typically appears to be finite [37], suggesting many of these dimensions are barely utilised and motivating the pursuit of a lossy -yet still near-exact -compression method. A suggestive path to such compression is to truncate ρ by removing the dimensions corresponding to elements of its spectrum that are sufficiently small (as the {λ} represent the occupation probabilities of the eigenstates of ρ). However, this impacts upon the transition structure of the model, rendering it non-physical. An approach with greater finesse is needed, which we now provide.

C. Exponential sums and lossy compression
Rather than taking an existing exact model and introducing lossy distortion to effect compression, we will instead construct a distortion of the underlying process that is amenable to simulation by a model with a memory of low dimension. The intent is that the exact model of the distorted process forms a near-exact, compressed model of the original process.
This requires us to identify what features the wait-time distribution must possess to permit a finite-dimensional exact model. In other words, to identify what the constraints on φ(t) are such that it will lead to ρ(t, t ) taking the form of a degenerate kernel. Let us begin by introducing the kernel κ(t, t ) : It then follows that the spectrum of κ(t, t ) is { λ/µ}, and is thus of the same rank as ρ(t, t ) [51]. This reduces the problem to identifying the conditions under which κ(t, t ) is a degenerate kernel. These are then the processes for which we can express ψ(t) as a finite sum of functions F j (t) that satisfy F j (t + t ) = α j (t)β j (t ). We can readily identify the appropriate functions as being (complex) exponentials, i.e., F j (t) = c j exp(z j t) for some (c j , z j ) ∈ C 2 . Thus, for ψ(t) = N j=1 c j exp(z j t) we correspondingly have at most N non-zero eigenvalues of the kernel κ(t, t ).
Though we began by assuming ψ(t) is real, if we allow it to be complex we instead have φ(t) = |ψ(t)| 2 , and Notice that even when ψ(t) = N j=1 c j exp(z j t) is complex, it can be verified through direct substitution that ρ(t, t ) remains a degenerate kernel of at most rank N . Thus, with an N -dimensional memory it is possible to model renewal processes for which Let us decompose z j := −γ j + iω j for (γ j , ω j ) ∈ R 2 . For φ(t) to be a valid distribution it must be normalisable to unity, and thus we can constrain γ j ∈ R + . The complex exponentials exp(−zt) form an overcomplete basis into which any piecewise continuous function of finite exponential order can be decomposed, where the overlap of the function with the basis elements are described by its Laplace transform. Thus, for any ψ(t) that is piecewise continuous and of finite exponential order we can express the corresponding wait-time distribution in the form of Eq. (6), albeit with N not necessarily finite.
Nevertheless, this provides a constructive approach to finding lossy compressions for quantum models of renewal processes. The goal is to find exponential sums with a finite number of terms that provide a high-fidelity approximation to ψ(t). In practice, it has been found that such decompositions can achieve accurate reconstructions of a function with a relatively small number of terms. Moreover, there are systematic approaches to obtaining such decompositions. 2 From the decomposition we are then able to build an exact model of the approximate process, to effect a near-exact model of the original process.
The last step remaining is to find an explicit encoding of the memory states of the approximate model into a finite-dimensional memory space. Beginning from a (normalised) approximate decompositionψ(t) = N j=1 c j exp((−γ j +iω j )t), we assign N 'generator' states {|ϕ j } and a unitary operatorŨ δt with the evolution 3 U δt |ϕ j |0 = e (−γj +iωj )δt |ϕ j |0 + 1 − e −2γj δt |ς 0 |1 , in analogy with Eq. (3). Here, we have defined which forms the reset state corresponding to t← − 0 = 0, with the rest of the quantum memory states {|ς t← − 0 } implicitly defined by acting U with the ancilla a sufficient number of times, postselected on all measurement outcomes being 0, i.e., |ς nδt = 0|(I ⊗ |0 0|U ) n |ς 0 |0 . Non-normalised, these states can also be expressed The overlaps of the generator states can be obtained [30,38] from the recursive relations ϕ j |ϕ k = ϕ j | 0|Ũ † δtŨ δt |ϕ k |0 , from which we can move from their implicit definition to express them explicitly in terms of an N -dimensional set of orthonormal basis states using a reverse Gram-Schmidt procedure [53]. The relevant columns ofŨ δt are defined implicitly by Eq. (7) and can now readily be expressed explicitly in this basis; the remaining columns can be assigned arbitrarily provided they preserve orthonormality of the basis states (by using e.g., a Gram-Schmidt procedure) [38].
This constructs a lossy compression of the quantum memory states, yielding a near-exact model of the process. The steps are summarised in Algorithm 1.

IV. EXAMPLES
As a demonstration of the efficacy of our quantum compression protocol, we apply it to the modelling of two example renewal processes. For each process we show how the quantum models quickly converge on high-fidelity approximations of the original processes with only a comparatively small memory dimension. Our approximate exponential sums are found using the method of Beylkin and Monzón [52], summarised in Appendix A.
We quantify the goodness-of-fit using a Kolmogorov-Smirnov (KS) statistic [54], which is defined as the maximum difference between points in the the cumulative distribution functions of two probability distributions. This allows us to compare how well discrete distributions approximate continuous distributions, as the cumulative distribution function can be extended over a continuum. That is, let C p (t) = t 0 p(t )dt be the cumulative distribution function of a continuous distribution p(t), and C q (t) = argmax(N |N δt<t) n=0 q(nδt) the continuum form of the cumulative distribution of a discrete distribution q(nδt). The KS statistic is then given by KS(p, q) = max t |C p (t) − C q (t)|. For a renewal process the survival probability Φ(t) = 1 − C φ (t), and so the KS statistic here also corresponds to the maximum difference between the survival probabilities of the exact and approximate processses at any time: Thus, the KS statistic as employed here measures the largest cumulative divergence between the statistics of the approximate model and the exact process.
We compare our quantum models to approximate classical models constrained to a classical memory of the same dimension. These classical models are constructed by discretising the process into finite-sized time-steps and using gradient descent [55] to fit the parameters, taking the KS statistic as a cost function (see Appendix B). While we do not claim this to be the optimal lossy classical compression, we believe it to provide a fair indicator of the potential performance of classical compression methods for this task.

A. Alternating Poisson process
As a first example, we consider an alternating Poisson process. The output can be described by a sequential series of Poisson processes, with an event on these underlying processes alternatively coinciding with events or non-events of the alternating Poisson process (non-events of the Poisson processes also correspond to non-events of the alternating Poisson process). The corresponding wait-time distribution is given by where the rate γ sets an arbitrary scale for units of time. This is the continuous-time analogue of the so-called simple non-unifilar source process [43]. While also appearing simple to generate, it too has no finite-dimensional exact causal classical representation [28]; it is thought that an exact causal quantum model is similarly structurally complex. Using our compression protocol, we observe excellent performance in replicating the statistics of the alternating Poisson process with low-dimensional quantum models. As can be seen in Fig. 3(a), even a single qubit memory provides a close approximation to the exact wait-time distribution, and a two qubit memory is seemingly indistinguishable at the resolution shown. In Fig. 3 compare the perfomance of our coarse-grained quantum models with the classical approximations, as well as a memoryless model. We see that the quantum models bear a KS statistic orders of magnitude smaller than the corresponding classical, and moreover, appear to exhibit a more favourable scaling with increasing memory.

B. Bimodal Gaussian process
For the second example we find compressed models of a bimodal Gaussian process. The wait-time distribution consists of the sum of two displaced Gaussian peaks: As with the previous example, the units of time are arbitrary, and can be set through the σ. We consider the case where the two peaks have equal weight (p 1 = p 2 ) and equal spread (σ 1 = σ 2 ). In units where σ = 1, we then take µ 1 = √ 5 and µ 2 = √ 33.8. This leads to little overlap between the two peaks, requiring a model to be able to capture features at both short and long timescales in order to account for the two regions of high event probability, and the low probability trough between them.
As can be seen in Fig. 4(a), our coarse-grained models struggle to fully capture the features with one and two qubit memories, with the former overweighting the first peak, and the latter the second. With a three qubit memory however, the model closely follows the exact process. This is reflected in the KS statistic [ Fig. 4(b)], where there is a drastic decrease when going from two qubits to three. This is possibly due to the method used here to construct the approximate exponential sum: rather than fixing the maximum allowed number of terms in advance, the method instead constructs a sum with a large number of terms and then afterwards truncates to those with the largest weight. In this case, we find that the terms lost to truncation are not always negligible. This motivates future consideration of alternative methods for constructing approximate exponential sums that begin with the constraint of a maximum allowed number of terms, in order to make best use of the available memory resources. Nevertheless, we still see that our coarse-grained quantum models significantly outperform classical models with only a small number of qubits.

V. COSTLY FEATURES?
We have seen that the quantum compression protocol performs well on the two examples above. However, this begs the question of how well it performs in general, and for which processes it will show the weakest perfomance. Ultimately, the accuracy of the model comes down to how good an approximation the finite exponential sum is of the wait-time distribution -or conversely, the dimension required by the model depends on how few terms are required in the sum to reach a desired precision -as the compressed model will (experimental imperfections aside) provide an exact model of this approximation of the wait-time distribution. In this sense, the performance of our compression protocol comes down to how well the method used to construct an approximate exponential sum performs. For the particular algorithm used in our examples we refer the reader to the discussion in the associated literature [52,56], also noting that they find even better performance in practice than indicated by their bounds.
Nevertheless, we can find a useful heuristic in the information cost of the exact quantum model of the process -once the (logarithm of) the dimension drops below the information cost (i.e., once the capacity of the memory is lower than the information required for exact modelling), the compressed model must throw away useful information, limiting the accuracy it can achieve. Correspondingly, we can expect the performance of the quantum compression to be inversely correlated with the information cost of exact quantum modelling.
We can also deduce the features that would be most stubborn to compress. Consider our discussion above comparing the exponential sum with expressing the function in the Laplace basis. Given that we want our sum to have as few terms as possible, problematic functions are those that are highly-localised, as they have large spread in the Laplace basis. Indeed, the ultimate limit of thisδ-functions -represent deterministic renewal processes; such processes do not allow a quantum advantage even in information cost in exact compression settings [34,37]. In Appendix C we provide a case study of the performance of our quantum compression protocol applied to a series of top-hat wait-time distributions of decreasing width. These processes represent increasingly accurate models of ideal clocks [48,57], and are also similarly difficult for classical compression methods. More generally, processes dominated by such sharp peaks are resistant to quantum compression in the information cost [37], and so can be expected to also present difficulties for methods of compressing the memory dimension such as ours.

VI. DEPLOYMENT WITH GENERAL CONTINUOUS-TIME STOCHASTIC PROCESSES
A. Generalising the protocol Algorithm 1 -our protocol for compressing quantum models of renewal processes can be adapted to compress the temporal aspect of quantum models of general continuous-time processes with multiple modes and events [39,41].
Consider such a process with modes g ∈ G, events x ∈ X and a transition dynamic Λ. The dynamic Λ effects an evolution according to P (X, G |G, T← − 0 ) describing the probability density of an event x occuring accompanied by a transition into mode g in the next infinitesimal interval dt given the system is currently in mode g with time t← − 0 since the last event. Following the corresponding literature on memory-minimal classical models [41] we assume a HSMM representation of the process where the subsequent mode is uniquely determined by (g, x) -independent of t 0 . This is a slightly stronger condition than strictly necessary for the model to be causal, and we discuss its relaxation later.
Along with the modal wait-time distributions xg P (x, g |g)φ x g g (t) we can define a corresponding modal survival probability Φ g (t) = xg ∞ t P (x, g |g)φ x g g (t )dt [39]. From these one can then define a set of quantum memory states {|ς gt← − 0 } and evolution U δt such that 4 We are now in a position to now generalise Algorithm 1 for such processes. It transpires that this is for the most part simply a case of repeating the steps for renewal processes multiple times for each of the dwell functions.
To generalise Steps 1 and 2, we define a function ψ x g g (t) := φ x g g (t) for each of the dwell functions, and analogous to the case of renewal processes, approximate each of them by finite exponential sumsψ x g g (t): Generalising Steps 3 to 5, we then similarly use these to construct a set of generator states {|ϕ g gx j }, again defined implicitly in terms of an evolution operator: Here we have analogously defined memory states as linear combinations of these generator states: (14) This implicit definition can be used to determine the overlaps of the generator states, from which a reverse Gram-Schmidt procedure can be used to express them explicitly in terms of (at most) N |X ||G| orthonormal basis states 5 . In turn, the evolution operators and memory states may be expressed in this basis, completing the protocol.
We remark on a useful feature of this compressionthat the modal wait-time distributions maintain their structure as a product of symbolic dynamics and a temporal component -with only this latter factor modified. That is, the compressed quantum models have the statistics of a process with the same transition topology, but now with modal wait-time distributions xg P (x, g |g)φ x g g (t), whereφ x g g (t) = |ψ x g g (t)| 2 . This distortion introduces errors only in terms of the times when events occur, and not the probability with which they occur. Moreover, the product structure entails that the distortion in the statistics of the compressed quantum model is no greater than the worst of the distortions of the φ x g g (t), and that the errors in each inter-event interval are independent. Thus, the performance of the protocol seen in the renewal process examples will still hold in this generalised setting. The memory of the resultant quantum model will be compressed to at most N |X ||G| dimensions.

B. Example
As an illustration of how the general case is little more than a straightforward application of Algorithm 1 for multiple times, we apply it to an example process consisting of dwell functions that are based on the examples above. Specifically, the process has two modes g A , g B and two possible events x, y, with the dwell function of both modes corresponding to an alternating Poisson process for event x and a bimodal Gaussian process for event y, and a transition structure such that the mode changes on event x and remains constant on event y; the probabilities of each event is different for the two modes. This is depicted as a HSMM in Fig. 5(a).
We measure the error in the accuracy as the average KS statistic, where the average is taken over events (for simplicity we scale all dwell functions to have the same average mean firing rate, such that this also essentially corresponds to the average over time). That is, the average KS statisticKS := xgg KS(φ x g g (t),φ x g g (t))P (x, g). It is possible to apply the KS statistic in this way as the errors are constrained to a single interevent interval, and there is no crossover of errors between the dwell 5 It is possible that some generator states may be linear combinations of those belonging to other modes. This does not break the protocol, though will result in some memory dimensions being left unused. It may then be possible to use these dimensions to incorporate additional terms into the approximate exponential sums to increase their accuracy. functions of different events. Moreover, we need not calculate the approximations of the dwell functions anewthe approximations (and corresponding errors) found in Section IV are the very same approximations needed. In Fig. 5(b) we plot this for the full (p, q) parameter range for N = 8 (requiring 32 memory dimensions in total). Of note are the limits p = q = 0 (corresponding to only alternating Poisson process) and p = q = 1 (corresponding to only bimodal Gaussian process) where the errors take on their minimum and maximum respectively, matching with those found for the renewal processes, while the error for the remainder of the parameter space interpolates between these two limits. Note that we neglect the extra dimensions made available by linear dependencies of generator states at the exceptional parameter regimes p = 0, 1, q = 0, 1, and p = q.
C. Scope for improvement?
Above, we have followed the classical condition that the HSMM representation is such that the symbol and current mode alone determine the next mode. Yet, the quantum models described in Eq. (11) still function correctly -and remain causal -with only the weaker condition on the HSMM representation that the triple (g, x, t 0 ) suffices to determine the subsequent mode. That is, emission of a given symbol from a given mode can result in a transition to two (or more) possible different modes, provided that also knowing the time spent in the current mode then provides sufficient information to determine the next mode. That is, the classical convention requires H(G |G, X) = 0, while the exact quantum models assume only that H(G |G, X, T 0 ) = 0 (here, H(.) is the Shannon entropy [58]). An example of such a transition satisfying only the weaker condition is illustrated in  However, in the case where only this weaker condition holds, there can be interference between the generator states corresponding to transitions with the same symbol and initial mode, but different end mode. This manifests from errant overlaps of the approximate dwell functions φ x g g (t): while ∞ 0 φ x g g (t)φ x g g (t)dt = 0∀x, g, g , g = g , this may not hold true for theφ x g g (t). That is, there may be times t for whichφ x g g (t) andφ x g g (t) (g = g ) are simultaneously non-zero, violating the condition H(G |G, X, T 0 ) = 0. Such a violation cannot occur under the stronger classical condition, as we are already guaranteed there is at most one g for each pair (g, x) for which φ x g g (t) (and thusφ x g g (t)) is not zero everywhere. As an example, consider a process where the dwell time associated with mode g and event x is uniformly distributed over the interval [0, τ ], with the system transitioning into mode g if the dwell time is less than τ /2, and into g if it is greater than (or equal to) τ /2. Then, φ x g g (t) is a uniform distribution over [0, τ /2), and φ x g g (t) a uniform distribution over [τ /2, τ ]. When we parse this through the compression protocol, the approximated distributions have a non-zero overlap, and so have interfering probability amplitudes. This is illustrated in Fig. 6(b) for N = 16.
This interference requires us to modify the symbolic transition probabilities P (X, G |G) to an approximate formP (X, G |G) in order to appropriately normalise the memory states, which will correspondingly distort the transition structure. In particular, it can result in the model transitioning to superpositions of memory states, manifesting new (potentially infinitely many) effective modes. While these effective modes do not require additional memory dimensions to track (as they are linear combinations of existing memory states), they do allow for a gradual accumulation of errors over time, as the errors are now able to propagate across multiple inter-event intervals. A further complication is presented in the freedom of choice in how to actually assignP (X, G |G) to enforce proper normalisation -while a simple rescaling of P (X, G |G) would work, it is also possible to achieve this with an uneven rescaling, which may result in greater accuracy by offsetting the effect of the interference.
Note that the magnitude of the interference scales with the overlaps of the memory states for each mode -and hence the overlaps of their statistics: thus, the more distinguishable the statistics of the modes, the smaller the distortion. Further, as noted above, with the stronger condition imposed on classical models these overlaps cannot occur, and thus when compressing a given such classical model we can sidestep such interference. Nevertheless, embracing this weaker condition may unlock even greater compression potential; we leave the optimisation of theP (X, G |G) in such settings as an open question for future work.

VII. DISCUSSION
We have introduced a lossy compression protocol for the quantum modelling of stochastic temporal dynamics. By harnessing non-classical features of quantum state spaces -namely, that sets of quantum states can be at once linearly-dependent and non-degenerate -an effective coarse-graining of the state space inhabited by a quantum memory can be realised. This achieves a much greater compression than is possible with analogous classical methods and exact quantum compression alike. The relaxation from exact to near-exact replication naturally fits into applications where the dynamics of the system to be modelled have been inferred through observation [59,60], and are thus already an approximation of the true dynamics. This also brings the additional benefit of placing less demand on the precision of the quantum processor implementing the simulation, which in current realistic settings should not be assumed noiseless.
Going forwards, our work encourages the development of similar lossy compression beyond tracking the temporal component of stochastic processes. For example, the framework can be applied to compress quantum clocks [48,57], and motivates the extension to other models with continuous state spaces, such as belief spaces [61,62] used in reinforcement learning [63]. Further avenues include development of analogous methods for compressed modelling of purely symbolic dynamics and input-output processes [64].
Furthermore, in spite of the significant compression ad-vantage offered by our protocol, it is by no means optimal. Two aspects we foresee as presenting opportunities for enhancing the compression are in the choice of algorithm for constructing an approximate exponential sum, and in allowing for more general complex ψ(t) to be considered. Pursuing the former of these may allow for more faithful approximations of the wait-time distribution without increasing the number of allowed states.
In the latter we have a family of functions we can attempt to approximate, and we need only take the one which we can most faithfully represent. In the case of general continuous-time processes, the question remains open how to best handle cases where the classical condition that the dynamics factor into a product of temporal and symbolic dynamics does not hold. Further improvements in this regime may also be found by taking a more holistic approach that coarse-grains the Hilbert space in terms of symbolic and temporal dynamics simultaneously. Nevertheless, even in this initial foray, we see the potential for drastic improvement over classical techniques. Moreover, the high fidelities reached with comparatively few dimensions places it well within reach of current and near-term small-scale quantum processors with only a handful of qubits [33,65], offering exciting prospects for imminent experimental realisations.
Appendix A: Approximate exponential sums In Algorithm 1, Step 2 requires that we construct an exponential sum approximating the square root of the wait-time distribution. Here, we use the method of Beylkin and Monzón [52], summarised in Algorithm 2.

Algorithm 2
Approximate exponential sum [52] Inputs: Exact function ψ(t), with the domain scaled such that the region to be approximated is the interval [0, 1], target precision . Outputs: Set of triples {(cj, γj, ωj)} yielding approximate decomposition ψ(t) ≈ N j=1 cj exp((−γj + iωj)t). For our purposes, to obtain an N term approximate sum we keep only the triples with the N largest magnitudes for weights {|c j |}. Prior to this we also discard any terms with non-positive γ j (to ensure a valid quantum state of the form Eq. (2) can be constructed), and rescale the weights by a constant factor to ensure the sum has unit L 2 norm. By varying the precision we obtain different decompositions, with truncation to fewer terms favouring larger , and conversely, larger number of terms performing better with smaller . In our examples, we took M = 1000 and varied to find the most accurate decomposition for each N (according to the KS statistic), ultimately using values in the range 10 −12 to 10 −1 . We used GNU Octave's roots function [66] to numerically solve the polynomials, and c = (V T V ) −1 V T h to solve the overconstrained Vandermonde system.

Appendix B: Lossy classical compression method
As shown in Fig. 2, the transition structure between the memory states of the ε-machine of a renewal process takes the form of an incrementing counter that resets upon events. A finite-dimensional approximation must adopt the structure of Fig. 7, where the counter progresses up to a terminal state, upon which it loops back to an earlier state [32]. The variable parameters are the transition probabilities {p j }, the timestep size ∆t, and the target state of the loop R.
The optimal lossy classical compression at fixed dimension is found by minimising the associated cost function over all possible choices of these parameters. We use a standard gradient descent-based approach to seek the minimum of the KS statistic. For each possible choice of loop state R, we generate W seeds of random parameters for ({p j }, ∆t) and run S steps of update according to p j → p j − η p ∇ j D({p j }, ∆t) and ∆t → ∆t − η t ∇ t D({p j }, ∆t), where D is the KS statistic (with hard constraints to ensure the parameters remain physical). We then keep the final parameter set that reached the minimum value of D across all choices of loop state and seeds. As with the quantum method we rescaled the wait-time distribution to the domain [0, 1], and for purposes of numerical evaluation discretised it into 1000 steps. We again remark that we do not claim this method to necessarily yield the very optimal lossy classical compression at fixed dimension, but simply that it should offer a ballpark figure as to its performance. That is, we believe it is reasonable to expect the optimal classical compression will not perform significantly better than the explicit examples we find here.
We generated the initial seeds for {p j } uniformly in the interval [0, 1], and ∆t exponentially decaying. For the al- ternating Poisson process we found best performance by taking learning rates η p = 10 −4 and η t = 10 −8 , with gradients estimated over discrete intervals δp = 10 −3 and δt = 10 −4 . Empirically, the descents appeared to converge on a minimum within S = 12500N steps, and running more than W = 1000 seeds for each loop state did not seem to yield any improved minima. For the bimodal Gaussian process we found best perfomance with much the same parameters; a slight improvement was found by increasing the learning rates by a factor of 10 for the first 1250N steps of descent, whereupon convergence was reached within S = 6250N steps. positioned to coincide with the time where Φ(t) = 0.5. In Fig. 9 we compare wait-time distributions and survival probabilities of the compressed four-qubit quantum models to their exact counterparts for each of the widths. As the width narrows, the periodicity of the approximate distributions can be seen, due to competition between surpressing these spurious peaks with the exponential decay, and the need to not supress the modelled peak.