Causal Asymmetry in a Quantum World

Causal asymmetry is one of the great surprises in predictive modelling: the memory required to predict the future differs from the memory required to retrodict the past. There is a privileged temporal direction for modelling a stochastic process where memory costs are minimal. Models operating in the other direction incur an unavoidable memory overhead. Here we show that this overhead can vanish when quantum models are allowed. Quantum models forced to run in the less natural temporal direction not only surpass their optimal classical counterparts, but also any classical model running in reverse time. This holds even when the memory overhead is unbounded, resulting in quantum models with unbounded memory advantage.

How can we observe an asymmetry in the temporal order of events when physics at the quantum level is time-symmetric?The source of time's barbed arrow is a longstanding puzzle in foundational science [1][2][3][4].Causal asymmetry offers a provocative perspective [5].It asks how Occam's razor -the principle of assuming no more causes of natural things than are both true and sufficient to explain their appearances -can privilege one particular temporal direction over another.That is, if we want to model a process causally -such that the model makes statistically correct future predictions based only on information from the past -what is the minimum past information we must store?Are we forced to store more data if we model events in one particular temporal order over the other (see Fig. 1)?
Consider a cannonball in free fall.To model its future trajectory, we need only its current position and velocity.This remains true even when we view the process in reverse-time.This exemplifies causal symmetry.There is no difference in the amount of information we must track for prediction versus retrodiction.However this is not as obvious for more complex processes.Take a glass shattering upon impacting the floor.In one temporal direction, the future distribution of shards depends only on the glass's current position, velocity and orientation.In the opposite, we may need to track relevant information regarding each glass shard to infer the glass's prior trajectory.Does this require more or less information?This potential divergence is quantified in the theory of computational mechanics [6].It is not only generally non-zero, but can also be unbounded.This phenomenon implies A stochastic process can be modeled in either temporal order.(a) A causal model takes information available in the past x and and uses it to make statistically accurate predictions about the process' conditional future behaviour P ( X| X = x).(b) A retrocausal model replicates the system's behaviour, as seen by an observer who scans the outputs from right to left encountering Xt+1 before Xt.Thus it stores relevant future information x, in order to generate a statistically accurate retrodiction of the past P ( X| X = x).Causal asymmetry implies a non-zero gap between the minimum memory required by any causal model C + , and its retrocausal counterpart C − .a simulator operating in the 'less natural' temporal direction is penalized with potentially unbounded memory overhead, and is cited as a candidate source of time's barbed arrow [5].
These studies assumed that all models are implemented using classical physics.Could the observed causal asymmetry have been a consequence of this classicality constraint?Here, we first consider a particular stochastic arXiv:1712.02368v2[quant-ph] 21 Jul 2018 process that is causally asymmetric.We determine the minimal information needed to model the same process in forward versus reverse time using quantum physics, and prove these quantities exactly coincide.More generally, we present systematic methods to model any causally asymmetric stochastic process quantum mechanically.Critically, the resulting quantum models not only use less information than any classical counterpart, but also any classical model of the time-reversed process.Thus, quantum models can field a memory advantage, that always exceeds the memory overhead incurred by causal asymmetry.Our work indicates this overhead can emerge when imposing classical causal explanations.These result remain true even in cases where causal asymmetry becomes unbounded.

I. BACKGROUND
Framework -Consider a system that emits an output x t governed by some random variable X t at each discrete point in time t.This behaviour can be described by a stochastic process P -a joint probability distribution P ( X, X) that correlates past behaviour, X = . . .X −2 X −1 , with future expectations, X = X 0 X 1 . . . .Each instance of the past x = . . .x −2 x −1 exhibits a conditional future x = x 0 x 1 . . .with probability P ( X = x| X = x).
Suppose that a model for this system can replicate this future statistical behaviour using only H bits of past information.Then this model can be executed by encoding the past x into a state s( x) ∈ S of a physical system Ξ of entropy H, such that repeated application of a systematic action M on Ξ sequentially generates x 0 ,x 1 . . .governed by the conditional future P ( X| X = x).The model is causal if at each instance of time, all the information Ξ contains about the future can be obtained from the past [7].Implementing it on a computer then gives us a statistically faithful simulation of the process' realizations.The simplest causal model for a process P ( X, X), is the model that minimizes H.
The statistical complexity C + is defined as the entropy H of this simplest model -it is the minimal amount of past information needed to make statistically correct future predictions [8,9].This measure is used to quantify structure in diverse settings [10][11][12], including hidden variable models emulating quantum contextuality [13].C + also fields thermodynamic significance, having been linked to the minimal heat dissipation in stochastic simulation and the minimal structure a device needs to fully extract free energy from non-equilibrium environments [14][15][16][17].
Causal asymmetry captures the discrepancy in statistical complexity when a process is viewed in forward versus reverse time [18].Consider an observer that encounters X t+1 before X t .Their observations are characterized by the time-reversed stochastic process where past and future are interchanged, such that Y = . . .X 1 X 0 , while Y = X −1 X −2 . . .and Y t = X −(t+1) .A causal model for the time-reversed process then corresponds to a retrocausal model for the forward process P ( X, X).It generates a statistically accurate retrodiction of the conditional past P ( X| X = x), using only information contained in the future x.The statistical complexity of this time-reversed process C − (referred to as the retrodictive statistical complexity for P) quantifies the minimal amount of causal information we must assign to model P ( X, X) in order of decreasing t.Causal asymmetry captures the divergence ∆C = |C − − C + |.When ∆C > 0, a particular temporal direction is privileged, such that modelling the process in the other temporal direction incurs a memory overhead of ∆C.
Note that the definitions above are entropic measures, and thus take operational meaning at the i.i.d.limiti.e.modelling N instances of a stochastic process with statistical complexity C + requires N C + bits of past information, in the limit of large N .While this is the most commonly adopted measure in computational mechanics, single shot variants do exist.The topological state complexity D + , is particularly noteworthy [8].It captures the minimum number of dimensions (max entropy) Ξ must have to generate future statistics.A single-shot variant of causal asymmetry can thus be defined by the difference ∆D = |D − − D + |, between the topological state complexities of P + and P − .Here, we focus on statistical complexity for clarity.However many of our results also hold in this single-shot regime.We return to this when relevant.
Classical models -Prior studies of causal asymmetry assumed all models were classical.In this context, causal asymmetry can be explicitly demonstrated using ε-machines, the provably optimal classical causal models [8,9].This involves dividing the set of pasts into equivalences classes, such that two pasts, x and x lie in the same class if-and-only-if they have coinciding future behaviour, i.e., P ( X| X = x) = P ( X| X = x ).Instead of recording the entire past, an ε-machine records only which equivalence class x lies within -inducing an encoding function ε : X → S from the space of pasts X onto the space of equivalence classes S = {s i }, known as causal states.At each time-step, the machine operates according to a collection of transition probabilities T x ij : the probability an ε-machine initially in s i , will transition to s j while emitting output x.The classical statistical complexity thus coincides with the amount of information needed to store the current causal state where π i is the probability the past lies within s i .εmachines are also optimal with respect to the max entropy [19], such that the topological state complexity D µ of a process is the logarithm of the number of causal states [8].Despite their provable optimality, ε-machines The ε-machine for the process P + h ( X, X), created by a flipping a biased coin and emitting outcome 2 when H → T , 0 when T → T , and 1 when T /H → H.This process has two causal states s + 1 and s + 0 , where the latter includes all pasts ending in either 0 or 2. (b) The time-reversed process P − ( Y , Y ).Here pasts ending in 0, 1 and 2 now all lead to qualitatively different future behaviour and must be stored in distinct causal states s − 0 , s − 1 and s − 2 respectively which occur with respective probabilities π − 0 = (q − pq)/(p + q), π − 1 = π + 1 = p/(p + q) and π − 2 = pq/(p + q).
still appear to waste memory.The amount of past information they demand typically exceeds the amount the past contains about the future -the mutual information E = I( X, X).Observing an ε-machine's entire future is insufficient for deducing its initial state.Some of the information it stores in the present is never reflected in future statistics and is thus effectively erased during operation.In general, this waste differs between prediction and retrodiction, inducing non-zero causal asymmetry.
Examples -We illustrate this by examples, starting with the perturbed coin.Consider a box containing a single biased coin.At each time-step, the box is perturbed, causing the coin to flip with probability p if it is in heads (0), and q if it is in tails (1).The coin's state is then emitted as output.This describes a stochastic process P + 0 .As only the last output is necessary for generating correct future statistics, P + 0 has two causal states, corresponding to the states of the coin.The statistical complexity h(π + 1 ) thus represents the entropy of the biased coin, where π + 1 = p p+q is the probability the coin is in heads and h(x) = −x log x − (1 − x) log(1 − x) is the binary entropy.Furthermore P + 0 is clearly symmetric under time reversal (i.e., P + 0 = P − 0 ), and thus trivially causally symmetric.
Suppose we post-process the output of the perturbed coin, replacing the first 0 of each consecutive substring of 0s with a 2 (For example, . . .1000110100 . . .becomes . . .1200112120 . ..).This results in a new stochastic process, P + h ( X, X), called the heralding coin P + h , which also has two causal states, In fact, one can model P + h ( X, X) by perturbing the same biased coin in a box, and modifying it to output 2 -instead of 0 -when it transitions from heads to tails (see Fig. 2).Thus the heralding coin also has classical statistical complexity C + µ = h(π + 1 ).Its retrodictive statistical complexity, however, is higher.The time-reversed process P − h ( Y , Y ) represents an alternative post-processing of the perturbed coin -replacing the last 0 in each consecutive substring of 0s with a 2. Now, 0 can be followed by 0 or 2, while 1 can be followed by anything, and 2 can only be followed by 1, inducing three causal states s − j = { y|y −1 = j} (see Fig. 2).
This immediately establishes a difference in the number of distinct configurations needed for causal versus retrocausal modelling.Indeed, P + h fields causal asymmetry where To understand this asymmetry, note that when modelling P + h , we need only know if the previous output was 1 (i.e., current state of the coin) to decide whether a 0 should be replaced by a 2. To model P − h however, one cannot simply look into the 'future' to see if the system will output 1 next.Causal asymmetry thus captures the overhead required to accommodate this restriction.
In general, causal asymmetry can be unbounded.In Appendix D, we describe the class of n-m flower processes, where C + µ scales as O(log n) while C − µ scales as O(log m).n and m can be adjusted independently, allowing construction of processes where ∆C µ > K for any given constant K. Setting m = 2 for example, can yield a process where C + µ can be made arbitrarily high, while C − µ ≤ log 3. When this occurs, the memory overhead incurred for modelling the process in the 'less natural' direction scales towards infinity.
Quantum Models -A quantum causal model is described formally by an ordered tuple Q = (f, Ω, M) where Ω is a set of quantum states; f : X → Ω defines how each past x, is encoded into a state f ( x) = |s x of a physical system Ξ; and M is a quantum measurement process.To model P ( X, X), repeated applications of M on Ξ must generate correct conditional future behaviour.That is, application of M on a system Ξ in state |s x must (i) generate an output x with probability P (X 0 = x| X = x) and (ii) transition Ξ into a new state f ( x ) = |s x where x = xx, such that Lrepeated applications of M will generate x 0 , . . ., x L−1 with correct probability P (X 0:L | X = x) for any desired L ∈ Z + [20].The entropy of a model Q is given by the von Neumann entropy S(ρ) = −Tr(ρ log ρ), where ρ = P ( X = x)|s x s x |.Thus the quantum statistical complexity C + q of a process can be computed by minimizing S(ρ) over all valid models [21].Uq satisfies Uq|0 = √ q|0 + √ 1 − q|1 , and finally CX where X is the Pauli X operator generates a suitable entangled state -such that measuring the first two qubits yields yt (provided we identify measurement outcome 00 → yt = 0, 10 → yt = 1 and 01 → yt = 2), and collapses the remaining qubit into the quantum causal state for the next time step.In either circuit, retaining only the state of Ξ (green circle) at each time-step is sufficient for generating statistically correct predictions or retrodictions.
This optimization is highly non-trivial.There exists no systematic techniques for constructing optimal quantum models, or proving the optimality of a given candidate model.To date, C + q , has only been evaluated for the Ising chain [20].This process, however, is symmetric under time reversal, implying that ∆C µ is trivially zero.Nevertheless recent advances show multiple settings where quantum models outperform optimal classical counterparts [22][23][24][25][26].In fact, for every stochastic process where the optimal classical models are wasteful (i.e., C + µ > E), it is always possible to design a simpler quantum model [22].Indeed, sometimes the quantum memory advantage C + µ − C + q can be unbounded [27].Could quantum models mitigate the memory overhead induced by causal asymmetry?

II. RESULTS
We study this question via two complementary approaches.The first is a case study of the heralding coin -the aforementioned process that exhibits causal asymmetry.We pioneer methods to establish its provably optimal quantum causal and retrocausal models, and thus produce a precise picture of how quantum mechanics mitigates all present causal asymmetry.The second studies quantum modelling of arbitrary processes with causal asymmetry.Here, C + q and C − q cannot be directly evaluated, but can nevertheless be bounded.In doing so, we show that when forced to model such process in the less natural direction, the quantum advantage always exceeds the memory overhead ∆C µ .
The Heralding Coin -Let P + h denote the heralding coin process.Here we first state the optimal quantum models of P + h and P − h .We then outline how their optimality is established, leaving details of the formal proof to Appendix B. The optimal causal model Q + has two internal states; with associated encoding function Given a qubit in state + q ( x), Fig. 3 establishes the sequential proccedure that replicates expected future behaviour, i.e., samples P + h ( X| X = x).Meanwhile the optimal quantum retrocausal model Q − has encoding function − q ( y) = |s − i if-and-only-if y ∈ s − i , where The associated procedure for sequential generation of y as governed by To establish optimality, we first invoke the causal state correspondence: for any stochastic process with causal states {s i } that occur with probability π i , there exists an optimal model Q = ( q , Ω, M), where the elements of Ω are in 1-1 correspondence with {s i } (see Lemma 1 of Appendix A).Since the heralding coin process has two forward causal states, we can restrict our computation of C + q to quantum models where Ω = {|ψ + 0 , |ψ + 1 }.Moreover we can show that the data processing inequality implies Proving the optimality of Q − is more involved.First note the causal state correspondence allows us to consider only candidate models Q = (f, Ω, M) where Ω = {|ψ − k } k=0...2 has three elements.The data processing 1.0 0.5 q p p q q p q p p q q q q q q p p p p p FIG. 4. Complexity of the heralding coin plotted against p and q.The figure illustrates E µ across all values of the parameter space (0 ≤ p, q ≤ 1).(d) depicts the classical causal asymmetry ∆Cµ, and (f) effectively demonstrates C + q = C − q and thus ∆Cq = 0.
inequality can then be used to establish the fidelity con- In Lemma 4 of Appendix B, we prove that for all choices of |ψ − k satisfying the fidelity constraint λ − k majorizes λ k .Thus ρ − has minimal entropy among all valid retrocausal quantum models.
Q + and Q − exhibit different encoding functions (one maps onto two code words, the other onto three), and invoke seemingly unrelated quantum circuits for generating future statistics (see Fig. 3).Nevertheless direct computation yields where c = (p 2 (1 + 4(1 − q)q) − 2pq + q 2 )/(p + q) 2 and h(•) is the binary entropy.Thus ∆C q = 0 for all values of p and q.This establishes our first result: Result 1.There exists stochastic processes that are causally asymmetric (C + µ = C − µ ), but exhibit no such asymmetry when modelled quantum mechanically This vanishing of causal asymmetry at the quantum level is not simply the result of saturating the bound given by E. Fig. 4 shows that µ for almost all values of p and q.While both quantum causal and retrocausal models reduce memory resources beyond classical limits (i.e., C + q < C + µ and C − q < C − µ , see Fig 4 f and g), they each still store some unnecessary information (C + q , C − q > E, see Fig. 4 i).Our results persist when considering minimal dimensions, rather than minimal entropy required for causal modelling.P + h requires only two causal states, and thus can be modeled using a 2-level system (D + µ = log 2).P − h , however, has three causal states.Modelling it thus requires a 3-level system (D − µ = log 3).In contrast, the three quantum causal states of P − h can be embedded within a single qubit, and thus the dynamics of the heralding coin can be modelled using a single qubit in either temporal direction.Therefore this vanishing of causal asymmetry also applies in single shot settings.
General Processes -We now study quantum mitigation of causal asymmetry for general stochastic processes by bounding C + q and C − q from above.Let represent the minimum amount of information we need to classically model P ( X, X) when allowed to optimize over temporal direction.Meanwhile let C max q = max(C + q , C − q ) be the minimal memory a quantum system needs when forced to model the process in the least favourable temporal direction.In Appendix C, we establish the following: Result 2. For any stochastic process P, Equality occurs only if C + µ = C − µ = E, such that P is causally symmetric.
Consider any causally asymmetric process P, such that modelling it in the less favourable temporal direction incurs memory overhead ∆C µ .Result 2 implies that this overhead can be entirely mitigated by quantum models.There exists a quantum model that is not only provably simpler than its optimal classical counterpart, but is also simpler than any classical model of the time-reversed process P − .In Lemma 7 (see Appendix C), we show that such models can be systematically constructed, and align with the simplest currently known quantum modelsq-machines [28,29].As a corollary, causal asymmetry guarantees both C + q < C + µ and C − q < C − µ , i.e., nonzero quantum advantage exists when modelling in either causal direction.
A variant of these results also applies to topological state complexity.Suppose the number of causal states for P and its time-reversal P − differ, such that D + µ = D − µ .Let D + q and D − q respectively be the logarithm of the minimal dimensions needed to model P and P − quantum mechanically.Appendix C also establishes that Result 3.For any stochastic process P, Given there exists stochastic processes where predictive and retrodictive topological complexity differ (e.g. the heralding coin).This immediately implies the following corollary: Result 4. The quantum topological complexity D q can be strictly less than the classical topological complexity D µ .
This solves an open question in quantum modellingwhether quantum mechanics allows for models that simulate stochastic processes using not only reduced memory, but also reduced dimensions.
These results have particular impact when ∆C µ is exceedingly large.Recall that in the case of the n-2 flower process, C min µ ≤ log 3 while C + µ scales as O(log n).Our theorem then implies that C ± q ≤ C min µ ≤ log 3. Thus we immediately identify a class of processes whose optimal classical models require a memory that scales as O(log n), and yet can be modelled quantum mechanically using a single qutrit.

III. FUTURE DIRECTIONS
There are a number of potential relations between causal asymmetry and innovations on the arrow of time, and retrodictive quantum theory.In this section, we survey some of these connections, and highlight promising future research directions.
Retrodictive Quantum Mechanics -Consider the evolution of an open quantum system that is monitored continuously in time.Standard quantum trajectory theory describes how the system's internal state ρ(t) evolves, encapsulating how our expectations of future measurement outcomes update based on past observations.Retrodictive quantum mechanics introduces the effect matrix E(t) -a time-reversed analogue of the density matrix ρ(t) [30][31][32].E(t) propagates backwards through time, representing how our expectations of the past change as we scan future measurement outcomes in time-reversed order.The original motivation was that ρ(t) and E(t) combined yield a more accurate estimate of the measurement statistics at time t than ρ(t) alone, allowing improved smoothing procedures [33][34][35][36].
While this framework and causal asymmetry differ in motivation and details (e.g.monitoring is done in continuous time, whereas we have so far only considered discrete time), there are also notable coinciding concepts.The standard propagation equation for ρ(t) parallels a causal model for observed measurement statistics, while its time-reversed counterpart governing E(t) parallels a corresponding retrocausal model.It would certainly be interesting to see if such systems exhibit either classical or quantum causal asymmetry.For example, does the resource cost of tracking E(t) differ from that of ρ(t) under some appropriate measure [37]?
Answering these questions will likely involve significant extensions of current results.Our framework presently assumes the process evolves autonomously, and that time is divided into discrete steps.These restrictions will need to be lifted, by combining present results with recent generalizations of classical and quantum computational mechanics to continuum time [38,39] and input-dependent regimes [16,40,41].More generally, such developments will enable a formal study of causal asymmetry in the quantum trajectories formulation of open quantum systems.
Arrow of Time in Quantum Measurement -Related to such open systems are recent proposals for inferring an arrow of time from continuous measurement [42].These proposals consider continuously monitoring a quantum system initialized in state ρ i , resulting in a measurement record r(t) with some probability P [r(t)|ρ i ].Concurrently, the state of the system evolves through a quantum trajectory ρ(t), into some final configuration ρ(T ) = ρ f .The goal is to identify an alternative sequence of measurements, such that for at least one possible outcome record r (t) occurring with non-zero probability P [r (t)|ρ f ], the trajectory rewinds.That is, a system initially in state ρ f will evolve into ρ i , passing through all intermediary states in time-reversed order.An arrow of time emerges as P [r(t)|ρ i ] and P [r (t)|ρ f ] generally differ, such that one of the two directions occurs with greater probability.An argument via Bayes' theorem then assigns different probabilistic likelihoods towards whether ρ(t) occurred in forward or reverse time.
This framework provides a complementary perspective to our results.It aims to reverse the trajectory of the system's internal state ρ(t), placing no constraints on the relation between the measurement statistics governing r(t) and r (t).In contrast, causal asymmetry deals with reversing the observed measurement statistics (as described by some stochastic process P), while placing no restrictions on the internal dynamics of the causal and retrocausal models (the two models may even field different Hilbert space dimensions, such as in the heralding coin example).
We also observe some striking parallels.Both works start out with some sequential data, but no knowledge about whether the sequence occurred in forward or reverse time.Both ask the following question: Is there some sort of asymmetry singling out one temporal direction over the other?In the emerging arrow of time from quantum measurement, we are given a trajectory ρ(t), and asymmetry arises from the difficulty (in terms of success probability) of realizing this trajectory in forward versus reverse time.Meanwhile, in causal asymmetry, we are given the observed measurement statistics, and an arrow of time arises from the difference in resource costs needed to realize these statistics causally in forward versus reverse time.It would then be interesting to see if a similar argument via Bayes' theorem can be adapted to causal asymmetry.Supposing more complex machines are less likely to exist in nature (e.g.due to dimensional or entropic constraints), could we then argue whether a given stochastic process is more likely to occur in one causal direction versus the other?

IV. DISCUSSION
Causal asymmetry captures the memory overhead incurred when modelling a stochastic process in one temporal order versus the other.This induces a privileged temporal direction when one seeks the simplest causal explanation.Here we demonstrate a process where this overhead is non-zero when using classical models, and yet vanishes when quantum models are allowed.For arbitrary processes exhibiting causal asymmetry, we prove that quantum models forced to operate in a given temporal order always require less memory than classical counterparts, even when the latter are permitted to operate in either temporal direction.The former result represents a concrete case where causal asymmetry vanishes in the quantum regime.The latter implies that the more causally asymmetric a process, the greater the resource advantage of modelling it quantum mechanically.
Our results also hold when memory is quantified by max entropy.They thus establish that quantum mechanics can reduce the dimensionality needed to simulate a process beyond classical limits.Indeed our results isolate families of processes whose statistical complexity grows without bound, but can nevertheless be modelled exactly by a quantum system of bounded dimension.These features make such processes ideal for demonstrating the practical benefits of quantum models -allowing us to verify arbitrarily large quantum advantage in single-shot regimes [19,43], and avoiding the need to measure von Neumann entropy as in current state of the art experiments [24].
One compelling open question is the potential thermodynamic consequences of causal asymmetry.In computational mechanics, C + µ has thermodynamical relevance in the contexts of prediction and pattern manipulation [14][15][16][17]44].For instance, the minimum heat one must dissipate to generate future predictions based on only past observations is given by W where k B is Boltzmann's constant, T is the environmental temperature, and the excess entropy E is symmetric with respect to time-reversal.Therefore, non-zero causal asymmetry implies that flipping the temporal order in which we ascribie predictions incurs an energetic overhead of ∆W diss = k B T ∆C µ .In processes where ∆C µ scales without bound, this cost may become prohibitive.Could our observation that ∆C q ≤ C min µ imply such energetic penalties become strongly mitigated when quantum simulators are taken into account?
A second direction is to isolate what properties of quantum processing enable it to mitigate causal asymmetry.In Appendix C, we establish that all deterministic processes are causally symmetric, such that C ± µ = C ± q = E (see Lemma 6 of Appendix C).Randomness is therefore essential for causal asymmetry.Observe also that the provably optimal quantum causal and retrocausal models for the heralding coin both operated unitarily -such that their dynamics are entirely deterministic (modulo measurement of outputs).Indeed, such unitary quantum models can always be constructed [29], and we conjecture that this unitarity implies causal symmetry.However, it remains an open question as to whether the optimal quantum model is always unitary.
Insights here will ultimately help answer the big outstanding question of whether the quantum statistical complexity ever displays asymmetry under time-reversal.Identifying any process for which such asymmetry persists implies that Occam's preference for minimal cause can privilege a temporal direction in a fully quantum world.Proof that no such process exists would be equally exciting, indicating that causal asymmetry is a consequence of enforcing all causal explanations to be classical in a fundamentally quantum world.
Acknowledgements -The authors appreciated the feedback and input received from: Yang Chengran, Suen Whei Yeap, Liu Qing, Alec Boyd, Varun Narasimhachar, Felix Binder, Thomas Elliott, Howard Wiseman, Geoff Pryde, Nora Tischler, Farzad Ghafari and Chiara Marletto.This work was supported by, the National Research Foundation of Singapore and in particular NRF Awards NRF-NRFF2016-02, NRF-CRP14-2014-02 and RF2017-NRF-ANR004 VanQuTe, the John Templeton Foundation grants 52095 and 54914, Foundational Questions Institute grant FQXi-RFP-1609 and Physics of the Observer grant No. FQXi-RFP-1614, the Oxford Martin School, the Singapore Ministration of Education Tier 1 RG190/17 and the U. S. Army Research Laboratory and the U. S. Army Research Office under contracts No. W911NF-13-1-0390, No. W911NF-13-1-0340, and No. W911NF-18-1-0028.Much of the collaborative was also made possible by the 'Interdisciplinary Frontiers of Quantum and Complexity Science' workshop held in January 2017 in Singapore, funded by the John Templeton Foundation, the Centre for Quantum Technologies and the Lee foundation of Singapore.

Appendix A: Technical Definitions
We first introduce further technical notation and background that will be used for subsequent proofs.
Definition 1 (Quantum Causal Model).Consider an ordered tuple Q = (f, Ω, M) where Ω is a set of quantum states; f : X → Ω is an encoding function that maps each x onto a state f ( x) = |s x of a physical system Ξ; and M is a quantum process.Q is a quantum model for P ( X, X) if-and-only-if for any x ∈ X , whenever Ξ is prepared in f ( x) subsequent application of M: (i) generates an output x with probability P (X 0 = x| X = x) and (ii) transitions Ξ into a new state f ( x ) = |s x where x = xx [20].
Condition (i) guarantees that if a quantum model is initialized in state f ( x) then the model's future output X 0 = x will be statistically indistinguishable from the output of the process itself.(ii) ensures the internal memory of the quantum model updates to record the event X 0 = x, allowing the model to stay synchronized with the sequence of outputs it has generated thus far.Thus a series of L repeated applications of M acting on Ξ, generates output x 0:L = x 0 . . .x L−1 with probability P (X 0:L = x 0:L | X = x), and simultaneously transitions Ξ into the state f ( xx 0:L ).In the limit L → ∞, the model produces a sequence of outputs x = x 0 x 1 . . .with probability P ( X| X = x).
The entropy of a quantum model Q is given by where S(•) is the von Neumann entropy, ρ = x π x ρ x for ρ x = |s x s x |, and π x = P ( X = x).
Definition 2. Q is an optimal quantum model for a process P ( X, X), if given any other model Q , we have Consider a stationary stochastic process P ( X, X), such that P (X 0:L ) = P (X t:t+L ) for any L ∈ Z + , t ∈ Z.Let P ( X, X) have causal states S = {s i } each occurring with stationary probability π i .Define the conditional distribution P i ( X) = P ( X| X = x ∈ s i ) as the future morph of causal state s i .We will make use of the following two results derived in [20].
Lemma 1 (Causal state correspondence).Let P ( X, X) be a stochastic process with causal states {s i }.There exists an optimal model Q = ( q , Ω, M) where Ω = {|s i } and q ( x) = |s i if-and-only-if x ∈ s i .This implies that we can limit our search for optimal models Q = (f, Ω, M), to those whose internal states Ω = {|ψ i } are in one-to-one correspondence with the classical causal states.In addition, it can be shown that Ω must satisfy the following constraint: Lemma 2 (Maximum fidelity constraint).Let P ( X, X) be a stochastic process with causal states {s i }, and Q = (f, Ω, M) be a valid quantum model satisfying f is the fidelity between the future morphs of s i and s j .
These definitions assume that all elements of Ω are pure.This is because computational mechanics considers only causal models -models whose internal states do not store more information about the future than what is available from the past.Specifically, let R be a random variable governing the state of a model at t = 0. I(R, X| X) is then known as the oracular information, and represents the amount of extra information R contains about the future X that is not contained in the past X.For causal models, I(R, X| X) = 0 [45].In Appendix E, we show that this allows us to assume all elements of Ω are pure without loss of generality.

Appendix B: Proofs of Optimality
Here, we formally prove that the quantum models for the heralding coin given in Eq. ( 3) and Eq. ( 4) are optimal.
1. Optimality of the Causal Model.

Let P +
h denote the heralding coin process, with corresponding ε-machine depicted in Fig. 2(a).
Let P − h denote the time reversal of the heralding coin process, with corresponding ε-machine in Fig. 2(b).
Ω − = {|s − i }, and the measurement process M − given in Fig. 3(b).Q − is an optimal quantum model for P − h .
Below we break the proof of this theorem down into a series of small steps.Each step is phrased as a lemma.
Then, up to a unitary rotation, Our models described in Eq. ( 4) can be obtained by setting r sin θ e iω = √ q and √ 1 − r 2 e iα = √ 1 − q in Eq. (B4) (i.e. this corresponds to choosing 2 ).The subsequent lemma then establishes that this is the optimal choice.Lemma 4. For any quantum model That is, Q − , as described by Eq. (4), is the lowest entropy (optimal) model which satisfies the causal state correspondence.

By Lemma 1, P −
h has an optimal quantum model which satisfies the causal state correspondence.Meanwhile by Lemma 4, any Q satisfying the causal state correspondence must have C q (Q) ≥ C q (Q − ).It follows that Q − is an optimal quantum model for P − h .
Appendix C: Proof of Result 2 Here we prove Result 2. To do this, we require some preliminary lemmas.The first connects the capacity for quantum models to improve upon their optimal classical counterparts with causal asymmetry.Lemma 5.If the classical and quantum statistical complexity of a process P coincide, such that C + q = C + µ , then P is causally symmetric and C + µ = C − µ = E. Proof.We first make use of the prior results showing that whenever classical models waste information, more efficient quantum models exist [22].Specifically , where S −1 is the random variable governing the causal state at t = −1 [9].Thus given x we can find a unique s i such that P ( X = x| X = x) is only non-zero when x ∈ s i .It follows that the sets τ i = { x|P i ( X = x) = 0} form a partitioning on the space of all futures (i.e.τ i ∩ τ j = ∅ for i = j).
Furthermore any two x, x ∈ s i satisfy P ( X| X = x) = P ( X| X = x ), by definition of s i .Thus Bayes' theorem implies that the τ i partition the future into equivalence classes x ∼ x if-and-only-if P ( X| X = x) = P ( X| X = x ) [49].Hence {τ i } constitute the retrocausal states.Bayes' theorem also yields P ( X = x| X = x ∈ τ i ) = 0 only when x ∈ s i .This implies H(S − −1 | X = x) = 0, where S − −1 governs the retrocausal state at time t = −1.Hence C − µ = E, which is a contradiction.It follows as a direct corollary of this result that causal asymmetry vanishes for deterministic processes (i.e.processes where H( X| X) = 0).Lemma 6.Any deterministic process P ( X, X) has ∆C µ = 0.

Proof. Any deterministic process has
q and thus according to the above lemma ∆C µ = 0.
Our next lemma makes use of q-machines [28], the simplest currently known quantum models.Consider a process P = P ( X, X) whose classical ε-machine has a collection of causal states S = {s i } and transition probabilities T x ij .Let k denote the cryptic order of P ( X, X), defined as the smallest l such that H(S l |X 0:∞ ) = 0 [23,28,50].The q-machine of P has internal states |S i defined by a recursive relation Conversely, suppose max(C + q , C − q ) = min(C + µ , C − µ ).Without loss of generality, we can assume C + µ ≤ C − µ .This implies either (i) C + q = C + µ or (ii) C − q = C + µ .In the case of (i), direct application of Lemma 5 implies C + µ = C − µ = E.In the case of (ii) we have C That is, qmachines are not more efficient than ε-machines in modelling P.This is true if-and-only-if C + q = C + µ [22,23].Thus Lemma 5 again implies C + µ = C − µ = E.This completes the proof.

Appendix D: n-m flower process
The family of n-m flower processes demonstrate how causal asymmetry can be potential unbounded (see Fig. 5).The process has statistical complexity C + µ = 1 + Thus ∆C µ also diverges to infinity.A similar divergence is witness for topological state complexity.
Applying Result 2, we see that C + q and C − q are both bounded above by log 3. The same is also true for D + q and D − q .Thus, quantum models of this process can fit within a single qutrit, whether modelling in forward or reverse time.In the specific case of the former, C + µ and D + µ diverge to infinity.Thus, we obtain a family of processes whose quantum models field unbounded memory advantage -in both the entropic and single-shot sense.

Appendix E: Excluding Mixed State Models
In this section we consider more general causal models Q = (f, Ω, M) which have the freedom to encode pasts into mixed quantum states.We show that this does not allow for models which are more optimal than those which only encode pasts into pure quantum states.
Theorem 3. Consider a stochastic process P ( X, X), with a causal model Q = (f, Ω, M).If the internal states of Q are mixed, such that f ( x) = i q i ( x)|ψ x i ψ x i |, then we can always find a causal model Q = (f , Ω , M ) such that f ( x) = |s x s x |, and C q (Q ) ≤ C q (Q).
Proof.Let P ( X, X) have causal states S = {s i }.Suppose Q = (f, Ω, M) is an optimal causal model for P ( X, X), with mixed internal states.
It is trivial to generalize the causal state correspondence to mixed state models.Thus we can assume that Q has an encoding function where if-and-only-if x ∈ s i .So that the internal states Ω = {ω i } are in 1-1 correspondence with the classical causal states.
Our proof makes use of the requirement that causal models store no oracular information, i.e.I(R, X| X) = x P ( x)I(R, X| X = x) = 0 where R is the random variable governing the memory.
Regrouping the pasts into causal state equivalence classes yields si∈S π i I(R, X| x ∈ s i ) = 0, where π i is the probability the past belongs to s i .Thus I(R, X| x ∈ s i ) = 0 for every s i ∈ S.
We have assumed some elements of Ω are mixed.In particular, suppose we have a specific ω i = ω ∈ Ω with S(ω) > 0 that occurs with probability π i = π.Let x be a particular past such that f ( x) = ω, and Ψ = {|ψ k } be a set of pure states that form an unravelling of ω.I.Fe.there must exist some q k ∈ [0, 1] such that ω = k q k |ψ k ψ k |.Now let O M be a quantum process that maps ω to a classical random variable X governed by probability distribution P ( X| X = x).By definition of a quantum model, this process can always be constructed by concatenations of M acting on a physical system Ξ.
Let A represent the state of Ξ and B be the random variable that governs the resulting output of O M acting on Ξ. Zero oracular information implies that A and B must be uncorrelated when conditioned on observing past
If any of the states in Ω are still mixed, then by repeating the above procedure we can replace them with pure states, thereby constructing a model Q = (f , Ω , M) with pure internal states such that C q (Q ) ≤ C q (Q) FIG.1.A stochastic process can be modeled in either temporal order.(a) A causal model takes information available in the past x and and uses it to make statistically accurate predictions about the process' conditional future behaviour P ( X| X = x).(b) A retrocausal model replicates the system's behaviour, as seen by an observer who scans the outputs from right to left encountering Xt+1 before Xt.Thus it stores relevant future information x, in order to generate a statistically accurate retrodiction of the past P ( X| X = x).Causal asymmetry implies a non-zero gap between the minimum memory required by any causal model C + , and its retrocausal counterpart C − .

FIG. 3 .
FIG. 3. Quantum circuits for generating (a) P + h ( X, X) and (b) P − h ( Y , Y ).Here CU (black circle and line) is the standard control gate CU : |w |ψ → |w U (w mod 2) |ψ .Meanwhile CU (white circle, black line) is defined as CU |w |ψ = |0 U (w+1 mod 2) |ψ .(a) To simulate P + h ( X, X) we initialize a qubit in state |s + i and an ancilla in state |0 .Executing the local unitary Vp|0 → |s + 0 , followed by the 2-qubit gate CV q , where VqVp|0 = |s + 1 , creates a suitable entangled state -such that a computation basis measurement of the top qubit yields xt, and simultaneously collapses the bottom qubit into the causal state for the next time step.(b) To simulate P − h ( Y , Y ) we prepare state |s − i |0 |0 as input.Execution of CUp where Up|0 = √ 1 − p|0 + √ p|1 , followed by CU q where