Quantum stochastic processes and quantum non-Markovian phenomena

The field of classical stochastic processes forms a major branch of mathematics. They are, of course, also very well studied in biology, chemistry, ecology, geology, finance, physics, and many more fields of natural and social sciences. When it comes to quantum stochastic processes, however, the topic is plagued with pathological issues that have led to fierce debates amongst researchers. Recent developments have begun to untangle these issues and paved the way for generalizing the theory of classical stochastic processes to the quantum domain without ambiguities. This tutorial details the structure of quantum stochastic processes, in terms of the modern language of quantum combs, and is aimed at students in quantum physics and quantum information theory. We begin with the basics of classical stochastic processes and generalize the same ideas to the quantum domain. Along the way, we discuss the subtle structure of quantum physics that has led to troubles in forming an overarching theory for quantum stochastic processes. We close the tutorial by laying out many exciting problems that lie ahead in this branch of science.

Many systems of interest, in both natural and social sciences, are not isolated from their environment. However, the environment itself is often far too large and far too complex to model efficiently but must be treated statistically. This is the core philosophy of open systems; it is a way to render the description of systems immersed in complex environments manageable, even though the respective environments are inaccessible and their full description out of reach. Quantum systems are no exception to this philosophy. If anything, they are more prone to be affected by their complex environments, be they stray electromagnetic fields, impurities, or a manybody system. It is for this reason that the study of quantum stochastic processes goes back a full century. The field of classical stochastic processes is a bit older, however, not by much. Still, there are stark contrasts in the development of these two fields; while the latter rests on solid mathematical and conceptual grounds, the quantum branch is fraught with mathematical and foundational difficulties.
The 1960s and 1970s saw great advancements in laser technology, which enabled isolating and manipulating single quantum systems. However, of course, this did not mean that unwanted environmental degrees of freedom were eliminated, highlighting the need for a better and formal understanding of quantum stochastic processes. It is in this era great advancements were made to this field. Still going half a century into the future from these early developments, there is yet another quantum revolution on the horizon; the one aimed at processing quantum information. While quantum engineering was advancing, many of the early results in the field of quantum stochastic processes regained importance and new problems have arisen requiring a fresh look at how we characterize and model open quantum systems.
Central among these problems is the need to understand the nature of memory that quantum environments carry. At its core, memory is nothing more than information about the past of the system we aim to model and understand. However, this presence of this seemingly harmless feature leads to highly complex dynamics for the system that require different tools for their description from the once used in the absence of memory. This is of particular importance for engineering fault-tolerant quantum devices which are by design complex and the impact of memory effects will rise with increased miniaturization and read-out frequencies. Consequently, here, one aims to characterize the underlying processes with the hope to mitigate complex noise and making the operation of engineered devices robust to external noise.
On the other hand, there are natural systems that are immersed in complex environments that have functional or fundamental importance in, e.g., biological systems. These systems too undergo open quantum processes with memory as they interact with their complex environments. Here, in order to exploit them for physics and technological development, one aims to better understand the mechanisms that are at the heart of complex quantum processes observed in nature.
For these reasons over the years many books have been dedicated to this field of research, e.g. [1][2][3][4][5]. In addition, the progress, both in experimental and theoretical physics has been fast leading to many review papers focusing on different facets of open quantum systems [6][7][8][9][10][11][12] and the complex multilayered structure of memory effects in quantum processes [10]. This tutorial adds to this growing literature and has its own distinct focus. Namely, we aim to answer two questions: how can we overcome the conceptual problems encountered in the description of quantum stochastic processes, and how can we comprehensively characterize multitime correlations and memory effects in the quantum regime when the system of interest is immersed in a complex environment.
A key aim of this tutorial is to render the connection between quantum and classical stochastic processes transparent. That is, while there is a well-established formal theory of classical stochastic processes, does the same hold true for open quantum processes? And if so, how are the two theories connected? Thus we begin with a pedagogical treatment of classical stochastic process centered around several examples in Sec. II. Next, in Sec. III we formalize the elements of the classical theory, as well as present several elements of the theory that are important in practice. In Sec. IV we discuss the early results on the quantum side that are well-known. Here, we also focus on the fundamental problems in generalizing the theory of quantum stochastic processes such that it is on equal footing as its classical counterpart. Sec. V begins with identifying the features of quantum theory that impose a fundamentally different structure for quantum stochastic processes. We then go on to detail the framework that allows one to generalize the classical theory of stochastic processes to the quantum domain. Finally, in Sec. VI we present various features of quantum stochastic processes, like, e.g., the distinction between Markovian and non-Markovian processes. Throughout the whole manuscript, we give examples that build intuition for how we ought to address multi-time correlations in an open quantum system. We close with several applications.
Naturally, we cannot possibly hope to do the vast field of open quantum system dynamics full justice here. The theory of classical stochastic processes is incredibly large, and its quantum counterpart is at least as large and complex. Here, we focus on several aspects of the field and introduce them rather by concrete example than aiming for absolute rigor. It goes without saying that there are countless facets of the field to be explored, that go beyond what we can present in a tutorial. While we aim to provide as many references as possible for further reading, we do so without a claim to comprehensiveness, and much of the results that have been found in the field will be left unsaid, and far too much will not even be addressed.

II. CLASSICAL STOCHASTIC PROCESSES SOME EXAMPLES
A typical textbook on stochastic processes would begin with a formal mathematical treatment by introducing the triple (Ω, S, ω) of a sample space, a σ-algebra, and a probability measure. Here, we are not going to proceed in this formal way. Instead, we will begin with intuitive features of classical stochastic processes and then motivate the formal mathematical language retrospectively. We will then introduce and justify the axioms underpinning the theory of stochastic processes and present several key results in the theory of classical stochastic processes in the next section. The principal reason for introducing the details of the classical theory is that, later in the tutorial, we will see that many of these key results cannot be imported straightforwardly into the theory of quantum stochastic processes. We will then pivot to provide resolutions of how to generalize the features and key ingredients of classical stochastic processes to the quantum realm.

A. Statistical state
Intuitively, a stochastic process consists of sequences of measurement outcomes, and a rule that allocates probabilities to each of these possible sequences. Let us start with a motivating example of a simple process -that of tossing a die -to clarify these concepts. After a single toss, a die will roll to yield one of the following outcomes Here, R (for roll of the die) is called the event space capturing all possible outcomes. If we toss the die twice in a row then the event space is While this looks the same as a single toss of two dice the two experiments -tossing two dice in parallel, and tossing a single die twice in a row -can, depending on how the die is tossed, indeed be different. However, in both cases the event spaces are the same and grow exponentially with the number of tosses. For example, for three tosses the event space R 3 has 6 3 entries.
While the event spaces for different experiments can coincide, the probabilities for the occurrence of different events generally differ. Any possible event r K ∈ R K has a probability where the boldface subscript K denotes the number of times or the number of dice that are tossed in general, and R K is a. b. c. the random variable corresponding to K tosses. Throughout, we will denote the random variable at toss k by R k , and the specific outcome by r k . Importantly, two experiments with the same potential outcomes and the same corresponding probabilities cannot be statistically distinguished. For example, tossing two dice in parallel, and hard tossing (see below) of one die twice in a row yield the same probabilities and could not be distinguished, even though the underlying mechanisms are different. Consequently, we call the allocation of probabilities to possible events the statistical state of the die, as it contains all inferable information about the experiment at hand. In anticipation of our later treatment of quantum stochastic processes, we emphasize that this definition of state chimes well with the definition of quantum states, which, too, contain all statistical information that is inferable from a quantum system. Importantly, the respective probabilities not only depend on how the die is made, i.e., its bias, but also on how it is tossed. Since we are interested in the stochastic process and, as such, sequential measurements in time, we will focus on the latter aspects below.

B. Memoryless process
Let us now, to see how the probabilities P K emerge, look at a concrete 'experiment', the case where the die is tossed hard. For a single toss of a fair die, we expect the outcomes to be equally distributed from P(R 1 = ) = . . . = P(R 1 = ) = 1/6.
Now, imagine this fair die is tossed 'hard' successively. By hard, we mean that it is shaken in between tossed -in contrast to merely being perturbed (see below). Then, importantly, the probability of future events does not depend on the past events; observing, say, , at some toss, has no bearing on the probabilities of later tosses. In other words, a hard toss of a fair die is a fully random process that has no memory of the past. Consequently, this successive tossing of a single die at k times is not statistically distinguishable from the tossing of k unbiased dice in general. The memorylessness of the process is not affected if an biased die is tossed, e.g., a die with distribution P(R = { , , , , }) = 4 25 and P(R = ) = 1 5 .
Here, while the bias of the die influences the respective probabilities, the dependence of these probabilities on prior outcomes solely stems from the way the die is tossed. Alternatively, suppose, we toss two identical dice with event space given in Eq.
(3) Now, if we consider the aggregate outcomes (sum of the outcomes of the two dice) {2, 3, . . . , 12}, they do not occur with uniform probability. Nevertheless, the process itself remains random as the future outcomes do not depend on the past outcomes. Processes without any dependence on past outcomes are often referred to as Markov order 0 processes. We now slightly alter the tossing of a die to encounter processes with higher Markov order.

C. Markov process
To introduce a dependence on prior outcomes, let us now ease the tossing and imagine placing the die on a book and then gently shaking the book horizontally for three seconds, see the depiction in Figure 1(b). We refer to this process as perturbed die. The term 'perturbed' here highlights that the toss is only a small perturbation on the current configuration. In this process, the probability to tip to anyone side is q, rolling to the opposite side is highly unlikely [13] (with probability s, while it is highly likely (with probability p) that the die stays on the same side. Concretely, suppose we start the die with , then the probability for the outcomes of the next roll will be P(R k |R k−1 = ) = [q p q q s q] T , where T denotes transpose, i.e., the probability distribution is a column vector. The perturbative nature of the toss means that p > q ≫ s and normalization gives us p + 4q + s = 1.
Above, R k and R k−1 are the random variables describing the die at the k-th and (k − 1)-th toss, respectively. The conditional probabilities in Eq. (7) denote the probability for the outcomes { , , , , , } at the k-th toss, given that the (k − 1)-th toss yielded . For example, for the die to yield outcome R k = (i.e., to roll on its side) at the k-th toss, given that it yielded r k−1 = in the previous toss, is P(R k = |R k−1 = ) = q.
A word of caution is needed. In the literature, conditional probabilities often carry an additional subscript to denote how many previous outcomes the probability of the current outcome is conditioned on. For example, P 1|k would denote the probability of one (the current) outcome conditioned on the k previous outcomes, while P k would represent a joint probability of k outcomes. Here, in slight abuse of notation, we use the same symbol for conditional probabilities, as we used for one-time probabilities, e.g. in Eq. (5), and we omit additional subscripts. However, since the number of arguments always clarifies what type of probability is referred to, there is no risk of confusion, and we will maintain this naming convention also for the case of conditional probabilities that depend on multiple past outcomes.
In this example, even though the die may be unbiased, the toss itself is not and the distribution for the future outcomes of the die depend on its current configuration. As such, the process remembers the current state. However, for the probabilities at the kth toss, it is only the outcome at the (k − 1)th toss that is of relevance, but none of the earlier ones. In other words, only the current configuration matters for future statistics, but the earlier history does not matter. Such processes are referred to as Markov processes, or, as they 'remember' only the most current outcome, processes of Markov order 1. Importantly, as soon as any kind of memory effects are present, the successive tossing of a die can be distinguished from the independent, parallel tossing of several identical dice, as in this latter case, the statistics of the kth die cannot depend on the (k − 1)th die (or any other die).
Again, we emphasize that this process will remain Markovian even if the die is replaced by two dice or by a biased die. Similarly, the above considerations would not change if the perturbation depended on the number of the toss k, i.e., if the parameters of Eq. (7) were functions q(k), p(k), s(k). We will now discuss the case where this assumption is not satisfied, i.e., where the perturbation at the k-th toss can depend on past outcomes, and memory over longer periods of time starts to play a non-negligible role.

D. Non-Markovian processes
Let us now modify the process in the last example a bit by changing the perturbation intensity as we go. Above, we considered the process where the die was placed on a book, and the book was shaken for three seconds. Suppose that after the first shake the die rolls on its side, say ↦ . The process is such that, after the number of pips changes, the next perturbation has unit intensity. If this intensity is low enough then we are likely to see ↦ , and if that happens -i.e., the number of pips is unchanged -then the intensity is doubled the next shake; and we keep doubling the intensity until either die rolls to a new value or the intensity reaches the value of eight units (four times), which we assume to be equal to shaking the die so strongly that its initial value does not influence future outcomes. After this, the shaking intensity is reset to the unit level. We have depicted this process in Figure 1(c).
In this example, to predict the future probabilities we not only need to know the current number of pips the die shows, but also its past values. That is, the probability of observing an event, say , after observing two consecutive outcomes is different than if one had previously observed and , i.e., The necessity for remembering the past beyond the most recent outcomes makes this process non-Markovian. On the other hand, here, we only have to remember the past four outcomes of the die due to the resetting protocol of the perturbation strength. Concretely, the future probabilities are independent of the past beyond four steps. For example, we have P( | , , , , ) = P( | , , , , ). (9) To be more precise, predicting the next outcome with correct probabilities requires knowing the die's configuration for the past four steps. That is, the future distribution is fully determined by conditional probabilities P(R k |R k−1 , . . . , R 0 ) = P(R k |R k−1 , . . . , R k−4 ), (10) where we only need to know a part (here, the last four outcomes) of the history.
As mentioned, the size of the memory is often referred to as the Markov order or memory length of the process. A fully random process -like the hard tossing of a die -has a Markov order 0, and a Markov process has an order of 1. A non-Markovian process has an order of 2 or larger. This, in turn, implies that the study of non-Markovian processes contains Markovian processes as well as fully random processes as special cases. Indeed, most processes in nature will carry memory, and Markovian processes are the -well studiedexception rather than the norm [14]. In general, the complexity of a non-Markovian process is higher than that of the Markov process in the last subsection; this is because there is more to remember. Put less prosaically, the process has to keep a ledger of the past outcomes to carry out the correct type of perturbation at each point. And, in general, the size of this ledger, or the complexity, grows exponentially with the Markov order m: for a process with d different outcomes at each time (6 for a die), it is given by d m . However, sometimes is it possible to compress the memory. For instance, in the above example, we only need to know the current configuration and the number of time steps it has remained unchanged; thus the size of the memory is linear in the Markov order for this example. Moreover, looking at histories larger than the Markov order will not reveal anything new and thus does not add to the complexity of the process.

E. Stochastic matrix
Having discussed stochastic processes and memory at a general level, it is now time to look in more detail at the mathematical machinery used to describe them. A convenient way to model stochastic processes is the stochastic matrix, which transforms the current state of the system into the future state. It also lends itself to a clear graphical depiction of the process in terms of a circuit, see, e.g., Figure 2 for circuits corresponding to the three examples above. In what follows, we will write down the stochastic matrices corresponding to the three processes above. The future states can then be computed by following the circuit and performing appropriate matrix multiplication.

Transforming the statistical state
Before describing the process, let us write down the state of the system at time k − 1. At any given time, the die has a probability of be in one of six states, not necessarily uniformly distributed. We can think of this distribution as the statistical state of the system: T . (11) Γ Γ Γ Γ Γ Random and perturbed die (Markov order 0 and 1) Figure 2. Random, Markovian, and non-Markovian processes.
Top panel shows the circuits for random and Markovian die. In these cases, there are no extra lines of communication between the tosses (represented by boxes). Only the system carries the information forward for a Markov process. The bottom panel shows the Markovian die. Here, information is sent between tosses (represented by boxes) in addition to what the system carries, which is the memory of the past states of the system (die). This memory is then denoted by the thick line. The memory has to carry the information about the state of the die in the past four tosses to determine the intensity of the next perturbation.
Here again, T denotes transpose, i.e., the statistical state is a column vector. Suppose the die in the (k − 1)-th toss rolls to r k−1 . Along with this, if we knew the conditional (or transition) probabilities P(r k |r k−1 ), the probability to find the die to rolls to r k in the k-th toss can be straight forwardly computed via P(r k ) = r k−1 P(r k |r k−1 )P(r k−1 ).
This can be phrased more succinctly as where stochastic matrix Γ (k∶k−1) is the mechanism by which the statistical state changes in time from time step k − 1 to k. For brevity we will generally omit the subscript on Γ (the time at which it acts will be clear from the respective arguments it acts on) unless it is required for clarity. The elements of the stochastic matrix are called transition probabilities as they indicate how two events at k and k − 1 are correlated. Before examining the explicit stochastic matrices for the above examples of processes, let us first discuss their general properties. First, all entries of Γ are positive, as they correspond to transition probabilities. Second, to ensure that the l.h.s of Eq. (13) is a probability distribution, the columns of the stochastic matrix sum to one, which is a direct consequence of the identity ∑ r k P(r k |r k−1 ) = 1 which holds for all r k−1 . On the other hand, the rows of Γ do not have to add to unity, as generally we have ∑ r k−1 P(r k |r k−1 ) ≠ 1 (this is also clear in Eq. (14) for a biased die below). In the case where the rows actually add to 1, the matrix is called bistochastic, and it has some nice properties and applications [15], which we will not cover in detail in this tutorial; for example, any bistochastic matrix can be represented as a convex combination of permutation matrices, a fact known as Birkhoff's theorem.

Random process
Now, making the concept of stochastic matrices more concrete, we begin by constructing the stochastic matrix for the fully random process of the tossing of a die without memory. In this case, it does not matter what current state of the die is, the future state will be the one given in Eq. (11). This is achieved by the following matrix As stated above, a fully random process has Markov order of 0, which we denote by the extra superscript (0). Additionally, all the columns of the above Γ (0) add up to one, independent of whether or not the die is biased, while in general, i.e., when the die is biased, the rows do not add up to unity. It is easy to check that the above stochastic matrix indeed leads to the correct transitions; suppose the current state of the die is , i.e., P(R k−1 ) = [0 0 0 0 0 1] T . The statistical state after the roll will be the one given in Eq. (11), i.e., Evidently, this process does not care about the current statethe 'new' probabilities at the k-th toss do not depend on the previous ones -but it merely independently samples from the underlying distribution corresponding to the bias of the coin. As already mentioned, we could readily incorporate a temporal change of said bias, by making it dependent on the number of tosses. However, as long as this dependence is only on the number of tosses, and not on the previous outcomes, we would still consider this process memoryless. To avoid unnecessary notational cluttering, we will always assume that the bias and/or the transition probabilities are independent of the absolute toss number (but they can depend on previous outcomes, see below.).
For an unbiased die the above stochastic matrix will be simply which is not only a stochastic, but a bistochastic map. Again, it is easy to check that the output is the uniform distribution for any P(R k−1 ).

Markov process
Let us now move to the perturbed die process, which we argued is a Markovian process. In this case the stochastic matrix has the form where, again, we have used the superscript (1) to signify that the underlying process is of Markov order 1.
The hallmark of this matrix is that it gives us different future probabilities, depending on the current configuration; the probability P( | ) to find the die showing at the k-th toss, given that it showed at the k − 1-th toss generally differs from the probability P( | ) to show given that it previously showed . In contrast, for the fully random process above, both of these transition probabilities would be given by P( ).
Concretely, for the perturbed die process given in Eq. (7), the stochastic matrix will have the form p q q q q s q p q q s q q q p r q q q q s p q q q s q q p q s q q q q p Again, here the condition p > q ≫ s and p + 4q + r = 1 is assumed, and we have P( | ) = r ≠ q = P( | ). Again, it is easy to see, that the normalization of the conditional probabilities implies that the columns of the stochastic matrix add to one. Additionally, here, the rows of Γ (1) add up to one, too, making it a bistochastic matrix. For a Markov process, the state P(R k ) is related to an earlier state P(R j ), with j < k, by repeated applications of the stochastic matrix Alternatively, we may describe the process from j to k with the stochastic matrix This is clearly desirable as the above stochastic matrix is simply obtained by matrix multiplications, which is easy to do on a computer. Another way to compute the probability for two sequential events, say r k given we saw event r j at respective times, is by employing Eq. (12): This is knows as the Chapman-Kolmogorov equation. Here, we have summed over all trajectories between event r j and event r k .  Figure 3. Memory in non-Markovian processes. For processes with memory, besides the state of the system at a time/toss k, we need additional information -depicted by the additional memory lines -about the past to correctly predict future statistics. If only the probability of the next outcome is of interest, then a map Γ (m) of the form of Eq. (24) is sufficient, if all future probabilities are to be computed via the concatenation of a single map, then Ξ, given in Eq. (26), is required. Together, the system and memory undergo Markovian dynamics.

Non-Markovian process
Above, we have required that the stochastic matrix Γ maps the statistical state P(R j ) at a single time to another singletime statistical state P(R k ). This was the correct way of computing future statistics, as they only depended on the current state of the system, but not on any additional memory. Now, turning our attention to non-Markovian dynamics, we will expand our view to consider processes that map multitime statistical states, e.g., P(R j−1 , R j−2 , . . . , R j−m ), to either a single-time state, e.g. P(R k ), or a multi-time state, e.g., P(R k−1 , R k−2 , . . . , R k−m ), depending on what we aim to describe. This can be done in several ways, either by considering collections of stochastic map, or a single stochastic map that acts on a larger space. We briefly discuss both of these options.
First, let us consider the stochastic matrix for the non-Markovian process described above. As mentioned before, we need to know the current state and the number of times it has not changed µ ∈ [0, 1, 2, 3] to correctly predict future statistics. Given these two pieces of information, for a given µ, we can write the stochastic matrix as where the superscript on the transition probabilities and the stochastic matrices denotes that they depend on the number of times the outcome has not changed. For µ = 3, the process becomes the random process given in Eq. (14), and µ = 4 is the same as µ = 0. Evidently, Eq. (23) defines four distinct stochastic matrices, one for each µ that leads to distinct future statistics. For any given µ, Γ (4,µ) allows us to correctly predict the probability of the next toss of the die. It is always possible to write down a family of stochastic matrices for any non-Markovian process. Given the current state and history, we make use of the appropriate stochastic matrix to get the correct future state of the system. In general, for Markov order m, there are at most d m distinct histories, i.e., µ ∈ {0, d m−1 − 1}; each such history (prior to the current outcome) then requires a distinct stochastic matrix to correctly predict future probabilities. This exponentially growing storage requirement of distinct pasts highlights the complexity of a non-Markovian process.
On the other hand, such a collection of stochastic matrices for a process of Markov order m could equivalently be combined into one d × d m matrix of the form that acts on d m -dimensional probability vectors to yield the correct future statistics, i.e., P(R k ) = Γ (m) P(R K ). Here, K denotes the last m tosses and thus by R K we denote the random variable corresponding to sequences of the last m outcomes starting at the (k − 1)-th toss. As before, Γ (m) is a stochastic matrix, as all of its entries are positive, and its columns sum to one. However, in contrast to the Markovian and the fully random case, it seizes to be a square matrix. We thus have to widen our understanding of a statistical 'state' from probability vectors of outcomes at one time/toss, to probability vectors of outcomes at sequences of times/tosses. In quantum mechanics, this shift of perspective allows one to resolve many of the apparent paradoxes that appear to plague the description of quantum stochastic processes. In the following section, we will see a concrete example of this way of describing non-Markovian processes.
We have graphically depicted non-Markovian processes, with Markov orders 2, 3, and 4, in Figure 3. Here, the lines above the boxes denote the memory that is passed to the future and required to correctly predict future statistics. Each box simply has to pass the information about the current state -which generally is a multi-time object -to future boxes, which, again, can make use of this information. Considering Figure 3, we can already see that the description of stochastic processes with the memory provided above is somewhat incomplete. While Γ (m) allows us to compute the probabilities of the next outcome, given the last m outcomes, it only yields a one-time state, not an m-time state. While this is sufficient if we are only interested in the statistics of the next outcome, it is not enough to compute statistics further in the future. Concretely, we cannot let Γ (m) act successively to obtain all future statistics. Expressed more graphically, a map that allows us to fully compute statistics for a process of Markov order m needs m input and m output lines (see Figure 3). Naturally, such a map, which we will denote as Ξ can always be constructed from Γ (m) , as we discuss in more detail in the next section. Importantly, its action looks just like a square stochastic matrix: As this then allows us to -just like in the Markovian case and hence the superscript 1 -simply compute statistics via the concatenation of Ξ, we can think of any non-Markovian process as a Markovian process on a larger system, as depicted in the bottom panel. Graphically, this can already easily be seen in Figure 3, where the system of interest (the die) plus the required memory lines form a Markovian process. Returning to our discussion of the complexity of non-Markovian processes, usually, not all distinct pasts -even within the Markov order -lead to distinct futures, and memory can be compressed. This effect can already be seen for the perturbed coin above, where, instead of 6 3 = 216 stochastic matrices, we can compute the correct future by means of merely 4 stochastic matrices. We will not discuss the issue of memory compression in this tutorial, but details can be found in the vast literature on the so-called ε-machines, see, for example, Ref. [16][17][18]. Finally, we emphasize that, while here we have been focusing on the underlying mechanisms through which the respective probabilities emerge, a stochastic process is also fully described once all joint probabilities for events are known. For example, considering a three-fold toss of a die, once the probabilities P(R 2 , R 1 , R 0 ) are known, all probabilities for smaller sequences of tosses (say, for example, P(R 2 , R 0 )) as well as all conditional probabilities for those three tosses can be computed. Knowing the full joint distribution is thus equivalent to knowing the underlying mechanism.

F. Hidden Markov model
An important concept in many disciplines is that of stationarity. For stochastic processes, stationarity means time translation symmetry. That is, it doesn't matter when we flip a coin, we only need to consider its states up to the Markov order. This is useful because often we are interested in characterizing a process whose inner workings are hidden from us. In such a case, we can try to infer the inner working by noting the statistics of the state of the system in a time series. For example, given a long sequence of coin-flip outcomes F 1 = {h, t}, we can determine the statistics for seeing 'heads' h and 'tails' t, or any other sequence, say hhttht. Of course, this requires that the total data size is much larger than any sequence whose probability we wish to estimate. From this data we construct hidden Markov model for the system that will reproduce the statistics up any desired Markov order [16][17][18].
An illuminating graphical representation for a stationary stochastic process is the so-called Markov chain, which is associated with the stochastic matrix. For simplicity of the diagram let us consider a process dichotomic outcomes; e.g., a coin flip with the random variables F k (for flips). Again, this can be a fully random process, a Markov process, or a non-Markovian process, depending on how the coin is flipped. Suppose now that the coin is flipped a million times in succession and we are given the sequence of results. For simplicity, we assume stationarity, i.e., probabilities and conditional probabilities do not depend on the cardinal number of the coin toss. Under this assumption, c. Figure 4. Markov chains. Given a time series of coin flips we can deduce any of the above hidden Markov models. At memory length 2 we have a deterministic process and therefore longer memory will not yield any more information. In other words, the Markov order of the process in panel (c) is 2.
from the observed results, we can compute how frequently one sees h or t, which is quantified by P(F k ). We might compute how often h flips to h and so on; this is quantified by P(F k |F k−1 ). Analogously, we may also compute the probability of seeing longer sequences, like hhh, hht, etc. With all of this, we can obtain conditional probabilities of the form P(F k |F k−1 , F k−2 ) and P(F k |F k−1 , F k−2 , F k−3 ). Let us assume that both of these conditional probabilities coincide, which leads us to conclude that the Markov order of the process is 2 (technically, we should check that P(F k |F k−1 , F k−2 ) = P(F k |F k−1 , . . . , F k−n ) for all n ≥ 2, but as it is unlikely that only longer memory exists, we consider this test for Markov order 2 sufficient.). In this case, following the ideas laid out below Eq. (24), the probabilities of future outcomes can be described by a single stochastic matrix of the form Γ (2) = P(h|hh) P(h|ht) P(h|th) P(h|tt) P(t|hh) P(t|ht) P(t|th) P(t|tt) .
This map will act on a statistical state that has the form The action of the stochastic matrix on the statistical state gives us the probability for next flip.
Combining the probabilities for two successive outcomes into a single probability vector thus allows us to compute the probabilities for the next outcome in a Markovian fashion, i.e., by applying a single stochastic matrix to said probability vector. However, there is a slight mismatch in Eq. (29); while the random variables we look at on the r.h.s. are sequences of two successive outcomes, the random variable on the l.h.s. is a single outcome at the k-th toss. To obtain a fully Markovian model, one would rather desire a stochastic matrix that provides the transition probabilities from one sequence of two outcomes to another, i.e., a stochastic matrix Ξ that yields where, for better bookkeeping, we formally distinguish between the random variables on the LHS and the RHS. Additionally, we give Ξ an extra superscript to underline that it describes a process of Markov order one. To do so, it has to act on a larger space of random variables, namely, the combined previous two outcomes. Now, in our case, it is easy to see that the action of Ξ (1) can be simply computed from Γ (2) as where δ is the Kronecker function. The above equation then describes a Markovian model for the random variable F 2 , which takes values {hh, ht, th, tt}. As knowledge of all relevant, i.e., within the Markov order, transition probabilities allows the computation of all joint probabilities, such an embedding into a higher dimensional Markovian process via a redefinition of the considered random variables is always possible. The corresponding Markovian model is often called hidden Markov model. As a brief aside, we note that the amount of memory that needs to be considered in an experiment depends both on the intrinsic Markov order of the process at hand, as well as the amount of information an experimenter can/wants to store. If, for example, one is only interested in correctly recreating transition probabilities P(R k |R k−1 ) between adjacent times, but not necessarily higher-order transition probabilities, like, e.g., , then a Markovian model without any memory is fully sufficient (but will not properly reproduce higher-order transition probabilities). Returning to our process, we have depicted the corresponding Markov chains for each case in Figure 4. For a fully random process, the Markov chain only has one state; after each flip, the state returns to itself, and the future probabilities do not change based on the past. For a Markov process, the chain has two states, and four transitions are possible. Finally, the non-Markovian process is chosen to be deterministic: hh always goes to th, and so on. Note that, here, as mentioned above, if we only care about transition probabilities P(F k |F k−1 ), i.e., we only consider the last outcome and not the last two outcomes (i.e., we identify hh and ht, and th and tt), then the process of the panel (c) in Figure 4 reduces to the simpler one in panel (b), but the information is lost.
All of the panels of Figure 4 describe Markovian processes, however for different random variables. This is a general fea-ture: any non-Markovian process can be represented by a hidden Markov model or a Markov chain by properly combining the past into a 'large enough' random variable [14] (for example, the random variable with values {hh, th, ht, tt} in panel (c)). This intuition will come in handy when we move the case of quantum stochastic processes. But first, we need to formalize the theory of classical stochastic process and show where lie the pitfalls when generalizing this theory to the quantum domains.
G. (Some) mathematical rigour As mentioned, in our presentation of stochastic processes, we rather opt for intuitive examples than full mathematical rigor. However, laying the fundamental concepts of probability theory in detail provides a more comprehensive picture of stochastic processes, and renders the generalizations needed to treat quantum processes mathematically straightforward. As we will not rely on this mathematical machinery throughout this tutorial, the following section can be skipped.
The basic ingredient for the discussion of stochastic processes is the triple (Ω, S, ω) of a sample space Ω, a σ-algebra S and a probability measure ω. Intuitively, Ω is the set of all events that can occur in a given experiment (for example, Ω could represent the myriad of microstates a die can assume or the possible numbers of pips it can show), S corresponds to all the outcomes that can be resolved by the measurement device (for the case of the die, S could, for example, correspond to the number of pips the die can show, or to the less fine-grained information 'odd' or 'even') and ω allocates a probability to each of these observable outcomes.
More rigorously, we have the following definition [19]: Definition (σ-algebra) Let Ω be a set. A σ-algebra on Ω is a collection S of subsets of Ω, such that • Ω ∈ S and ∅ ∈ S.
• S is closed under (countable) unions and intersections, For example, if the sample space is given by Ω = { , . . . , } and we only resolve whether the outcome of the toss of a die is odd or even, the corresponding σ-algebra is given by {{ , , }, { , , }, ∅, Ω}, while in the case where we resolve the individual numbers of pits, S is simply the power set of Ω.
A pair (Ω, S) is called a measurable space, as now, we can introduce a probability measure for observable outcomes in a well-defined way: Definition (Probability measure) Let (Ω, S) be a measurable space. A probability measure ω ∶ S → R is a realvalued function that satisfies • ω(Ω) = 1.
• ω(s) ≥ 0 for all s ∈ S • ω is additive for (countable) unions of disjoint events, i.e., ω ⋃ ∞ j=1 s j = ∑ ∞ j ω(s j ) for s j ∈ S and s j ∩ The corresponding triple (Ω, S, ω) is then called a probability space [19]. As the name suggests, ω maps each event s j to its corresponding probability, and, using the convention of the previous sections, we could have denoted it by P, and will do so in what follows. Evidently, in our previous discussions, we already made use of sample spaces, σ-algebras and probability measures, without caring too much about their mathematical underpinnings. The mathematical machinery of probability spaces provides a versatile framework for the description of stochastic processes, both on finitely and infinitely many times (see Sec. III D for an extension of the above concepts to the multitime case).
So far, we have talked about processes that discrete both in time and space. It does not make much sense to talk about the state of a die when it is in mid-air; nor does it make sense to attribute a state of 4.4 to a die. On the other hand, of course, there are processes that are both continuous in time and space. A classic example is Brownian motion [20], which requires that time be treated continuously. If not, the results lead to pathological situations where the kinetic energy of the Brownian particle blows up. Moreover, in such instances, the event space is the position of the Brownian particle and can take uncountably many different real values. Nevertheless, the central object in the theory of stochastic processes does not change; it remains the joint probability distribution for all events, which in the case of infinitely many times is a probability distribution on a rather complicated, and not easy to handle σ-algebra. Below, we will discuss how due to a fundamental result by Kolmogorov it is sufficient to deal with finite distributions instead of distributions on σ-algebras on infinite Cartesian products of sample spaces. Finally, this machinery straightforwardly generalizes to positive operator valued measures (POVMs) as well as instruments, fundamental ingredients for the discussion of quantum stochastic processes.

III. CLASSICAL STOCHASTIC PROCESSES FORMAL APPROACH
Up to this point, both in the examples we provided, as well as the more rigorous formulation, we have somewhat left open what exactly we mean by a stochastic process, and what quantity encapsulates it. We will do so now, and provide a fundamental theorem for the theory of stochastic processes, the Kolmogorov extension theorem (KET), which allows one to properly define stochastic processes on infinitely many times, based on finite time information.
A. What then is a stochastic process?
Intuitively, a stochastic process on a set of times T k ∶= {t 0 , t 1 , . . . , t k } with t i ≤ t j for i ≤ j is the joint probability distribution observable events. Namely, the central quantity that captures everything that can be learned about an underlying process is corresponding to all joint probabilities {P(R k = r k , R k−1 = r k−1 , . . . , R 0 = r 0 , )} r k ,...,r 0 (33) to observe all possible realizations R k = r k at time t k , R k−1 = r k−1 at time t k−1 and so on. Evidently, the time label -which we omit above and for most of this tutorial -could also correspond to a label of the number of tosses, etc. We also adopt the compact notation of P T k+1 , as defined above, to denote a probability distributions on a set of k + 1 times. More concretely, suppose the process we have in mind is tossing a die five times in a row. This stochastic process fully characterised by the probability of observing all possible sequence of events where, as before, we omit the respective time/tossing number labels. From the joint distribution for five tosses, one can obtain any desired marginal distributions for fewer tosses, e.g. P(R 3 ); or any conditional distributions (for five tosses), such as, for example, the conditional probability P(R 2 = |R 1 = , R 0 = ), to obtain outcome at the third toss, having observed two in a row previously; the conditional distributions in turn allows computing the stochastic matrices, which in turn allow casting processes as a Markov chain. Having the total distribution is enough to determine whether a process is fully random, Markovian, or non-Markovian. This statement, however, is contingent on the respective set of times. Naturally, without any further assumptions of memory length and/or stationarity, knowing the joint probabilities of outcomes -and thus everything that can be learned -on a set of times T k does not provide knowledge about the corresponding process on a different set of times T k ′ Consequently, we identify a stochastic process with the joint probabilities it displays with respect to a fixed set of times.
While joint probabilities contain all inferable information about a stochastic process, working with them is not always desirable. This is because its number of entries grows exponentially. Nevertheless, this is the central quantity in the theory of classical stochastic processes. Our first aim when extending the notion of stochastic processes to the quantum domain will thus be to construct the analogy to joint distribution for time-ordered events. Doing so has been troubling for the same foundational reasons that make quantum mechanics so interesting. Most notably, quantum processes do generally not straightforwardly allow for a Kolmogorov extension theorem, which we discuss below. However, upon closer inspection, such obstacles can be overcome by properly generalizing the concept of joint probabilities to the quantum domain. Before doing so, we will first return to our more rigorous mathematical treatment and define stochastic processes in terms of probability spaces. Figure 5. Continuous process. A stochastic process is a joint probability distribution over all times. From a physical perspective, we can think of it as the probability of observing a trajectory s k . This is highly desirable when talking about the motion of a Brownian particle. However, this interpretation requires some caution as there are cases where trajectories may not be smooth or even continuous.

B. Kolmogorov extension theorem
While, for the example of the tossing of a die, a description of the process at hand in terms of joint probabilities on finitely or countably many times/tosses is satisfactory, this is not always the case. For example, even though it can in practice only be probed at finitely many points in time, when considering Brownian motion, one implicitly posits the existence of an 'underlying' stochastic process, from which the observed joint probabilities stem. Intuitively, for the case of Brownian motion, this underlying process should be fully described by a probability distribution that ascribes a probability to all possible trajectories the particle can take. Connecting the operationally well-defined finite joint probabilities a physicist can observe and/or model with the concept of an underlying process is the aim of the Kolmogorov extension theorem (KET).
Besides not being experimentally accessible, working with probability distributions on infinitely many times has the additional drawback that the respective mathematical objects are rather cumbersome to use, and would make the modeling of stochastic processes a fairly tedious business. Luckily, the KET allows one to deduce the existence of an underlying process on infinitely many times, from properties of only finite objects. With this, modeling a proper stochastic process on infinitely many times amounts to constructing finite time joint probabilities that 'fit together' properly.
To see what we mean by this last statement let P T be the joint distribution obtained for an experiment for some fixed times. For now, we will stick with the case of Brownian motion, and P T could correspond to the probability to find a particle at positions x 0 , . . . , x −1 when measuring it at times T = {t 0 , . . . , t −1 }. As mentioned before, P T contains all statistical information for fewer times as marginals, i.e., any subset T k ⊆ T we have where we denote the sum over the times in the complement of the intersection of T k and T by T \ T k and use an additional superscript to signify that the respective joint probability distribution is restricted to a subset of times via marginalization. For simplicity of notation, here and in what follows, we always denote the marginalization by a summation, even though, in the uncountably infinite case, it would correspond to an integration. For classical stochastic processes, all probabilities on a set of time can be obtained from those on a superset of times by marginalization. We will call this consistency condition between joint probability distributions of a process on different sets of times Kolmogorov consistency conditions. Naturally, consistency conditions hold in particular if the finite joint probability distributions stem from an underlying process on infinitely many times T ⊇ T ⊇ T k , where we leave the nature of the corresponding probability distribution P T somewhat vague for now (see Sec. III D for a more thorough definition).
Importantly, the KET shows, that satisfaction of the consistency condition on all finite sets T k ⊆ T ⊆ T is already sufficient to guarantee the existence of an underlying process on T. Specifically, the Kolmogorov extension theorem [3,19,21,22] defines the minimal properties finite probability distributions have to satisfy in order for an underlying process to exist: Theorem. (KET) Let T be a set of times. For each finite T k ⊆ T, let P T k be a (sufficiently regular) k-step joint probability distribution. There exists an underlying stochastic process P T that satisfies Put more intuitively, the KET shows that for a given family of finite joint probability distributions that satisfy consistency conditions, [23] the existence of an underlying process, that contains all of the finite ones as marginals, is ensured. Importantly, this underlying process does not need to be known explicitly in order to properly model a stochastic process.
We emphasize that, in the (physically relevant) case where T is an infinite set, the probability distribution P T is generally not experimentally accessible. For example, in the case of Brownian motion, the set T could contain all times in the interval [0, t] and each realization would represent a possible continuous trajectory of a particle over this time interval, see Figure 5. While we assume the existence of these underlying trajectories (and hence the existence of P T ) in experiments concerning Brownian motion, we often only access their finite time manifestations, i.e., P T k for some T k . The KET thus bridges the gap between the finite experimental reality and the underlying infinite stochastic process, in turn defining in terms of accessible quantities what one means by a stochastic process on infinitely many times. For this reason, many books on stochastic processes begin with the statement of KET. Γ (t:r) ≠ Γ (t:s) Γ (s:r)  Figure 6. Hierarchy of multi-time processes. A stochastic process is the joint probability distribution over all times. Of course, in practice one looks only at finite time statistics. However, the set of all k-time probability distributions {P T k } contain, as marginals, all jtime probability distributions {P T j } for j < k. Moreover, the set of two and three time distributions play a significant roles in the theory of stochastic processes.
As mentioned, the KET also enables the modeling of stochastic processes: Any mechanism that leads to finite joint probability distributions that satisfy a consistency condition is ensured to have an underlying process. For example, the proof of the existence of Brownian motion relies on the KET as a fundamental ingredient [24][25][26][27]. Loosely speaking, the KET holds for classical stochastic processes, because there is no difference between 'doing nothing' and conducting a measurement but 'not looking at the outcomes' (i.e., summing over the outcomes at a time). Put differently, the validity of the KET is based on the fundamental assumption that the interrogation of a system does not, on average, influence its state. This assumption generally fails to hold in quantum mechanics, which makes the definition of quantum stochastic processes somewhat more involved, and their structure much richer than that of their classical counterparts.

C. Practical features of stochastic processes
Now that we have a formal definition of a stochastic process let us ask what it is useful for. It is worth saying that working with a probability distribution of a large number of random variables is not desirable as the complexity grows exponentially. However, for a given problem, what we care about is the structure of the stochastic process and what we may anticipate when we sample from this distribution. We depict the hierarchy of stochastic processes in Figure 6, and in this section focus on the short end of the hierarchy.
Naturally, the examples in Sec. II and the formal theory in the last subsection only begin to scratch at the massive literature on stochastic processes. We, of course, cannot cover all facets of this field here. However, in practice, there are a few important topics that must be mentioned. Below we will discuss several common tools that one encounters in the field of stochastic processes. First among them is master equations, which are employed ubiquitously in the sciences, finance, and beyond. Next, we will briefly cover methods to differentiate between Markovian and non-Markovian processes, as well as quantify the memory using tools of information theory. While many of these examples only deal with two-time correlations, we do emphasize that there are problems that naturally require multi-time correlations.

Master equations
A master equation is a differential equation that relates the rate of change in probabilities with the current and the past states of the system. There are of course many famous master equations in physics: Pauli, Fokker-Plank, Langevin, to name a few on the classical side. We will not delve into the details of this very rich topic here, and once again just begin to scratch the surface. We refer the reader to other text for more in-depth coverage of master equations [20,28,29].
It will suffice for our purpose that a master equation, in general, has the following form [30] where G(t, τ ) is a matrix operator. The time derivative of the state at t depends on the previous states up to a time s, which is the memory length. If the memory is infinite, then s → 0.
Since the master equation expresses the probabilities continuously in time it may be then tempting to think that a master equation is equivalent to a stochastic process as defined above. However, this is not the case because a master equation needs at most joint probabilities of three times or lower. Namely, the set of joint probability distributions, is sufficient to derive Eq. (36). The LHS can be computed by setting b = t and a = t − dt. While the RHS can be expressed as a linear combination of product of stochastic matrices Γ c∶b Γ b∶a , with c = t, b = τ ≥ s, and a = r < τ . In fact, the RHS is concerned with functions such as Γ c∶a − Γ c∶b Γ b∶a , which measure the temporal correlations between a and c, given an observation at b. In any case, these stochastic matrices only depend on joint distributions of two times, as seen in Eqs. (12) and (13), are not concerned with multi-time statistics. Thus, the family of distributions in Eq. (37) suffices for the RHS. Formally, showing that the RHS can be expressed as a product of two stochastic matrices can be done by means of the Laplace transform [31,32] or the ansatz known as the transfer tensor [33][34][35].

Divisible processes
Let us now consider a family of stochastic matrices that satisfy Processes described by such a family is called divisible. This family is equivalent to the family of distributions contained in Eq. (37). It is easy to see that the family of stochastic matrices in Eq. (38) is a superset of Markovian processes. That is, any Markov process will satisfy the above equation. However, there are non-Markovian processes that also satisfy the divisibility property [36]. Nevertheless, checking for divisibility is often far simpler than checking for the satisfaction of the Markov conditions. Moreover, as we will see shortly, the divisibility of the process implies several highly desirable properties for the process. A nice property of divisible processes is the corresponding master equation. Applying Eq. (38) to the LHS of Eq. (36) we get where 1 1 is the identity matrix. Taking the limit dt → 0 yields the generator G t ∶= lim dt→0 [Γ (t∶t−dt) − 1 1]/dt. This is a time-local master equation. In turn, the generator is related to the stochastic matrix as Γ (t∶t−dt) = exp(G t dt), which is obtained by integration. When the process is stationary, i.e., symmetric under time-translation, both Γ and G will be time independent.
A divisible Markovian process. Let us consider a two level system that undergoes the following infinitesimal process The first part of the process is just the identity process, and the second part is a random process. However, together they form a Markov process. Using Eq. (39) we can derive the generator for the master equation. This process is very similar to the perturbed die in the last section; it takes any state P(X t−dt ) at t − dt to where G = [g 0 g 1 ] T . After some time τ = ndt, i.e., after n application of the stochastic matrix, we have That is, the process relaxes any state of the system to the fixed G exponentially fast with a rate γ. Many processes, such as thermalization, have such a form. In fact, one often associates Markov processes with exponential decay. A stroboscopic divisible non-Markovian process. This example comes from Ref. [37] and provides a stroboscopic non-Markovian process that is divisible. Let us consider a single bit process with x j = 0, 1 with probability 1/2 for j = 1, 2, 3. That is, the process yields random bits in the first three times. At the next time, we let x 4 = x 1 +x 2 +x 3 , where the addition is modulo 2 (see Figure 7). It is easy to see that the stochastic matrix between any two times will correspond to a random process, making the process divisible. However, P(X 4 , X 3 , X 2 , X 1 ) is not uniform; when x 1 + x 2 + x 3 = x 4 the probability will be 1 8 and 0 otherwise. Consequently, there are genuine four-time correlations, but there are not two or three time correlations.
A process with long memory. Let us now consider a process where the probability of observing an outcome x t is correlated with what was observed some time ago x t−s with some probability Figure 7. Stroboscopic divisible non-Markovian process. At each time t j , each of the possible outcomes 0 and 1 occurs with probability 1/2 (for example, they could be drawn from urns with uniform distributions). At the final time t 4 , the observed outcome is equal to the sum (modulo 2) of the previous three outcomes. While the stochastic map between any two points in time is completely random -and thus the process is divisible -the overall joint probability distribution shows multi-time memory effects (as laid out in the text).
Here d is the size of the system. This process only has twotime correlations, but the process is non-Markovian as the memory is a long-range one. A master equation, of the type of Eq. (36), for this process, can be derived by differentiating.
For sake of brevity, we forego this exercise. At this stage, it is worth pointing out why Markov processes are of interest in many cases. Suppose we are following the trajectory of a particle at position x at time t, which then moves to x ′ at t ′ . If the difference in time is arbitrarily small, say δt, then for a physical process, x ′ cannot be too different from x due to continuity. Thus, it is natural to write down a master equation to describe such a process. Since the future state will always depend on the current position, the process will be at least Markovian. Still, the process may have higher-order correlations, but they are often neglected for simplicity.

Data processing inequality
One of the most useful properties of Markov processes is the satisfaction of the data processing inequality (DPI). Suppose we subject two initial states, P(X 0 ) and R(X 0 ) to a process Γ (t∶0) , which yields P(X t ) and R(X t ), respectively. The main idea here is that since the process is random, it cannot increase the distinguishability between two initial states.
For instance, a natural measure for distinguishing probability distribution is the so called the Kolmogorov distance or the trace distance When two states are fully distinguishable, the trace distance will be 1, which is the maximal value it can assume. On the other hand, if the two distributions are the same then the trace distance will be 0. The DPI guarantees that for all Markov processes for all times. The equality holds if and only if the process is reversible.
There are many metrics and pseudo-metrics that satisfy DPI, but not all. For instance, the Euclidean norm, ∥P(X)∥ 2 ∶= ∑ x P(x) 2 , does not satisfy the DPI. As an example consider a two-bit process with initial states P(X 0 )P u and R(X 0 )P u , where the second bit's state is the uniform distribution P u . If the process simply discards the second bit, then the final Euclidean distance is simply P(X 0 ) − R(X 0 ). However, the initial Euclidean distance is exactly 1 2 (P(X 0 ) − R(X 0 )). The DPI plays an important role in information theory because it holds for two important metrics, the mutual information and the Kullback-Leibler divergence. The latter is also known as the relative entropy. For a random variable, the Shannon entropy is defined as The mutual information between two random variables X and Y , that posses a joint distribution P(X, Y ), is defined as Here, H(X) is computed from the marginal distribution as long as the transformation is a Markov process (some DPIs hold for divisible dynamics). The relative entropy between two distributions P(X) and P ′ (X) is defined as Note that this is not a symmetric quantity, which is endowed with an operational meaning. The DPI here has the form again, for Markov processes. The behavior of relative entropy and the related pseudometric in quantum and classical dynamics is an ongoing research effort [38,39]. There are still other inequalities that are being discovered, e.g. see Ref. [37] for the socalled monogamy inequality. For detailed coverage of DPI see [40,41]. Moreover, recently, researchers have employed the so-called entropy cone [42,43] to infer causality in processes, which is closely related to many of our interests in this tutorial. However, for brevity, we do not go into these details here.

Conditional mutual information
While Markov processes, i.e., processes with finite Markov order 1, satisfy the DPI, a general process with finite Markov order (introduced in Sec. II D) has vanishing conditional mutual information (CMI), mirroring the fact that such a process is conditionally independent of past outcomes that lie further back than a certain memory length (Markov order) of . For ease of notation, we will group the times {t k , . . . , t 0 } on which the process at hand is defined into three segments: the history With this, the CMI of a joint probability distribution on past, memory and future is defined as (51) where the conditional entropy is defined as This latter quantity is the entropy of the conditional distribution P(X|Y ) and has a clear interpretation in information theory as the number of bits X must send to Y so the latter party can construct the full distribution.
Consequently, H(F ∶ H|M ) is a measure of the correlations that persist between F and H, once the outcomes on M are known. Intuitively then, for a process of Markov order , H(F ∶ H|M ) should vanish as soon as M contains more than times. This can be shown by direct insertion. Recall that by means of (the general form of) Eq. (10), we can write P(F |M, H) = P(F |M ) for a process of Markov order ≤ |M |, implying This means that H(F, H|M ) = H(F |M ) + H(H|M ) and, consequently, the CMI in Eq. (51) vanishes. Importantly, the CMI only vanishes for processes with finite Markov order (and |M | ≥ ), but not in general. If the CMI vanishes, then the future is decoupled from the entire history given knowledge of the memory. Vanishing CMI can thus be used as an alternative, equivalent definition of Markov order. Following this interpretation, the Markov order then encodes the complexity of the process at hand, as it is directly related to the number of past outcomes that need to be remembered to correctly predict future statistics; if there are d different possible outcomes at each time, then no more than d different sequences need to be remembered. While, in principle, may be large for many processes, they can often be approximated by processes with short Markov order. This is, in fact, the assumption that is made when real-life processes are modeled by means of Markovian Master equations.
Additionally, complementing the conditional independence between history and future, processes with vanishing CMI admit a so-called 'recovery map' R M →F M that allows one to deduce P(F, M, H) from P(M, H) by means of a map that only acts on M (but not on H). Indeed, we have where we have added additional subscripts to clarify what variables the respective joint probability distributions act on. While seemingly trivial, the above equation states that the future statistics of a process with Markov order can be recovered by only looking at the memory block. Whenever the memory block one looks at is shorter than the Markov order, any recovery map only approximately yields the correct future statistics. Importantly, though, the approximation error is bounded by the CMI between F and H [44,45], providing an operational interpretation of the CMI, as well as quantifiable reasoning for the memory truncation of non-Markovian processes. As we will see in Sec. VI C, many of these properties will also apply in some form to quantum processes of finite Markov order, with the caveat that the question of memory length possesses a much more layered answer in the quantum case than it does in the classical one.
D. (Some more) mathematical rigor As promised above, we shall now define what we mean by a stochastic process in more rigorous terms, and thus give a concrete meaning to the probability distribution P T when |T| is infinite.
Before advancing, a brief remark is necessary to avoid potential confusion. In the literature, stochastic processes are generally defined in terms of random variables [3,22], and above, we have already phrased some of our examples in terms of them. However, both in the previous examples, as well as those that follow, explicit reference to random variables is not a necessity, and all of the results we present can be phrased in terms of joint probabilities alone. Thus, foregoing the need for a rigorous introduction of random variables and trajectories thereof, we shall phrase our formal definition of stochastic processes in terms of probability distributions only. For all intents and purposes, though, there is no difference between our approach and the one generally found in the literature.
To obtain a definition of stochastic processes on infinite sets of times, we will define stochastic processes -first for finitely many times, then for infinitely many -in terms of probability spaces, which we introduced in Sec. II G. This can be done by merely extending their definition to sequences of measurement outcomes at (finitely many) multiple times, like, for example, the sequential tossing of a die (with or without memory) we discussed above.

Definition (Classical stochastic process) A stochastic process on times
a σ-algebra S T k on Ω T k , and a probability measure The symbol × denotes the Cartesian product for sets. Naturally, as already mentioned, the set T k the stochastic process is defined on does not have to contain times, but could, as in the case of the die tossing, contain general labels of the observed outcomes. Each Ω α corresponds to a sample space at t α , and the probability measure P T k ∶ S T k → [0, 1] maps any sequence of outcomes at times {t α } α∈T k to its corresponding probability of being measured. A priori, this definition of stochastic processes is not concerned with the respective mechanism that leads to the probability measure P T k ; above, we have already seen several examples of how it emerges from the stochastic matrices we considered. However, as mentioned, once the full statistics P T k are known, all relevant stochastic matrices can be computed. Put differently, once P T k is known, there is no more information that can be learned about a classical process on T k . We now formally define a stochastic process on sets of times T, where |T| can be infinite. Using the mathematical machinery we introduced, this is surprisingly simple: Definition A stochastic process on times α ∈ T is a triple (P T , Ω T , S T ) of a sample space a σ-algebra S T on Ω T , and a probability measure P T on S T with P T (Ω T ) = 1.
While almost identical to the analogous definition for finitely many times, conceptually, there is a crucial difference between the two. Notably, P T is not an experimentally reconstructable quantity unless |T| is finite. Additionally, here, we simply posit the σ-algebra S T . However, generally, the explicit construction of this σ-algebra from scratch is not straight forward, and starting the description of a given stochastic process on times T from the construction of S T is a daunting task, which is why, for example, the modeling of Brownian motion processes does not follow this route. Nonetheless, we often implicitly assume the existence of an 'underlying' process, given by (P T , Ω T , S T ) when discussing, for example, Brownian motion on finite sets of times. Connecting finite joint probability distributions to the concept of an underlying process is the main achievement of the Kolmogorov extension theorem, as we will lay out in detail below.

IV. EARLY PROGRESS ON QUANTUM STOCHASTIC PROCESSES
Our goal in the present section, as well as the next section, will be to follow the narrative presented in the last two chapters to obtain a consistent description of quantum stochastic processes. However, the subtle structure of quantum mechanics will get in the way and generate technical and foundational problems that will challenge our attempts to generalize the theory of classical stochastic processes to the quantum domain. Nevertheless, it is instructive to understand the kernel of these problems before we present the natural generalization in the next section. Thus we begin with the elements of quantum stochastic processes that are widely accepted. It should be noted that we assume a certain level of mastery of quantum mechanics from the reader. Namely, statistical quantum states, generalized quantum measurements, composite systems, and unitary dynamics. We refer the readers unfamiliar with these standard elements of quantum theory to textbooks on quantum information theory, e.g. [46][47][48]. However, for completeness, we briefly introduce some of these elements in this section.
The intersection of quantum mechanics and stochastic processes dates back to the inception of quantum theory. After all, a quantum measurement itself is a stochastic process. However, the term quantum stochastic process means a lot more than a quantum measurement that has to be interpreted probabilistically. Perhaps, the von Neumann equation (also due to Landau) is the first instance where elements of the stochastic process come together with those of quantum mechanics. Here, the evolution of a (mixed) quantum state is written as a master equation, though this equation is fully deterministic. Nevertheless, a few years after the von Neumann equation genuine phenomenological master equations appeared to explain atomic relaxations and particle decays [49]. Later, further developments were made as per need, e.g. Jaynes introduced what is now known as a random unitary channel [50].
Serious and formal studies of quantum stochastic processes began in the late 1950s and early 1960s. Two early discoveries were the exact non-Markovian master equation due to Nakajima and Zwanzig [51,52] as well as the phenomenological study of the maser and laser [53][54][55][56]. It took another decade for the derivation of the general form of Markovian master equation [57,58]. In the early 1960s, Sudarshan et al. [59,60] generalized the notion of the stochastic matrix to the quantum domain, which was again discovered in the early 1970s by Kraus [61]. We begin by introducing the basic elements of quantum theory and move to quantum stochastic matrices (also called quantum channels, quantum maps, dynamical maps), and discuss their properties and representations. This then lays the groundwork for a consistent description of quantum stochastic processes that allows one to incorporate genuine multi-time probabilities.

A. Quantum statistical state
As with the classical case we begin with defining the notion of quantum statistical state. A (pure) quantum state |ψ⟩ is a ray in a d-dimensional Hilbert space H S (where we employ the subscript S for system). Just like in the classical case, d corresponds to the number of perfectly distinguishable outcomes. Any such pure state can be written in terms of a basis: where {|s⟩} is an orthonormal basis, c s are complex number, and we assume d < ∞ throughout this article. Thus the quantum state is a complex vector, which is required to satisfy the property ⟨ψ|ψ⟩ = 1, implying ∑ s |c s | 2 = 1. It may be tempting to think of |ψ⟩ as the quantum generalization of the classical statistical state P. However, as mentioned, a state that is represented in the above form is pure, i.e., there is no uncertainty about what state the system is in. To account for potential ignorance, one introduces density matrices, which are better suited to fill the role of quantum statistical states.
Density matrices are written in the form which can be interpreted as an ensemble of pure quantum states {|ψ j ⟩} n j=1 which are prepared with probabilities p j such that ∑ n j=1 p j = 1. Such a decomposition is also called a convex mixture. Naturally, pure states are special cases of density matrices, where p j = 1 for some j. In other words, this state represents our ignorance about which element of the ensemble or the exact pure quantum state we possess. It is important though, to add a qualifier to this statement: seemingly, Eq. (58) provides the 'rule', by which the statistical quantum state at hand was prepared. However, this decomposition in terms of the pure state is neither unique nor do the states {|ψ j ⟩} that appear in it have to be orthogonal. For any non-pure density matrix, there are infinitely many ways of decomposing it as a convex mixture of pure states [62][63][64]. This is in stark contrast to the classical case, where any probability vector can be uniquely decomposed as a convex mixture of 'pure' states, i.e., events that happen with unit probability.
For a d-dimensional system, the density matrix is a d × d square matrix (i.e., an element of the space B(H) of bounded operators on the Hilbert space H) Due to physical considerations, like the necessity for probabilities to be real, positive, and normalised, the density matrix must be • Hermitian ρ rs = ρ * sr , • positive semidefinite 1 ≥ ρ rr ≥ 0, and • unit-trace ∑ r ρ rr = 1.
As noted above, the density matrix is really the generalization of the classical probability distribution P. In fact, a density matrix that is diagonal in the computational basis is just a classical probability distribution. Conversely, the offdiagonal elements of a density matrix, known as coherences, make quantum mechanics non-commutative and are responsible for the interference effects that give quantum mechanics its wave-like nature. However, it is important to realize that a density matrix is like the single time probability distribution P, in the sense that it provides the probabilities for any conceivable measurement outcome at a single given time to occur. It will turn out that the key to the theory of quantum stochastic process lies in clearly defining a multi-time density matrix.
There are many interesting properties and distinct origins for the density matrix. We list a few important ones here without proof or justification. While we have simply heuristically introduced it as an object that accounts for the uncertainty about what current the system at hand is in, there are more rigorous ways to motivate it. One way to do so is Gleason's theorem, which is grounded in the language of measure theory [65,66], and, basically, derives density matrices as the most general statistical object that provides 'answers' to all questions an experimenter can ask.
Concerning its properties, a density matrix is pure if and only if ρ 2 = ρ. Any (non-pure) mixed quantum state ρ S , of the system S, can be thought of as the marginal of a bipartite pure quantum state |ψ⟩ SS ′ , which must be entangled. This fact is known as quantum purification and it is an exceedingly important property that we will discuss in Sec. IV B 4. Of course, the same state ρ S can also be thought as a proper mixture of an ensemble of quantum states on the space S alone. However, quantum mechanics does not differentiate between proper mixtures and improper mixtures, i.e., mixedness due to entanglement (see [67] for a discussion of these different concepts of mixtures). As mentioned, mixtures are non-unique. The same holds true for purifications; for a given density matrix ρ S , there are infinitely many pure states that have it as a marginal. Finally, let us say a few words about the mathematical structure of density matrices. Density matrices are elements of the vector space of d×d Hermitian matrices, which is d 2 dimensional. Consequently, akin to the decomposition of pure state in Eq. (57) in terms of an orthonormal basis, a density matrix can also be cast in terms of a fixed set of d 2 orthonormal basis operators: where we can choose different sets {σ k } of basis matrices. [68] They can, for example, be Hermitian observables (e.g., Pauli matrices plus the identity matrix), in which case {r k } are real numbers. Also, {σ k } can be non-Hermitian elementary matrices, in which case {r k } are complex numbers; In both cases, we may have the matrix orthonormality condition tr[σ jσ † k ] = N δ jk , with N being an constant normalization. However, in neither case, the matrices {σ k } correspond to physical states, as there is no set of d 2 orthogonal d × d quantum states, which is in contrast to Eq. (57). We can, however, drop the demand for orthonormality, and write any density matrix as a linear sum of a fixed set of d Here, however, {q k } will be real but generally not positive, see Figure 8. This appears to be in contrast to Eq. (58), where the density matrix is written as a convex mixture of physical states. The reason for this distinction is in the last equation we have fixed the basis operators {ˆ k }, which span the whole space of Hermitian matrices, and demand that any quantum state can be written as a linear combination of them, while in Eq. (58) the states {|ψ j ⟩} can be any quantum state, i.e., they would have to vary to represent different density matrices as convex mixtures. Understanding these distinctions will be crucial in order to grasp the pitfalls that lie before us as well as to overcome them. Figure 8. Non-convex decomposition. All states in the x − y plane of the Bloch sphere, including the pure states, can be described by the basis stateˆ 1 ,ˆ 2 , andˆ 4 in Eq. (65). However, only the states in shaded region will be convex mixtures of these basis states. Of course, no pure state can be expressed as a convex mixture.

Decomposing quantum states
Let us illustrate the concept of quantum states with a concrete example for d = 2, i.e., the qubit case. A generic state of one qubit can, for example, be written as in terms of Pauli operators {σ 1 , σ 2 , σ 3 } and the identity matrix σ 0 . We can write the same state in terms of elementary matrices with complex coefficients {e 00 , e 01 , e 10 , e 11 } being The elementary matrices are non-Hermitian but self dual, i.e., tr[ε ijε † kl ] = δ ik δ jl . These are, of course, two standard ways to represent a qubit state in terms of well-known orthonormal bases. On the other hand, we can expand the same state in terms of the following basis statesˆ where |±x⟩, |±x⟩, and |±z⟩ are the eigenvectors ofσ 1 ,σ 2 , andσ 3 , respectively. With this, we have α = ∑ k q kˆ k . It is easy to see that the density matricesρ k are Hermitian and linearly, but not orthonormal. The real coefficients {q 1 , q 2 , q 3 , q 4 } are obtained by following the inner product where the set {D k } is dual to the set of matrices in Eq. (65) satisfying the condition tr(ˆ iD † j ) = δ ij . See the Appendix in Refs. [69,70] for a method for constructing the dual basis.
For example, for the set of density matrices in Eq. (65) the dual set isD Note that, even though the statesˆ k are positive, this dual set does not consist of positive matrices (all the duals of a set of Hermitian matrices are Hermitian, though [70]). Nonetheless, it gives us the coefficient in Eq. (66) as Interestingly, the dual set itself also forms a linear basis, and we can write any state α as where p k = tr(αˆ k ). This is decomposition lends itself nicely to experimental reconstruction of the state α . Specifically, given given many copies of α, the value tr(αˆ k ) is obtained by projecting α along directions x, y, z, i.e., measuring the observables σ 1 , σ 2 , and σ 3 . The inner product tr(αˆ k ) is then nothing more than a projective measurement along direction k and p k is the probability of observing the respective outcomes. Importantly, as the duals {D k } can be computed from the basis {ˆ k }, these probabilities then allow us to estimate the state via Eq. (69). The procedure to estimate a quantum state by measuring it is called quantum state tomography [71][72][73]. There are many sophisticated methods to this nowadays, which we will only briefly touch on in this tutorial.

Measuring quantum states: POVMs and dual sets
As we have seen above, a quantum state can be recronstructed experimentally, by measuring enough observables (above, the observablesσ 1 ,σ 2 , andσ 3 where used), and collecting the corresponding outcome probabilities. Performing pure projective measurement is not the only way in quantum mechanics to gather information about a state. More generally, a measurement is described by a positive operator valued measure (POVM), a collection J = {E k } n k=1 of positive operators (here, matrices), that add up to the identity, i.e., ∑ k E k = 1 1. Each E k corresponds to a possible measurement outcome, and the probability to observe said outcome is given by the Born rule: For example, when measuring the observable σ 1 , the corresponding POVM is given by J = {| + z⟩⟨+z|, | − z⟩⟨−z|}.
A less trivial example on a qubit is the symmetric informationally complete (SIC) POVM [71] While still pure (up to normalization), these POVM elements are not orthogonal. However, as they are linearly independent, the span the d 2 = 4-dimensional space of Hermitian qubit matrices, and every density matrix is fully characterized once the probabilities p k = tr(ρÊ k ) are known. As this holds true in any dimension for POVMs consisting of d 2 linearly independent elements, such POVMs are called 'informationally complete' (IC). [74] Importantly, using the ideas outlined above, an informationally complete POVM allows one to reconstruct density matrices.
In short, to do so, one measures the system with an IC-POVM, whose operators {Ê k } linearly span the matrix space of the system at hand. The POVM yields probabilities {p k }, and the measurement operators The density matrix is then of the form (see also Eq. (69)) which can be seen by direct insertion; the above state yields the correct probability with respect to each of the POVM elements E k . As the POVM is informationally complete, this implies it yields the correct probabilities with respect to every POVM. It remains to comment on the existence of IC-POVMs, and the physical realizability of POVMs in general, which, at first sight, appear to be a mere mathematical construction. Concerning the former, it is easy to see that there always exists a basis of d 2 × d 2 Hermitian matrices that only consists of positive elements, thus ensuring the existence of IC-POVMs in any dimension. With respect to the latter, due to Neumark's theorem [75][76][77], any POVM can be realized as a pure projective measurement in a higher-dimensional space, thus putting them on the same foundational footing as 'normal' measurements in quantum mechanics.

B. Quantum stochastic matrix
Our overarching aim is to generalize the notion of stochastic processes to quantum theory. Here, we start with the generalization of classical stochastic matrices. In the classical case a stochastic matrix, in Eq. (12), is a mapping from time t j to time t k , i.e., Γ (k∶j) ∶ P(X j ) ↦ P(X k ). As such, in clear analogy, we are looking for a mapping from . While there are different representations of E (k∶j) (see, for example, Ref. [70]), we start with the one that most closely resembles the classical case, where a probability vector gets mapped to another probability vector by means of a matrix Γ (k∶j) . We have already argued that the density matrix is the quantum generalization of the classical probability distribution. Then, consider the following transformation that turns a density matrix into a vector: where we use the | r ⟩⟩ notation to emphasize that the vector stems from a vectorization. This procedure is often called vectorization of matrices, for details see Refs. [47,78,79].
Next, in clear analogy to Eq. (14), we can define a matrix E that maps a density matrix ρ to another density matrix ρ ′ (we have added the symbol˘to distinguish the map E from its matrix representationȆ). Using the above notation, this matrix can be expressed as and the action of E can be simply written as Here,Ȇ is simply a matrix representing the map , [80] very much like the stochastic matrix, that maps the initial state to final state. For better bookkeeping, we explicitly distinguish between the input space H i and output space H o . While for the remainder of this tutorial, the dimensions of these two spaces generally agree, in general, the two are allowed to differ, and even in the case where they do not, it proves advantageous to keep track of the different spaces.
It was with the above intuition Sudarshan et al. calledȆ the quantum stochastic matrix [59]. In today's literature, it is often referred to as a quantum channel, quantum dynamical map, etc. Along with many names, it also has many representations. We will not go into these details here (see Ref [70] for further information). We will, however, briefly discuss some of its important properties. Note, that we stick to discrete level systems and do not touch the topics of Gaussian quantum information [81,82].
Amplitude damping channel. Before that, we quickly provide an explicit examples of a quantum stochastic matrix. Consider a relaxation process that takes any input quantum state to the ground state. Such as process is, for example, described by the so-called amplitude damping channel This matrix acts on a vectorized density matrix of a qubit, i.e., |ρ(0)⟩⟩ = [ρ 00 , ρ 01 , ρ 10 , When p(t) = exp{−γt}, we get relaxation exponentially fast in time, and for t → ∞, any input state will be mapped to [1,0,0,0] T . This example is very close in spirit of the classical example in Eq. (40).
Here, already, it is easy to see that the matricesȆ, unlike their classical counterparts, do not possess nice properties, like Hermiticity, or stochasticity. However, these shortcomings can be remedied in different representations ofȆ. Also note that, here, we actually have a family of quantum stochastic matrices parameterized by time. When we speak of a family of maps we will label them with subscript (t ∶ 0). However, often the stochastic matrix only represents a mapping from the initial time to a final time. In such cases, we will omit the subscript and refer to the initial and final states as ρ and ρ ′ , respectively.

Linearity and tomography
Having formally introduced quantum maps, it is now time to discuss the properties they should display. We begin with one of the most important features of quantum dynamics. The quantum stochastic map, like its classical counterpart, is a liner map: This is straightforwardly clear for the specific case of the quantum stochastic matrixȆ because the vectorization of a density matrix itself is also a linear map, i.e., |A + B⟩⟩ = |A⟩⟩ + |B⟩⟩. Once this is done, the rest is just matrix transformations, which are linear. The importance of linearity cannot be overstated; we will exploit this property over and over and, in particular, the linearity of quantum dynamics plays a crucial role in defining an unambiguous set of Markov conditions in quantum mechanics. Due to linearity, a quantum channel is fully defined once its action on a set of linearly independent states is known. From a practical point of view, this is important for experimentally characterizing quantum dynamics by means of a procedure known as quantum process tomography [71,83] (see, e.g., Refs. [84,85] for a more in-depth discussion).
For concreteness, let us prepare a set of linearly independent input states, say {ˆ j }, and determine their corresponding outputs {E[ˆ j ]} by means of quantum state tomography. The corresponding input-output relation then fully determines the action of the stochastic map on any density matrix where we have used Eq. (61), i.e, ρ = ∑ k q kˆ k , underlining its importance. The above equation highlights that, once the output states for a basis of input states are known, the action of the entire map is determined.
Using ideas akin to the aforementioned tomography of quantum states we can also directly use linearity and dual sets to reconstruct the matrixȆ. Above, we saw that a quantum state is fully determined, once the probabilities for an informationally complete POVM are known. In the same vein, a quantum map is fully determined, once the output states for Indeed, it is easy to see that ⟨⟨A|B⟩⟩ = tr(A † B), implying that, with the above definition, we haveȆ |ˆ j ⟩⟩ = |ρ ′ j ⟩⟩ for all basis elements. Due to linearity, this implies thatȆ yields the correct output state for any input state. Measuring the output states for a basis of input states is thus sufficient to perform process tomography. Specifically, if the output states ρ ′ j are measured by means of an informationally complete As the experimenter controls the states they prepare at each run (and, as such, the dualD j corresponding to each run), as well as the instrument they use, determining the probabilities q (j) j thus enables the reconstruction ofȆ (see Figure 9 for a graphical representation).

Complete positivity and trace preservation
A classical stochastic matrix maps probability vectors to probability vectors. As such, it is positive, in the sense that it maps any vector with positive semi-definite entries to another positive semi-definite vector. In the same vein, quantum channels need to be positive, as they have to map all density matrices to proper density matrices, i.e., positive semidefinite matrices to positive semi-definite matrices.
One crucial difference between classical stochastic maps and their quantum generalization is the requirement of complete positivity. A positive stochastic matrix is guaranteed to map probabilities into probabilities even if it acts non-trivially only on a subpart, i.e., for all P A and P AB where A and B are two different spaces and 1 1 B is the identity process on B. Here, P ≥ 0 means that all the entries of the vector are positive semi-definite, and we have given all objects additional subscripts to denote the spaces they act/live on.
The same is not true in quantum mechanics. Namely, there are maps that take all density matrices to density matrices on a subspace, but their action on a larger space fails to map density matrices to density matrices where I B is the identity map on the system B, i.e., I B [ρ B ] = ρ B for all ρ B , and ρ ≥ 0 means that all eigenvalues of ρ are positive semidefinite. These maps are called positive maps, and they play an important role in the theory of entanglement witnesses [86,87]. It is easy to show that the positivity breaks only when the map E acts on an entangled bipartite state. Of course, giving up positivity of probability is not physical, and as such a positive map that is not also positive when acting on a part of a state is not physical. One thus demands that physical maps must take all density matrices to density matrices, even when only acting nontrivially on a part of them, i.e., Maps, for which this is true for the arbitrary size of the system B are called completely positive (CP) maps [1] and are the only maps we will consider throughout this tutorial (for a discussion of non-completely positive maps and their potential physical relevance -or lack thereof -see, for example, Refs. [70,[88][89][90][91][92]). In addition to preserving positivity, i.e., preserving the positivity of probabilities, quantum maps also must preserve the trace of the state ρ, which amounts to preserving the normalization of probabilities. This is the natural generalization of the requirement on stochastic matrices that their columns sum up to 1. Consequently, for a quantum channel, we demand that it satisfies Luckily, there is no such thing as completely trace-preserving. If a map E A is trace-preserving, then so is E A ⊗ I B . We will refer to completely positive maps that are also tracepreserving as CPTP maps, or quantum channels. Importantly, while the physicality of non-completely positive maps is questionable, we will frequently encounter maps that are CP, but only trace non-increasing instead of trace-preserving. Such maps are the natural generalizations of POVM elements and will play a crucial role when modeling quantum stochastic processes.

Representations
We will not discuss the details of different representations here; we refer the reader to Ref. [47,70,93] for further reading. Above, we have already seen the matrix representationȆ of the quantum stochastic map in Eq. (74). This representation is rather useful for numerical purposes, as it allows one to write the action of the map E as a matrix representation. However, it is not straightforward to see how complete positivity and trace preservation enter into the properties ofȆ. When we add these two properties to the mix, there are two other important and useful representations that prove more insightful. First is the so-called Kraus representation of completely positive maps: by letting it act on one half on a maximally entangled state. Note that, for ease of notation, here, we let E act on B(H i ′ ), such that  (85) for some d × d matrices {K j }. In addition, the CP map E is trace-preserving, iff the Kraus operators satisfy ∑ j K † j K j = 1 1 (see, for example, Ref. [70] for a more in-depth discussion of representations of quantum maps.
Depolarizing channel. A common quantum map that one encounters in this representation is the depolarizing channel on qubits: where {σ j } are the Pauli operators. This map is an example of a random unitary channel [95], i.e., it is a probabilistic mixture of unitary maps. When the p j are uniform the image of this map is the maximally mixed state for all input states. It is straightforward to see that the above map is indeed CPTP, as it can be written in terms of the Kraus operators {K j = p j σ j }, and we have ∑ j K † j K j = ∑ j p j σ j σ j = 1 1. For the second important representation of E, consider the action of a quantum map on one part of an (unnormalized) where {|k⟩} is an orthonormal basis of H i . The resultant Figure 10 for a graphical depiction. This can easily be seen by noting that in the last equation E is acting on a complete linear basis of matrices, i.e., the elementary matrices {ε kl ∶= |k⟩⟨l|}. Consequently, Υ E contains all information about the action of E. In principal, instead of Φ + , any bipartite vector with full Schmidt rank could be used for this isomorphism [96]. In the form of (87) it is known as the Choi-Jamiołkowski isomorphism (CJI) [97][98][99]. It allows one to map linear maps, Usually, Υ E is called the Choi matrix or Choi state of the map E. We will mostly refer to it by the latter. Given Υ E , the action of E can be written as where tr i is the trace over the input space H i and 1 1 o denotes the identity matrix on H o . The validity of (88) can be seen by direct insertion of (87).
For quantum maps, Υ E has particularly nice properties. Complete positivity of E is equivalent to Υ E ≥ 0 (i.e., ⟨x|Υ E |x⟩ ≥ 0 for all |x⟩ ∈ H i ⊗ H o ), and it is straightforward to deduce from Eq. (88) that E is trace-preserving iff tr o (Υ E ) = 1 1 i . These properties are much more transparent, and easy to work with than, for example, the properties that makeȆ the matrix corresponding to a CPTP map. Additionally, Eq. (88) allows one to directly relate the representation of E in terms of Kraus operators to the Choi state Υ E , and, in particular, the minimal number of required Kraus operators to the rank of Υ E . Specifically, in terms of its eigenbasis, Υ E can be written as Υ E = ∑ r α=j λ j |Φ j ⟩⟨Φ j |, where r = rank(Υ E ) and λ j ≥ 0. Inserting this into Eq. (88), we obtain where {|α/β⟩} is a basis of H i . The above equation provides a Kraus representation of E with the minimal number of Kraus operators (for more details on this connection between Choi matrices and Kraus operators, see, for example, Ref. [100]). Naturally, all representations of quantum maps can be transformed into one another; details how this is done can be found in Refs. [70,101].
Depolarizing Channel. For concreteness, let us consider the above case of the depolarizing channel E DP and provide its Choi state. Inserting Eq. (86) into Eq. (87), we obtain The resulting matrix Υ E DP is positive semidefinite (with corresponding eigenvalues Besides its appealing mathematical properties, the CJI is also of experimental importance. Given that a (normalized) maximally entangled state can be created in practice, the CJI enables another way of reconstructing a representation of the map E; letting it act on one half of a maximally entangled state and reconstructing the resulting state via state tomography directly yields Υ E . While this so-called ancilla-assisted process tomography [102,103] requires the same number of measurements as the input-output procedure, it can be, depending on the experimental situation, easier to implement in the laboratory.

Purification and Dilation
In quantum mechanics any mixed state ρ S can be thought of as the marginal of a pure state |Ψ⟩ SS ′ in a higher dimensional space that. I.e., for any ρ S , there exists a pure state |Ψ⟩ SS ′ such that tr S ′ (|Ψ⟩ SS ′ ⟨Ψ|) = ρ S . The state |Ψ⟩ SS ′ is then called a purification of ρ S . This is in contrast to classical physics, which is not endowed with a purification principle.
To show that such a purification always exists, recall that any mixed state ρ S is diagonal in its eigenbasis {|r⟩ S }, i.e., ρ S = r λ r |r⟩ S ⟨r| , with λ r ≥ 0 and r λ r = 1. (91) This state can, for example, be purified by More generally, as a consequence of the Schmidt decomposition, any state |Ψ⟩ SS ′ that purifies ρ S is of the form where W is an isometry from space S ′′ to S ′ . Importantly, |Ψ⟩ SS ′ is entangled between S and S ′ as soon as ρ S is mixed, i.e., as soon as λ r < 1 for all r.
As entangled states lie outside of what can be captured by classical theories, classical mixed states do not admit a purification in the above sense -at least not one that lies within the framework of classical physics. Randomness in classical physics can thus not be considered as stemming from ignorance of parts of a pure state in a higher-dimensional space, but it has to be inserted manually into the theory. On the other hand, any quantum state can be purified within quantum mechanics, and thus randomness can always be understood as ignorance about extra degrees of freedom.
Purification example. To provide an explicit example, consider the purification of a maximally mixed state on a d-dimensional system ρ S = 1 d ∑ d i=r |r⟩ S ⟨r|. Following the above reasoning, this state is, for example, purified by |r⟩ S |r⟩ S ′ , which is the maximally entangled state.
Remarkably the purification principle also holds for dynamics, i.e., any quantum channel can be understood as stemming from a unitary interaction with an ancillary system, while this does not hold true for classical dynamics. This former statement can be most easily seen by direct construction. As we have seen, quantum channels can be represented in terms of their Kraus operators as and we have added extra superscripts denoting the spaces the Kraus operators act on. The above can be easily rewritten as an isometry in terms operators K j ∈ B(H) and vectors |j⟩ E ∈ H: satisfying V † V = 1 1 S . Consequently, the number of Kraus operators determines the dimension d E of the environment that is used for the construction. [104] With this, we have The second equality comes from the fact that any isometry V can be completed to a unitary U SE→SE =∶ U (see, for example, Ref. [105] for different possible constructions). For completeness, here, we provide a simple way to obtain U from ) be an orthogonal basis of the system (environment) Hilbert space. By construction, we have U | S 0 E ⟩ = V | ⟩ S . Consequently, U can be written as where SE ⟨ϑ ′ ,α |V | ⟩ S = 0 for all { , ′ } and α ≥ 1 and ⟨ϑ ′ ,α ′ |ϑ ,α ⟩ = δ ′ δ αα ′ . Such a set {ϑ ′ ,α } of orthogonal vectors can be readily found via a Gram-Schmidt process. It is easy to verify by that the above matrix U is indeed unitary.
The fact that any quantum channel can be understood as part of a unitary process is often referred to as Stinespring dilation [106]. Together with the possibility to purify every quantum state, it implies that all dynamics in quantum mechanics can be seen as reversible processes, and randomness only arises due to lack of knowledge. In particular, we have where we have omitted the respective identity matrices.
On the other hand, in the classical case, the initial state of the environment is definite, but unknown, in each run of the experiment. Supposing that the system too is initialized in a definite state, then any pure interaction, i.e., a permutation will always yield the system in a definite state. In other words, if any randomness exists in the final state, the randomness must have been present a priori, either in the initial state of the environment or in the interaction. Put differently, a classical dynamics that transforms pure states to mixed states, i.e., one that is described by stochastic maps, cannot have come from a permutation and pure states only.
As both of these statements, the purifiability of quantum states and quantum channels ensure the fundamental reversibility of quantum mechanics, purification postulates have been employed as one of the axioms in reconstructing quantum mechanics from first principles [107,108].
Purification of dephasing dynamics. Before advancing, it is insightful to provide the dilation of an explicit quantum channel. Here, we choose the so-called dephasing map on a single qubit: This channel can be represented with two Kraus matrices Following Eq. (94), the corresponding isometry is given by From this, we can construct the two remaining vectors |ϑ 01 ⟩ SE (t) and |ϑ 11 ⟩ SE (t) to complete V (t) to a unitary U (t) by means of Eq. (96). For example, we can make the choice It is easy to check that these vectors indeed satisfy SE ⟨ϑ ′ ,α |V | ⟩ S = 0 for all { , ′ } and α ≥ 1, as well as ⟨ϑ ′ ,α ′ |ϑ ,α ⟩ = δ ′ δ αα ′ . This, then, provides a unitary matrix U (t) that leads to the above dephasing dynamics: Insertion into Eq. (95) then shows that the thusly defined unitary evolution indeed leads to dephasing dynamics on the system.

C. Quantum Master Equations
While we have yet to formalize the theory of quantum stochastic processes, quantum stochastic matrix formalism is enough to keep us occupied for a long time. In fact, much of the active research in this field is concerned with the properties of families of stochastic matrices. It should be already clear that the quantum stochastic matrix, like its classical counterpart, only deals with two-time correlations, see Figures 6 and 24. This analogy goes further; as is the case on the classical side, an important family of stochastic matrices corresponds to quantum master equations. [109] Quantum master equations have a long and rich history dating back to the 1920s. Right at the inception of modern quantum theory, Landau derived a master equation for light interacting with charged matter [110]. This should not be surprising because master equations play a key role in understanding the real phenomena observed in the lab. For the same reason, they are widely used tools for theoretical physicists and beyond, including quantum chemistry, condensed matter physics, high-energy physics, material science, and so on. However, the formal derivation of overarching master equations took another thirty years. Nakajima and Zwanzig independently derived exact memory kernel master equations using the so-called projection operator method. Since then there have an enormous number of studies of non-Markovian master and stochastic equations , spanning from exploring their mathematical structure, studying the transition between the Markovian and non-Markovian regime [135,136] and applying them chemistry or condensed matter systems. Here, we will not concern with these details and limit our discussion to the overarching structure of the master equation, and in particular how to tell Markovian ones apart from non-Markovian ones. We refer the reader to standard textbooks for more details [2,3,5] on these aspects.
The most general quantum master has a form already familiar to us. We simply replace the probability distribution in Eq. (36) with density matrix to obtain the Nakajima-Zwanzig master equation [137] d dt Above, K(t, τ ) is an super-operator [138] that is called the memory kernel. Often, this equation is written in two parts where D is called the dissipator with the form Above, the first term on the RHS corresponds to a unitary dynamics, the second term is the dissipative part of the process, and the third terms carries the memory (which can also be dissipative). While the Nakajima-Zwanzig equation is the most general quantum master equation, the rage in the 1960s and 1970s was to derive the most general Markovian master equation. It took well over a decade to get there, after many attempts, see Ref. [139] for more on this history. Those who failed in this endeavor were missing a key ingredient, complete positivity. In 1976, this feat was finally achieved by Gorini-Kossakowski-Sudarshan [57] and Lindblad [58] independently. [140] A quantum Markov process can be described by this master equation, now known as the GKSL master equation. Eq. (104) already contains the GKSL master equation as the final term vanishes for the Markov process and, here L stands for Louivillian, but often called Lindbladian. If L is time independent, then the above Master equation has the formal solution where ρ r is the system state at time r. From this, we see that the respective dynamics between two arbitrary times r and t only depends on t − r, but not the absolute times r and t.
Using e L(t−r) = e L(t−s) e L(s−r) , this implies the often used semi-group property of Markovian dynamics for t ≥ s ≥ r. Some remarks are in order. The decomposition in Eq. (104) is not always unique. Often, a term dubbed as the inhomogeneous term is present and it is due to the initial system-environment correlations. As we will below that such a treatment is operationally meaningless. The superoperators in Eq. (106) should be time-independent. In fact, it is possible to derive master equations for non-Markovian processes that look just like Eq. (106) but then the super-operator will be time-dependent and the rates {γ j } may be negative [31,125,[141][142][143]. For a Markovian master equation, the operators {L j } are related to the Kraus operators [144]. Since this is the most general form for the quantum master equation it contains equations due to Redfield, Landau, Pauli, and others. To reach this equation one usually expands and approximates the memory kernel. This is a field of its own and we cannot do justice to these methods, and the reasoning behind the approximations here (for a comparison of the validity of the different employed approximations, see, for example, Ref. [145,146]). As with the classical case, the above master equation expresses the statistical quantum state continuously in time. [147] And as before, it may be tempting to think that the master equation is equivalent to a stochastic process as defined above. However, just as in the classical case, the quantum master equation only accounts for two-point correlations. This is seen by employing the transfer tensor method [33][34][35], which shows that the RHS of Eq. (104) can be expressed as a linear combination of product of quantum maps E (c∶b) •E (b∶a) , with c being either t or t − dt, b = s, and a being either the initial time. A quantum map E (b∶a) is a mapping of a preparation at time a to a density matrix at time b. Thus, it only contains correlations between two times a and b. The LHS can be computed by setting b = t and a = t − dt. Another formal method for showing that the RHS can be expressed as a product of two stochastic matrices can be done by means of the Laplace transform [31,32].
Then, indeed, it is possible to have physical non-Markovian processes that can be described as a Markovian master equation. That is, the implication only goes one way; a Markov process always leads to a master equation of Eq. (104) form with the final term vanishing. The converse does not hold. We detail an example below, but to fully appreciate it we must have a better understanding of multi-time quantum correlations.

D. Witnessing non-Markovianity
Having access to the stochastic matrix and master equation is already sufficient to witness departures from Markovianity. That is, there are certain features and properties that must belong to any Markovian quantum processes, which then allows for discriminating between Markov and non-Markov processes.

Initial correlations
Consider the dynamics of a system from initial time to some final time. When the system interacts with an environment the process on the system can be described by a map E (t∶0) . As we showed in Eq. (95), such a map can be thought to come from unitary system-environment dynamics, with the caveat that the initial system-environment has no correlations. Already in the 1980s and 1990s, researchers began to wonder what happens if the initial system-environment state has correlations [88,89,148]. Though this may seem unrelated to the issue of non-Markovianity, the detectable presence of initial correlation is already a non-Markovian effect. This is because initial correlations indicate past interactions and if the initial correlations affect the future dynamics then the future dynamics are a function of the state of the system at t = 0, as well as further back in the past. As this is the definition of non-Markovianity, the observable presence of initial correlations constitutes an experimentally accessible witness for memory [149][150][151][152][153][154].
We emphasize, that the presence of initial correlations does not make the resulting process non-Markovian per se; suppose there are initial correlations whose presence cannot be detected on the level of the system, then these initial systemenvironment correlations lead to non-Markovianity. If, however, it is possible to detect an influence of such correlations on the behavior of the system (for example, by observing a breakdown of complete positivity [88,91,92] or by means of a local dephasing map [150,152]), then the corresponding process is non-Markovian. With this in mind, in what follows, by 'presence' of correlations, we will always mean 'detectable presence'.
A pioneering result on initial correlations and open system dynamics was due to Pechukas in 1995 [88]. He argued that either there are no initial correlations or we must give up either the complete positivity or the linearity of the dynamics. Around the same time, many experiments began to experimentally construct quantum maps [155][156][157]. Surprisingly, many of them revealed not completely positive maps. This began a flurry of theoretical research either arguing for not-completely-positive (NCP) dynamics or reinterpreting the experimental results [91,[158][159][160][161][162][163][164][165]. However, this does not add to the physical legitimacy of NCP processes [166]. Nevertheless, NCP dynamics remains as a witness for non-Markovianity. We will show in the next section that all dynamics, including non-Markovian ones, must be completely positive. We do this by getting around the Pechukas theorem by paying attention to what it means to have a state in quantum mechanics.

Completely positive and divisible processes
The notion of divisibility, first discussed in Sec. III C 2, naturally extends from classical to the quantum domain. A quantum process is called divisible if Here, • stands for the composition of two quantum maps.
Since they are not necessarily matrices, the composition may not be a simple matrix product. Moreover, in the quantum case we now further require that each map here is completely positive and thus such a class of processes is referred to as CP divisible processes. Understanding the divisibility of quantum maps and giving it an operational interpretation is a highly active area of research [167][168][169][170][171][172][173][174][175][176], and we will only scratch the surface. Importantly, as we have seen above, processes that satisfy a GKSL equation are divisible (see Eq. (108) provided the maps {E (t∶0) } are invertible. We deliberately label this map with a different letter ζ, as it may not actually represent a physical process [177]. Now, if the process is Markovian then ζ (t∶s) = E (t∶s) , i.e., it corresponds indeed to the physical evolution between s and t, and it will be completely positive. Conversely, if we find that ζ (t∶s) is not CP then we know that process is non-Markovian. Working with divisible processes has several advantages. Two that we've already discussed in the classical case are the straightforward connection to the master equation and data processing inequality. We can use these to construct further witnesses for non-Markovianity, such as those based on the trace distance measure [178]. The amplitude damping channel in Eq. (76) and the dephasing channel in Eq. (98) are both divisible as long as they relax exponentially. Otherwise, they are indivisible processes, which is easily checked numerically.

Snapshot
As in the classical case, when a process is divisible it will be governed by Markovian master equation of GKSL type of Eq. (106). Following the classical case, in Eq. (39), the Liouvillian for the quantum process can be obtained via This, in turn, means that We can now reverse the implication to see if a process is Markovian by considering the process E (t∶0) for some t. We can take the log of this map, which has to be done carefully, to obtain L. If the process is Markovian then exp(Ls) will be CP for all values of s. If this fails then the process is must be non-Markovian, provided it is also symmetric under time translation. That is, we may have a Markovian process that slowly varies in time, and may fail this test. This witness was one of the first proposed for quantum processes [179,180]. Once again, note that here only two-time correlations are accounted for, and this witness will miss processes that are non-Markovian at higher orders.
Dephasing dynamics. The dephasing process introduced in the previous subsection is divisible and thus can be described by a Markovian master equations. To obtain this, we simply differentiate the state at time t to get The quantum stochastic matrix for this process is Since the matrix is diagonal, it can be trivially seen to be divisible by the fact that exp −γt = exp −γ(t − s) exp −γs. In Sec. VI A 2 we will revisit this example and show that there are non-Markovian processes, where the two point correlations have this exact form.

Quantum data processing inequalities
We now move to quantum data processing inequalities. As in the classical case there are several distance measure that are proven to be contractive under CP dynamics [181,182]: Three prominent examples are the quantum trace distance the quantum mutual information, and the quantum relative entropy All of these are defined as in the classical case, with the sole difference that for the latter two we replace Shannon entropy with von Neumann entropy, S(ρ) ∶= −tr[ρ log(ρ)]. Two of the most popular witnesses of non-Markovianity [178,183,184], derived using the first two data processing inequalities, were introduced about a decade ago. In particular, Ref. [184] proposed to prepare a maximally entangled state of a system and an ancilla. The system is then subjected to a quantum process. Under this process, if the quantum mutual information (or any other correlation measure) between the system and ancilla behaves non-monotonically then the process must be non-Markovian. A similar argument was proposed by Ref. [178] using the trace distance measure. It can be shown that the former is a stronger witness than the latter [185]. Nonetheless, even this latter witness of non-Markovianity is generally not equivalent to the break-down of CP divisibility, as there are processes that behave monotonically under the above distance measures, but are not CP divisible [172,[186][187][188]. We will not delve into the details of these measures here since there are excellent reviews on these topics readily available [7,8] (for an in-depth quantitative study of the sensitivity to memory effects of correlation-based measures, see, for example, Ref. [189]).

E. Troubles with quantum stochastic processes
Do we need more (sophisticated) machinery than families of quantum stochastic maps and quantum master equations [190] to describe stochastic quantum phenomena? Perhaps for a long time, the machinery introduced above was sufficient. However, as quantum technologies gain sophistication and as we uncover unexpected natural phenomena with quantum underpinnings, the above tools do not suffice [191][192][193]. Take for example the pioneering experiments that a argued for the persistence of quantum effects on time scales relevant for photosynthetic processes [194][195][196][197][198][199], and, in particular, that these processes might exploit complex quantum memory effects arising from the interplay of the electronic degrees of freedom -constituting the system of interestand the vibrational degrees of freedom -playing the role of the environment. In these experiments, three ultra-short laser pulses are fired at the sample and then a signal from the sample is measured. The time between each pulse, as well as the final measurement, are varied. This system itself is mesoscopic and therefore certainly an open system. The conclusion from these experiments is based on the wave-like feature in the signal. This experiment is fundamentally making use of four-time correlations and thus requiring more sophistication than the above machinery affords us. Another important example is the mitigation of non-Markovian noise in quantum computers and other quantum technologies [200][201][202][203][204]. However, as we unveil next, there are fundamental problems that we must overcome first before we can describe multi-time quantum correlations as a stochastic process. Figure 12. Simple quantum process that violates the assumptions of the KET. Successive measurements of the spin of a spin-1 2 particle do not allow one to predict the statistics if the intermediate measurement is not conducted. Here, measuring in the x−basis is invasive, and thus summing over the respective outcomes is not the same as not having done the measurement at all.

Break down of KET in quantum mechanics
As we have mentioned in Sec III B, one of the fundamental theorems for the theory of classical stochastic processes, and the starting point of most books on them, is the Kolmogorov extension theorem (KET). It hinges on the fact that joint probability distributions of a random variable S pertaining to a classical stochastic process satisfy consistency conditions, like, for example, ∑ s 2 P(S 3 , S 2 = s 2 , S 1 ) = P(S 3 , S 1 ) amongst each other; a joint distribution on a set of times can always be obtained by marginalization from one on a larger set of times. Fundamentally, this is a requirement of noninvasiveness, as it implies that not performing a measurement at a time is the same as performing a measurement but forgetting the outcomes.
While seemingly innocuous, this requirement is not fulfilled in quantum mechanics, leading to a breakdown of the KET [205]. To see this, consider the following concatenated Stern-Gerlach experiment [206] (depicted in Figure 12): Let a qubit initially be in the state |x+⟩ = 1 2 (|z+⟩+|z−⟩), where {|z+⟩ , |z−⟩} are the pure states corresponding to projective measurements in the z-basis yielding outcomes z+, z−. Now, the state is measured sequentially (with no intermediate dynamics happening) in the z-, xand z-direction at times t 1 , t 2 and t 3 (see Figure 12). These measurements have the possible outcomes {z+, z+} and {x+, x−} for the measurement in zand x-direction, respectively. It is easy to see that the probability for any possible sequence of outcomes is equal to 1/8. For example, we have P(z+, x+, z+) = 1 8 .
Now, summing outcomes at time t 2 , i.e., we obtain the marginal probability ∑ s 2 =x± P(z+, s 2 , z+) = 1/4. However, by considering the case where the measurement is not made at t 2 , it is easy to see that P(S 3 = z+, S 1 = z+) = 1/2. The intermediate measurement changes the state of the system, and the corresponding probability distributions for different sets of times are not compatible anymore [8,207]. Does this then mean that there is no singular object that can describe the joint probability for a sequence of quantum Figure 13. Perturbed coin with interventions. Between measurements, the coin -which initially shows heads -is perturbed and stays on its side with probability p and flips with probability 1 − p, leading to a stochastic matrix Γ between measurements. Using their instrument, upon measuring an outcome, the experimenter flips the coin. Here, this is shown for the outcome of hh. For most values of the probability p, this process -despite being fully classical -does not satisfy the requirement of the KET. events? Alternatively, what object would describe a quantum stochastic process if it cannot be a joint probability distribution?
Seemingly, the breakdown of consistency conditions prevents one from properly reconciling the idea of an underlying process with its manifestation on finite sets of times, as we did in classical theory by means of the KET. However, somewhat unsurprisingly, this obstacle is one of formalism, and not a fundamental one, in the sense that marginalization is more subtle for quantum processes than it is for classical ones; introducing a proper framework for the description of quantum stochastic processes -as we shall do below in Sec. V -brings with it a natural way of marginalization in quantum mechanics, that contains the classical version as a special case, and alleviates the aforementioned problems.

Input / output processes
Our understanding of classical stochastic processes, and with it the consistency between different observed joint probability distributions are built upon the idea that classical measurements are non-invasive. However, depending on the 'instrument' J an experimenter uses to probe a system, this assumption of non-invasiveness might not be fulfilled, even in classical physics.
To see this, consider the example of a perturbed coin, that flips with probability p and stays on the same side with probability 1 − p (see Figure 13). Instead of merely observing outcomes, an experimenter could actively interfere with the process. As there are many different ways, how the experimenter could interfere at each point in time, we have to specify the way in which they probe, or, in anticipation of later matters, what instrument they use, which we will denote by J .
For example, upon observing heads or tails, they could always flip the coin to tails and continue perturbing it. Or, upon observing an outcome, they could flip the coin, i.e., h ↦ t and t ↦ h. Finally, they could just leave it on the side they found it in and let the perturbation process continue. Let us refer to the latter two instruments J F and J I , respectively. Now, let us assume, that, before the first perturbation, the coin shows heads. Then, if at t 1 we choose the instruments J 1 = J F that, upon observing an outcome, flip the coin, we obtain, e.g., This means that P(F 2 = h), when J 1 = J F is 2p (1 − p). On the other hand, if the experimenter does not perform a measurement at the first time, i.e., J 1 = J I , upon perturbation, the coin will show h with probability 1 − p and t with probability p at time t 1 . Then, the probability to observe h at time t 2 is given by except if p = 1 2 . Thus the two cases do not coincide and the requirements of the KET are not fulfilled.
As, here, the experimenter can observe the output of the process, and freely choose what they input into the process, these processes are often called input-output processes and are subject of investigation in the field of computational mechanics [208]. A priori, it might seem arbitrary to allow for active interventions in classical physics. However, such operations naturally occur in the field of causal modeling [209], where they can be used to deduce causal relations between events. On the other hand, while in classical physics, it is a choice (up to experimental accuracy that is) to not actively interfere with the process at hand, in quantum mechanics, such an active intervention due to measurements -even projective ones -can generally not be avoided. Considering classical processes with interventions thus points us in the right direction as to how quantum stochastic processes should be modeled. In particular, the dependence of observed probability distributions on the employed instruments is a phenomenon that we will encounter when describing quantum stochastic processes.
Interestingly, this breakdown of the requirements of the KET is closely related to the violation of Leggett-Garg inequalities in quantum mechanics [210,211]. These inequalities are derived on the assumption of realism per se and noninvasive measurability. While realism per se implies that joint probability distributions for a set of times can be expressed as marginals of a respective joint probability distribution for more times, non-invasiveness means that all finite distributions are marginals of the same distribution. Naturally then, as soon as one of these conditions does not hold, the KET can fail and Leggett-Garg inequalities can be violated.

KET and spatial quantum states
Before finally advancing to quantum stochastic processes, it is instructive -as a preparation -to reconsider the concept of states in quantum mechanics, in the context of measurements. To this end, consider the situation depicted in Figure Figure 14. Spatial Measurements. Alice, Bob, Charlie, and David perform measurements on a seven-partite quantum state ρ. Both Bob and Charlie have access to two parts of said state, respectively, but while Bob can perform correlated measurements on said systems, Charlie can only access them independently. The probabilities corresponding to the respective outcomes are computed via the Born rule (see Eq. (122)).
corresponds to a positive matrix X j , and we have ∑ j X j = 1 1. Then, according to the Born rule, probabilities for the measurements depicted in Figure 14 are computed via where ρ ∶= ρ 1234567 is the probed multipartite state, and {X a m } is the POVM operator for party X with outcome a when measuring system m. We use the double subscript notation to label the operator index and the system at once. The above probability depends crucially on the respective POVMs the parties use to probe their part of the state ρ. This dependence is denoted by making the probability contingent on the instruments J X . As soon as ρ is known, all joint probabilities for all possible choices of instruments can be computed via the above Born rule. In this sense, a quantum state represents the maximal statistical information that can be inferred about spatially separated measurements. While, pictographically, Figure 14 appears to be a direct quantum version of the classical stochastic processes we encountered previously, there is a fundamental difference between spatially and temporally separated measurements: In the spatial setting, none of the parties can signal to the others. For example, we have for all instruments. Put differently, the quantum state a subset of parties sees is independent of the choice of instruments of the remaining parties. This is also mirrored by the fact that we model the respective measurement outcomes by POVM elements, which make no assertion about how the state at hand transforms upon measurement. On the other hand, the possible breakdown of the KET in quantum mechanics and classical processes with interventions shows that, in temporal processes, an instrument choice at an earlier time can influence the statistics at later times.
To accommodate for this kind of signaling between different measurements, we will have to employ a more general description of measurements, that accounts for the transformations a quantum state undergoes upon measurement. However, the general idea of how to describe temporal processes can be directly lifted from the spatial case: as soon as we know how to compute the statistics for all sequences of measurements and all choices of (generalized) instruments, there is nothing more that can be learned about the process at hand. Unsurprisingly then, we will recover a temporal version of the Born rule [212,213], where the POVM elements are replaced by more general completely positive (CP) maps, and the spatial quantum state is replaced by a more general quantum comb that contains all detectable spatial and temporal correlations.

V. QUANTUM STOCHASTIC PROCESSES
In the last section, we saw various methods to look at twotime quantum correlations. We now introduce tools that will allow us to consistently describe multi-time quantum correlations, independently of the choice of measurement. Before doing this, it is worth elaborating on the source of the troubles in way of a theory of quantum stochastic processes.

A. Subtleties of the quantum state and quantum measurement
Let us use the initial correlation problem in quantum mechanics as an example. This problem has been fraught with controversies for decades now [158] as some researchers have argued that, in presence of initial correlations, a dynamical map is not well defined [90], while others have argued to give up complete positivity or linearity [89,158]. What is the underlying reason for these disagreements? And does the same problem exist in classical mechanics?
The answer to this latter question is no. The crucial difference being that it is possible to observe classical states without disturbing the system, while the same cannot be said for quantum states. Consider a classical experiment that starts with an initial system-environment state that is correlated. The overall process takes the initial state Λ (t∶0) ∶ P(S 0 E 0 ) ↦ P(S t E t ). Of course, we can simply observe the system (without disturbing it) and measure the frequencies for S 0 = s j ↦ S t = s k . This is already enough to construct joint distribution P(S t , S 0 ) and from it, we can construct a stochastic matrix Γ (t∶0) that takes the initial system state to the final state. In other words, the initial correlations pose no obstacles at all here. This should not be surprising, after all, a multi-time classical process will have system-environment correlations at some point. And we have already argued that it is always possible to construct a stochastic matrix between any two points.
If we try to repeat the same reconstruction process for the quantum case, we quickly run into trouble. Again, without any controversy, we can imagine that an initial systemenvironment quantum state is being transformed into a final one, ρ SE (0) ↦ ρ SE (t). It may be then tempting to say that we can also have a transformation on the reduced state of the system ρ S (0) ↦ ρ S (t). However, we run into trouble as soon as we try to determine the process ρ S (0) ↦ ρ S (t). In order to do this, we need to relate different initial states and the corresponding final states. Do we then mean that there is an initial set of states {ρ SE (0)} and for each element of the set here we have a different initial state for the system, i.e., {ρ S (0) = tr E [ρ SE (0)]}? This is possible of course but requires knowledge about the environment, which goes against the spirit of the theory of open systems.
Our problem is still more profound when we focus solely on S. Suppose that the above setup is true and the elements of {ρ S (0)} are linearly independent, constituting an input basis. But then in a given run of the experiment, how do we know which element of this set we have at our disposal? Quantum mechanics fundamentally forbids us to unambiguously discriminate a set of non-orthogonal states, while the input set must be linearly independent. And if the set {ρ S (0)} contains all d 2 linearly independent basis elements, then at most d of them can be orthogonal. Therefore, quantum mechanics fundamentally forbids us from experimentally deducing the dynamical map from when there are initial correlations! This contextual nature of quantum states is the key subtlety that forces a fundamentally different structure for quantum stochastic processes than classical ones. Perhaps, a theorist may be tempted to say that never mind the experiments and let us construct the map with a theoretical calculation, i.e., first, properly define the SE dynamics then infer the process on S alone. This is in fact what was done by many theorists in the past two decades. They asked what happened if we fixed the initial state ρ SE (0) and consider the family of ρ S (0) states that are compatible with the former. Can we construct a map? These types of constructions are precisely what led to not completely positive maps. However, do such calculations have a correspondence with reality then [164]? The real source of the problem (in the technical sense) is that we need a set of initial states and corresponding final states to have a well-defined map. For an experimentalist, there is an easy solution. You simply go ahead and prepare the initial state as desired, which can even be noisy [214]. Then let this initial state evolve and measure the corresponding final state. In fact, this is the only way, in quantum mechanics, to ensure that we have a linearly independent set of input states, whose output states are also accessible. Without preparation at the initial time, we only have a single point in the state space, and a map is only defined on a dense domain. Now, the question is will a finite set of such experiments have enough information to construct a dynamical map?
Yes! It is easy to show that there is only a finite number of preparations that are linearly independent (for finitedimensional systems). And therefore, any other preparations can be expressed as a linear combination of a fixed set of preparations. Next, we lay out the mathematical foundations for the notion of preparation, historically it is known as an instrument, which generalizes POVM. With these tools, we will show that the solution to the initial correlation problem is well-defined, completely positive, and linear all at once [215]. Moreover, this is then a pathway to laying down the foundations for quantum stochastic processes.

B. Quantum measurement and instrument
As mentioned in Sec. IV E, unlike in the case of spatially separate measurements, in the temporal case, it is important how the state of the system of interest changes upon being measured, as this change will influence the statistics of subsequent measurements. In order to take this into account, we work with the concept of generalized instruments introduced by Davies and Lewis [216].
To this end, first, recall the definition of a POVM provided in Sec. IV A 2. A POVM is a collection of positive matrices J = {E j } n j=1 with the property ∑ j E j = 1 1. Each element of J corresponds to the possible outcomes of the measurements. Intuitively, a POVM allocates an operator to each measurement outcome of the measurement device that allows one to compute outcome statistics for arbitrary quantum states that are being probed. However, it does enable one to deduce how the state changes upon observation of one of the outcomes.
To account for state changes, we have to modify the concept of a POVM, which are known as a generalized instrument [217,218]. As POVMs turn out to be a special case of (generalized) instruments, we will denote them by J , too. An instrument corresponding to a measurement with outcomes j = {1, . . . , n} is a collection of CP maps J = {A j } that add up to a CPTP map, i.e., A = ∑ n j=1 A j . Each of the CP maps corresponds to one of the possible outcomes. For example, returning to the case of a measurement of a qubit in the computational basis, the corresponding instrument is given by (124) assuming, that after projecting the state onto the computational basis, it is sent forward unchanged, i.e., a projective measurement.
Importantly, an instrument allows one to compute both the probability to obtain different outcomes and the state change upon measurement. The latter is given by when the system in state ρ is interrogated by the instrument J , yielding outcome j. The state after the said interrogation, given the outcome, is obtained via the action of the corresponding element of the instrument. Importantly, this state is in general not normalized. Its trace provides the probability to observe a given outcome. Concretely, we have where the sum runs over all Kraus operators that pertain to the CP map A j , and we have ∑ α j K † α j K α j < 1 1 if A j is not trace preserving. The requirement that all CP maps of an instrument add up to a CPTP map ensures -just like in the analogous case for POVMs -the normalization of probabilities: which is 1 for all ρ. Naturally, the concept of generalized instruments contains POVMs as a special case, namely as those generalized instruments, where the output space of the respective CP maps is trivial. Put differently, if one simply wants to compute the probabilities of measurements on a quantum state, generalized instruments are not necessary. Concretely, we have This is because for a single measurement, the state transformation is not of interest. However, as we will see in the next section, this situation drastically changes, as soon as sequential measurements are considered. There, POVMs are not sufficient anymore to correctly compute statistics. Before advancing, it is insightful to make the connection between the CP maps of an instrument and the elements of its corresponding POVM explicit. This is most easily done via the CJI we introduced in Sec. IV B 3. There, we discussed that the action of a map A i on a state ρ can be expressed as where is the Choi state of the map A j and ρ ∈ B(H i ). Using this expression to compute probabilities, we obtain Comparing this last expression with the Born rule, we see that the POVM element E j corresponding to A j is given by where the additional transpose stems from our definition of the CJI. This definition indeed yields a POVM, as the partial trace of a positive matrix is also positive, and we have ∑ n j=1 E j = ∑ n j=1 tr o (A j ) = 1 1, where we have used that the Choi state A of A satisfies tr o (A) = 1 1. Discarding the outputs of an instruments thus yields a POVM. This implies that different instruments can have the same corresponding POVM. For example, the instrument that measures in the computational basis and feeds forward the resulting state, has the same corresponding POVM as the instrument that measures in the computational basis, but feeds forward a maximally mixed state, indiscriminate of the outcome. While both of these instruments lead to the same POVM, their influence on future statistics is very different.

POVMs, Instruments, and probability spaces
Before advancing to the description of multi-time quantum processes, let us quickly connect POVMs and instruments to the more formal discussion of stochastic processes we conducted earlier. The benefit of making this connection transparent is two-fold; on the one hand, it recalls the original ideas, stemming from the theory of probability measures, that led to their introduction in quantum mechanics. On the other hand, renders the following discussions of quantum stochastic processes a natural extension of both of the concepts of instruments, as well as the theory of classical stochastic processes.
In the classical case, we described a probability space as a σ-algebra (where each observable outcome corresponds to an element of the σ-algebra) and a probability measure ω that allocates a probability to each element of said σalgebra. Without dwelling on the technical details (see, e.g., Refs. [218,219] for more thorough discussions), this definition can be straightforwardly extended to POVMs and instruments. However, instead of directly mapping observable outcomes to probabilities, in quantum mechanics, we have to specify how we probe the system at hand. Mathematically, this means that instead of mapping the elements of our σalgebra to probabilities, we map them to positive operators via a function ξ that satisfies the properties of a probability measure (hence the name positive operator-valued measure). For example, the POVM element corresponding to the union of two disjoint elements of the σ-algebras is the sum of the two individual POVM elements, and so on. Together with the Born rule, each POVM then leads to a distinct probability measure on the respective σ-algebra. Concretely, denoting the Born rule corresponding to a state ρ by χ ρ [E] = tr(ρE), then ω ρ = χ ρ • ξ is a probability measure on the considered σ-algebra.
For instruments, the above construction is analogous, but with POVM elements replaced by CP maps. It is then a natural step to assume that, in order to obtain probabilities, a generalized Born rule [212,213], that maps CP maps to the corresponding probabilities. More generally yet, sequences of measurement outcomes correspond to sequences of CP maps, and a full description of the process at hand would be given by a mapping of such sequences to probabilities. In the next section, we will see that this reasoning indeed leads to a consistent description of quantum stochastic processes that -additionally -resolves the aforementioned problems, like, e.g., the breakdown of the Kolmogorov extension theorem.

C. Initial correlations and complete positivity
With the introduction of the instrument, we are now in a position to operationally resolve the initial correlation problem. We begin with an initial system-environment quantum state that is correlated. Now, in a meaningful experiment, that aims to characterize the dynamics of the system from the initial time to the final time, one will apply an instrument J = {A j } on the system alone at the initial time to prepare it into a known (desired) state. Next, the total SE is propagated in time via a map Note that, due to the dilation theorem in Sec. V D 6 we can always take the system-environment propagator to be unitary. Taking the propagator to be a CPTP map will make no difference at the system level. The full process can written down as Figure 15. Complete Positivity and Trace Preservation for Superchannels. A superchannel is said to be CP if it maps CP maps to CP maps (even when acting on only a part of them), and CPTP maps to CPTP maps. Here, T (t∶0) is CP (CPTP) if for all CP (CPTP) maps A and all possible ancilla sizes, the resulting map A ′ is also CP (CPTP). Note that for the TP part, it is already sufficient that T (t∶0) maps all CPTP maps on the system to a unit trace object.
Above I is the identity map on the E as the instrument act only on S. Now, let us recall that a map (in quantum, classical, and beyond physics) is nothing more than a relationship between inputs and outputs. Here, the inputs are the choice of the instrument J = {A j } and the corresponding outcome is ρ S (t). Then right away, by combining everything that is unknown to the experimenter in Eq. (132) into one object, we have the map The map T (t∶0) was introduced in Ref. [215] and was referred to as the superchannel in Ref. [220], where it was first realized experimentally. Ref. [215] proved that this map is linear, completely positive, trace-preserving; and clearly, it is welldefined for any initial preparation. The trace-preservation property means that if A is CPTP then the output will unittrace. See Ref. [221] for further discussion and theoretical development.
The meaning of complete positivity for this map is operationally clear; suppose the instrument J acts not only on the system S, but also on an ancilla. Then the superchannel's complete positivity guarantees that the full system-ancilla output state is positive (see Figure 15 for a graphical depiction).
Importantly, the superchannel is a higher-order map as its domain is the set of CP maps and the image is density operators. Clearly, this is different from the quantum stochastic matrix. In fact, the superchannel is the first step beyond two point quantum correlations. This is most easily seen from its Choi state, which is a bounded operator on three Hilbert spaces: (details for constructing the Choi state of higher order maps can be found in Sec. V D 3). Moreover, the superchannel contains 'normal' quantum channels as a limiting case. Namely, when there are no initial correlations, i.e., ρ SE (0) = ρ S (0) ⊗ ρ E (0), then the superchannel reduces to the usual CPTP map: The superchannel is a primitive for constructing the descriptor of quantum stochastic processes. As such, it should Figure 16. General quantum stochastic process. System of interest is coupled to an unknown environment and probed at times In between measurements, the system and the environment together undergo closed, i.e., unitary dynamics. The corresponding multi-time joint probabilities can be computed by means of the process tensor corresponding to the process at hand (depicted by the grey dotted outline).
be operationally accessible via a set of experiments. The input of the superchannel, an instrument, is linearly spanned by a basis of CP maps. This means that the superchannel is fully determined by its action on the CP maps that form a linear basis:

Importantly, any CP map can be cast as a linear sum of the basis maps as
In fact, the superchannel has been observed in the laboratory [220] and proven to be effective at dealing with initial correlations without giving up either linearity or complete positivity. One then might wonder how does this get around Pechukas' theorem? To retain both linearity and complete positivity we have given up the notion of the initial state. In fact, as we argued in Sec. V A, in presence of correlation, quantum mechanics doesn't allow for a well-defined local state beyond a singular point in the Hilbert space. Therefore a map on this singular point alone is not very much meaningful, hence there's no big loss in giving up the notion of the initial state. [222] Finally, it should be said that this line of reasoning is very close to that of Pearl [209] in classical causal modeling, which goes beyond the framework of classical stochastic processes and allows for interventions.

D. Multi-time statistics in quantum processes
Following the above resolution for the initial correlation problem in quantum mechanics, we are now in a position to provide a fully-fledged framework for the description of multi-time quantum processes. Here, we predominantly focus on the case of finitely many times at which the process of interest is interrogated (for an in-depth discussion of continuous measurements, see, for example, Refs. [223,224]). Note that here, we can but scratch the surface of the different approaches that exist to the theory of multi-time quantum processes. For a much more in-depth investigation of the relation between different concepts of memory in quantum physics, see Ref. [10].
In principle, there are two ways to motivate this framework. On the one hand, by generalizing joint probabilities, the descriptor of classical stochastic processes, to the quantum realm, and taking into consideration that, in quantum mechanics, we have to specify the instruments that were used to interrogate the system. This approach would then yield a temporal Born rule [212,213], and provide a natural descriptor of quantum stochastic processes in terms of a 'quantum state over time'. We will circle back to this approach below. Here, we shall take the second possible route to the description of multi-time open quantum processes, which -just like in the case of initial correlations -is motivated by considering the underlying dynamics of a quantum stochastic process. As we shall see, though, both approaches are equivalent and lead to the same descriptor of quantum stochastic processes.
To obtain a consistent description of a multi-time process, consider a system of interest S coupled to an environment E. Initially, the joint system-environment (SE) is in state ρ SE (0). Together, we consider SE to be closed, such that the system-environment evolves unitarily -described by the unitary map For brevity we have contract the subscript on U. Next, in order to minimize notational clutter we define several sets The first set, T k is the set of times on which the process is defined. At these times the system S is interrogated with a set of instruments J T k yielding a set of outcomes x T k . The set of outcomes correspond to a set of CP map A x T k . Note, that while we have let the instruments each time be independent, we can also allow for correlated instruments, also known as testers, see Sec V D 5.
The overall system environment dynamics is thus a sequence of unitary maps on the system and the environment, interspersed by CP maps that act on the system alone; this is shown in Figure 16. This continues until a final intervention at t k , and then the environmental degrees of freedom are discarded. We emphasize that, as we do not limit or specify the size of the environment E, this setup is fully general; as we outlined above, due to the Stinespring dilation, any quantum evolution between to points in time can be understood as a unitary evolution on a larger space. As such, our envisioned setup is the most general description of the evolution of an open quantum system that is probed at times T k . We will see below that this statement even holds in more generality: there are no conceivable quantum stochastic processes that cannot be represented in the above way, as sequences of unitaries on a system-environment space, interspersed by CP maps that act on the system alone.
The probability to observe a sequence of quantum events, i.e., the outcomes x T k corresponding CP to maps A x T k , can then be straightforwardly computed via Above, ◯ denotes composition of maps, A act on S alone, while U act on SE, but we have omitted I on E for brevity. The last equation is just quantum mechanics, as well as simply a multi-time version of Eq. (132), which defines the superchannel. Of course, the challenge is to turn this equation into a clear descriptor for a multi-time quantum process. This can be done by noting that the above expression is a multi-linear map with respect to the maps A x k [225]. It is then possible to write the last equation as a multi-linear functional T T k , which we call the process tensor: While seemingly a mere mathematical rearrangement, the above description of an open system dynamics in terms of the process tensor [225][226][227] T T k is of conceptual relevance; it allows one to separate the parts of the dynamics that are controlled by the experimenter, i.e., the maps A x T k from the unknown and inaccessible parts of the dynamics, i.e., the initial system-environment state and the system-environment interactions. This clean separation means that when we speak of a quantum stochastic process we only need to refer to T T k , and then for any choice of instrument, we can compute the probability for the sequence of outcomes by means of Eq. (142). Moreover, this separation will later help us to resolve the aforementioned issues with the KET in quantum mechanics, where, apparently, the possible invasiveness of measurements prevented a consistent description of quantum stochastic processes. This will be possible because T T k does not depend on the maps A x T k , and as such provides a description of open quantum system dynamics that is independent of the way in which the process at hand is probed. In addition, T T k is the clear generalization of the superchannel T (t∶0) , which is, in turn, is a generalization of CPTP maps as discussed in Sec V C.
We now discuss several key properties of the process tensor. To remain close the classical case in spirit, we will focus on probabilities. However, we could also understand the process tensor as a mapping from operations to a final state at time t k ; as it can act on all sequences of CP maps, one can choose to not apply an instrument at t k . Consequently, T T k allows for the construction of a related map whose output is a quantum state at t k conditioned on the sequence of CP maps A x T k at times T k .

Linearity and tomography
As mentioned above, T T k is a multi-linear functional on sequences of CP maps A x T k . Consequently, once all the probabilities for the occurrence of a basis of such sequences are known, the full process tensor is determined. As the space of such sequences is finite-dimensional (for d < ∞), this, in turn, implies, that T T k can be reconstructed in a finite number of experiments, in a similar vein to the reconstruction of quantum channels discussed above. The instrument J t j at any time t j is a set of CP maps Since we can choose an instrument at each time independently of other times, we can form a multi-time basis as a tensor product of basis elements at each time forms linear basis on all times T k : Let us define the action of the process tensor on the multi-time basis as As before with quantum process tomography in Sec. IV B 1 we use the fact that there is dual set {D x j } corresponding to the basis set {Â x j } such that tr(D † We can thus write the process tensor as By construction, the process tensor above yields the correct probabilities for any basis sequenceĴ T k , thus yielding the correct outcome for any conceivable sequence of measurements. Thus, in order to reconstruct a process tensor on times T k , an experimenter would have to probe the process using informationally complete instruments {Ĵ j } -in the sense that its elements span the whole space of CP maps -at each time, and record the probabilities for each possible sequence of outcomes. This reconstruction also applies in case that the experimenter does not have access to informationally complete instruments, yielding a 'restricted' process tensor [203,228], that only meaningfully applies to operations that lie in the span of those that can be implemented. While the number of necessary sequences for the reconstruction of a process tensor scales exponentially with the number of times (if there are N times, then there are d 4N different sequences, for which the probabilities would have to be determined), the number is still finite, and thus, in principle, feasible. We note that classical processes are scaled by a similar exponential scaling problem. If there are d different outcomes at each time, then the joint probabilities for d N different sequences of N outcomes.

Spatiotemporal Born rule
While conceptually insightful, in its current form, T T k is still rather abstract. This can, just like in the case of quantum maps, be remedied by choosing a concrete representation. While, again, several equivalent representations are possible (see Ref. [70] for details), here, we opt for the representation in terms of Choi states. To this end, let {A x j } be the Choi states of the maps {A x j } and let i.e., the set of Choi-state corresponding to the set of CP maps in the se A x T k . Then, Eq. (147) can be written as where Υ T k is the Choi state of T T k (see below for its explicit definition). The above constitutes a in a multi-time generalization of the Born rule [212,213], where Υ T k plays the role of a quantum state over time, and the Choi states A x T k play a role that is analogous to that of POVM elements in the spatial setting. In principle, Υ T k can be computed by means of the link product ⋆ defined in Ref. [229] as Here, U j is the Choi state of the map U j and the link product acts like a matrix product on the space E and a tensor product on space S. Importantly, Υ T k can also be reconstructed experimentally in a similar vein to the reconstruction of quantum channels (see below and Refs. [225,228] for details). Moreover, Υ T k is a many-body density matrix (up to a normalization), therefore correct generalization for a classical stochastic process which is a joint probability distribution over many random variables. Since it allows the compact phrasing of many of the subsequent results, we will often opt for a representation of T T k in terms of its Choi matrix in what follows. Nonetheless, for better accessibility, we will also express our results in terms of maps whenever appropriate. Before advancing, let us recapitulate what has been achieved by introducing the process tensor for the description of general quantum processes. First, the effects on the system due to interaction with the environment have been isolated in the process tensor Υ T k . All of the details of the instruments and their outcomes are encapsulated in A x T k , while all inaccessible effects and influences are contained in the process tensor. In this way, Υ T k is a complete representation of the stochastic quantum process, containing all accessible multi-time correlations [230][231][232][233]. [234] The process tensor can be formally shown to be the quantum generalization of a classical stochastic process [206], and it reduces to classical stochastic process in the correct limit [206,235,236].
We emphasize that open quantum system dynamics is not the only field of physics where an object like the process tensor (or variants thereof) crop up naturally. See, for example Refs. [227,229,[237][238][239][240][241][242][243][244][245][246][247][248][249] for an incomplete collection of works where similar mathematical objects have been used for the study of higher-order quantum maps, causal automata/non-anticipatory channels, quantum networks with modular elements, quantum information in general relativistic space-time, quantum causal modeling, and quantum games. In open quantum system dynamics, they have been used in the disguise of so-called correlation kernels already in early works on multi-time quantum processes [218,219,250]. Figure 17. Choi state of a process tensor. At each time, half of an unnormalized maximally entangled state is fed into the process. For better book-keeping, all spaces are labeled by their respective time. The resulting state many-body state Υ t k ∶t 0 contains all spatiotemporal correlations of the corresponding process as spatial correlations.

Many-body Choi state
While we now know how to experimentally reconstruct it, it remains to provide a physical interpretation for Υ T k , and discuss its properties. We start with the former. For the case of quantum channels, the interpretation of the Choi state Υ E is clear; it is the state that results from letting E act on half of an unnormalized maximally entangled state. Υ E then contains exactly the same information as the original map E. Somewhat unsurprisingly, in the multi-time case, the CJI is similar to the two-time scenario of quantum channels. Here, however, instead of feeding one half of a maximally entangled state into the process once, we have to do so at each time T k (see Figure 17 for a graphical representation). From Eq. (149), we see that Υ T k must be an element of . Labeling the maximally entangled states in Figure 17 diligently, and distinguishing between input and output spaces, we see that the resulting state lives in exactly Υ T k the right space. Checking that the matrix Υ T k constructed in this way indeed yields the correct process tensor can be seen by direct insertion. Indeed, by using the Choi state of Figure 17 and the definition of the Choi state {A x j }, one sees that Eq. (149) holds. While straightforward, this derivation is somewhat arduous and left as an exercise to the reader.
We thus see that Υ T k is proportional to a many-body quantum state, and the spatio-temporal correlations of the underlying process are mapped onto spatial correlations of Υ T k via the CJI. These properties lend themselves to convenient methods for treating a multi-time process as a many-body state with applications for efficient simulations and learning the process [251][252][253][254][255]. Additionally, the CJI for quantum channels is simply a special case of the more general CJI presented here. Figure 18. Trace conditions on process tensors. Displayed is the pertinent part of Figure 17. As tr • U = tr for all CPTP maps U, tracing out the final degree of freedom of ΥT k , denoted by k i , amounts to a partial trace of Φ + k−1 o . This, in turn, yields a tensor product between 1 1 k−1 i and a process tensor on one step less.

Complete positivity and trace preservation
Just like for the case of quantum channels, the properties of a multi-time process can be most easily read off its Choi state. First, as we have seen above, Υ T k is positive. Just like in the case of channels, and superchannels, this property implies complete positivity of the process at hand. As was the case for superchannels, complete positivity here has a particular meaning: let the process act on any sequence of CP maps where B x is the Choi state of B x . These superoperators act both on the S of interest, as well as some external ancillas, which we collectively denote by B, which do not interact with the environment E that is part of the process tensor. We can see the complete positivity of the process tensor directly in terms of the positivity of the process Choi states Above Υ T k acts on S at times T k and 1 1 B is the identity matrix on the ancillary degrees of freedom B. As the positivity of the Choi state implies complete positivity of the underlying map, any sequence of CP maps is mapped to a CP map by T T k . Analogously, we could have expressed the above equation in terms of maps, yielding However, as mentioned, the properties of process tensors are much more easily represented in terms of their Choi matrices.
In clear analogy to the case of quantum channels, process tensors should also satisfy a property akin to trace preservation. At its core, trace preservation is a statement about normalization of probabilities. As CPTP maps can be implemented with unit probability, at first glance, the natural generalization of trace preservation thus appears to be for all CPTP maps A 0 , . . . , A k . However, this requirement on its own is too weak, as it does not encapsulate the temporal ordering of the process at hand [245]. If only the above requirement was fulfilled, then, actions at a time t j could in principle influence the statistics at an earlier time t ′ j < t j . This should be forbidden by causality, though. Fortunately, Υ T k already encapsulates the causal ordering of the underlying process by construction. Specifically, tracing over the degrees of freedom of Υ T k that correspond to the last time (i.e., the degrees of freedom labeled by k i in Figure 17) yields where Υ T k−1 is the process tensor on times T k−1 with a final output degree of freedom denoted by k − 1 i . The above property trickles down, in the sense that Before elucidating why these properties indeed ensure causal ordering, let us quickly lay out why they hold. To this end, it is actually sufficient to only prove the first condition (155), as the others follow in the same vein. A rigorous version of this proof can, for example, be found in [225,229]. Here, we will prove it by means of Figure 17. Consider tracing out the degrees of freedom denoted by k i in said figure. This, then, amounts to tracing out all output degrees of freedom of the map U k . As U k is CPTP, tracing out all outputs after applying U k is the same as simply tracing out the outputs without having applied U k , i.e., tr • U k = tr. This, then, implies a partial trace of the unnormalized maximally entangled state Φ + k−1 o , yielding 1 1 k−1 o , as well as a trace over the environmental output degrees of freedom of U k−1 (see Figure 18 for a detailed graphical representation). The remaining part, i.e., the part besides 1 1 k−1 o is then a process tensors on the times T k−1 = {t 0 , . . . , t k−1 }. Iterating these arguments then leads to the hierarchy of trace conditions in Eq. (156).
Showing that the above trace conditions indeed imply correct causal ordering of the process tensor now amounts to showing that a CPTP map at a later time does not have an influence on the remaining process tensor at earlier times. We start with a CPTP map at t k . This map does not have an output space. The only CPTP map with trivial output space is the trace operation, which has a Choi state 1 1 k i . Thus, letting Υ T k act on it amounts to a partial trace tr k i Υ T k , which is equal to 1 Letting this remaining process tensor act on a CPTP map A k−1 at time t k−1 yields where tr k−1 denotes the trace over k − 1 i and k − 1 o , and we have used the property of CPTP maps that tr o k−1 A k−1 = 1 1 k−1 i . As the LHS of the above equation does not depend on the specific choice of A k−1 , no statistics before t k−1 will depend on the choice of A k−1 either. Again, iterating this argument then shows that the above hierarchy of trace conditions implies proper causal ordering. As before, the above Figure 19. Tester Element. In the most general case, an experimenter can correlate the system of interest with an ancilla (here, initially in state |Ψ⟩), use said ancilla again at the next time, etc., and make a final measurement with outcome x in the end. As the unitaries V j can also act trivially on parts of the ancilla, this scenario includes all conceivable measurements an experimenter can perform. Summing over the outcomes x T k amounts to tracing out the ancillas, thus yielding a proper comb (compare with Figure 16. Note that the inputs (outputs) of the resulting tester elements correspond to the outputs (inputs) of the process tensor, and the system of interest corresponds to the top line, not the bottom line. result can, equivalently, be stated in terms of maps. However, the corresponding equations would not be very enlightening.
At this point, it is insightful to return to the two different ways of motivating the discussion of quantum stochastic processes we alluded to at the beginning of Sec. V D. Naturally, we could have introduced the process tensor as a positive linear functional that maps sequences of CP maps to probabilities and respects the causal order of the process. This, then, might in principle have led to a larger set of process tensors than the ones obtained from underlying circuits. However, this is not the case; as we shall see in the next section, any object that is positive and satisfies the trace hierarchy above actually corresponds to a quantum circuit with only pure states and unitary intermediate maps.
Finally, one might wonder, why we never discussed the question of causality in the case of classical stochastic processes. There, however, causality does not play a role per se if only non-invasive measurements are considered. It is only through the invasiveness of measurements/interrogations that influences between different events, and, as such, causal relations can be discerned. A joint probability distribution obtained from making non-invasive measurements thus does not contain information about causal relations. This, naturally, changes drastically, as soon as active interventions are taken into considerations, as is done in the field of classical causal modeling [209], and as cannot be avoided in quantum mechanics [206].

Testers: Temporally correlated 'instruments'
While so far, we have only considered the application of independent instruments, which have the form given in Eq. (148). However, these are not the only operations a process tensor can meaningfully act on. In principle, an experimenter could, for example, condition their choice of instrument at time t j ′ on all outcomes they recorded at times t j < t j ′ . This would lead to a (classically) temporally correlated 'instrument', which is commonly practiced in quantum optics experiments [256]. More generally, at times T k , the experimenter could correlate the system of interest with ex-ternal ancilla, which are re-used, and measure said ancilla at time t k (see Figure 19). This, then, would result in a generalized instrument that has temporal quantum correlations.
We can always express this using a local linear basis as The LHS side of this equation is labeled the same as Eq. (148), this is because the above equation contains Eq. (148). Here, {Â x j } form a linear basis for operations at time t j and α x T k are generic coefficient that can be non-positive. In other words, the above correlated instrument can carry 'entanglement in time'. Such generalizations of instruments have been called 'testers' in the literature [212,229,257].
In the case of 'normal' instruments, the respective elements have to add up to a CPTP map. Here, in clear analogy, the elements of a tester have to add up to a proper process tensor. In terms of Choi states, this means that the elements {A x T k } of a tester have to be positive, and add up to a matrix A = ∑ k A x T k that satisfies the hierarchy of trace conditions of Eqs. (155) and (156). We emphasize that the possible outcomes x T k that label the tester elements do not have to correspond to sequences x 0 , . . . x k of individual outcomes at times t 0 , . . . t k . As outlined above, for correlated tester elements, all measurements could happen at the last time, only, or at any subset of times. Consequently, in what follows, unless explicitly stated otherwise, x T k will label 'collective' measurement outcomes and not necessarily sequences of individual outcomes However, for a tester, the roles of input and output are reversed with respect to the process tensors that act on them; an output of the process tensor is an input for the tester and vice versa. Consequently, keeping the labeling of spaces consistent with the above, and assuming that tester ends on the last output space k i , the trace hierarchy starts with tr etc., implying that, with respect to Eqs. (155) and (156), the roles of i and o in the trace hierarchy are simply exchanged. Naturally, testers generalize both POVMs and instruments to the multitime case.
Importantly, for any element A x T k of a tester that is ordered in the same way as the underlying process tensor, we have which can be seen by employing the hierarchy of trace conditions that hold for process tensors and testers. Similarly to the case of POVMs and instruments, letting a process tensor act on a tester element yields the probability to observe the outcome x T k that corresponds to A x T k (see Figure 20 for a graphical representation). Below, we will encounter temporally correlated tester elements when discussing

Causality and dilation
In Sec. V E, we will see that, besides being a handy mathematical tool, process tensors allow for the derivation of a Figure 20. Action of a process tensor on a tester element. 'Contracting' a process tensor (depicted in blue) with a temporally correlated measurement, i.e., a tester element (depicted in green), yields the probability for the occurrence of said tester element. Figure 21. Dilation of the Choi state of a process tensor. Up to normalization, Eqs. (161) and (162) together yield a quantum circuit for the implementation of the Choi state of a two-step process tensor that only consists of pure states and isometries (which could be further dilated to unitaries).
generalized extension theorem, thus appearing to be the natural extension of stochastic processes to the quantum realm on a fundamental level. Here, we will, in a first step, connect process tensors to underlying dynamics. In classical physics, it is clear that every conceivable joint probability distribution can be realized by some -potentially highly exotic -classical dynamics. On the other hand, so far, it is unclear if the same holds for process tensors. By this, we mean, that, we have not shown the claim made above, that every process tensor, i.e., every positive matrix that satisfies the trace hierarchy of Eqs. (155) and (156) can actually be realized in quantum mechanics. We will provide a short 'proof' by example here; more rigorous treatments can, for instance, be found in Refs. [225,229,239,258].
Concretely, showing that any process tensor can be realized in quantum mechanics amounts to showing that they admit a quantum circuit that is only composed of pure states and unitary dynamics. This is akin to the Stinespring dilation we discussed in Sec. IV B 4, which allowed us to represent any quantum channel in terms of pure objects only. In this sense, the following dilation theorem will even be more general than the analogous statement in the classical case, where randomness has to be inserted 'by hand'.
We use the property that all quantum states are purifiable to obtain a representation for general process tensors. For concreteness, let us consider a three-step process tensor Υ 0 i 0 o 1 i 1 o 2 i , defined on three times {t 0 , t 1 , t 2 }. Now, due to Figure 22. Process tensor corresponding to Figure 21. Rearranging the wires of the circuit of Figure 21 and maximally entangled states (i.e., undoing the CJI), yields the representation of a process tensor we have already encountered in the previous section. As above, note that the causality constraints of Eqs. (155) and (156), we have where the prime is added for clearer notation in what follows. This latter term can be dilated in two different ways: where

respectively (with corresponding ancillary purification spaces A and {1
o′ , B}), and the additional pre-factor d 1 o = tr(1 1 1 o ) is required for proper normalization. These two different dilations of the same object are related by an isometry V 1 o′ B→2 i A =∶ V that only acts on the dilation spaces, i.e., In the same vein, due to the causality constraints of Υ ′ 0 i 0 o 1 i , we can show that there exists an isometry W 0 o′ 0 i′ →1 i B =∶ W , such that where |Υ ′′ ⟩ 0 i 0 i′ is a pure quantum state. Inserting this into Eq. (161) and using that Υ Figure 21). As any isometry can be completed to a unitary, this implies that Υ 0 i 0 o 1 i 1 o 2 i can indeed be understood as stemming from a quantum circuit consisting only of pure states and unitaries. This circuit simply provides the CJI of the corresponding process tensor, as can easily be seen by 'removing' the maximally entangled states, and rearranging the wires in a more insightful way (see Figure 22). Naturally, these arguments can be extended to any number of times. Here, we sacrificed some of the mathematical rigor for brevity and clarity of the exposition; as mentioned, for a more rigorous derivation, see Refs. [225,229,239,258]. Above, we provided a consistent way to describe quantum stochastic processes. Importantly, this description given by process tensors can deal with the inherent invasiveness of quantum measurements, as it separates the measurements made by the experimenter from the underlying process they probe. Unsurprisingly then, employing this approach to quantum stochastic processes, the previously mentioned breakdown of the KET in quantum mechanics can be resolved in a satisfactory manner [206,250].
Recall that one of the ingredients of the Kolmogorov extension theorem -which does not hold in quantum mechanics -was the fact that a multi-time joint probability distribution contains all joint probability distributions for fewer times. In quantum mechanics on the other hand, a joint probability distribution, say, at times {t 1 , t 2 , t 3 } for instruments {J 1 , J 2 , J 3 } does not contain the information of what statistics one would have recorded, had one not measured at t 2 , but only at times {t 1 , t 3 }. More generally, P(x 3 , x 2 , x 1 |J 3 , J 2 , J 1 ) does not allow one to predict probabilities for different instruments {J On the other hand, the process tensor allows one to -on the set of times it is defined on -compute all joint probabilities for all employed instruments, in particular, for the case where one of more of the instruments are the 'do-nothing' instrument. Consequently, it is easy to see that for a given process on, say, times {t 1 , t 2 , t 3 }, the corresponding process tensor T {t 1 ,t 2 ,t 3 } -where for concreteness, here we use T {t 1 ,t 2 ,t 3 } instead of T T 3 -contains the correct process tensors for any on any subset of {t 1 , t 2 , t 3 }. For example, we have where I 2 is the identity map I[ρ] = ρ at time t 2 (see Figure 23 for a graphical representation). This, in turn, implies that process tensors satisfy a generalized consistency condition. Importantly, as I is a unitary operation, letting T act on an identity does generally not coincide with the summation over measurement outcomes. Concretely, for any instrument with more than one outcome, we have ∑ x A x ≠ I, and thus summation over outcomes is not the correct way to 'marginalize' process tensors. We will discuss below, why it works nonetheless for classical processes. With this generalized consistency condition at hand, a generalized extension theorem (GET) in the spirit of the KET can be proven for quantum processes [206,250]; any underlying quantum process on a set of times T leads to a family of process tensors {T T k } T k ⊂T that are compatible with each other, ℰ (t:r) ≠ ℰ (t:s) ∘ ℰ (s:r)  Figure 24. Hierarchy of multi-time quantum processes. A quantum stochastic process is the process tensor over all times. Of course, in practice one looks only at finite time statistics. However, generalized extension theorem tells us that the set of all k-time process tensors {Υ T k } contain, as marginals, all j-time probability distributions {Υ T j } for j < k. Moreover, the set of two and three time processes play a significant roles in the theory of quantum stochastic processes. Here, we only display a small part of the multi-faceted structure of non-Markovian quantum processes. For a much more comprehensive stratification, see Ref. [7,10].
while any family of compatible process tensors implies the existence of a process tensor that has all of them as marginals in the above sense. More precisely, setting where we employ the shorthand notation ⨂ α∈T \T k I α to denote that the identity map is 'implemented' at each time t α ∈ T \ T k , we have the following Theorem [206,250]: Theorem. (GET) Let T be a set of times. For each finite T k ⊂ T, let T T k be a process tensor. There exists a process tensor T T that has all finite ones as 'marginals', i.e., T T k = T |T k T iff all finite process tensors satisfy the consistency condition, i.e., Importantly, this theorem contains the KET as a special case, namely the one where all involved process tensors and operations are classical. Consequently, introducing process tensors for the description of quantum stochastic processes closes the apparent conceptual problems we discussed earlier, and provides a direct connection to their classical counterpart; while quantum stochastic processes can still be considered as mappings from sequences of outcomes to joint probabilities, in quantum mechanics, a full description requires that these probabilities are known for all instruments an experimenter could employ (see Figure 25). Additionally, the GET provides satisfactory mathematical underpinnings for physical situations, where active interventions are purposefully employed, for example, to discern different causal relations and mechanisms. This is for instance the case in classical and quantum causal modeling [209,227,247,259] (see Figure 26 for a graphical representation).
In light of the fact that, mathematically, summing over outcomes of measurements does not amount to an identity map -even in the classical case -it is worth reiterating from a mathematical point of view, why the KET holds in classical physics. For a classical stochastic process, we always implic- Figure 25. 'Trajectories' of a quantum stochastic process. An open quantum process is fully described once all joint probabilities for sequences of outcomes are known for all possible instruments an experimenter can employ to probe the process. Like in the classical case, each sequence of outcomes can be considered a trajectory, but unlike in the classical case, there is no ontology attached to such trajectories. Additionally, each sequence of outcomes in the quantum case corresponds to a sequence of measurement operators, not just labels. If both the process and the allowed (noninvasive) measurements are diagonal in the same fixed basis, then the above figure coincides with Figure 5, where trajectories of classical stochastic processes were considered. Importantly, while in classical physics, only probabilistic mixtures of different trajectories are possible, quantum mechanics allows for the coherent superposition of 'trajectories' [260].
itly assume that measurements are made in a fixed basis (the computational basis), and no active interventions are implemented. Mathematically, this implies that the considered CP maps are of the form A x j [ρ] = ⟨x j |ρ|x j ⟩ |x j ⟩⟨x j |. Summing over these CP maps yields the completely dephasing CPTP map ∆ j [ρ] ∶= ∑ x j ⟨x j |ρ|x j ⟩ |x j ⟩⟨x j |, which does not coincide with the identity map. However, on the set of states that are diagonal in the computational basis, the action of both maps coincides, i.e., ∆ j [ρ] = I j [ρ] for all ρ = ∑ x j λ x j |x j ⟩⟨x j |. More generally, their action coincides with the set of all combs that describe classical processes [236]. In a sense then, mathematically speaking, the KET works because, in classical physics, only particular operations, as well as particular process tensors are considered. Going beyond either of these sets requires -already in classical physics -a more general way of 'marginalization', leading to an extension theorem that naturally contains the classical one as a special case.

A. Quantum Markov conditions and Causal break
Now armed with a clear description for quantum stochastic processes, i.e., the process tensor, we are in the position to ask Figure 26. (Quantum) Causal network. Performing different interventions allows for the causal relations between different events (denoted by X j ) to be probed. For example, in the figure the event B 1 directly influences the events C 3 and A 2 , while A 3 influences only B 4 . As not all pertinent degrees of freedom are necessarily in the control of the experimenter, such scenarios can equivalently be understood as an open system dynamics. Any such scenario can be described by a process tensor [229], and the GET applies, even though active interventions must be performed to discern causal relations. For example, the events D 3 , D 4 , B 5 could be successive (e.g., at times t 3 , t 4 and t 5 ) spin measurements in z-, xand z-direction, respectively. Summing over the results of the spin measurement in x-direction at t 4 would not yield the correct probability distribution for two measurements in z-direction at t 3 and t 5 only, but consistency still holds on the level of process tensors (see also Sec. IV E 1).
when is a quantum process Markovian. We will formulate a quantum Markov condition [226,227] by employing the notion of causal breaks. Intuitively speaking, information of the past can be transmitted to the future in two different ways: via the system itself and via the inaccessible environment. In a Markovian process, the environment does not transmit any past system information to the future process on the system. This condition is encapsulated in the classical Markov condition A causal break allows one to extend this classical intuition to the quantum case. It is designed to block the information transmitted by the system itself and at the same time look for the dependence of the future dynamics of the system conditioned on the past control operations on the system. If the future depends on the past controls, then we must conclude that the past information is transmitted to the future by the environment, which is exactly the non-Markovian memory. Let us begin by explicitly denoting the process Υ T on a set of time T = {t 0 , . . . , t k , . . . , t }. We break this set into two subsets at an intermediate time step k < l as T − = {t 0 , . . . , t k − } and T + = {t k + , . . . , t }. In the first segment, we implement a tester A x − belonging to instrument J − with outcomes x − . In the next segment, as the system evolves to time step , we implement a tester A x + belonging to instrument J + with outcomes x + (see Figure 27). Together, we have applied two independent testers A x + ⊗ A x − , where the simple tensor product between the two testers implies their independence. In detail, the two testers split the timestep k: the first instrument ends with a measurement on the output of the process at time t k (labeled as t k − ). The second instrument begins with preparing a fresh state at the same time (labeled as t k + ). Importantly, it implies a causal break that prevents any information transmission between the past T − and the future T + via the system.
Let us now focus solely on the outcome statistics of the future process, which are given by Eq. (149) Note that we have added a second condition x − on the LHS because the future process, in general, may depend on the outcomes for the past instrument J − . This operationally welldefined conditional probability is fully consistent with the conditional classical probability distributions in Eq. (7). The causal break at timestep k guarantees that the system itself cannot carry any information of the past into the future beyond step k. The only way the future process Υ T + could depend on the past is if the information of the past is carried across the causal break via the environment. We have depicted this in Figure 27, where the only possible way the past information can go to the future is through the process tensor itself. This immediately results in the following operational criterion for a Markov process: Quantum Markov Condition. A quantum process is Markovian when the future process statistics, after a causal break at time step k (with l > k), is independent of the past instrument outcomes x − ∀ J + , J − and ∀ k ∈ T.
Alternatively, the above Markov condition says that a quantum process is non-Markovian iff there exist two past testers outcomes, x − and x ′ − , such that after a causal break at time step k, the conditional future process statistics are different for a some future instrument J + : Conversely, if the statistics remain unchanged for all possible past controls, then the process is Markovian. The above quantum Markov condition is fully operational and thus it is testable with a finite number of experiments [261] . Suppose the conditional independence in Eq. (167) holds for a linearly independent set of past and future testers {Â x − ⊗ A x + } for all k, then by linearity it holds for any instruments, thus the future is always conditionally independent of the past. It is worth noting that this definition is the quantum generalization of the causal Markov condition for classical stochastic evolutions where interventions are allowed [262].
Additionally, in spirit, the above definition is similar to the satisfaction of the Quantum regression formula (QRF) [2,3,263]. Indeed, its equivalence to a generalized QRF has been shown in [10], while satisfaction of the generalized QRF has been used in [218,250] as a basis for the definition of quantum Markovian processes. On the other hand, the relation of the QRF and the witnesses of non-Markovianity we discussed in Sec. IV D has been investigated [191,264]. Here, we opt for the understanding of Markovianity in terms of conditional future-past independence, an approach fully equivalent to the one taken in the aforementioned works [10,218,250]. Figure 27. Determining whether a quantum process is Markovian. Generalized testers (multi-time instruments) A x − and A x + are applied to the system during a quantum process, where the subscripts represent the outcomes. The testers are chosen to implement a causal break at a timestep t k , which ensures that the only way the future outcomes depend on the past if the process is non-Markovian. Thus by checking if the future depends on the past for a basis of instruments we can certify the process to be Markovian or not.

Quantum Markov processes
The quantum Markov condition implies that any segment of the process is uncorrelated with the remainder of the process. This right away means that a Markov process must have a rather remarkably simple structure. The Choi state of the process tensor for a Markov process can be shown to be simply a product state where each Υ E (j+1 − ∶j + ) is the Choi matrix of a CPTP map from t j + to t j+1 − . The above equation simply says that there no temporal correlations in the process, other than between neighboring time steps. An obvious example of such a process is a closed process, i.e., a unitary process. Here each Υ E (j+1 + ∶j − ) will be maximally entangled and corresponds to a unitary evolution, respectively. In terms of quantum maps, the action of the corresponding process tensor T where all E (j+1 − ∶j + ) (corresponding to Υ E (j+1 − ∶j + ) ) are mutually independent CPTP maps that act on the system alone, and ρ is a system state. While this property of independent CPTP maps -at first sight -seems equivalent to CP divisibility, we emphasize that it is strictly stronger, as the mutual independence of the respective maps has to hold for arbitrary interventions at all times in T k [177,225,226]. The above form for Markov processes also means that we don't need to do experiments with causal breaks, in general. We simply need to determine if the process tensor has any correlations in time which can also be done using noisy instruments that do not correspond to causal breaks. This is because, causal breaks form a linear basis, but we can infer the correlations in a process if we have access to the process tensor. We can get the latter by tomography, which only requires applying a linear basis of instruments. Additionally, deviation from Markovianity can already be witnessed -and assertions about the size of the memory can be made -even if not a full basis of operations is available to the experimenter [203,228,252,265].
Besides following the same logic as the classical definition of Markovianity, that is, conditional independence of the future and the past, the above notion of Markovianity also boils down to the classical one in the correct limit: Choosing fixed instruments at each time in T k yields a probability distribution P(x k , . . . , x 1 ) for the possible combinations of outcomes. Now, if each of the instruments only consists of causal breaks -which is the case in the study of classical processes -then a (quantum) Markovian process yields a joint probability distribution for those instruments that satisfies the classical Markov condition of Eq. (165). The quantum notion of Markovianity thus contains the classical one as a special case. One might go further in the restriction to the classical case, by demanding that the resulting statistics also satisfy the Kolmogorov consistency conditions we discussed earlier. However, on the one hand, there are processes that do not satisfy Kolmogorov consistency conditions, independent of the choice of probing instruments [236]. On the other hand, Markovianity is also a meaningful concept for classical processes with interventions [209], where Kolmogorov conditions are generally not satisfied. Independent of how one aims to restrict to the classical case, the notion of Markovianity we introduced here for the quantum case would simplify to the respective notion of Markovianity in the classical case.
Below we will use the structure of Markovian processes to construct operationally meaningful measures for non-Markovianity. We then discuss the quantum Markov order and quantum master equations to close this section.
Before doing this we will shortly discuss how the quantum Markov condition relates to witnesses of non-Markovianity. In contrast to the non-Markovianity witnesses discussed in the previous section, the above condition is necessary and sufficient for memorylessness. That is, if it holds there is no physical experiment that will see conditional dependence between the past and future processes. If it does not hold then there exists some experiment that will be able to measure some conditional dependence between the past and the future processes. In fact, a large list of non-Markovian witnesses, defined in Refs. [149,150,154,162,178,180,184,187,215,[266][267][268][269][270][271][272][273][274][275][276][277][278], herald the breaking of the quantum Markov condition. However, there are always non-Markovian processes that will not be identified by most of these witnesses. This happens because these witnesses above usually only account for three-time correlations. Many of them are based on the divisibility of the process. A Markov process, which has the form of Eq. (169), will always be CP-divisible, while the converse does not hold true [177,250]. Then it suffices to show an example of a CP-divisible process that is non-Markovian.

Examples of divisible non-Markovian processes
A completely positive and divisible process on a single qubit system can be acquired by following the prescription in Refs. [250,279,280], where the so-called shallow pocket model was discussed. We begin with the system in an arbitrary state ρ(0) that interacts with an environment whose Figure 28. A CP-divisible but non-Markovian process. A qubit system is prepared in states |x±⟩ and evolves along with an environment. The uninterrupted dynamics of the system is pure dephasing, which will be certified as Markovian by two-point witnesses. However, when an instrument X is applied at time t, the system dynamics reverse and the system returns to its original state, which is only possible in the presence of non-Markovian memory. initial state is a Lorentzian wavefunction Obviously, the initial state is uncorrelated. The two evolve together according to the Hamiltonian H SE = g 2 σ 3 ⊗x, wherê x is the environmental position degree of freedom. The total SE dynamics are then due to the unitary operator It is easy to show, by partial tracing E, that the reduced dynamics of the S is pure dephasing in the z-basis (see Eq. (113) in Sec. IV D 3), and can be written exactly in GKSL form, i.e., if the system is not interfered with, the evolution between any two points is a CPTP map of the following form: As we argued above that the process is both completely positive, fully divisible [7,8,278], and also has a Markovian generator as required by the snapshot method [180]. Suppose we start the system in initial states ρ ± (0) ∶= {|x±⟩⟨x±|}. After some time t, these states will have the form It is then easy to see that the trace distance between the two states will monotonically decrease: This means that the non-Markovianity witness based on non-monotonicity of the trace-distance measure, given in Ref. [178], will call this a Markovian process. This is not surprising as the process is divisible, which a stronger witness for non-Markovianity than the trace distance [7,185]. This process will also be labeled as Markovian by the snapshot approach, as the generator of the dynamics of the system alone will always lead to CP maps. In fact, we have already shown in Sec. IV D that divisibility-based witnesses will not see any non-Markovianity in a pure dephasing process.
To take this argument further, let us split the process from 0 → 2t into two segments: 0 → t and t → 2t. If the process is indeed Markovian then we can treat the process in each segment identically, i.e., the dynamical map for both segments will be the same. Then, if we sneak in an instrument, on the system alone, between these two dynamical maps that by itself cannot break the Markovianity of the process. Now, we show that this process, while divisible, is indeed non-Markovian. However, the usual witnesses fail to detect temporal correlations as the process only reveals non-Markovianity when an instrument at an intermediate time is applied, see Figure 28.
Suppose, we apply a single element (unitary) instrument J 1 = X[ r ] ∶= σ 1 ( r ) σ 1 at time t. Doing so will not break the Markovianity of the process. Moreover, the process should not change at all because the states in Eq. (174) commute with σ 1 . Thus, continuing the process to time 2t should continue to decrease the trace distance monotonically, Indeed this is what happens if the instrument X is not applied. However, when the instrument X is applied, the dynamics in the second segment reverses the dephasing. This is most easily seen by the fact the total system-environment unitary maps to its adjoint as U † = σ 1 U σ 1 . Concretely we have Above ρ E = |ψ E ⟩⟨ψ E |. This calculations shows that the state at time 2t is unitarily equivalent to the initial state of the system, which is in contrast to Eq. (176).
There are a few take away messages. First, for the initial states, ρ ± (0), which were monotonically moving closer to each other during the segment will begin to move apart, monotonically. In other words, during the second segment, they are becoming more and more distinguishable. This means that the trace distance monotonically grows for a time greater than t. Therefore, with the addition of an intermediate instrument, the process is no longer seen to be Markovian. Indeed, if the process were Markovian then an addition of an intermediate instrument will not break the monotonicity of the trace distance. In other words, this is breaking a data processing inequality, and therefore the process was non-Markovian from the beginning.
Second, the dynamics in the second segment are restoring the initial state of the system, which means that the dynamical map in the second segment depends on the initial condition. If the process was divisible then the total dynamics must have the following form However, this is not the same as total dynamics being simply a unitary transformation. Therefore, the process is not divisible when an intermediate instrument is applied. Again, if the process were Markovian, adding an intermediate instrument will not break the divisibility of the process, and therefore the process was non-Markovian from the beginning. Third, the snapshot witness [180] would not be able to attribute CP dynamics to the second segment and thus it too would conclude that the process is non-Markovian. In fact, it is possible to construct dynamics that look Markovian for arbitrary times and then reveal themselves to be non-Markovian [281].
To be clear, unlike the snapshot method, the process tensor for the whole process will always be completely positive. Let us then write down the process tensor Υ {2t,t,0} for this process for three time {2t, t, 0}. To do so, we first notice that the action of the system-environment unitary has the following simple form u ∶= exp(−i gt 2x ) is a unitary operator on E alone. Next, to construct the Choi state for this process we will prepare two maximally entangled states for the system: |Φ We will let parts S ′ 0 interact with the environment in segment one and then S ′ 1 in segment two. Namely, let where these are the interaction unitary operators for the two segments. We first write down the process tensor for the whole SE, i.e., without the final trace on the environment (see Figure 29): where we have defined |0⟩ ∶= |00⟩ and |1⟩ ∶= |11⟩ for brevity. Combining Eq. (180) with Eq. (179) and tracing over the environment we get the Choi state of the process in the compressed basis |0⟩ ∶= |00⟩ and |1⟩ ∶= |11⟩: We have used the fact that where again γ = g G and this is obtained by using Eq. (171). Note that the process tensor is really 16 × 16 matrix, but we have expressed it in the compressed basis. In other words, all elements of the process that are not of the form |jjll⟩ ⟨mmnn| are vanishing.
Looking at the Choi state it is clear that there are correlations between time steps 0 and 2. This is most easily seen by computing the mutual information. We can think of the process tensor as a two-qubit state, where the first qubit represents spaces S 0 S ′ 0 and the second qubit S 1 S ′ 1 . Moreover, the S 0 and S 1 are the output of the process at times t and 2t respectively. The mutual information is about 0.35 for large are the input and output wire at time t 1 , respectively). The intermediat system-environment unitaries U t are given by Eq. (179). Note that, in contrast to previously depicted Choi states, here, the environmental degree of freedom E is not traced out yet. values of γt. Therefore it does not have the form of Eq. (169) and the process is non-Markovian. This non-Markovianity will also be detectable if causal breaks are applied at t.
The quantum stochastic matrix from 0 → 2t can be obtained by contracting the process tensor with the instrument at time t: Note that this cannot be done in the compressed basis as the instruments live on S 0 A 1 spaces. Applying the identity will give us exactly the dephasing channel While applying the X t instruments will give us an identity channel: E (2t∶0)|X t = I (2t∶0) . This example shows that there are non-Markovian effects that can only be detected by interventions. This is not a purely quantum phenomenon; the same can be done in the classical setting and this is the key distinction between stochastic processes and causal modeling. We have not discussed the witness based on trace distance in terms of process tensor; we point the interested reader to Ref. [282] for a detailed analysis.

B. Measures of non-Markovianity for Multi-time processes
The Choi state Υ of a quantum process translates the correlations between timesteps into spatial correlations. A multitime process is then described by a many-body density operator. This is exactly the expected generalization of multi-time probability distributions, which represents classical stochastic processes. This general description then affords the freedom to use any method for quantifying many-body correlations to quantify non-Markovianity. However, there are some natural candidates, which we discuss below. We do warn the reader that there will be infinite ways of quantifying non-Markovianity, as there are infinite ways of quantifying entanglement and other correlations. However, there are metrics that are natural for certain operational tasks. We emphasize that, here, we will only provide general memory measures, and will not make a distinction between classical and quantum memory, the latter corresponding to entanglement in the respective splittings of the corresponding Choi matrix [233,283]. In this section, we omit the subscripts on the process tensor, as they are understood to be many-time processes.

Memory bond
We begin by discussing the natural structure of quantum processes. One important feature of the process tensor is that it naturally has the structure of a matrix product operator. While any many-body operator can be written as a matrix product operator, the natural connection is twofold: first, we can always divide the environment in two ways, Markovian and non-Markovian. In the matrix product operator form, the bond represents the non-Markovian environment. For a Markov process naturally, the bond dimension is one. By employing methods from the field of tensor networks we can compress the bond and give the process an efficient description. Second, for processes that are time translationally invariant, one can determine the singular matrix of the bond, yielding a compact description for the process.

Schatten measures
Next, we make use of the form of Markov processes given in Eq. (169). We remind the reader that this quantum Markov condition contains the classical Markov condition, and any deviations from it represent non-Markovianity. Importantly, it allows for operationally meaningful measures of non-Markovianity. For instance, if we want to distinguish a given non-Markovian process from a Markov process, we can measure the distance to the closest Markov process for a choice of metric, e.g. the Schatten p-norm, where ∥X∥ p p = tr(|X| p ). Here, we are minimizing the distance for a given quantum process Υ over all Markovian processes Υ (M) , which have the form of Eq. (169). Naturally, this goes to zero if and only if the given process is Markovian. Moreover, suppose we begin with an n-step Markovian process and then only consider m < n time steps. That is, we coarse grain in time. It is straightforward to see that the m step process will still have the form of Eq. (169). On the other hand, to maximally differentiate between a given process and its closest Markovian process the natural distance choice is the diamond norm: where ∥ r ∥ ⧫ is the generalized diamond norm for processes [229,232]. Eq. (186) then gives the optimal probability to discriminate a process from the closest Markovian one in a single shot, given any set of measurements together with an ancilla. Schatten norms play a central role in quantum information theory. Therefore, the family of non-Markovianity measures given above will naturally arise in many applications. For instance, the diamond norm is very convenient to work with when studying the statistical properties of quantum stochastic processes [284,285].

Relative entropy
We can also use any metric or pseudo-metric D that is contractive under CP operations Here, CP contractive means that S[Φ(X)∥Φ(Y )] ≤ S[X∥Y ] for any CP map Φ on the space of generalized Choi states. A metric or pseudo-metric that is not CP contractive, may not lead to consistent measures for non-Markovianity. [286] The requirement for pseudo-metric means that it satisfies all the properties of a distance except that it may not be symmetric in its arguments. Different quasidistance measures will have different operational interpretations for the memory.
A very convenient choice is the quantum relative entropy [287], which is a pseudo-metric because the closest Markov process, for any given process, is straightforwardly found by discarding the correlations. That is, the process made of the marginals of the given process is the closest Markov process. Moreover, this measure has clear operational interpretation as a probability of confusing the given process for being Markovian: where N R is the relative entropy between the given process and its marginals. This measure quantifies the following; suppose you have an experiment that non-Markovian. Your model for the experiment is, however, Markovian. The above measure is related to the probability of confusing the model with the experiment after n samplings. If N R is large, then an experimenter will very quickly realize that the hypothesis is false, and the model needs updating.

C. Quantum Markov order
The process tensor allows one to properly define Markovianity for quantum processes. As we have seen, though, in our discussion of the classical case, Markovian processes are not the only possibility. Rather, they constitute the set of processes of Markov order 1 (and 0). It is then natural to ask if Markov order is a concept that transfers neatly to the quantum case as well. As we shall see, Markov order is indeed a meaningful concept for quantum processes but turns out to be a more layered phenomenon than in the classical realm. Here, we will only focus on a couple of basic aspects of the quantum Markov order. For a more in-depth discussion, see, for example, Refs. [231,288]. Additionally, while it is possible to phrase results on quantum Markov order in terms of maps, it proves prohibitively cumbersome which is why the following results will be presented exclusively in terms of Choi states.
Before turning to the quantum case, let us quickly recall (see Sec. III C 4) that for classical processes of Markov order |M | = , we had  (189) is illdefined on its own, as the respective probabilities depend on the instruments {J F , J M , J H } that were used at the respective times to probe the process. With this in mind, we obtain an instrument-dependent definition of finite Markov order in the quantum case [230,231]: Quantum Markov order. A process is said to be of quantum Markov |M | = with respect to an instrument J M , if for all possible instruments {J F , J H } the relation is satisfied.
Intuitively, this definition of Markov order is the same as the classical one; once the outcomes on the memory block M are known, the future F and the history H are independent of each other. However, here, we have to specify, what instrument J M is used to interrogate the process on M . Importantly, demanding that a process has finite Markov order with respect to all instruments J M is much too strong a requirement, as it can only be satisfied by processes of quantum Markov order 0, i.e., processes where future statistics do not even depend on the previous outcome [37,230,231].
While seemingly a quantum trait, this instrument dependence of memory length is already implicitly present in the classical case; there, we generally only consider joint probability distributions that stem from sharp, non-invasive measurements. However, as mentioned above, even in classical physics, active interventions, and, as such, different probing instruments, are possible. This, in turn, makes the standard definition of Markov order for classical processes inherently instrument-dependent, albeit without being mentioned explicitly. Indeed, there are classical processes that change their Markov order when the employed instruments are changed (see, e.g., Sec. VI of Ref. [231] for a more detailed discussion).
In the quantum case, there is no 'standard' instrument, and the corresponding instrument-dependence of memory effects is dragged into the limelight. Even the definition of Markovianity, i.e., Markov order 1, that we provided in Sec. VI A is an inherently instrument-dependent one; quantum processes are Markovian if and only if they do not display memory effects with respect to causal breaks. However, this does not exclude memory effects to appear as soon as other instruments are employed (as these memory effects would be introduced by the instruments and not by the process itself, the instrument-dependent definition of Markovianity still captures all memory that is contained in the process at hand). Just like for the definition of Markovianity, once all process tensors are classical, and all instruments consist of classical measurements only, the above definition of Markov order coincides with the classical one [230] For generality, in what follows, the instruments on M can be temporally correlated, i.e., they can be testers (however, for conciseness, we will call J F , J M , and J H instruments in what follows). While in our above definition of quantum Markov order we fix the instrument J M on the memory block, we do not fix the instruments on the future and the history, but require Eq. (190) to hold for all J F and J H . This, then, ensures, that, if there are any conditional memory effects between future and history for the given instrument on the memory, they would be picked up.
As all possible temporal correlations are contained in the process tensor Υ F M H that describes the process at hand, vanishing instrument-dependent quantum Markov order has structural consequences for Υ F M H . In particular, let J M = {A x M } be the instrument for which Eq. (190) is satisfied, and let J F = {A x F } and J H = {A x H } be two arbitrary instruments on the future and history. With this, Eq. (190) implies where the process tensor on M H is Υ M H = 1 d F o tr F (Υ F M H ) (which, due to the causality constraints is independent of J F ) and d F o is the dimension of all spaces labeled by o on the future F . As the relation (191) has to hold for all conceivable instruments J H and J F , and all elements of the fixed instrument J M , it implies that each element A x M ∈ J M 'splits' the process tensor in two independent parts, i.e., See Figure 30 for a graphical representation. While straightforward, proving the above relation is somewhat tedious, and the reader is referred to Refs. [230,288], where a detailed derivation can be found. Here, we rather focus on its intuitive content and structure. Most importantly, Eq. (192) implies that, for any element of the fixed instrument J M , the remaining 'process tensor' on future and history does not contain any correlations; put differently, if one knows the outcome on M , the future statistics are fully independent of the past. Conversely, by insertion, it can be seen that any process ten- sor Υ F M H that satisfies Eq. (192) for some instrument J M also satisfies Eq. (190). On the structural side, it can be directly seen that the terms {Υ F |x M } in Eq. (192) are proper process tensors, i.e., they are positive and satisfy the causality constraints of Eqs. (155) and (156). Specifically, contracting Υ F M H with a positive element on the memory block M yields positive elements, and does not alter satisfaction of the hierarchy of trace conditions on the block F . This fails to generally hold true on the block H. While still positive, the termsΥ H|x M do not necessarily have to satisfy causality constraints. However, the set {Υ H|x M } forms a tester, i.e., ∑ x MΥ H|x M = Υ H is a process tensor.
Employing Eq. (192), we can derive the most general form of a process tensor Υ F M H that has finite Markov order with respect to the instrument J M = {A x M } n x M =1 . To this end, without loss of generality, let us assume that all n elements of J M are linearly independent. [289] Then, this set can be completed to a full basis of the space of matrices on the memory block M by means of other tester elements {Ā α M } d M α M =n+1 , where d M is the dimension of the space spanned by tester elements on the memory block. As these two sets together form a linear basis, there exists a corresponding dual basis, which we denote as From this, we obtain the general form of a process tensor Υ F M H with finite Markov order with respect to the instrument J M : It can be seen directly, that the above Υ F M H indeed yields the correct term Υ F |x M ⊗Υ H|x M for every A x M ∈ J M . Using other tester elements, like, for exampleĀ α M , will however not yield uncorrelated elements on F H (as the termsΥ F H|α M do not necessarily have to be uncorrelated). This, basically, is just a different way of saying that an informationally incomplete instrument is not sufficient to fully determine the process at hand [228]. Additionally, most elements of the span of J M will not yield uncorrelated elements, either, but rather a linear combination of uncorrelated elements, which is generally correlated.
While remaining a meaningful concept in the quantum domain, quantum Markov order is highly dependent on the choice of instrument J M , and there exists a whole zoo of processes that show peculiar memory properties for different kinds of instruments, like, for example, processes that only have finite Markov order for unitary instruments, or processes, which have finite Markov order with respect to an informationally complete instrument, but the conditional mutual information does not vanishes [230,288].
Before providing a detailed example of a process with finite quantum Markov order, let us discuss this aforementioned connection between quantum Markov order and the quantum version of the conditional mutual information. In analogy to the classical case, one can define a quantum CMI (QCMI) for quantum states ρ F M H shared between parties F, M, and H as S(F ∶ H|M ) = S(F |M )+S(H|M )−S(F H|M ), (195) where S(A|B) ∶= S(AB) − S(B) and S(A) ∶= −tr[A log(A)] (see Sec. IV D 4) is the von-Neumann entropy. Quantum states with vanishing QCMI have many appealing properties, like, for example, the fact that they admit a block decomposition [290], as well as a CPTP recovery map W M →F M [ρ M H ] = ρ F M H that only acts on the block M [291,292]. Unlike in the classical case, the proof of this latter property is far from trivial and a highly celebrated result. States with vanishing QCMI or, equivalently, states that can be recovered by means of a CPTP map W M →F M are called quantum Markov chains [44,45,290,[292][293][294][295][296]. Importantly, for states with approximately vanishing QCMI, the recovery error one makes when employing a map W M →F M can be bounded by a function of the CMI [44,45,295,296].
As process tensors Υ F M H are, up to normalization, quantum states, all of the aforementioned results can be used for the study of quantum processes with finite Markov order. However, the relation of quantum processes with finite Markov order and the QCMI of the corresponding process tensor is -unsurprisingly -more layered than in the classical case. We will present some of the peculiar features here without proof (see, for example, Refs. [230,231,288] for in-depth discussions).
Let us begin with a positive result. Using the representation of quantum states with vanishing CMI provided in Ref. [290], for any process tensor Υ F M H that satisfies S(F ∶ H|M ) Υ F M H = 0, one can construct an instrument on the memory block M , that blocks the memory between H and F . Put differently, vanishing CMI implies (instrumentdependent) finite quantum order.
However, the converse does not hold. This can already be seen from Eq. (194), where the general form of a process tensor with finite Markov order is provided. The occurence of the second set of termsΥ F H|α M ⊗∆ α M implies the exis-tence of a wide range of correlations between H and F that can still persist, making it unlikely that the CMI of such a process tensor actually vanishes. On the other hand, if the instrument J M is informationally complete, then there is a representation of Υ F M H that only contains terms of the form Υ F |x M ⊗ ∆ x M , which looks more promising in terms of vanishing QCMI (in principle, such a decomposition can also exist when the respective tester elements are not informationally complete, which is the case for classical stochastic processes). However, when the tester elements A x M corresponding to the duals ∆ x M do not commute (which, in general they do not), then, again, the QCMI of Υ F M H does not vanish [230,231,288]. Nonetheless, for any process tensor of the form knowing the outcomes on the memory block (for the instrument J B = {A x M }) allows one to reconstruct the full process tensor. Concretely, using where we set c x M = tr(Υ F |x M ) and d F o is the dimension of all output spaces on the future block, we have Here, the mapW M →F M appears to play the role of a recovery map. With this procedure one can then construct an ansatz for a quantum process with approximate Markov order. The crucial point being that the difference between the ansatz process and the actual process can be quantified by relative entropy between the two [232]. Such a construction has applications in taming quantum non-Markovian memory; as we stated earlier, the complexity of a process increases exponentially with the size of the memory. Thus contracting the memory, without the loss of precision, is highly desirable. With this, we conclude our discussion of the properties of quantum processes with finite Markov order. We now provide an explicit example of such a process.

Non-trivial example of quantum Markov order
Let us now consider the process depicted here in Figure Figure 31. (Quantum) Markov order network. A process with finite quantum Markov order with parts of M kept by H and F . The top panel shows the first process, in which parts of the common cause state |e + ⟩ is sent to L i and |e − ⟩ is sent to R i . The process in the bottom panel has the recipients flipped. The process tensor is depicted in gray, and entanglement between parties color-coded in green and maroon. The overall process is a probabilistic mixture of both scenarios. Still the process has finite Markov order because it is possible to differentiate between the scenarios by making a parity measurement on M .
is of the form H X = H X a ⊗ H X b ⊗ H X c , where X takes values for the times and a, b, c are labels for the three qubits; whenever we refer to an individual qubit, we will label the system appropriately, e.g., L i a refers to the a qubit of the system L i ; whenever no such label is specified, we are referring to all three qubits.
The environment first prepares the five-qubit common cause states |e + ⟩ = 1 2 (α |ψ 0 , 00⟩ + β |ψ 1 , 11⟩) and Here, we have separated the first register, which is a three qubit state, from the second, which consists of two qubits, with a comma. The first parts of the states |e + ⟩ and |e − ⟩ are respectively sent to H i and F i . The second parts are sent either to L i or R i , according to some probability distribution. Let the state input at H o be the first halves of three maximally entangled states ⨂ x∈{a,b,c} |Φ x with |Φ + ⟩ ∶= 1 2 (|00⟩ + |11⟩); here, the prime denotes systems that are fed into the process, whereas the spaces without a prime refer to systems kept outside of it. The input at L o and R o are labeled similarly. In between times H o and L i , the process makes use of the second part of the common cause state |e + ⟩ to apply a controlled quantum channel X, which acts on all three qubits a, b, c. Following this, qubits a and b are discarded. The ab qubits input at L o , as well as all three qubits input at R o , are sent forward into the process, which applies a joint channel Y on all of these systems, as well as the first part of the common cause state |e − ⟩. Three of the output qubits are sent out to F i , and the rest are discarded. The c qubit input at L o is sent to R i , after being subjected to a channel Z, which interacts with the first part of the common cause state |e − ⟩, i.e., the φ 0 , φ 1 register.
Consider the process where |e + ⟩ is sent to H i and L i and |e − ⟩ to R i and F i . The process tensor for this case is where Next, consider the process where |e + ⟩ is sent to H i and M ′i and |e − ⟩ to M i and F i . The process tensor for this scenario is where In the first case, there is entanglement between H io and L i , as well as between L o R io and F i . In the second case, there is entanglement between H io and L i c R i ab , as well as between L i ab L o R i c R o and F i . The overall process is the average of these two, which will still have entanglement across the same cuts for generic probability distributions that the common cause states are sent out with. This process has a vanishing Markov order because we can make a parity measurement on the ab parts of M i and R i . The parity measurement applies two controlled phases to an ancilla initially prepared in the state |+⟩, with the control registers being qubits a and b. If the two control qubits are in states |00⟩ or |11⟩, then |+⟩ ↦ |+⟩. However, if the control qubits are in states |01⟩ or |10⟩, then |+⟩ ↦ |−⟩. By measuring the final ancilla, which can be perfectly distinguished since it is in one of two orthogonal states, we can know which process we have in a given run; in either case, there are no F H correlations. Lastly, note that this process also has vanishing QCMI; this agrees with the analysis in Ref. [231], as the instrument that erases the history, comprises only orthogonal projectors.

VII. CONCLUSIONS
We began this tutorial with the basics of classical stochastic processes by means of concrete examples. We then built up to the formal definition of classical stochastic processes. Subsequently, we moved to quantum stochastic processes, covering the early works from half a century ago to modern methods used to differentiate between Markovian and non-Markovian processes in the quantum domain. Our main message throughout has been to show how a formal theory of quantum stochastic processes can be constructed based on ideas akin to those used in the classical domain. The resulting theory is general enough that it contains the theory of classical stochastic processes as a limiting case. On the structural side, we have shown that a quantum stochastic process is described by a many-body density operator (up to a normalization factor). This is a natural generalization for classical processes which are described by joint probability distributions over random variables in time. Along the way, we have attempted to build intuition for the reader by giving several examples.
In particular, the examples in the last section show that, in general, quantum stochastic processes are as complex as many-body quantum states are. However, there is beauty in the simplicity of the framework that encapsulates complex quantum phenomena in an overarching structure. We restrained our discussion to Markov processes and Markov order in the final section, but needless to say, there is much more to explore. Complex processes, in the quantum or classical realm, will have many attributes that are of interest for foundational or technological reasons. We cannot do justice to many (most) of these facets of the theory in this short manuscript. On the plus side, there are a lot of interesting problems left unexplored for current and future researchers. Our tutorial has barely touched the topic of quantum probability, and associated techniques such as quantum trajectories, quantum stochastic calculus, and the SLH [297] framework. This is an extremely active area of research [6,205,256,[298][299][300] with many overlaps with the ideas presented here; however, a detailed cross-comparison would form a whole tutorial on its own. We bring this article to closure by discussing some important applications of the theory of open quantum systems. The foremost application of the theory of quantum stochastic processes is quantum control, e.g. dynamical decoupling [301][302][303][304] (and understanding processes that cannot be decoupled [305]), decoherence-free subspaces [306,307], quantum error correction [308], quantum Zeno effect [309][310][311], control of biological systems [312], and so on. In addition, they can be used to not just battle noise, but rather harness it for transport and quantum information tasks [313][314][315][316][317]. Already, and even more so in the future, these tools (will) enable quantum technologies in presence of non-Markovian noise [318][319][320]. As we attempt to engineer more and more sophisticated quantum devices we will need more sophistication in accounting for the noise due to the environment. These applications will be within reach once we can characterize the noise [321][322][323][324][325][326][327][328] and understand how quantum processes and memory effects can serve as resources [329][330][331][332][333][334].
There are also foundational applications to the frameworks discussed above. For instance to better understand how the theory of thermodynamics fits with the theory of quantum mechanics requires better handling of interventions and memory, and already there is progress on this front [335][336][337][338]. This framework also allows for a method to build a classical-quantum correspondence, i.e., determining quantum stochastic processes that look classical [235,236]. Furthermore, it enables one to understand the statistical nature of quantum processes, i.e., when is the memory too complex [284,285,339,340], or when does a system look as if it has equilibrated [341,342]? These latter questions are closely related to ones aiming to derive statistical mechanics from quantum mechanics [343][344][345][346]. In general, non-Markovian effects in many-body systems [347][348][349] and complex single body systems [350,351] will be of keen interest as they will contain rich physics.
Finally, the tools introduced in the article are closely related to those used to examine the role of causal order -or absence thereof -in quantum mechanics. As they are tailored to account for active interventions, they are used in the field of quantum causal modeling [227,247,259,352,353] to discern causal relations in quantum processes. Beyond such causally ordered situations, the quantum comb and process matrix framework have been employed to explore quantum mechanics in the absence of global causal order [245,354], and it has been shown that such processes would provide advantages in information processing tasks over causally ordered ones [354][355][356][357][358][359]. The existence of such exotic processes is still under debate and the search for additional principles to limit the set of 'allowed' causally disordered processes is an active field of research [360]. Nonetheless, the tools to describe them are -both mathematically and in spirit -akin to the process tensors we introduced for the description of open quantum processes, demonstrating the versatility and wide applicability of the ideas and concepts employed in this tutorial.