Probing non-Markovian quantum dynamics with data-driven analysis: Beyond"black-box"machine learning models

A precise understanding of the influence of a quantum system's environment on its dynamics, which is at the heart of the theory of open quantum systems, is crucial for further progress in the development of controllable large-scale quantum systems. However, existing approaches to account for complex system-environment interaction in the presence of memory effects are either based on heuristic and oversimplified principles or give rise to computational difficulties. In practice, one can leverage on available experimental data and replace first-principles simulations with a data-driven analysis that is often much simpler. Inspired by recent advances in data analysis and machine learning, we propose a data-driven approach to the analysis of the non-Markovian dynamics of open quantum systems. Our method allows, on the one hand, capturing the most important characteristics of open quantum systems such as the effective dimension of the environment and the spectrum of the joint system-environment quantum dynamics, and, on the other hand, reconstructing a predictive model of non-Markovian quantum dynamics, and denoising the measured quantum trajectories. We demonstrate the performance of the proposed approach with various models of open quantum systems, including a qubit coupled with a finite environment, a spin-boson model, and the damped Jaynes-Cummings model.

A precise understanding of the influence of a quantum system's environment on its dynamics, which is at the heart of the theory of open quantum systems, is crucial for further progress in the development of controllable large-scale quantum systems. However, existing approaches to account for complex system-environment interaction in the presence of memory effects are either based on heuristic and oversimplified principles or give rise to computational difficulties. In practice, one can leverage on available experimental data and replace first-principles simulations with a data-driven analysis that is often much simpler. Inspired by recent advances in data analysis and machine learning, we propose a data-driven approach to the analysis of the non-Markovian dynamics of open quantum systems. Our method allows, on the one hand, capturing the most important characteristics of open quantum systems such as the effective dimension of the environment and the spectrum of the joint system-environment quantum dynamics, and, on the other hand, reconstructing a predictive model of non-Markovian quantum dynamics, and denoising the measured quantum trajectories. We demonstrate the performance of the proposed approach with various models of open quantum systems, including a qubit coupled with a finite environment, a spin-boson model, and the damped Jaynes-Cummings model.

I. INTRODUCTION
Understanding and predicting dynamics of quantum many-body systems is a formidable challenge, and an essential step towards the development of quantum computing devices [1]. Recent experiments [2][3][4][5][6][7][8][9] have demonstrated the realization of quantum computing and simulation protocols with quantum many-body systems of an intermediate scale (dozens of particles). Nevertheless, even with the achieved outstanding level of control in these experiments it remains challenging to isolate a quantum system from the environment [9], which is the main source of decoherence causing errors. This fact significantly limits applications of such systems for quantum information processing and prevents the full experimental verification of various effects related to the nonequilibrium dynamics [10][11][12].
The theory of open quantum systems [13][14][15][16] aims to provide a complete description of such nonequilibrium dynamical effects at the quantum level. Within such a framework, one can arbitrarily partition a quantum manybody system and its environment and describe each part as an individual open system. This powerful idea allows studying a nonequilibrium many-body system by considering smaller parts of it; for example, an individual qubit in a large ensemble of qubits interacting with each other and their environment. At first sight, dealing with one qubit instead of an ensemble of entangled qubits seems to be a much simpler problem, because it does not suffer from the computational difficulties raised by the exponentially large Hilbert space in the number of subsystems. Indeed, the dynamics of the subsystem's density matrix (e.g. one qubit density matrix) can be described exactly by the Nakajima-Zwanzig equation [17,18], which contains a convolution (memory) kernel. Time-convolution is necessary to take into account the non-Markovianity of the general open quantum dynamics. Non-Markovianity, i.e. effects of memory, are among the most challenging problems to solve for the proper description of the general open quantum system [14]. The problem is that the exact derivation of the Nakajima-Zwanzig equation is of the same complexity as the calculation of the coupled system and environment quantum dynamics [13].
This problem can be approached by considering different approximations such as the Born-Markov approximation [16,19] or simplified models of open quantum dynamics that are exactly solvable [20][21][22][23]. However, the class of problems that can be studied analytically is limited to a handful of exceptional situations. An alternative approach is to use numerical techniques such as the non-Markovian quantum state diffusion [24][25][26][27], the hierarchical equations of motion [28,29], the time-evolving matrix product operators [30], the method based on optimized auxiliary oscillators [31], the dressed quantum trajectories method [32], the Dirac-Frenkel time-dependent variational approach with the Davydov ansatz [33], or the time-evolving density with orthogonal polynomials algorithm [34,35], just to name a few. At the same time, open quantum systems studied under real experimental conditions are usually too complex and cannot always be described with existing numerical methods developed for illustrative models. For instance, most of the numerical methods, including those listed above, are devoted to simulating systems with an environment made of non-interacting quantum oscillators [14] or non-interacting fermions [36,37], so they are unable to simulate systems with more complex environments, such as the spin environment [38][39][40][41][42]. Moreover, since in a real experimental setting the joint Hamiltonian of a system and its environment is often unknown, exhaustive spectroscopy experiments are required to recover the Hamiltonian parameters and simulate the actual system's dynamics at the microscopic level [43,44].
Since building a precise analytical or numerical description of open quantum dynamics from a microscopic model is a notoriously difficult problem in the general case, one can try to build a precise data-driven model of open quantum dynamics using experimentally measured data. In addition, data-driven approaches do not require reconstruction of the joint system and environment microscopic model. A number of data-driven techniques for the analysis of the dynamics of open systems have been recently proposed. For example, Ref. [45] introduces a method for a data-driven reconstruction of the discrete-time Nakajima-Zwanzig equation, namely the Transfer-Tensor method (TTM) [46][47][48][49][50][51][52][53][54], that can be used for dynamics prediction. In Ref. [55] a recurrent neural network based method [56] for data-driven identification of open quantum dynamics has been developed. Although these methods provide efficient predictive models, they serve as "black boxes", i.e. they do not allow one to unravel the physical picture of the underlying processes [57][58][59].
One of the most natural ways to describe open quantum dynamics is embedding the non-Markovian system dynamics into a Markovian dynamics of the system and the effective environment of a finite dimension [60]. The physical intuition behind such embedding is summarized in Fig. 1. One can think of the environment that induces the system's non-Markovian dynamics as a two-component system consisting of effective and far environments. The effective environment is responsible for memory effects: some information about the system is recorded into the effective environment and then affects the system at later time. The dimension of the effective environment determines the complexity of the non-Markovian system's dynamics. In contrast to the effective environment, the interaction with the far environment does not entail a backflow of information, and leads to a purely dissipative dynamics. In this way, the system and the effective environment instantiate a Markovian embedding, and their collective dynamics is both dissipative and Markovian. We emphasize that Markovian embedding is not just a "black box" model of open quantum dynamics since it provides insights into the properties of both the system and its environment. Moreover, Physical intuition underpinning the Markovian embedding for the non-Markovian dynamics of the finite-dimensional system interacting with a high-dimensional (possibly infinite-dimensional) environment. One can split the system's environment into two parts: The finite-dimensional effective environment, which is responsible for memory effects in the system's dynamics, and the remaining far environment responsible for dissipation. It allows one to consider a Markovian dissipative dynamics of the finite-dimensional extended system consisting of the original system and its effective environment.
Markovian embedding can be reconstructed with only information about the non-Markovian dynamics of a system, which is accessible in real experimental conditions [61,62].
In this work, we harness data-driven Markovian embedding reconstruction, and present a complete framework for general open quantum dynamics analysis. Our framework can be seen as a combination of the alternative "datafriendly" view on non-Markovian quantum dynamics, that we develop in first sections, and linear machine-learning methods [63][64][65][66]. Input data include trajectories of a system at discrete time steps, a guess of the memory depth of the non-Markovian process, and the level of noise generated during experimental reconstruction of trajectories. As output, our framework returns the predictive model of non-Markovian quantum dynamics, the dimension of the effective environment [62,67,68], the eigenfrequencies of the joint system and environment quantum dynamics, and the denoised trajectories of the system as a valuable byproduct. Note that our framework has the distinct advantage to rely on the linear methods of machine learning, which are known to be scalable, data efficient and yield the exact solution. We illustrate the performance of our scheme with several paradigmatic examples of realistic models such as a qubit coupled with a finite-dimensional environment, a spin-boson model, and the damped Jaynes-Cummings model.

A. Quantum trajectories' Markovian dynamics
In this section we discuss an alternative view on non-Markovian open quantum dynamics that is the basis of the proposed data processing scheme. The alternative lies in turning from the density matrix based description of non-Markovian quantum dynamics to quantum trajectories based description, where a quantum trajectory is seen as a concatenation of several subsequent in time density matrices. This description is less physics-motivated and more suitable for the data-driven analysis we propose. Consider a d-dimensional quantum system undergoing non-Markovian dynamics [69] due to the interaction with a d E -dimensional environment. Throughout the paper, we assume that the Hamiltonian driving dynamics of the system and environment as a whole remains time-independent. In the development that follows, d and d E are finite, though the environment may consist of a large number of subsystems (particles), so one can have d E exponentially large in the number of particles. Assume one has access to the system states (density matrices) (t) ∈ C d×d at consecutive and equidistant time instants t = 0, 1, 2, . . ., separated by the constant experimental time resolution τ , and one is able to reconstruct (t) using quantum state tomography. Throughout the paper we use τ as time units, i.e. the physical time is expressed as τ t. Performing the tomographic experiment multiple times, one reconstructs K-element sequences of the system's states: which we refer to as system trajectories. Assuming that the observed non-Markovian dynamics has a finite memory depth, for a sufficiently large value of K, there exists a linear map M that connects two subsequent trajectories through a Markovian master equation: The rigorous criteria for K to be sufficient (existence of M ) is given in Appendix A. One can think of K as the number of previous in-time density matrices of the system, which is enough to determine the next in-time state of the system. Note also that while one can express the physical memory depth as τ K, which is a more fundamental quantity, K itself is more suitable for data-driven analysis. In the Appendix A we also show that for a finite d E one always has a finite sufficient K that does not exceed d 2 d 2 E . Note, however, that in the worst case, the minimal sufficient K scales with the number of environment's subsystems (particles) exponentially and Eq. (2) quickly becomes intractable for a numerical analysis. Our conjecture is that this worst case corresponds to a dynamical chaos regime, as we discussed in more detail in Appendix A, for which long-time dynamics prediction is impossible. Although, one can use the data-driven analysis (discussed in the next section) to identify chaos in the system's dynamics, our scheme is suitable only for finite-memory (non-chaotic) non-Markovian dynamics. In what follows, K is always assumed to be sufficient. We also refer to a sufficient K as to a memory length.
The general idea of considering the Markovian dynamics of T K (t) instead of the non-Markovian dynamics of (t) is also known as a time-delay embedding and it is routinely employed across wide variety of contexts, including, Koopman operators [70][71][72][73], closure modeling [74], time series modeling [75,76] and reinforcement learning [77]. One can also think of the time-delay embedding as a particular way to construct a Markovian embedding [60,61,67,[78][79][80][81] of non-Markovian quantum dynamics. The framework being developed here also seems very close in spirit to the TTM. However, there is a substantial difference. The TTM operates with a map from the space of trajectories to the space of density matrices, while here we operate with a map from the space of trajectories to the same space. This framework turns out to be more fruitful since it allows us to study the environment properties in addition to the system's dynamics prediction. In particular we show that the observed trajectories dynamics is the same, up to linear rescaling, as joint dynamics of the system and so called effective part of the environment. This finding is illustrated in Fig. 2. The given figure and the concepts involved are discussed in detail below. Since the main distinguishing feature r-dimensional relevant subpsace Space of joint system + environment states (hidden from the experimentalist)

Ambient space of trajectories (experimentally accessible)
r-dimensional subspace of system's trajectories Results of tomographic measurements FIG. 2. Connection between the joint dynamics of the system and its environment with the observed dynamics of trajectories. The curve (i) sketches the joint dynamics of the system and its environment SE(t) driven by the quantum channel Φ. The curve (ii) sketches the joint dynamics of the system and the effective environment SE (t) driven by the map Φ (rr) . The curve (iii) sketches the observed trajectories dynamics TK (t) driven by the map M . There is the one-to-one correspondence between curves (ii) and (iii) that is described by the linear map φK . Both dynamics (ii) and (iii) are Markovian, their Master equations are connected via φK and they can be seen as 'shadows' of the dynamics (i).
of the presented method is its ability to explore the physics of the environment, the central question we address in this section is: what knowledge about the system and its environment causing non-Markovianity one can extract from the analysis of the Markovian dynamics of system's trajectories? B. Unraveling the joint system and environment dynamics via the trajectories' dynamics To start addressing this question, let us first note that states (t) can be expressed via the corresponding joint system and environment states SE (t) ∈ C dSE×dSE as follows where d SE := dd E and Tr E denotes a partial trace over the environment. Let us also introduce a quantum channel Φ describing the discrete-in-time evolution of the joint states: Since Tr E is a linear map and the dynamics of SE (t) is linear as well, one can introduce a linear map φ K returning T K (t) from SE (t): The precise definition of φ K is given in the Appendix A; for the purposes of this section we only need the linearity property of this map. The analysis of φ K provides insights into the connection between the system trajectories' dynamics and the actual joint system and environment dynamics. It turns out that the linear map φ K defines which 'part' of the joint system and environment dynamics affects the observed trajectories. Let us call the irrelevant (relevant) subspace the kernel (the support) of φ K . The corresponding projectors take the form: where Id denotes the identity map, φ + K is the Moore-Penrose inverse of φ K , and superscripts (i) and (r) stand for irrelevant and relevant respectively. These projectors allow us to introduce the 'irrelevant' and 'relevant' parts of the joint state that read Since        SE (t) no longer have a unit trace. Moreover, there is no guarantee that they are positive semidefinite or even Hermitian. One can think of the projection on the relevant subspace as an analog of the partial trace over those degrees of freedom of the environment that do not contribute to the memory effects. We note that the similar projections on the relevant and irrelevant subspaces, yet with different meanings, are employed in the Nakajima-Zwanzig projection method [13].
An important characteristic of the linear map φ K is the rank r of its matrix. It provides the dimension of the relevant subspace. In the next subsection, we show that the dynamics in the relevant subspace is independent of the dynamics in the irrelevant subspace. So, to fully describe the non-Markovian dynamics of the system, it is enough to consider only the relevant subspace that can be seen now as the space of the joint system and effective environment density matrices. By the construction, d eff E provides the minimal dimensionality of the environment that can describe the non-Markovian behavior of the system. This observation allows us to introduce the dimension of the effective environment through the following relation r = d 2 [d eff E ] 2 , which simply states that the joint system and effective environment density matrix has size dd eff E × dd eff E . Note, however, that there is no guarantee that d eff E = r/d 2 results in an integer number, since r can be an arbitrary integer. So, to make d eff E an integer, one can either round it up or down. Rounding down contradicts the fact that r/d 2 is an underestimation of d eff E since some of the degrees of freedom of the environment may not affect the system dynamics. Therefore, one needs to round r/d 2 up, or in other words one needs to complement the relevant subspace with additional degrees of freedom that, however, do not affect the dynamics of the system. Clearly, this can always be done and not in a unique way. We use this trick only here and for the sole purpose to justify the following final estimation of the effective environment dimension that reads: where · stands for the rounding up operation. Note that the value of r and d eff E are independent of K because K is sufficient (see Appendix A). The concept of the effective environment's dimension has been actively studied in recent literature [61,67,68,82,83] and has been shown to be an important characteristic of the environment. One can think of d eff E as a quantitative indicator of non-Markovianity, the dimensionality of the memory, or the complexity of non-Markovian dynamics. We remark that an exponentially large d eff E corresponds to utterly complex dynamics / dynamical chaos.
C. Equivalence between the trajectories' dynamics and the dynamics of the system and its effective environment Next, consider the r-dimensional subspace C K ≡ Im(φ K ) ⊆ C K×d×d , where Im stands for the image of φ K . All experimentally accessible trajectories T K (t) lie in this subspace by definition; therefore, we refer to it as trajectories subspace. Note that φ K is the linear bijection between the relevant subspace and the trajectories subspace. The Moore-Penrose inverse φ + inverts this bijection. The bijection between this subspaces is complemented by the following identity between Φ, M , and φ K : One can think about Eq. (9) as about the following commutative diagram: It says that one can first make a time step forward in the space of the joint system and environment states and then go to the corresponding trajectory via φ K or first go to a trajectory via φ K from a given join state and then make a time step forward by applying M in the trajectories subspace. Note that the identity Eq. (9) follows from the fact that M describes the temporal evolution of any trajectory. The existence of the bijection between the relevant subspace and C K and Eq. (9) mean that the trajectories' dynamics and the joint system and effective environment dynamics are identical in a sense that is explained below. Let us decompose Φ into four terms where Φ (xy) := π K , x, y ∈ {i, r}. One can see that each term has a clear operational meaning: Φ (rr) and Φ (ii) govern the dynamics in the relevant and irrelevant subspace respectively, while Φ (ir) and Φ (ri) correspond to 'crossflows' between the subspaces. Note, however, that Φ (xy) are neither trace-preserving nor completely positive maps in general. Let us substitute the decomposition Eq. (10) in Eq. (9) One can compare supports of all terms of the identity above and identifies two linearly independent identities that read By noticing that φ K Φ (ii) = 0 and φ K Φ (ir) = 0 due to the fact that φ K maps any vector from the irrelevant subspace to zero one finally gets The identity 0 = φ K Φ (ri) implies the absence of 'flow' from the irrelevant subspace to the relevant one. In other words Φ (rr) fully describes the dynamics in the relevant subspace: (r) since there is no effect of the irrelevant subspace on the dynamics in the relevant subspace. Moreover, one can see that Eq. (14) is a Markovian master equation. Let us multiply the identity M φ K = φ K Φ (rr) by φ + K on the right side Bearing in mind that the action of M on the orthogonal complement of C K does not have a physical meaning and can be set arbitrarily, we set it equal to zero. Hence, M = M φ K φ + K , and we get Now we can claim that the dynamics of trajectories and the joint dynamics of the system and its effective environment are identical up to a linear rescaling. Indeed, both dynamics are Markovian, there is the linear bijection between states, i.e. T (t) = φ K (r) SE (t) and the corresponding linear identity between master equations M = φ K Φ (rr) φ + K . This also implies that M and Φ (rr) have completely the same eigenvalues and their left and right eigenvectors are connected via φ K . Moreover, since Φ has block triangular form w.r.t relevant and irrelevant subspaces (Φ (ri) = 0), the set of eigenvalues of Φ includes all the eigenvalues of Φ (rr) , and so the eigenvalues of M . The overall concept described above is illustrated in Fig. 2.
The fact that the dynamics of the system and environment factorizes into two parts: the "relevant" part, which is Markovian, and the independent "irrelevant" part, may seem artificial at first glance. Indeed, this property should imply some symmetry that is not inherent to an arbitrary system. In fact, the exact factorization in general, except for some trivial cases of two non-interacting subsystems or highly symmetric cases, results trivially in the "relevant" part that covers the entire state space and a one-dimensional irrelevant part. However, an approximate factorization (with some acceptable error) may be far more productive and lead to a significant dimensionality reduction of the "relevant" part compared to the entire state space. The data-driven approach we introduce later builds an approximate factorization by setting a threshold defined by the noise amplitude.
The main results of this section may be summarized as follows. If a non-Markovian dynamics has a finite memory length K then: (I) The dynamics of trajectories is driven by the Markovian master equation (2); (II) All the experimentally accessible trajectories lie in a subspace of dimension r = rank(M ) (trajectories subspace) and form an r-dimensional Markovian embedding; (III) There exists an r-dimensional relevant subspace of the joint system and environment density matrices' space. The dynamics in the relevant subspace is Markovian and identical to the dynamics of trajectories up to the bijective linear transformation φ K . The dynamics in the orthogonal complement of the relevant subspace does not affect dynamics of trajectories at all; (IV) One can think of the dynamics in the relevant subspace as the dynamics of the joint system and effective environment. One can estimate the dimension of the effective environment d eff E using Eq. (8). Both r and d eff E are quantitative indicators of the non-Markovian dynamics complexity; (V) As a consequence of (III), all the eigenvalues of the dynamical maps M and Φ (rr) are the same, and are part of the set of eigenvalues of Φ.
These claims allow one to gain valuable knowledge about the system and its environment from the master equation describing the system's trajectories and the dimension of the subspace where these trajectories are embedded. The only missing ingredient at this stage, is a scheme to extract these objects from noisy measured trajectories.

III. ALGORITHMS FOR PROCESSING OF QUANTUM TRAJECTORIES
A. Effective environment dimension identification and data denoising As we discussed in the previous section, the only data assumed to be experimentally accessible are the system's discrete-in-time quantum trajectories. Those quantum trajectories can be reconstructed via quantum state tomography [84,85], i.e., for each trajectory, one runs multiple experiments with completely identical initial conditions and collects measurement outcomes statistics in order to reconstruct underlying density matrices in different moments of time. Since in real experiments each measurement takes some finite time τ m , one has a natural restriction τ ≥ τ m . Note also that the time resolution τ should be chosen small enough to capture all necessary short-time dynamical effects but not too small not to make an experiment lasts too long collecting too many data-points. Note, that subsequent experiment runs must be sufficiently separated in time, in order to eliminate memory effects from the previous run. Let us assume that for L different initial states of the system the measured data set consists of quantum trajectories of length N > K, where N is the total number of discrete time steps per each trajectory and K is assumed to be sufficient for the Markovian dynamics of trajectories. This set reads where i enumerates initial states. To prepare data for the analysis we construct their time-delay embedding. We slice each trajectory T (i) as it is shown in Fig. 3a. The resulting set of chunks {T First, one can estimate the subspace C K as the linear span of all the trajectories chunks, i.e.
The question arises: how many trajectories L have to be measured in order to provide a reliable estimate of C K ? For noiseless trajectories it turns out that in most cases one needs only one long enough trajectory. This can be explained as follows. The estimate Eq. (18) for only one trajectory reads Note that unless the initial trajectory chunk T (1) covering the entire trajectories subspace. Therefore, only one trajectory is enough. This is also supported by the numerical experiments that are demonstrated further. For example, in Fig. 10 one can see that the average error of prediction for quite a few numerical experiments with L = 1 and different other parameters, including noise amplitude, is about 0.06, which is not that much. However for highly symmetric systems it might be the case that span M t T (1) covers only part of the trajectories subspace. In this case, one needs to sample trajectories with different initial states. It can be done in different ways. For example, before measuring a trajectory one can perturb the system in various ways, e.g. measure it on an arbitrary basis or apply some control signal to it. In our numerical experiments, we prepare a unique initial state as follows. We set the initial joint system and environment state in the stationary thermal state and then replace the system with a new one in some state of our choice. This helps us to explore the trajectories subspace better. To determine the value of L in practice, one can sample a new trajectory with a unique initial state until the estimate Eq. (18) saturates. All the trajectories bear some noise induced by finiteness of the measurements statistics, imperfections of measurement devices, etc. This results in an additive noise where δT is a noise term. Note that this is not a noise induced by the interaction of the system with the environment that has an entirely different nature. The linear span of noisy trajectories always tends to the ambient space C d×d×K with the growing number of trajectories in a data set and does not correspond to the genuine subspace C K ⊆ C K×d×d of trajectories. Thus, one requires a method that identifies the genuine trajectories subspace in the presence of background noise. In other words, one needs to find a low-dimensional hyperplane of an appropriate dimension such that the data points (trajectories) lie as close to this hyperplane as possible. The dimension of the hyperplane is chosen to separate signal from noise. This is a typical machine learning task usually performed by using a principal component analysis (PCA) with automatic rank determination. We found the automatic rank determination procedure described in [63] the most suitable for our case. The overall algorithm is summarized here: Step 1. One concatenates all the trajectories chunks into a matrix of size Kd 2 × L(N − K + 1), which has the form of a Hankel matrix. Here, trajectories chunks can be thought of as vectors of length Kd 2 . In the noiseless case, it is enough to take the linear span of the Hankel matrix columns to determine the subspace C K , but we need to process them further in order to distinguish the signal from background noise.
Step 2. One performs a singular value decomposition (SVD) of the Hankel matrix H = U SV † , where U and V are isometric matrices and S is the diagonal matrix with singular values on the diagonal arranged in non-increasing order.
Step 3. Following the results of [63] one separates "noisy" singular values from "signal" singular values by comparing them with a threshold where f is defined as follows f (β) = 2(β + 1) + , σ is the standard deviation of noise. If a singular value is greater (less) than the threshold, one considers it as a signal (noise) singular value. Since the quantum state tomography requires an enormous number of measurement outcomes to build a precise estimations of system's states, the size of measurements statistic is the main bottleneck of a tomographic experiment. Therefore, the observed noise is mainly the result of a statistical error caused by the finiteness of the quantum measurements statistics N samples per time step. This allows one to estimate the standard deviation of noise σ as N −1/2 samples . More precisely, the standard deviation σ of the resulting noise reads σ ≈ N −1/2 samples for the standard tomographic protocols or σ ≈ N −1 samples for adaptive tomographic protocols with a protocol-specific prefactor [86]. If trajectories result from a Monte-Carlo simulation [13] one can estimate σ from the number of samples as σ ≈ N −1/2 samples . The threshold s differs from the threshold in [63] by a factor √ 2. This is motivated by the fact that the amplitude of the absolute value of a complex valued noise is √ 2 times larger. The intuition behind such a separation of singular values is simple: one uses the random matrix theory to determine the maximal singular value s max of a random matrix with the standard deviation of a matrix elements σ. Then, s ≈ s max can be seen as a threshold that separates "noisy" singular values from "signal" singular values; Step 4. One truncates the SVD of the Hankel matrix as followŝ where we use MATLAB/NumPy notations to represent slices of matrices (see Appendix C), η is the number of singular values that are greater than s. The corresponding "truncated" Hankel matrix readsĤ =ÛŜV † . This algorithm provides a data-driven estimation of the genuine trajectories subspace C K , its dimension, and the dimension of the effective environment: where span(·) denotes the linear span of matrix columns and η serves as a data-driven estimation of the genuine trajectories subspace dimension r. But these are not the only useful outcomes of the algorithm: it also reduces noise in the measured trajectories. The idea is that since noise does not preserve the subspace C K , it takes trajectories out of this subspace (see Fig. 4). Meanwhile, the orthogonal projection of trajectories back onto the subspace C K reduces the effect of noise because the component of noise orthogonal to the subspace C K is thus eliminated. Note, that one can reconstruct the whole trajectory of length N from its chunks as shown in Fig. 3b. In order to establish a connection between the proposed denoising method with previously developed techniques for denoising and quantum tomographic reconstruction from limited data, we note that usually their main idea is to employ an assumption about underlying states model imposing certain restrictions on admissible states. In other words, these methods use prior information about an unknown state in addition to the observed measurements data, which mitigates requirements to the amount of data. For example, one can use the low-rank assumption to reconstruct a fairly pure state from limited data [87], or assume limited entanglement to make the reconstruction of a many-body state from measurement outcomes possible [88]. However, our technique utilizes another type of prior information. It uses prior information about the temporal structure of a system's dynamics, not about its state. It fixes some level of complexity of the dynamics that reduces the required amount of data and leads to noise reduction. Our denoising technique is compatible with conventional methods of noise reduction, and in combination, they can lead to further improvement of data efficiency. We verify the performance of denoising in the numerical experiments discussed further below.

B. Identification of the Markovian master equation governing trajectories dynamics
Denoised trajectoriesT where · 2 stands for the 2-norm. The first line corresponds to the minimization of the difference between left and right hand sides of the equationT K (t + 1) = M [T K (t)] driving the dynamics of trajectories. The second line restricts consideration only to matrices of rank η. As the dynamics of trajectories is embedded in the subspace of dimension η, there is no need to consider matrices of other ranks. Note that η is not only an estimate of r, but also the optimal choice of one of the hyperparameters of the predictive model. Another hyperparameter of the predictive model is the memory depth K. Search of η corresponds to the selection of the predictive model. The larger the value of the hyperparameter is, the more expressible the corresponding model is (i.e., more complex data can be described), but the more data is needed to reconstruct M (lower data efficiency). This is the bias-variance trade-off typical for the model selection tasks in machine leaning [89]. Usually one selects the best model by minimization of the error on a validation set, which is not used for model training. This model selection technique is called cross-validation [90], and requires splitting data on training and validation sets and exhaustive validation set's error minimization via brute force adjustment of hyperparameters. However, for some models, e.g. relevance vector machine [66,91], one selects the best model automatically without a validation set by using the Bayesian interpretation of the model selection task. The idea of such an automatic model selection follows the Occam's razor principle: one should select the simplest model among all possible that still yet manages to describe the observed data [66]. The correct model selection guarantees the best data efficiency and robustness to noise and corresponds to the best machine learning practices. One sees that a PCA algorithm with automatic determination of the subspace dimension solves such a model selection task. It chooses the model with the smallest value of the hyperparameter (the simplest model) that still describes the measured data. Therefore, our approach is equipped with the automatic model selection, which improves the data efficiency compared to previously proposed methods such as TTM [45] as discussed in Sec. V. The effect of noise reduction also can be seen as a consequence of the correct model selection. It also makes the model less sensitive to selection of another hyperparameter K. One can safely overestimate K by setting it knowingly large, the automatic model selection prevents the effect of overfitting. This makes the selection procedure for K very simple: one can set K to be comparable with N , e.g. K = N/2. The only important restriction is not to make the number of trajectories (N − K + 1)L in a data set too small. The proper selection of η prevents overfitting. We show this numerically in Section IV A. The optimization problem Eq. (25) is exactly solvable. Indeed, this is an example of linear regression among the most robust methods of machine learning. To write down the solution of Eq. (25), let us introduce two matriceŝ The matrixX is the matrixĤ without L last columns and the matrixŶ is the matrixĤ without L first columns. In other words, columns ofŶ are trajectories subsequent in time to the corresponding columns ofX. Using this fact, one can rewrite the optimization problem Eq. (25) in the following form where · F stands for the Frobenius norm, and M is seen as a matrix rather than as an abstract linear map. The solution of this optimization problem thus reads: whereX + is the Moore-Penrose inverse of X. Indeed, M has the rank η by construction and it is the global minimum of the objective function. Note that it is more efficient both for numerical complexity and memory to find the solution Eq. (28) by using the dynamic mode decomposition (DMD) method [64,65], rather than straightforwardly calculate Y X + . In our numerical experiments we use the DMD to find M . Finally, we summarize the data processing scheme in Fig. 5, that basically includes two steps, (i) environment dimension identification described in the previous subsection and (ii) predictive model reconstruction described in the given subsection. The proposed data processing scheme allows one to obtain (i) information on the quantum environment such as the dimension of the effective environment, eigenvalues of the quantum channel driving the discrete-in-time joint dynamics of the system and its environment, (ii) the master equation that predicts the system's dynamics, and (iii) denoised data set of trajectories. As inherent features of the scheme we note that all the used algorithms (i.e. PCA with automatic rank selection and the linear-regression-based reconstruction of M ) converge with mathematical guarantees to the optimal solution and the scheme is equipped with the automatic model selection that improves data efficiency and robustness to noise.

IV. NUMERICAL RESULTS
A. Probing non-Markovian quantum dynamics with a predefined effective environment As the first example of non-Markovian quantum dynamics that we analyze with the proposed method, we consider a model with a predefined effective environment, i.e. with a predefined Φ (rr) driving the dynamics of the system and the effective environment as a whole. We choose Φ (rr) in such a way, that its dimension can not be reduced further. This model is chosen to validate the claimed ability of the proposed method to reconstruct properties of the effective environment i.e. the effective dimension of the environment and eigenvalues of the joint system and environment dynamics. It can be easily done since all the properties of Φ (rr) are predefined and known. In this subsection we also (i ) validate the ability of our method to predict a non-Markovian quantum dynamics via the reconstructed M ; (ii ) validate the ability of our method to reduce noise in data by checking how the projection of the observed trajectories on the principle subspace reduces the amplitude of noise; (iii ) study the sensitivity of the proposed method to a choice of the hyperparameter K; and finally (iv ) we study how the performance of the method depends on the size of data set, i.e. number of trajectories L in a data set.
where (r) SE ∈ C dd eff E ×dd eff E is the joint system and effective environment density matrix that also plays the role of the state in the relevant subspace, d eff E is the dimension of the effective environment, is the state of the system, L is the Gorini-Kossakowski-Sudarshan-Lindblad (GKSL) generator [13,15,16,92,93] of the Markovian quantum dynamics of the system and the effective environment, τ is the constant discrete time step. One generates the GKSL generator L randomly as described in the Appendix B. The randomness of L prevents further dimensionality reduction of the effective environment. We use the model above to generate a number of data sets. We fix d = 2 (the system is a qubit) and for d eff E ranging from 2 to 6 generate a set of random GKSL generators. For each GKSL generator and for total simulation times N = 150, and 200 we simulate L = 4 quantum trajectories T It means that the system and the environment are in a joint thermalized state before the observation process, and at the moment the process starts, we replace the system by a new random pure state. To address dynamics prediction accuracy, for each value of d eff E , we generate an additional 'test' trajectory T test N (0), which is not employed in the training procedure, i.e. its initial state is also randomly generated but different with initial states of {T (i) In order to simulate noise appearing during data acquisition of trajectories, we add i.i.d. Gaussian noise with zero mean and variance σ 2 to the real and imaginary parts of each entry of all trajectories (both employed in the training and the test trajectory). In what follows, we consider σ to be known in advance, because in a typical tomographic experiment or in the Monte-Carlo simulation one can estimate σ from the statistics size. We also note that the considered values of σ are consistent with typical numbers of samples processed in tomographic experiments N samples ∼ 10 3 . . . 10 6 (see e.g. [86]). To demonstrate the denoising ability of our approach we keep record of both noisy and noiseless versions of each trajectory, i.e. each trajectory of all data sets {T (i) and all test trajectories T test N (0). We start discussion of the results of data processing from the reconstruction of the trajectories subspace dimension r and the effective dimension of the environment d eff E . The comparison of the exact r and the reconstructed from data one for the fixed memory depth hyperparameter K = 75, several noise amplitudes σ, and two values of a data set trajectories length N , is presented in Fig. 6a. One can see that the reconstructed r underestimates the exact value of r that is equal to d 2 d 2 E though approaches it with decreasing the noise level and increasing the length of trajectories forming a data set. In the case of σ = 0, reconstructed r precisely fits the exact one. The tiny difference appearing for large r is due to negligible memory effects that do not fit into given memory length K. For finite σ it is impossible to recognize some weak memory effects in front of noise, this leads to underestimated r value. The situation becomes critical when σ = 0.1, as the algorithm is almost unable to recognize the signal against the background noise. We also compare the exact effective dimensions of the environment d eff E with its data driven reconstruction r/d 2 , where r is reconstructed from the data. The comparison is given in Table I. We see that for small noise amplitudes and sufficiently long trajectories in a data set, the value of d eff E is reconstructed exactly.  Next, we validate the ability of the proposed method to predict the dynamics of the system. The prediction is built as follows. One reconstructs M that describes Markovian dynamics of trajectories from a noisy data set. Then one uses M to predict dynamics of a test trajectory. One takes a noisy chunk T K (0) of the test trajectory, also seen as an initial state of Eq. (2) or as a "prehistory" of non-Markovian dynamics, and propagates it forward in time by using the reconstructed from data M and Eq. (2). In Fig. 7a we demonstrate the agreement between the predicted behavior of the test trajectory and the test trajectory itself for the following parameters of the data set used for the reconstruction of M : N = 200, d E = 3 and several different values of σ. The value of K is set to 75. One sees that the proposed method predicts non-Markovian dynamics of the system correctly. Even if a data set and a test trajectory are effected by noise with reasonably high standard deviation σ = 0.1, the approach is capable to predict the system's dynamics at a small time scale. Then we compare eigenvalues of M with eigenvalues of the quantum channel Φ (rr) = exp (τ L) driving the joint system and effective environment dynamics. The results for M reconstructed from a data set with d E = 4, K = 75, N = 200 in the case of noise absence (σ = 0) and in the case with noise (σ = 0.01) are presented in Fig. 6c. One can see that there is a perfect coincidence between exact and reconstructed eigenvalues in the case without noise, which agrees well with the theory (see Section II), while in the case with noise the obtained eigenvalues are slightly shifted, and a tiny part of them only is lost. The latter fact can be explained by the indistinguishability of some eigenmodes dynamics from the noise. However, even in the presence of noise, one sees that the method provides valuable information such as the eigenvalues of the joint dynamics.
We also study how the selection of the memory depth K that is a hyperparameter of the model and number of trajectories L (data set size) impact the accuracy of the prediction. For this purpose we introduce the distance between two quantum trajectories T where states (a) and (b) are taken from the trajectories T R (t) correspondingly, and · 1 denotes a trace norm [84]. Let D noisy be the distance Eq. 30 between the prediction and the noisy version of a test trajectory T test N −K (K) and D noiseless be the distance Eq. 30 between the prediction and the noiseless version of a test trajectory T test N −K (K). These distances show how close the predicted trajectory to the noisy test trajectory and to the noiseless test trajectory and thus can be used to validate the accuracy of the prediction. The behavior of the distance as a function of K is presented in Fig. 6b. As one can see, there is a 'saturation' value of K, starting from which the accuracy of prediction is mainly determined by the noise level. This means that the accuracy of the prediction is not sensitive to the choice of K, i.e. there is no effect of overfitting when the accuracy becomes worse with an increase of the model complexity. One can set K knowingly large and this does not lead to the effect of overfitting. This is the consequence of the proper automatic model selection. To validate how the accuracy of prediction depends on choice of L we generated additional data sets for L ranging from 1 to 20, d eff E = 4, N = 200 and for several different σ. For predictions based on these data sets, we plot the dependence of D noiseles on L and σ in Fig. 8. One can observe that the accuracy rapidly improves with growing L at the beginning and then saturates. The saturation effect is caused by the fact that the prediction is based on an initial noisy prehistory of constant size (independent on L) and this noise can not be eliminated via an increase of data set size. Finally, we validate the ability of the method to reduce noise in data. For this purpose we introduce the distance between data sets as the averaged over all trajectories distance introduced in Eq. 30 and denote this distance as D . One needs to consider the distance between denoised and noiseless data sets D denoised and the distance between noisy and noiseless data sets D noisy . If D denoised is systematically smaller than D noisy then the denoising procedure works correctly. We plot D denoised and D noisy for T = 200, K = 75 and several different values of σ and d E in Fig. 6d. One can see that D denoised is systematically smaller than D noisy . This fact supports our claim on the correctness of the denoising procedure. This completes the validation of the proposed method.
B. Probing dynamics of dissipative Jaynes-Cummings model As another example of non-Markovian quantum dynamics, we consider the dynamics of a two-level atom (the system) interacting with a decaying bosonic mode (the environment) via the Jaynes-Cummings (JC) interaction [94,95]. Note that in contrast to the previous example, the effective environment of a spin is a bosonic mode that is infinitedimensional. However, in the previous example we considered a random GKSL generator driving the joint dynamics. Its randomness prevents further dimensionality reduction of the effective environment. In the given example, the bosonic mode can be truncated since we consider a case when the mode is not highly excited. This means that one can build an effective environment of finite dimension. The interesting question is whether the proposed algorithm can detect a low-dimensional effective environment from data. To address this question, we probe our method on this model and try to identify a low-dimensional effective environment. The details of the model are given below. The joint dynamics of the atom and the bosonic mode within the JC model is driven by the Lindblad equation that reads where H JC is the JC model Hamiltonian, a(a † ) is the bosonic mode annihilation (creation) operator, σ x , σ y , σ z are the Pauli matrices, g is the interaction strength, and γ is the bosonic mode dissipation rate. Taking the partial trace w.r.t. the bosonic mode, one obtains the density matrix of the atom, (t) = Tr E ( SE (t)), that experiences non-Markovian dynamics. As in the previous example, for γ = 0.05, g = 2.5 and various amplitudes of noise σ we prepare a number of noisy data sets T (i) and noisy and noiseless versions of test trajectories T test N (0) for each data set using the model above. We simulate the joint atom and mode dynamics driven by Eq. (31) at successive time steps of length τ = 0.03, and compute the corresponding trajectories of the system by taking a partial trace over the environment. The initial joint atom and mode state is taken in the factorized form where |α is the coherent state of the bosonic mode, and |ψ is sampled uniformly from the Bloch sphere. For all data sets we fix α = 1.1. To proceed with numerical simulation of Eq. (31) we truncate the infinite-dimensional Hilbert space of the bosonic mode keeping such a number of eigenstates with the lowest energy that guarantees conservation of > 95% of the initial environment state E (0) probability mass. The truncation is needed only for the tractability of numerical calculations necessary for data set generation. The limit of applicability of our data processing method is not sensitive to the genuine dimensionality of the environment; it can be either finite or infinite-dimensional. As mentioned before, only the finiteness of the memory matters. We fix L = 2, K = 100 and N = 1000 for all data sets. We apply our method to generated data sets and reconstruct r and M . As in the previous example, we use M to predict behavior of the test trajectory. The comparison of the prediction and the test trajectory for each amplitude of noise is shown in Fig. 7b. One can see that one predicts dynamics of the atom even in presence of relatively high noise.
We also note, that in all considered cases r ≤ 39, while the dimension d 2 d eff E 2 of the joint system and environment density matrix after truncation of the bosonic mode is 100, which is well above 39. This justifies the truncation of the bosonic mode and shows the finiteness of d eff E even though the genuine environment is infinite. The main outcome of this numerical experiment is that the proposed method is able to identify a finite-dimensional effective environment even in the case of infinite-dimensional genuine environment.

C. Probing the spin-boson model dynamics
The third example of non-Markovian quantum dynamics, which we analyze with our method, is the dynamics of a two-level atom coupled with a set of non-interacting bosonic modes. We refer to this model as the spin-boson model [14]. We consider the dynamics of a two-level atom in the limit of a continuum of bosonic modes. In this case there is no a straightforward way to extract a finite dimensional effective environment using first-principle numerical modeling. We test the ability of the proposed method to do this from a data set as well as the ability to predict the non-Markovian dynamics of the spin. We also study how the dimension r reconstructed from data depends on some parameters of the spin-boson model and show that r serves as a non-Markovianity or complexity measure in this case. The Hamiltonian of the spin-boson model reads where ∆ is the tunneling matrix element, ω k are frequencies of bosonic modes, a k (a † k ) are annihilation (creation) operators of bosonic modes, and g k is the strength of interaction between the atom and the k-th mode. As before, we consider the atom as a system and the set of bosonic modes as an environment. The atom affected by the bosonic modes experiences non-Markovian quantum dynamics that can be analyzed by our method. We consider the case when the initial state of bosonic modes is the ground state uncorrelated with the initial state of the atom. In this case, the influence of the environment of non-interacting bosonic modes on the system is fully described by the two-time correlation function of bosonic modes where |vac is the ground state of bosonic modes, J(ω) is the spectral density. We consider the limit of continuum bosonic modes when the gap between frequencies of neighboring modes ω k+1 − ω k vanishes. In this limit we choose the following spectral density: where ω 0 is the resonance frequency, γ is the width of the spectral function, g is the aggregated interaction strength.
We simulate the dynamics of the atom with an arbitrary pure initial state |ψ using the numerical approach and code developed in [96]. As before, we prepared a number of data sets T (i) and corresponding test trajectories T test N (0) specified by various amplitudes of noise and parameters of the spin-boson model. The initial state of the two-level atom |ψ is sampled uniformly from the Bloch sphere for each trajectory of a data set. We fix L = 2, τ = 0.15, K = 100, and N = 1000 for all data sets. As before, we reconstruct M from a data set and use it to predict the behavior of the test trajectory. The comparison of the prediction and the test trajectory is given in Fig. 7c. One can see again, that even for the relatively high amplitude of noise, the approach is capable of predicting the atom dynamics.
In order to demonstrate on a concrete example that the trajectories space dimension r can be also considered as a measure of non-Markovianity and system's dynamics complexity, we study a relation between r and the value of the parameter γ of the considered spin-boson model. Note that γ determines the width of the two-time correlation function C(t). The smaller γ is the bigger the width of the correlation function is. Therefore the smaller γ is the stronger memory effects are, and the more challenging the atom dynamics simulation is. We also consider different values of K in order to understand how the memory depth of the dynamics depends on γ. We generate several noiseless data sets for L = 4, T = 1000, τ = 0.15 and different values of γ. For each value of γ we process the corresponding data set by our method. We consider different values of the hyperparameter K and the fixed value of σ = 10 −6 in order to cut off the numerical simulation error. We start from the analysis of the prediction accuracy. In Fig. 9a we show how the accuracy of prediction D noiseless depends on K and γ. One can observe that a steep improvement of the accuracy then changes to a smooth saturation regime with increasing K. Note that for smaller γ one has worse accuracy of prediction. This observation agrees well with the fact that for smaller γ, dynamics becomes more complicated. Then we turn to the analysis of r. In Fig. 9b we show how the minimal Markovian embedding dimension depends on different values of K and γ. One can see that value of r saturates starting from some sufficiently large value of K. Note also that for smaller γ the value of K, for which r saturates, is bigger. This supports the fact that for smaller γ one has deeper memory. But the most important observation is that for smaller γ one has r bigger. In Fig. 9c, for K = 500 (which is well above the memory depth for all considered γ) we show the dependence of r on γ explicitly. One sees that r decreases with increasing γ, which is in accord with interpretation of r as the complexity of a non-Markovian quantum dynamics or a measure of non-Markovianity.

V. COMPARISON WITH EXISTING DATA-DRIVEN METHODS OF NON-MARKOVIAN QUANTUM DYNAMICS IDENTIFICATION
In this section, we compare the given method of non-Markovian quantum dynamics identification with previously proposed ones, especially focusing attention on the related "black-box" methods namely the TTM [45][46][47][48][49][50][51]54], and a method based on a Recurrent Neural Network (RNN) parametrization of a predictive model [55]. We evaluate all the methods by means of the following metrics: ability of a method to extract information about the environment of a non-Markovian quantum system, data efficiency, and robustness to noise.
The first and foremost distinguishing feature of the proposed method is its ability to extract information about the quantum environment of a system by observing only quantum trajectories of a system. It estimates the effective dimension of the environment and reconstructs eigenvalues of the joint system and environment dynamics. The effective dimension of the environment shows the size of the environment fraction that mutually exchanges information with a system. The remaining fraction of the environment is seen as a decoupled part, since it only absorbs information and does not bring any information back. The eigenvalues of the joint system and environment dynamics are essentially spectroscopic data that defines the physics of the system and environment as a whole. All previously proposed methods do not reconstruct any information about the system's quantum environment, except maybe attempts to reconstruct the effective dimension of the environment by looking for the "elbow" in the validation curve plot [61,62] while fitting the observed data by process-tensor-based methods. However, this method requires a separate validation data set and multiple retraining of a process-tensor-based model, while the proposed method is free from these drawbacks. The next indicators of performance are data efficiency and robustness to noise. Our method is equipped with the automatic model selection that selects the simplest possible model among describing observed data. This is achieved by identification of the low-dimensional linear subspace where all the quantum trajectories are placed. The dynamics of the system in this subspace can be described by a model with a fewer number of parameters, which implies less data for learning and better robustness against noise. To demonstrate the data efficiency and robustness to noise of our method, we compare the prediction of a non-Markovian quantum dynamics built using our method with that which is built using the TT method that serves as an example of a "black-box" method, and which does not account for the physics under the hood of the observed data. To make our comparison systematic we performed multiple numerical experiments (300000 in total) and compared the performance of both methods for each experiment. For the learning of the TTM model from the data, we utilized a ridge regression [66] to solve the following optimization problem: where W is the kernel of the discrete Nakajima-Zwanzig equation that one reconstructs from the data and θ is the regularization coefficient taken equal to σ in order to mitigate the effect of noise. Each experiment consisted of a data preparation stage, when test and training sets of trajectories are generated within a particular model, a fitting stage, when both the proposed method and the TTM are applied to a training trajectories set in order to build data-driven model of the dynamics, and a test stage, when both data-driven models are used to predict dynamics of trajectories from a test set. After the test stage, prediction errors of both methods are evaluated. As the error measure we used D noiseless that is distance Eq. (30) between the prediction built on top of noisy "prehistory" of a test trajectory and the noiseless remaining part of a trajectory averaged over all trajectories from a test set. As a model of non-Markovian dynamics we used the model with a predefined effective environment considered in the subsection IV A. Each experiment is defined by a particular tuple of parameters (τ, σ, K, N, L, d E , a diss , κ) where τ is the time step size, σ is the amplitude of noise, K is the guess of memory depth, N is the number of discrete time steps in test and training sets, L is the number of trajectories in a training set, d E is the predefined dimension of the effective environment, a diss is the dissipation rate (see Appendix B) and κ is the random seed that is used to generate the Lindbladian driving the dynamics of the system and the environment. We performed numerical experiments for all tuples of parameters from the set diss , . . . , a diss × κ (1) , . . . , κ (10) , where × stands for the Cartesian product and superscripts mark a particular value of a parameter used in numerical experiments. In order not to list all values of parameters here we refer a reader to Fig. 10 where x-axis of subplots shows the ranges of parameters we used. The obtained high-dimensional data needs to be visualized in a concise way. To do that we introduce a logarithmic averaging that we define as follows that is more preferable than the standard averaging since the value of D noiseless may vary several orders of magnitude for different parameters. For the sake of demonstration of what is done next, let us fix σ as a target parameter. For σ we calculated D noiseless log τ,K,N,L,dE,a diss ,κ that is the logarithmic averaged value of D noiseless over all parameters but σ. We did the same averaging for all parameters except κ and plot the results in Fig. 10. One can think of this plots as "projections" of the obtained high-dimensional data on a particular plane. They represent how the prediction accuracy for both methods depends on a particular parameter. One can note, that the proposed method outperforms the TTM almost always except cases with small noise, small memory depth, small time step size, where both methods perform equally. One can also note, that the proposed method compared to the TTM works especially good in case of high amplitudes of noise, that is especially practical one. These observations approve the proposed method's improved data-efficiency and robustness to noise in comparison with previously propose "black-box" methods such as TTM of RNN based method.
To conclude the presented comparison, one can note that the proposed method is a substantial step forward towards interpretable data-driven methods of non-Markovian quantum dynamics analysis. This is the first method allowing automatic extraction of information about the environment from the measured system dynamics and it outperforms previously proposed "black-box" methods, such as TTM, in terms of robustness to noise and data efficiency.

DATA AVAILABILITY
Realization of the method in Python as well as numerical experiments for the model with predefined effective environment are available publicly in https://github.com/LuchnikovI/Automatic-non-markovian-dynamics-learner.

VI. DISCUSSION AND OUTLOOK
We developed a data-driven method for non-Markovian quantum dynamics analysis. The input data processed by the method is a set of quantum trajectories, i.e. sequences of density matrices of the system at consecutive time steps. We note that quantum trajectories can be reconstructed with the use of quantum state tomography that is de facto the gold standard for the characterization of quantum information processing devices [88], and is a standard tool for studying the performance of various physical quantum computing platforms including photons [97], trapped ions [98], and superconducting circuits [99]. So, the method we developed is well-suited to the study of existing noisy  29)). The value of D noiseless is the distance Eq. (30) between the prediction built on top of noisy "prehistory" of a test trajectory and the noiseless remaining part of a trajectory. The first averaging sign in D noiseless log means averaging the distance over all trajectories in the test set, the second averaging sign with superscript log means that for each particular plot one performs logarithmic averaging, i.e. D noiseless = exp ( log D noiseless ), over all parameters τ, σ, K, N, L, dE, a diss , κ but the parameter that is the X-axis.
intermediate-scale quantum (NISQ) devices including listed above. We also note that the developed approach takes into account inevitable discrepancies between true density matrices (which one would obtain with infinite number of perfect measurements) and practically reconstructed ones obtained with finite amount of experimental statistics. The estimated level of this reconstruction noise (the standard deviation σ), as well as the guess about the memory depth K, are the method's hyperparameters.
The analysis of the non-Markovian dynamics performed within our method consists of two major steps. At the first stage, the method determines the simplest model describing observed data, in conformity with Occam's razor principle, and outputs the hyperparameter r defining the complexity of the model. This simplest model selection corresponds to the best machine learning practices and improves both data efficiency and noise robustness of the proposed method [66]. Moreover, r defines the dimension of the effective environment that is an important characteristic of the genuine environment. At the second stage, the method reconstructs the predictive model based on the results at the previous step. The obtained model allows one to predict the continuation of the quantum system's trajectory consisting of at least K elements.
Both stages rely on matrix decompositions guaranteeing convergence to the optimal solution while most of previously proposed methods rely on an approximate non-convex optimization [55,61,62,68]. Due to the established relation between a quantum system's trajectories and the joint system and environment dynamics, the reconstructed predictive model is not just a "black box", but an interpretable physical model of non-Markovian quantum dynamics. It allows not only the system's dynamics prediction, but also provides additional information on the underlying open quantum system properties, in particular, the effective dimension of the environment d eff E and eigenfrequencies of the joint system and environment dynamics. As an additional valuable output, the proposed method performs quantum trajectories denoising by projecting them onto the r-dimensional principle subspace. We note that the introduced denoising procedure can be used in combination with other approaches for improving accuracy of quantum tomography, including compressive sensing [87] and employing tensor-product structures [88].
We have validated the performance of our approach for dynamics of two-level system interacting either with a known finite-dimensional effective environment or with an infinite dimensional environment (JC model and spinboson model). For the first case, we have justified the ability of the method to recover the correct information on effective environment, and for the latter, we have validated the performance of the method with respect to the models related to actual experiments (see e.g. [100,101]). We note that in both cases, the considered values of reconstruction noise σ, are taken to mimic realistic tomographic experiments.
Our technique is highly relevant for ongoing experiments with NISQ processors, where quantum state tomography of several-qubit systems is an available tool. Our method makes it possible to extract relevant information about the environment affecting a quantum processor, which is extremely hard to measure directly, and build its dynamics prediction. It potentially allows building new types of data-driven controllers for the next generation of quantum processors taking into account non-Markovian dynamics and using non-Markovianity as a resource.
We also note that the method is directly applicable to the analysis of incomplete data (trajectories of some elements of a density matrix or trajectories of single observables). Indeed, one requires only the linearity of φ K ; it can output either a trajectory of density matrices, a trajectory of certain elements of density matrices, or a trajectory of certain observables. However, in this case, the physical meaning of d eff E is different and requires further analysis. This is in the scope of the next works.
Further research is also required to determine whether our scheme is capable of predicting the response of a non-Markovian system to an external perturbation. Such a prediction capability would open a promising line of research with the development of new data-driven quantum control methods. We also need to study how the proposed approach could be generalized on the case of time dependent interaction between a system and its environment. Specifically, is it possible to capture the dynamics with varying K? This generalization would be useful to detect transitions between different dynamics regimes and dynamical phase transitions. Criterion 1. The value of K is sufficient for Markovianity of quantum trajectories iff ker ⊥ (φ K Φ) ⊆ ker ⊥ (φ K ) . (A7) Let us derive an important upper bound of the minimal sufficient K. Note, that if K is insufficient, then the following holds r(K + 1) ≥ r(K) + 1, or in other words r(K) > K.
Note also that the dimension of the relevant subspace does not exceed d 2 d 2 E . Therefore one has which states that any insufficient K is less then d 2 d 2 E . This means that the minimal sufficient K is not greater than d 2 d 2 E and for any finite dimensional environment the minimal sufficient K is also finite. This also means that the minimal sufficient K is less or equal to r.
The Eq. A10 implies that in the worst case the minimal sufficient K and r grow exponentially with the number of subsystems of the environment. In the thermodynamic limit the minimal sufficient K and r may even tend to infinity. Our conjecture is that this case corresponds to the dynamical chaos. Indeed, the same unbounded growth of r takes place when one builds time-delay embedding of classical chaotic dynamics [70]. This also is well compatible with understanding of r as the complexity measure. When r and the minimal K tend to infinity this means infinite complexity or chaotic behavior.