Learning conservation laws in unknown quantum dynamics

We present a learning algorithm for discovering conservation laws given as sums of geometrically local observables in quantum dynamics. This includes conserved quantities that arise from local and global symmetries in closed and open quantum many-body systems. The algorithm combines the classical shadow formalism for estimating expectation values of observable and data analysis techniques based on singular value decompositions and robust polynomial interpolation to discover all such conservation laws in unknown quantum dynamics with rigorous performance guarantees. Our method can be directly realized in quantum experiments, which we illustrate with numerical simulations, using closed and open quantum system dynamics in a $\mathbb{Z}_2$-gauge theory and in many-body localized spin-chains.


I. INTRODUCTION
Machine learning (ML) is playing an increasingly important role in physical sciences [1].The ability of ML to recognize patterns in data greatly facilitates the datadriven approach to scientific research, where scientific discoveries are achieved by analyzing experimental data.Recently, many works have been done to discover physical laws with ML models [2][3][4][5][6][7][8][9][10][11][12][13][14].Following this line, several works attempt to learn conservation laws in classical mechanical systems [15,16].The ML models in these works can successfully discover conserved quantities in simple classical systems, such as energy conservation, angular momentum conservation, and momentum conservation in two-body gravitational systems.
The conservation law, or the integral of motion, is also an important concept in quantum mechanics.There are usually many global conservation laws in quantum dynamics, such as the eigenstate projection operators, when the dynamics are governed by a Hamiltonian.However, these do not imply any special dynamical properties.In the quantum setting, the physically relevant conservation laws are the ones with locality structure [17].Such local integrals of motions [18][19][20][21] underlie, for instance, the absence of thermalization and transport in certain quantum systems -in contrast to ergodic systems, which typically conserve only a few globally supported quantities such as the total energy or number of particles.Thus, local integrals of motions are central for our understanding of phenomena such as many-body localization (MBL) [22][23][24][25][26][27][28] and Hilbert space fragmentation [29].
In this work, we consider a broad class of conservation laws with conserved quantities given by sums of geometrically local observables.These conservation laws can be geometrically localized or have support across the entire system.We propose an algorithm for discovering all such conservation laws in arbitrary quantum dynamical systems.We consider different types of quantum dynamics in which a quantum state ρ(t) evolves with time, and its evolution is described by the von Neumann equation under a Hamiltonian H or a Lindblad master equation under a Lindbladian L. Our algorithm is general enough to cover Hamiltonians that change over time, for instance, periodically, as in Floquet systems, and find observables that are conserved between periods.The conservation laws we consider could be state-dependent, i.e., the observables are conserved for a certain subset of initial states.Such observables can be more difficult to find than those that are conserved for all states because they will be overlooked if only information from the Hamiltonian or the Lindbladian is used.
Our algorithm combines classical shadow formalism [30] and several data analysis techniques.The algorithm uses classical shadows to estimate the expectation values of many Pauli observables at multiple times during the quantum dynamics based on a limited number of randomized measurements [30,31].After estimating the expectation values, the algorithm performs singular value decomposition (SVD) on a data matrix and gets the low dimensional manifold corresponding to the conservation laws.We rigorously prove that the algorithm can efficiently learn all conserved quantities which are sums of geometrically local observables in quantum dynamics.Although some previous papers are using numerical techniques to find such conservation laws based on known Hamiltonian [32][33][34][35][36], our method can be directly applied to experiments to find conservation laws in arbitrary unknown quantum dynamics.Our method also comes with theoretical guarantees that the sample and computational complexities are both polynomial in the system size and precision.Note that, for Hamiltonian dynamics, one could also obtain conservation laws by first learning the Hamiltonian [37][38][39][40][41][42][43][44][45][46][47][48][49][50][51][52][53].However, Hamiltonian learning protocols only work efficiently when there are some known sparsity or locality constraints on the Hamil-tonian.For quantum dynamics without such sparsity structure, prior works involve exponential sample complexity or classical post-processing cost.In comparison, our methods are not subject to these constraints.
We perform numerical experiments illustrating the learning of conserved quantities in closed and open system dynamics in a Z 2 gauge theory, arising from local and global symmetries.In addition, we demonstrate the learning of local, approximate conservation laws in a one-dimensional XXZ-chain with local disorder.Here, a sharp increase in the number of local conserved quantities takes place at a certain disorder strength, which we successfully observe from the result of our algorithm.
Discovering conservation laws in quantum systems is a fundamental problem and has also been studied in prior works.To our knowledge, this is the first work to provide general rigorous guarantees for learning and testing conservation laws in quantum experiments with unknown dynamics, by employing the randomized measurement toolbox [31] and the classical shadow formalism [30].While Ref. [36] proposes a similar algorithm to construct local conserved quantities in quantum dynamics, they use it in classical simulations of known timeindependent Hamiltonian dynamics to study integrability.Ref. [54] conducted physical experiments to obtain conserved quantities that are supported in local regions, which excludes conserved quantities supported on the entire system, such as total magnetization.In contrast, our protocol can uncover conservation laws supported locally and globally.Furthermore, [54] considers quantities that are conserved at discrete points in time, whereas we can also deal with the continuous-time scenario through robust polynomial interpolation.

II. ALGORITHM DESCRIPTION
The goal of our algorithm is to find all conserved quantities that are linear combinations of Pauli operators supported on k = O(1) adjacent qubits, in an unknown quantum dynamical system with experimental feasible measurements.The Hamiltonian or Lindbladian that governs the quantum dynamics is completely unknown and need not to be local, but we can control it to evolve for a time of our choice and perform randomized singlequbit measurements.Throughout this work, we will use ρ(t) to denote the time-evolved state and O(t) to denote the time-evolved observable in the Heisenberg picture.
Classical Shadows.The classical shadow formalism [30] was proposed to efficiently predict local observables with experimentally feasible randomized measurements [31].To be specific, it was shown that one can predict M arbitrary linear target function Tr(O 1 ρ), • • • , Tr(O M ρ) up to additive error ϵ with only O(B log(M )/ϵ 2 ) measurements, where B is the upper bound of the shadow norm defined in [30].This result implies that a limited number of measurements are enough to predict the expectation values of a large number of observables.Making use of this property, we can predict all k-local Pauli observables with a limited number of random measurements.
The classical shadow formalism is summarized as follows: We approximate an N -qubit quantum state by performing randomized single-qubit Pauli measurements on N s copies of ρ.That is, we project each qubit to one of three Pauli basis X, Y, Z and get a product state composed by the six basis states {|0⟩, |1⟩, |+⟩, |−⟩, |i+⟩, |i−⟩}.Performing one randomized measurement gives us such a product state, which can be stored in classical memory with an N -element array.After performing such measurements on N s copies of states, we get N N s singlequbit measurement results, which we can make use of to construct an approximation of the unknown state ρ: where ρ(ns is the outcome of qubit i in the n s -th randomized measurement.Eq. ( 1) allows in principle to fully recover the density matrix ρ, in the sense E[ρ] = ρ where we take the expectation value of many random unitaries and projective measurements [31].However, this requires an exponentially large number of copies of states N s = O(exp(N )).In contrast, N s = O(3 r log(N )/ϵ 2 ) is enough to provide an ϵ-accurate approximation of all reduced r-body reduced density matrix, allowing to estimate expectation values of r-local observables.We emphasize that this estimation can be made robust against errors in the application of the random unitaries and read-out errors [55][56][57].
Learning conservation laws.Our algorithm makes use of classical shadow formalism to evaluate the expectation values of all geometrically k-local Pauli observables with randomized measurements (the measurement results can be used to estimate all k-local Pauli observables, but we only focus on the geometrically local ones), and then post-process the measurement data to identify the conserved quantities.This is a non-trivial task because there are uncountably many linear combinations of these Pauli observables that can possibly be conserved.
Here we propose a method to efficiently narrow down the range of conserved quantities to look for.
We consider a quantum system on a D-dimensional lattice, with each site containing a qubit.We denote all geometrically k-local Pauli operators by P i , i = 1, 2, • • • , N P with N P ≤ O(N ).We then look for conserved quantities of the form In other words, {P i } form a basis of the subspace in which we search for conserved quantities.The expectation values of P i at time t j , j = 1, 2, • • • , N T , form a data matrix of size N P × N T (N T = O(N P )), which we denote by X.Its elements are Our algorithm is built upon the following observation (see also Refs.[36,47]): every conserved quantity of the form (2) lies in the null space of a matrix W ⊤ , where its transpose matrix W is defined through This is because, if operator O defined in (2) is conserved, then i c i ⟨P i (t j )⟩ are equal for all j, and they are thus all equal to the average.Consequently By (4) we then have is the vector formed by the coefficients in O.
From the above analysis, we can see that all conservation laws which are linear combinations of geometrically k-local terms must correspond to a vector in the null space of W ⊤ , and consequently, we can find all of them by examining this null space.At the same time, the dimension of this null space yields the number of independent conservation laws.Each singular value σ of W describes how much its corresponding operator expectation value changes over time [36].More precisely, let u = (u 1 , u 2 , • • • , u N P ) be the left singular vector corresponding to σ, and O = i u i P i , then where ⟨O(t j ′ )⟩ denotes the average of ⟨O(t j ′ )⟩ over the time index j ′ .Because of the inevitable statistical noise, the matrix W ⊤ we get from data will most likely not have a nontrivial null space.Therefore instead of looking at the null space, we will look at the subspace spanned by the left singular vectors of W corresponding to singular values below a truncation threshold ϵ, which serves as a precision parameter.These singular vectors are readily obtainable by performing SVD on our approximation of W based on finitely many samples (see also Ref. [47] for a similar procedure in the context of Hamiltonian learning).The number of such singular values provides an upper bound of the number of independent conserved quantities, which we will prove later.The computational cost of performing SVD on the data matrix is O(N 3 ).
So far we have been mainly concerned with conserved quantities that are specific to a single initial state.We may also learn conserved quantities for a distribution D of initial states in a similar way.In this scenario, we not only sample times t j , but also the initial states ρ k from D independently, for k = 1, 2, • • • , N I .The data matrix X is constructed to have N P rows and N T N I columns, consisting of entries X i,jk = tr[P i (t j )ρ k ], where j and k together index the columns.The matrix W is similarly modified to be W Testing conservation laws.The above procedure gives us candidates for conservation laws.However, it is not guaranteed that the quantities we get are indeed conserved, and therefore we need to test the candidates.Testing a finite group symmetry has been considered in Ref. [58], but their algorithm requires implementing the group action on a quantum computer, whereas we want to keep our procedure to only single-qubit operations.There are two problems that we need to overcome: the first is that in the above we only look at a discrete set of times t j , and cannot rule out the possibility that some quantity be conserved at these discrete times but not conserved at other times.The second is that we cannot hope to tell if a quantity is exactly conserved because of the presence of statistical noise.Consequently, we formalize the problem into a hypothesis testing problem.We first define how far a quantity deviates from its average up to time T by With d(O, ρ) we introduce the two hypotheses that we want to distinguish for every candidate O that comes from the learning procedure.The classical shadow technique enables us to process all O's in parallel.
For a fixed ρ ∼ D, we compute the maximal deviation of observable O from its time average using robust polynomial interpolation [59].We first randomly sample the discrete times t j , and then perform robust polynomial interpolation to obtain values for ⟨O(t)⟩ at all times t ∈ [0, T ].Then we can directly compute the maximal deviation from the time average.This enables us to compute d(O, ρ) with high confidence level.Note that the above discussion is for continuous t.If we want to test conservation laws for discrete t, such as for a Floquet system, then the problem comes strictly easier, as interpolation will not be needed.The ensemble average E ρ∼D d(O, ρ) can be computed from finitely many samples of ρ, thus enabling us to solve the hypothesis testing problem in Eq. ( 8).

III. RIGOROUS GUARANTEES
As noted previously, the estimates for tr[P i (t j )ρ k ] necessarily involves statistical noise.We will then analyze how the noise impacts the result we get.First we will analyze that in the learning algorithm based on finding the null space, the algorithm still yields an upper bound of the number of conserved quantities even when statistical noise is present.
We denote the number of independent conserved quantities by N c , and the dimension of the null space of W ⊤ by D null .It is guaranteed that N c ≤ D null because a quantity that is conserved in all times must also be conserved at discrete times t j and for the sampled states ρ k .W is computed from the data matrix X, but in practice, we do not directly have access to X, but can only obtain its noisy estimate X, which leads to a noisy estimate for W that we denote by Ŵ .Ŵ is almost surely full-rank due to the effect of the noise.We denote E = X − X, and this is the matrix containing all the entry-wise errors.The matrix W is perturbed similarly, and the errors can be collected into a matrix whose spectral norm is at most ∥E∥.Consequently, we cannot directly estimate D null .As discussed before, instead we look at the number of singular values of Ŵ that are below a threshold ϵ, which we denote by Dnull .For Dnull we have the following theorem (where for simplicity, we let N T × N I and N P be of order O(N ), more general N T , N I and N P are considered in Theorems 3 and 4 of Sec.I of the supplemental material (SM) [60]: where N c is the number of conserved quantities, with probability at least 1 − δ.Here Dmedian null is the median taken over O(log(δ −1 )) independent samples of Dnull and Dnull denotes the number of singular values of Ŵ below ϵ.In particular, when the quantum system has constant correlation length, the sample complexity can be reduced to O(N 2 ϵ −2 log(δ −1 )).
For the proof of this theorem, we refer to Sec.I of the SM [60], in which we bound singular value perturbation that comes from statistical noise.From its definition, we can see Dnull that is a decreasing function of ϵ, and consequently, for smaller ϵ, we will have a tighter upper bound for D null and the number of conserved quantities N c .If we keep the singular value perturbation below ϵ, then the D null 0-singular values of W will still be below ϵ after perturbation, thus ensuring D null ≤ Dmedian null .This will require more samples as ϵ decreases, as can be seen from Theorem 1.
We can also guarantee that by collecting all the leftsingular vectors of Ŵ corresponding to singular values below the threshold ϵ, we will have all the conservation laws approximately contained in the span.More precisely, each conservation law, when expressed as a norm-1 vector, will have an overlap with the subspace spanned by these singular vectors, and this overlap is lower bounded by 1 − ∥E∥ 2 /ϵ 2 .Therefore, when ∥E∥ ≪ ϵ, we will have an accurate description of all conservation laws.For detailed proof of this bound, see Sec.II of the SM [60].
Next, we will provide guarantees that the candidates for conserved quantities from the learning algorithm can be efficiently verified using the procedure described pre-viously.First, we consider the scenario where the conservation law is specific to a single initial state, i.e., the distribution D is completely concentrated on ρ.
Theorem 2. We assume that D is concentrated on a single ρ.
Then for T > 0 we can distinguish between the two hypotheses in (8) for each i with probability at least 1 − δ using O ΓT ϵ −2 log(δ −1 ) max i ∥O i ∥ 2 shadow samples.We refer to Sec.III of the SM [60] for detailed proof.
We note that d ℓ fi(t) ) is a very reasonable assumption to make.We will show in Sec.VI of the SM [60] that this assumption holds with Γ = O(1) when the dynamics is described by the von Neumann equation, and the Hamiltonian satisfies certain conditions.These Hamiltonians include geometrically local Hamiltonians and certain power-law interaction Hamiltonians.Without such an assumption, we can also choose Γ = ∥H∥ and then this inequality holds for all Hamiltonians.An extension to the Lindbladian case is straightforward.
Next, we consider a generic initial state distribution D. In this scenario, we can sample conserved for the sampled initial states.This naturally leads to the question of whether we can generalize the testing results for the sampled initial states to the entire distribution.The above involves generalization errors of the form In Sec.IV in the SM [60], we show that we can use ) to ensure that the generalization errors for all observables are below ϵ/4 with probability at least 1 − δ/2.For each sampled ρ k , we need O ΓT ϵ −2 log(δ −1 ) max i ∥O i ∥ 2 shadow samples to compute d(O i , ρ k ), i = 1, 2, • • • , χ according to Theorem 2, which multiplied by N I yields the total sample complexity for estimating E ρ∼D d(O i , ρ) up to precision ϵ/2.Therefore Theorem 3.Under the same assumptions as in Theorem 2, except that we do not restrict the form of the initial state distribution D, the hypothesis testing problem in (8) can be solved using So far, we have considered quantities that are on average conserved for an ensemble of states.It is natural to ask whether we can determine if an observable O is conserved for all states, namely [H, O] = 0. Through a quantum query complexity lower bound, we can show that this task cannot be accomplished efficiently in the worst case.The high-level idea of this argument goes as follows: suppose we have a black-box oracle U encoding a bit-string x = (x 1 , x 2 , . . . ) through U |n⟩ = (−1) xn |n⟩, then letting H = U we can implement e −iHt using two queries to U .If an algorithm can distinguish between ∥[H, O]∥ = 0 or ≥ 1 with high probability with Q queries to e −iHt , we can then show that it can evaluate OR(x) with 2Q queries to U .The query complexity lower bound of the OR function [62] then tells us that Q = Ω(2 N/2 ).For a detailed statement of the result and its proof, see Sec.V of the SM [60].

IV. NUMERICAL EXPERIMENTS
In this section, we illustrate our algorithm with numerical examples.We consider a Z 2 gauge theory and a disordered Heisenberg model in one dimension.
A. Identifying conservation laws in a lattice gauge theory As a first numerical example, we consider a Z 2 lattice gauge theory with staggered matter fields in one spatial dimension.The Hamiltonian is specified by where we choose periodic boundary conditions.By direct inspection of the Hamiltonian, we find that we can expect N/2 + 2 conservations laws, given by the Hamiltonian itself, the magnetization and N/2 Gauss laws We note that magnetization and Hamiltonian are linear combinations of geometrically 1-local and 3-local terms, which have support on the entire system.In contrast, the N/2 Gauss laws are strictly geometrically 3-local.In the following, we consider N = 8 qubits, set the mass parameter to unity m = 1, and choose electric field e = 3/2m and lattice spacing a = m/3.First, we investigate our protocol in the absence of statistical noise due to a finite number of measurements [Fig.1a)].We aim to learn all conservation laws with weight k ≤ 3. We find that for sufficiently long times T = 200, data collected from dynamics starting from a single initial random product state N I = 1 is sufficient to identify all expected N + 2 conservation laws: The singular values λ i for 1 ≤ i ≤ N + 2 are close to zero (a non-zero value originates from finite machine precision) with a large gap to λ N +3 .At shorter times T = 20 and N I = 1, the spectrum of singular values appears to be continuous, and conservation laws are not apparent.In contrast, data collected from several random initial product states is substantially more expressible (see Ref. [38] for a similar observation in the context of Hamiltonian learning).Even at short times T = 20, the expected conservation laws can be identified.We emphasize that hereby, we keep the total number of points N T N I ≈ 2N P where data is taken to be constant to enable a fair comparison (N T = 2N P = 624 for N I = 1 and N T = 41 for N I = 15).In all cases, the time points are equally spaced.
Secondly, we simulate our complete protocol, including a finite number of measurements M per time point and initial state.We choose N I = 15 randomly chosen initial product states, a total time evolution time of T = 20/m with N T = 41 steps.We find that for a moderate number of M = 10 5 randomized measurements, N + 2 singular values are gapped out [Fig.1b)].Analyzing a Pauli basis expansion of the corresponding singular vectors of our data matrix, we find that these correspond to the expected conservation laws, magnetization, energy (Hamiltonian) and N Gauss laws [Fig.1c)].
Finally, to test that the learned quantities are indeed conserved over times, we employ an independent data set of same size.We estimate expectation values of the learned quantities at different times and compute their maximum deviation from their mean values, averaged over initial states.We note that this represents a simplified testing procedure than employed in Secs.II and III, and devote the full numerical implementation of the robust polynomial interpolation for testing to future work.Indeed, we find that quantities corresponding to small singular values have small variations over time up to statistical noise originating from a finite number of randomized measurements M .The testing procedure also helps us better distinguish conservation laws from observables that are close to being conserved, by opening up the gap between singular values, as can be seen in Figure 1b.By using a different set of data to perform testing, we can exclude quantities with only small variation for a single noise realization or a single set of initial states.This is similar to detecting overfitting in supervised learning.
While we have so far concentrated on unitary dynamics, we emphasize that our protocol can serve to learn conservation laws of arbitrary quantum dynamics.To illustrate this point, we add local dephasing with strength γ, corresponding to jump operators L i = √ γσ z i , and solve the corresponding Lindblad equation (all other Hamiltonian parameters remain the same).Since magnetization and Gauss laws are diagonal in the computational Z-basis, they remain conserved also for γ > 0. In contrast, energy conservation is lost as shown in Fig. 2a where λ 2 is missing (the indices of the subsequent singular values λ 3 , λ 4 , . . .have been shifted by 1 for clarity).
Our numerical experiments demonstrate that we can learn conservation laws, which are linear combinations of k-local Pauli strings with a moderate number of randomized measurements M .We can decrease the required number of measurements further if we restrict ourselves to learning conservation laws with support on subsystems only, i.e. disregard quantities such as magnetization or Hamiltonian, which are linear combinations of few body terms but have support on the entire system.To achieve this, we construct reduced data matrices W A from measurement data obtained from the subsystem A only.This is illustrated in Fig. 2b, where we plot the singular values of data matrices W Aj with A j = [j − 1, j, j + 1] containing the three sites j − 1, j, j + 1 as function of j (periodic boundary conditions are implied).With only M = 10 4 randomized measurements [c.f.M = 10 5 in Fig. 1b)] per time point and initial state, we can identify the expected N Gauss laws contained in subsystems A j with j mod 2 = 0. restricted to three adjacent subsystem sites as function of j.This allows to learn the Gauss laws, corresponding to the gapped singular values at even j, with considerably fewer measurements M = 10 4 (c.f.Fig. 1b).

B. Identifying conservation laws in a many-body-localized system
Next, we consider a disordered one-dimensional XXZspin chain with nearest neighbor interactions which serves as a standard model for investigating many-body localization and thermalization [26][27][28].It is described by the following Hamiltonian where the local disorder potentials h i (i = 1, . . .N ) are randomly distributed in the interval [−w, w], and w is the disorder strength.We assume in the following Numerical studies [18,[63][64][65] indicate that this model exhibits a transition from an ergodic phase at weak disorder to an ergodicity breaking many-body localized phase at sufficiently strong disorder [66].While in the ergodic phase, nearly all eigenstates obey the Eigenstate Thermalization Hypothesis [67,68] in the MBL phase ETH is not valid, and the system is characterized by an extensive number of quasi-local conservation laws τ i (i = 1, . . .N ), called l-bits [18][19][20][21].In terms of these conservation laws, the Hamiltonian ( 13) can be written as where, in the MBL phase, the τ i are quasi-local, in the sense that they can be approximated by geometrically local operators to exponential precision, and the coupling coefficients J ij , J ikj decay exponentially with distance |i − j|.We emphasize that, by switching to an energy eigenbasis, we can always rewrite the Hamiltonian (13) in the form of Eq. ( 14), also in the thermalizing phase.In general, however, the conservation laws τ i will be completely non-local, high-weight operators which vanishing overlap to the microscopic degrees of freedom σ z i .In this case, Eq. ( 14) is of little use.Finally, we note that independent of the disorder strength, there are always two conserved quantities given as sums of local observables, which are the Hamiltonian itself and the total magnetization.
With our learning algorithm, we target conserved quantities given as sums of local observables.While we expect to be able to learn the local XXZ-Hamiltonian and total magnetization for all disorder strengths, the conserved quantities τ i are expected to be inaccessible in the thermal phase due to their non-local nature.In contrast, for strong disorder, we expect an extensive amount of approximately conserved local quantities, approximating the quasi-local l-bits τ i to high precision.To test these expectations in small systems, we simulate our protocol, by sampling the random initial product state, picking a random disorder pattern and time-evolve under the corresponding XXZ-Hamiltonian Eq. ( 13) using exact diagonalization.We evaluate Pauli expectations of all N P geometrically up to 3-local Pauli operators at N T = 41 equidistant time points up to a final time T = 40.We repeat this for N I ≈ 2N P /N T initial states per fixed disorder pattern.To simulate realistic shot noise arising from a finite number of randomized measurements, we use a Gaussian noise approximation adding independent Gaussian noise to each expectation value with a variance corresponding to M = 5×10 5 randomized measurements per time point and initial state [69].Finally, we construct the data matrix Ŵ and perform SVD.
In Fig. 3a, we display the 10 lowest singular values of Ŵ as a function of disorder strength, each point corresponds to an average over 11 random Hamiltonian configurations.Consistent, with our expectation, we find independent of the disorder strength two small singular values, corresponding to magnetization and Hamiltonian.As remarked before, they still attain a non-zero value due to the finite number of measurements M .In addition, we find that the singular values λ i (i ≥ 3) decrease strongly with increasing disorder strength, indicating an increasing number of approximately conserved quantities.To show this more quantitatively, we plot in Fig. 3b), the number of singular values below a threshold ϵ = 0.02 as a function of the disorder strength for various system sizes.We observe a sharp increase at a disorder strength, which is consistent with previous findings on the onset of many-body localization effects in finite-size systems [18,[63][64][65].In addition, the number of singular values below the threshold increases with system size, indicating indeed an extensive number of approximately conserved quantities.

V. CONCLUSION AND OUTLOOK
In this paper, we propose a method for learning conservation laws in arbitrary quantum dynamics.Our method can find a set of observables that include all conservation quantities with high probability, and we also propose a method to test the conservation law candidates obtained in this way.The sample complexity and classical processing time are both at most polynomial in the system size.The conservation laws hold for either a single input state or an ensemble of input states, and for the latter case, we derive a generalization bound ensuring that the result from finitely many samples can be reliably generalized to the entire ensemble.We provided a proof of principle of our method using numerical experiments in a one-dimensional Z 2 lattice gauge theory and one-dimensional MBL systems.Beyond these examples, we envision a wide range of applications for our protocol, ranging from Hilbert space fragmentation [29] to random circuits with symmetries [70] and the study of general quantum channels [71].In addition, knowledge of conserved quantities in dynamics enables powerful error mitigation techniques for NISQ devices [72] and more efficient (randomized measurement) protocols for probing many-body entanglement [73,74].
where N s is how many samples are used for each Xij .We are assuming that the error is Gaussian, which is reasonable for large N s due to the central limit theorem. Correspondingly, By (A12) we have In the context of our algorithm, we let A = W , Â = Ŵ , and w = ⃗ c (∥⃗ c∥ = 1) in the above lemma.⃗ c here corresponds to an exact conserved quantity O = i c i P i , Ŵ is the shifted data matrix, in which we subtract the time average from all entries so that each row sums up to zero, and W is its noiseless limit.In practice, W is perturbed to be Ŵ , and the corresponding perturbation is δA = E(I − 11 ⊤ /N T ), where E contains the noise on each entry of the data matrix X.The above result tells us that the subspace we obtain through performing SVD on Ŵ approximately contains the exact conservation law O if Here, ϵ is our chosen truncation threshold for singular values.More precisely, the overlap between the vector ⃗ c and the subspace spanned by ûk , This inequality also holds when we use multiple initial states through the relation (A7).
From the above analysis, we can see a tension in our choice of the threshold ϵ: decreasing ϵ helps us better distinguish exactly conserved quantities from the approximate ones, but on the other hand, it increases the precision requirement on our data matrix Ŵ .
Appendix C: Testing conservation laws for a single initial state In this section, we discuss how to test the conserved quantities that we have learned.The testing procedure is briefly outlined in Section II in the main text.Here, we provide a more detailed description, prove its correctness, and also analyze the cost. Let where each O i is a sum of low-weight Pauli operators, for i = 1, 2, • • • , χ.We further assume that, using the notation f (k) (t) to denote the kth derivative of f (t), with constants C and Γ.We note that this is a very reasonable assumption to make.For time evolution under the von Neumann equation, this assumption holds with Γ = O(1) for geometrically local Hamiltonians and Hamiltonians with certain fast-decaying long-range interaction.For details see Appendix F. In general, we can always choose Γ = ∥H∥.
For the Lindblad master equation, a similar result can also be obtained.

Expectation value interpolation for multiple observables
In this section, we find functions pi (t), which are piecewise polynomials, such that with probability at least 1 − δ for each t ∈ [0, T ].

a. Short-time interpolation
We first propose a method for the case where ΓT ≤ 1.In this case, f (t) can be well-approximated by a polynomial through Taylor expansion.We have where in deriving the second line, we have used the fact that We then denote This is a degree K polynomial satisfying Following [48], which in turn relies on [59], we generate independent and identically distributed samples t 1 , t 2 , • • • , t m from the Chebyshev distribution on [0, T ], specified by the probability density function (1/π)(t(T − t)) −1/2 .We then generate N s classical shadows for each t j , j = 1, 2, • • • , m. Therefore we are able to generate estimates y ij such that Therefore, with an appropriately chosen constant factor, we have with probability at least 2/3.This indicates that with probability at least 2/3.By the Chernoff-Hoeffding theorem, the above inequality holds for a majority of j = 1, 2, • • • , m with probability at least 1 − e −Ω(m) .Using the robust polynomial interpolation method proposed in [59], and as stated in [48, Theorem E.1], we can construct polynomials pi (t) from {(t j , y ij )} such that for all t ∈ [0, T ] with probability at least 1 − δ ′ , by choosing for all t ∈ [0, T ].Therefore pi (t) is a good uniform approximation of f i (t) for each i.
The above method can fail in two scenarios: either the error bound (C9) fails to hold for a majority of times, or the sampled times fail to correctly capture the profile of the function.The former failure scenario has its probability bounded by e −Ω(m) by the Chernoff-Hoeffding theorem, and the latter by δ ′ from the robust polynomial interpolation procedure.As a result, if we want to keep the total failure probability for a single f i (t) to be at most δ, then we only need e −Ω(m) + δ ′ ≤ δ. (C13) To this end, and taking into account (C11), it suffices to choose We want the uniform approximation error in (C12) to be upper bounded by ϵ.Therefore we can choose Combining the above analysis, in particular (C14) and (C15), the total number of classical shadows we need is We summarize the above analysis into the following lemma classical shadows of the time evolved state ρ(t).

b. Long-time interpolation
We then consider the case where T is not necessarily upper bounded by 1/Γ.In this case we can simply partition the interval [0, T ] into segments each of length at most 1/Γ, and there are therefore ΓT such segments.We then use the algorithm described in Appendix C 1 a to generate a polynomial to approximate each f i (t) on each of the ΓT segments.Piecing these polynomials together we have a piecewise polynomial approximation ĝi (t) that approximates f i (t) for all t ∈ [0, T ].The success probability of this procedure can be obtained via a union bound.We therefore arrive a the following theorem from Lemma 9:  classical shadows of the time evolved state ρ(t).
Unlike the usual statistical hypothesis testing situation, the two hypotheses we consider above are treated on an equal footing, and therefore we do not need to distinguish between the null hypothesis and the alternative hypothesis.This is possible because we are considering a promise decision problem.
With the piecewise polynomial approximations ĝi (t) we have, we can easily distinguish the two cases: an ϵ/ Therefore the two hypotheses can be distinguished by the statistic max t∈[0,T ] 1 T T 0 ĝi (t)dt − ĝi (t) ≤ ϵ/4 or ≥ 3ϵ/4.We can successfully distinguish between the two cases if we get an ϵ-uniform approximation for f i (t).Therefore we can use Theorem 10 to determine the cost of keeping the error probability below δ (which means both the Type-I and Type-II error probabilities are below δ).We therefore have the following theorem  Next, we show that geometrically local Hamiltonians and local Hamiltonians with power-law interaction that decays fast enough satisfy the assumptions in Lemma 13.The key quantity of interest is P :Pj ̸ =I |λ P |, which is the sum of the absolute value of all coefficients of Pauli terms that act non-trivially on a qubit j.For geometrically local Hamiltonians, there are only O(1) terms acting on any given qubit, and consequently P :Pj ̸ =I |λ P | = O(1) if |λ P | ≤ 1 as assumed at the beginning of this section.
For power-law interaction Hamiltonians, we adopt a restricted definition to make the discussion easier, without neglecting any essential feature of these Hamiltonians.For these Hamiltonians, λ P ̸ = 0 only when P involves at most two qubits.Moreover, |λ P | = O(d −α ), where d is the distance, on a D-dimensional lattice, between the two qubits, and α is the exponent deciding how rapid the decay is.The sum of all coefficients involving a qubit j can be roughly bounded by

Figure 1 .
Figure 1.Learning conservations laws in a Z2-Gauge theory.In panel a), we display the 30 smallest eigenvalues of the exact data matrix for various choices of evolution times T and number of initial states NI .At sufficiently long times T , N + 2 singular values are gapped out (light and dark blue).The use of multiple initial states (green triangles) allows to shorten the evolution time considerably.Inset illustrates this effect, showing the gap as a function of the number of initial states for fixed time T = 20.In panel b), we display the 20 smallest eigenvalues of the noisy data matrix, construct from M = 10 4 and M = 10 5 per initial state (NI = 15) and final time T = 20, NT = 41.For M = 10 4 (M = 10 5 ), increasing M , two (six) singular are gapped out.Testing these against independently obtained data yields small variations over time (inset).In panel c), we display the Pauli basis expansion of the corresponding lowest 6 singular vectors.We identify the magnetization, Hamiltonian, and four Gauss laws.In all panels, N = 8 and NT = 2NP /NI = 624.Points with error bars are the average and standard deviation over 25 experiments with identical parameters.

Figure 2 .
Figure 2. Learning conservation laws in open quantum systems.In panel a), we display the 20 smallest singular values of W for both Hamiltonian (blue) and Lindbladian dynamics (orange) with local dephasing with rate γ = 0.1.While magnetization and Gauss laws are conserved in both cases, the Hamiltonian (energy) is only conserved for Hamiltonian dynamics (λ2 is absent for γ = 0.1).We choose M = 10 6 measurements and use a Gaussian noise approximation to simulate the resulting shot noise.In all panels, N = 8, NT = 41, NI = 15 and points with errorbars are the average and standard deviation over 25 experiments with identical parameters.Panel b) displays the singular values obtained from the data matrix W [j−1,j,j−1]restricted to three adjacent subsystem sites as function of j.This allows to learn the Gauss laws, corresponding to the gapped singular values at even j, with considerably fewer measurements M = 10 4 (c.f.Fig.1b).

Figure 3 .
Figure 3. Learning conservation laws in disordered spinchains.(a)We display the 10 lowest squared singular values of the data matrix Ŵ constructed using NI = 17 initial states evolved according to HXXZ with different disorder strengths w = 2, 4, 6 (blue, orange, green) to a final time T = 40 and evaluated at NT = 41 equally spaced timepoints.We add Gaussian noise, simulating shot noise arising from M = 5 × 10 5 measurements.Each point is averaged over 11 random disorder patterns, the error bars indicate the standard error of the mean.The two smallest singular values λ1, λ2 correspond to magnetization and Hamiltonian, respectively.As the disorder strength w becomes larger, the singular values λ i≥3 decrease significantly indicating an increasing number of approximate conservation laws.(b) We show the number of singular values below a threshold ϵ = 0.02 as function of w for different system sizes.As in panel(a), we choose M = 5 × 10 5 , T = 40, NT = 41.The number of initial states for system size N = 6, 10, 14 is NI = 9, 17, 24, respectively, to ensure NI NT ≈ 2NP .Points correspond to the median of 11 random Hamiltonian configurations.We observe a sharp increase with increasing disorder strength.