Extreme Quantum Memory Advantage for Rare-Event Sampling

We introduce a quantum algorithm for memory-efficient biased sampling of rare events generated by classical memoryful stochastic processes. Two efficiency metrics are used to compare quantum and classical resources for rare-event sampling. For a fixed stochastic process, the first is the classical-to-quantum ratio of required memory. We show for two example processes that there exists an infinite number of rare-event classes for which the memory ratio for sampling is larger than r , for any large real number r . Then, for a sequence of processes each labeled by an integer size N , we compare how the classical and quantum required memories scale with N . In this setting, since both memories can diverge as N → ∞ , the efficiency metric tracks how fast they diverge. An extreme quantum memory advantage exists when the classical memory diverges in the limit N → ∞ , but the quantum memory has a finite bound. We then show that finite-state Markov processes and spin chains exhibit memory advantage for sampling of almost all of their rare-event classes.


I. INTRODUCTION
From earthquakes to financial market crashes, rare events are associated with catastrophe-from decimated social infrastructure and the substantial loss of life to global economic collapse.Though rare, their impact cannot be ignored.Prediction and modeling such rare events is essential to mitigating their effects.However, this is particularly challenging, often requiring huge datasets and massive computational resources, precisely because the events of interest are rare.Ameliorating much of the challenge, biased or extended sampling [1,2] is an effective and now widely-used method for efficient generation and analysis of rare events.The underlying idea is simple to state: transform a given distribution to a new one where previously rare events are now typical.This concept was originally proposed in 1961 by Miller to probe the rare events generated by discrete-time, discrete-value Markov stochastic processes [3].It has since been extended to address non-Markovian processes [4].The approach was also eventually adapted to continuous-time first-order Markov processes [5][6][7].Today, the statistical analysis of rare events is a highly developed toolkit with broad applications in sciences and engineering [8].Given this, it is perhaps not surprising that the idea and its related methods appear under different appellations, depending on the research arena.For example, large deviation theory refers to the s-ensemble method [9,10], the exponential tilting algorithm [11,12], or as generating twisted distributions.In 1997, building on biased sampling, Torrie and Valleau introduced umbrella sampling into Monte Carlo simula-tion of systems whose energy landscapes have high energy barriers and so suffer particularly from poor sampling [13].Since then, stimulated by computational problems arising in statistical mechanics, the approach was generalized to Ferrenberg-Swendsen reweighting, later still to weighted histogram analysis [14], and more recently to Wang-Landau sampling [15].When generating samples for a given stochastic process one can employ alternative types of algorithm.There are two main types-Monte Carlo or finite-state machine algorithms.Here, we consider finite-state machine algorithms based on Markov chains (MC) [16,17] and hidden Markov models (HMM) [18][19][20].For example, if the process is Markovian one uses MC generators and, in more general cases, one uses HMM generators.When evaluating alternative approaches the key questions that arise concern algorithm speed and memory efficiency.For example, it turns out there are HMMs that are always equally or more memory efficient than MCs.There are many finite-state HMMs for which the analogous MC is infinite-state [21].And so, when comparing all HMMs that generate the same process, one is often interested in those that are most memory efficient.For a generic stochastic process, the most memory efficient classical HMM known currently is the -machine of computational mechanics [22].The memory it requires is called the process' statistical complexity C µ [23].Today, we have come to appreciate that several important mathematical problems can be solved more efficiently using a quantum computer.Examples include quantum algorithms for integer factorization [24], search [25], eigendecomposition [26], and solving linear systems [27].Not long ago and for the first time, Ref. [28] provided a quantum algorithm that can perform stochastic process sample-generation using less memory than the best-known classical algorithms.Recently, using a stochastic process' higher-order correlations, a new quantum algorithm-the q-machine-substantially improved this efficiency and 1. Hidden Markov model generator of a stochastic process with infinite-range statistical dependencies that requires an HMM with only six states.To generate the same process via a Markov chain requires one with an infinite number of states and so infinite memory.
extended its applicability [29].More detailed analysis and a derivation of the closed-form quantum advantage of the q-machine is given in a sequel [30].Notably, the quantum advantage has been verified experimentally for a simple case [31].
The following brings together techniques from large deviation theory, classical algorithms for stochastic process generation, computational complexity theory, and the newly introduced quantum algorithm for stochastic process generation to propose a new, memory efficient quantum algorithm for the biased sampling problem.We show that there can be an extreme advantage in the quantum algorithm's required memory compared to the best known classical algorithm.Two examples are analyzed here.The first is the simple, but now well-studied perturbed coin process.The second is a more physical example-a stochastic process that arises from the Ising next-nearest-neighbor spin system in contact with thermal reservoir.

II. CLASSICAL ALGORITHM
The object for which we wish to generate samples is a discrete-time, discrete-value stochastic process [18,32]: a probability space P = A ∞ , Σ, P(•) , where . ., each random variable X i takes values in a finite, discrete alphabet A, and Σ is the σ-algebra generated by the cylinder sets in A ∞ .For simplicity we consider only ergodic stationary processes: that is, P(•) is invariant under time translation- ) for all n, m-and over successive realizations.
Sampling or generating a given stochastic process refers to producing a finite realization that comes from the process' probability distribution.Generally, generating a process via its probability measure P(•) is impossible due to the vast number of allowed realizations and, as a result, this prosaic approach requires an unbounded amount of memory.Fortunately, there are more compact ways than specifying in-full the probability measure on the sequence sigma algebra.This recalls the earlier remark that HMMs can be arbitrarily more compact than alternative algorithms for the task of generation.
An HMM is specified by a tuple S, A, {T (x) , x ∈ A} .In this, S is a finite set of states, A is a finite alphabet, and {T (x) , x ∈ A} is a set of |S| × |S| substochastic symbollabeled transition matrices whose sum T = x∈A T (x) is a stochastic matrix.
As an example, consider the HMM state-transition diagram shown in Fig. 1, where S = {A, B, C, D, E, F }, A = {0, 1, 2}, and we have three 6 × 6 substochastic matrices T (0) , T (1) , and T (2) .Each edge is labeled p|x denoting the transition probability p and a symbol x ∈ A which is emitted during the transition.In this HMM, of the two edges exiting state C, one enters state B and the other enters state A. The edges from C to A and C to B are labeled by 1 2 |1 and 1 2 |0.This simply means that if the HMM is in the state C, then with probability 1  2 it goes to the state A and emits the symbol 1 and with probability 1 2 it goes to state B and emits symbol 0. Following these transition rules in succession generates realizations in the HMM's process.
How does this generation method compare to generating realizations of the same process via a finite Markov chain.It turns out that this cannot be implemented, since generating a symbol can depend on the infinite history.That is, the process has infinite Markov order.As a result, to generate a realization using a Markov chain one needs an infinite number of Markovian states.In other words, implementing the Markov chain algorithm to generate process samples on a conventional computer requires an infinite amount of memory.
To appreciate the reason behind the process' infinite Markov order, refer to Fig. 1's HMM.There are two length-3 state-loops consisting of the edges colored red (right side of state-transition diagram) and those colored maroon (left side).Note that if the HMM generates n 1s in a row, we will not know the HMM's current state, only that it is either A, D, or E. This state uncertainty (entropy) is bounded away from 0. The observation holds for the other loop and its sequences of symbol 0 and the consequent ambiguity among states B, C, and F .Thus, there exist process realizations from which we cannot determine the future statistics, independent of the number of symbols seen.This means that the process statistics depend on infinite past sequences-the process has infinite Markov order.To emphasize, implementing a MC algorithm for this requires infinite memory.The contrast with the finite HMM method is an important lesson: HMMs are strictly more powerful generators, as a class of algorithms, than Markov chain generators.
For any given process P, there are an infinite number of HMMs that generate it.Therefore, one is compelled to ask, Which algorithm requires the least memory for implementation?The best known implementation, and provably the optimal predictor, is known as the -machine [22,33].The states of the -machine are called causal states; we denote this set S.
The average memory required for M (P) to generate process P is given by the process' statistical complexity C µ (P) [23].To calculate it: 1. Compute the stationary distribution π over causal states.π is the left eigenvector of the statetransition matrix T with eigenvalue 1: πT = π.

Calculate the state's Shannon entropy H
Thus, C µ = H[S] measures the (ensemble average) memory required to generate the process.
Another important, companion measure is h µ , the process' metric entropy (or Shannon entropy rate) [34]: w∈A n P(w) log 2 P(w) .
Although sometimes confused, it is important to emphasize that h µ describes randomness in the realizations, while C µ describes the required memory for process generation.

III. QUANTUM MEMORY ADVANTAGE
Recently, it was shown that a quantum algorithm for process generation can use less memory than the best known classical algorithm ( -machine) [28].We refer to the ratio of required classical memory C µ to quantum memory as the quantum advantage.Taking into account a process' higher-order correlations, a new quantum algorithm-the q-machine-was introduced that substantially improves the original quantum algorithm and is, to date, the most memory-efficient quantum algorithm known for process generation [29].Closed-form expressions for the quantum advantage are given in [30].
Importantly, the quantum advantage was recently verified experimentally for the simple perturbed coins process [31].It has been found that the q-machine sometimes confers an extreme quantum-memory advantage.For example, for generation of ground-state configurations (in a Dysontype spin model with N -nearest-neighbor interactions at temperature T ), the quantum advantage scales as N T 2 / log 2 T [35,36].
One consequence of this quantum advantage arises in model selection [37].Statistical inference of models for stochastic systems often involves controlling for model size or memory.The following applies this quantum advantage to find gains in the setting of biased sampling of a process' rare events.In particular, we will develop tools to determine how the memory requirements of classical and quantum algorithms vary over rare-event classes.

IV. QUANTUM ALGORITHM
We define the quantum machine of a stochastic process P, by QM (P) = {H, A, {K x , x ∈ A}}, where H denotes the Hilbert space in which quantum states reside, A is the same alphabet as the given process', and {K x , x ∈ A} is a set of Kraus operators we use to specify the measurement protocol for states [38]. 1 Assume we have the state (or density matrix) ρ 0 ∈ B(H) in hand.We perform a measurement and, as a result, we measure X.
The probability of yielding symbol x 0 ∈ X is: After measurement with outcome X = s 0 , the new quantum state is: .
Repeating these measurements generates a stochastic process.The process potentially could be nonergodic, depending on the initial state ρ 0 .Starting the machine in the stationary state defined by: and doing a measurements over and over again leads to generating a stationary stochastic process over x ∈ A. For any given process, ρ s can be calculated by the method introduced in Ref. [30].
Our immediate goal is to design a quantum generator of a given classical process.(Section VI will then take the given process to represent a rare-event class of some other process.)For now, we start with the process' -machine.The construction consists of three steps, as follows.
First: Map every causal state σ i ∈ S to a pure quantum state |η i .Each signal state |η i encodes the set of length-R sequences that may follow σ i , as well as each corresponding conditional probability: where w denotes a length-R sequence, P(w|σ i ) = and R is the process' the Markov order.The resulting Hilbert space is H w with size |A| R , the number of length-R sequences, with basis Second: Form a matrix Ξ by assembling the signal states: From here on out, we assume all the |η i s are linearly independent.(This holds for general processes except for some special cases, which we discuss elsewhere.)Define That is, we design the new bra states such that we obtain the identity: Third: Define |A| Kraus operators K i s via: Using the quantum generator QM (P), the required average memory for generating process P is C q (P) = S(ρ s ), where S(ρ) = −tr(ρ log ρ) denotes the von Neumann entropy [38].References [29,35] explain why C q is the quantum machine's required memory.

V. TYPICAL REALIZATIONS
At this point, we established classical and quantum representations of processes and characterized their respective memory requirements.Our purpose now turns to this set-up to monitor the classical and quantum resources required to generate probability classes of a process' realizations.
The concept of a stochastic process is quite general.Any physical system that exhibits stochastic dynamics in time or space may be thought of as generating a stochastic process.In the spatial setting one considers not time evolution, but rather the spatial "dynamic".For example, consider a one-dimensional noninteracting Ising spin-½ chain with classical Hamiltonian H = − n i=1 hσ i in contact with a thermal reservoir at temperature T .After thermalizing, a spin configuration at one instant of time may be thought of as having been generated left-to-right (or equivalently right-to-left).The probability distribution over these spatial-translation invariant configurations defines a stationary stochastic process-a simple Markov random field.
For n 1, one can ask for the probability of seeing k up spins.The Strong Law of Large Numbers [39] guarantees that for large n, the ratio k/n almost surely converges to . That is: Informally, a typical sequence is one that has close to p ↑ n spin ups.However, this does not preclude seeing other kinds of rare long runs, e.g., all up-spins or all down-spin.
It simply means that the latter are rare events.Now let us formally define the concept of typical realizations and, consequently, rare ones.Consider a given process P and let A n denote its set of length-n realizations.Then, for an arbitrary 0 < 1 the process' typical set [40][41][42] is defined: where h µ is the process' Shannon entropy rate, introduced above.
According to the Shannon-McMillan-Breiman theorem [43][44][45], for a given 1 and sufficiently large n * : There are two important lessons here.First, from Eq.
(1) we see that all sequences in the typical set have approximately the same probability.More precisely, the probability of typical sequences decays at the same exponential rate.The following adapts this to use decay rates to identify distinct sets of rare events.Second, coming from Eq. ( 2), for large n the probability of sequences falling outside the typical set is close to zero-these are the sets of rare sequences.

Typical Set Forbidden Set
FIG. 2. For a given process, the space A ∞ of all sequences is that are forbidden by the process, sequences in the typical set, and sequences not forbidden nor typical-the atypical or rare sequences.
Another important consequence of the theorem is that sequences generated by a stationary ergodic process fall into one of three partitions; see Fig. 2. The first contains those that are never generated; they fall in the the forbidden set.For example, the HMM in Fig. 1 never generates sequences that have consecutive 2s.The second partition consists of those in the typical set-the set with probability close to one, as in Eq. ( 1).And, the last contains sequences in a family of atypical sets-realizations that are rare to different degrees.We now refine this classification by dividing the atypical set into identifiable subsets, each with their own characteristic rarity.
Mirroring the familiar Boltzmann weight in statistical physics [46], in the n → ∞ limit, we define the subsets Λ P U ⊂ A ∞ for a process P as: This partitions A ∞ into disjoint subsets Λ P U in which all w ∈ Λ P U have the same probability decay rate U .Physics vernacular would speak of the sequences having the same energy density U .2 Figure 3 depicts these subsets as "bubbles" of equal energy.Equation (1) says the typical set is that bubble with energy equal to the process' Shannon entropy rate: U = h µ .All the other bubbles contain rare events, some rarer than others.They exhibit faster or slower probability decay rates.
Employing a process' HMM to generate realizations produces sequences in the typical set with probability close to one and, rarely, atypical sequences.Imagine that one

Typical
Set Forbidden is interested in a particular class of rare sequences, say, those with energy U (Λ P U ). (One might be concerned about the class of large-magnitude earthquakes or the emergence of major instabilities in the financial markets, for example.)How can one efficiently generate these rare sequences?We now show that there is a new process P U whose typical set is Λ P U and this returns us directly to the challenge of biased sampling.

VI. BIASED SAMPLING
Consider a finite set of configurations {c i } with probabilities specified by distribution P(•) and an associated set {ω i } of weighting factors.Consider the procedure of reweighting that introduces a new distribution P(•) over configurations where: .
Given a process P and its -machine M (P), How do we construct an -machine M (P U ) that generates P's atypical sequences at some energy density U = h µ or, as we denoted it, the set Λ P U ?Here, we answer this question by constructing a map B β : P → P β from the original process P to a new one P β .The map is parametrized by β ∈ R/{0} which indexes the rare set of interest.(We use β for convenience here, but it is related to U by a function introduced shortly.)Both processes P = A ∞ , Σ, P(•) and P β = A ∞ , Σ, P β (•) are defined on the same measurable sequence space.The measures differ, but their supports (allowed sequences) are the same.For simplicity we refer to B β as the β-map.
Assume we are given M (P) = S, A, {T (x) , x ∈ A} .We showed that for every probability decay rate or energy density U , there exists a particular β such that M (P β ) typically generates the words in Λ P U,n for large n [48].The βmap which establishes this is calculated by a construction that relates M (P) to M (P β ) = S, A, {S Having constructed the new process P β by introducing its generator, we use the latter to produce some rare set of interest Λ P U,n .Theorem 1.In the limit n → ∞, within the new process P β the probability of generating realizations from the set Λ P U,n converges to one: where: In addition, in the same limit the process P β assigns equal energy densities over all the members of the set Λ P U,n .
As a result, for large n the process P β typically generates the set Λ P U,n with the specified energy U .The process P β is sometimes called the auxiliary, driven, or effective Edges are labeled with conditional transition probabilities and emitted symbols.For example, for the self-loop on state A, p|0 indicates the transition is taken with probability Pr(0|A) = p and the symbol 0 is emitted.
process [49][50][51].Examining the form of the energy, one sees that there is a one-to-one relationship between β and U .And so, we can equivalently denote the process P β by P U .More formally, every word in Λ P U with probability measure one is in the typical set of process P β .
The β-map construction guarantees that the HMMs M (P) and M (P β ) have the same states and transition topology: The only difference is in their transition probabilities.M (P β ) is not necessarily an -machine-the most memory-efficient classical algorithm that generates the process.Typically, though, M (P β ) is an -machine and there are only finitely many βs for which it is not.(More detailed development along these lines will appear in a sequel.)

VII. QUANTUM AND CLASSICAL COSTS OF BIASED SAMPLING
Having introduced the necessary background to compare classical versus quantum models and to appreciate typical versus rare realizations, we are ready to investigate the quantum advantage when generating a given process' rare events.
The last section concluded that the memory required by the classical algorithm to generate rare sequences with energy density U is: where U and β are related via Similarly, the memory required by the quantum algorithm to generate the rare class with energy density U is: For simplicity, we denote these two quantities by C µ (β) ≡ C µ (P β ) and C q (β) ≡ C q (P β ).  4, with p = 0.6 and q = 0.8.As the inset shows, for large β both classical and quantum memories decay exponentially with β, but the quantum memory decays faster.

A. Advantage for a Simple Markov Process
Consider the case where we have two biased coins, call them A and B, and each has a different bias p and 1 − q both for Heads (symbol 0), respectively.When we flip a coin, if the result is Heads, then on the next flip we choose coin A. If the result is Tails, we choose coin B. Flipping the coins over and over again results in a process P pc called the Perturbed Coins Process [28].Figure 4 shows the process' -machine generator M (P pc ), where S = {A, B} and A = {0, 1}.
One can also produce this process with a quantum generator QM (P pc ).Using the construction introduced in Sec.IV, it has Kraus operators: For its stationary state distribution we have: Figure 5 shows the classical and quantum memory costs to generate rare realizations: C µ (β) and C q (β) versus β for different β-classes.Surprisingly, the two costs exhibit completely different behaviors.For example, lim β→0 C q = 0, while lim β→0 C µ = 1.More interestingly, as the inset demonstrates, even though both C µ (β) and C q (β) vanish exponentially fast, in the limit of β → ∞ C q (β) goes to zero noticeably faster.We define the quantum advantage of biased sampling as the ratio of classical to quantum memory: Figure 6 graphs the quantum advantage and shows how it divides into three distinct scaling regimes.First, for small |β| (high-temperature) the quantum algorithm exhibits a polynomial advantage O(β −2 ).Second, for large positive β (low-temperature) the quantum algorithm samples the rare classes with exponential advantage.The advantage grows as O(exp (cβ)) as one increases β and where c is a function of p and q.Third, for large negative β (negative low-temperature regime) there is no quantum advantage.Since we are analyzing finite-state processes, this regime appears and is the analog of population inversion.And so, formally there are β-class events with negative temperature.
Such is the quantum advantage for the Perturbed Coins Process at p = 0.6 and q = 0.8.The features exhibitedthe different scaling regimes-are generic for any p > 1−q, though.Moreover, for Perturbed Coins Processes with p < 1 − q, the positive and negative low temperature behaviors switch.No Advantage Exponential Advantage

Polynomial Advantage
FIG. 6.Quantum memory advantage for generating the rare realizations of the Perturbed Coins Process with p = 0.6 and q = 0.8 when employing its q-machine instead of it's (classical) -machine.Three different advantages occur: (i) near β = 0 the polynomial advantage scales as O(β −2 ), (ii) large positive β, there is an exponential advantage O(exp (f (q, p)β)), and (iii) no advantage at large negative β.

B. Spin System Quantum Advantage
Let us analyze the quantum advantage in a more familiar physics setting.Consider a general one-dimensional ferromagnetic next-nearest-neighbor Ising spin-½ chain [52,53] defined by the Hamiltonian: in contact with thermal bath at temperature k B T = 1.The spin s i at site i takes on values {+1, −1}.
After thermalizing, a spin configuration at one instant of time may be thought of as having been generated left-to-right (or equivalently right-to-left).The probability distribution over these spatial-translation invariant configurations defines a stationary stochastic process.Reference [54, Eqs.(84) − (91)] showed that for any finite and nonzero temperature T , this process has Markov order 2.More to the point, the -machine that generates this process has 4 causal states and those states are in one-to-one correspondence with the set of length-2 spin configurations.
the previous two spins were ↑↑ by p ↓ ↓ ↓ ↑↑ ↑↑ ↑↑ .If the generator is in the ↑↑ state and generates a ↓ spin, then the generator state changes to ↑↓.
To determine the -machine transition probabilities {T (x) } x∈A , we first compute the transfer matrix V for the Hamiltonian of Eq. ( 5) at temperature T and then extract conditional probabilities, following Ref.[54] and Ref. [35]'s appendix.
What are the classical and quantum memory costs for bias sampling of the rare spin-configuration class with decay rate U , as defined in Eq. (3)?First, note that U is not a configuration's actual energy density.If we assume the system is in thermal equilibrium and thus exhibits a Boltzmann distribution over configurations, then U and E are related via: where: This simply tells us that if a stochastic process describes thermalized configurations of a physical system with some given Hamiltonian, then every rare-event bubble in Fig. 3 can be labeled either with β, U , or E.Moreover, there is a one-to-one mapping between every such variable pair.
Figure 9 plots η(U ) versus U -the quantum advantage of generating rare configurations with decay rate U .To calculate η(U ) for a given process P, first we determine the process' classical generator M (P) using the method introduced in Ref. [33].Second, for every β ∈ R/{0}, using the map introduced in Sec.VI, we find the new classical generator M (P β ).Third, using the construction introduced in Sec.III, we find QM (P β ).Fourth, using Thm. 1 we find the corresponding U for the chosen β.Using these results gives η(U ) = C µ (β)/C q (β).By varying β in the range R/{0} we cover all the energy density U s. Practically, to calculate η(U ) in Fig.As pointed out earlier, β = 1 always corresponds to the process itself.And, one obtains its typical sequences.As one sees in Fig. 9, the quantum advantage η(1) < 2. This simply means that, though there is a quantum advantage generating typical sequences, it is not that notable.However, the figure highlights four other interesting regimes.First, there is the positive zero-temperature limit (β → ∞) corresponding to the rare class with minimum energy density equal to U min = − log 2 (p ↓ ↓ ↓ ↓↓ ↓↓ ↓↓ ) = − log 2 (p ↑ ↑ ↑ ↑↑ ↑↑ ↑↑ ).From Eq. ( 5) it is easy to see that this rare bubble only has two configurations as members: all up-spins or all down-spins.Let us consider finite but large β 1 that corresponds to the rare class with a low energy density close to U min .Figure 8(top-left) shows a general -machine for this process.Low color intensity for both edges and states means that the process rarely visits them during generation.This means, in turn, that a typical realization consists of large blocks of all up-spins and all down-spins.These large blocks are joined by small segments.Second, there is the negative zero-temperature limit (β → −∞) that corresponds to the rare class with maximum energy density equal to U max = − 1 2 log 2 (p ↓ ↓ ↓ ↓↑ ↓↑ ↓↑ p ↑ ↑ ↑ ↑↓ ↑↓ ↑↓ ).From Eq. ( 5) it is easy to see that this rare bubble only has one configuration as a member: a periodic repetition of spin down and spin up.Consider finite β 1 corresponding to a rare class with a high energy density close to U max .Figure 8(top-right) shows the general -machine for the associated process.The typical configuration consists of large blocks tiled with spin-up and spin-down pairs which are connected by other short segments.Third, there is the positive infinite-temperature limit (β → 0 + ).In this limit we expect to see completely random spin-up/spin-down configurations.Figure 8(bottomright) shows the -machine for this class labeled with nonzero small β.The transition probability for the edges labeled + is 1 /2 + and for the edges labeled − is 1 /2 − , where is a small positive number.As one can see, even though each transition probability is close to one-half, the self-loops are slightly favored.Fourth and finally, there is the negative infinitetemperature limit (β → 0 − ).The generator here, Figure 8(bottom-left), is similar to that at positive infinite temperature, except that the edge-sign labels are reversed.This means that the self-loops are slightly less favored.Generating a rare bubble with β < 0 is sometimes called unphysical sampling since there exists no physical temperature at which the system generates this rare class.As a result, the left part of the Fig. 9 corresponds to physical sampling and the right part to unphysical sampling.That said, there is no impediment to "unphysical" sampling from a numerical standpoint.In addition, as we noted, negative temperatures correspond physically to population inversion, a well-known phenomenon.Remarkably, the advantage η(U ) diverges at U = u 0 ≈ 1.878, where u 0 = lim β→0 U -both the positive and negative high temperature limit.Moreover, the advantage η(U ) diverges as (U −u 0 ) −2 in both limits and, as a result, there is a polynomial-type advantage.For this specific example one does not find a region with exponential advantage.

VIII. CONCLUSIONS
We introduced a new quantum algorithm for sampling the rare events of classical stochastic processes.The algorithm often confers a significant memory advantage when compared to the best known classical algorithm.We explored two example systems.In the first, a simple Markov process, we found that one gains either exponential or polynomial advantage.In the second, an Ising chain, we found a polynomial memory advantage for rare classes in both positive and negative high-temperature regimes.
Let us address an important point about the optimality of the classical and quantum algorithms.Consider the integer factorization problem.In this case Shor's algorithm scales polynomially [24], while the best classical algorithm currently known scales exponentially [55] with problem size.While neither algorithm has been proven optimal, many believe that the separation in scaling is real [56].Similarly, proving optimality for a rare-event sampling algorithm is challenging in both classical and quantum settings.However, with minor restrictions, one can show that the current quantum algorithm is almost always more efficient than the classical [29].

FIG. 3 .
FIG.3.The space of all sequences A ∞ partitioned into ΛU sisoenergy or equal probability-decay-rate bubbles-in which all sequences in the same ΛU have the same energy U .The typical set is one such bubble with energy equal to Shannon entropy rate: U = hµ.Another important class is the forbidden set, in which all sequences do not occur.The forbidden set can also be interpreted as the subset of sequences with infinite positive energy.By applying the map B β to the process and changing β continuously from −∞ to +∞ (excluding β = 0) one can generate any rare class of interest Λ P U .β → −∞ corresponds to the most probable sequences with the largest energy density Umax, β = 1 corresponds to the typical set and β → +∞ corresponds to the least probable sequences with the smallest energy density Umin.

2 .
(x) β , x ∈ A}the HMM that generates P β : 1.For each x ∈ A, construct a new matrix T Form the matrix T β = x∈A T (x) β .3. Calculate T β 's maximum eigenvalue λ β and corresponding right eigenvector r β .4. For each x ∈ A, construct new matrices S

FIG. 4 .
FIG. 4. -Machine generator of the Perturbed Coins Process.Edges are labeled with conditional transition probabilities and emitted symbols.For example, for the self-loop on state A, p|0 indicates the transition is taken with probability Pr(0|A) = p and the symbol 0 is emitted.

FIG. 5 .
FIG. 5. Classical memory Cµ(β) and quantum memory Cq(β)versus β for biased sampling of Perturbed Coins Process' rare sequence classes: See Fig.4, with p = 0.6 and q = 0.8.As the inset shows, for large β both classical and quantum memories decay exponentially with β, but the quantum memory decays faster.

1 FIG. 9 .
FIG. 9.Quantum advantage for biased sampling of Ising spin configurations: η(U ) versus decay rate U for bias sampling of equal-energy spin configurations.Vertical lines locate βs corresponding to particular U s.Note the extreme advantage indicated by the divergence in η(U ) at U = u0 ≈ 1.878 corresponding to β = 0.
Figure9plots η(U ) versus U -the quantum advantage of generating rare configurations with decay rate U .To calculate η(U ) for a given process P, first we determine the process' classical generator M (P) using the method introduced in Ref.[33].Second, for every β ∈ R/{0}, using the map introduced in Sec.VI, we find the new classical generator M (P β ).Third, using the construction introduced in Sec.III, we find QM (P β ).Fourth, using Thm. 1 we find the corresponding U for the chosen β.Using these results gives η(U ) = C µ (β)/C q (β).By varying β in the range R/{0} we cover all the energy density U s. Practically, to calculate η(U ) in Fig. (9), we chose 2000 β ∈ [−10, 7.5].