Optimisation of time-ordered processes in the finite and asymptotic regime

Many problems in quantum information theory can be formulated as optimizations over the sequential outcomes of dynamical systems subject to unpredictable external influences. Such problems include many-body entanglement detection through adaptive measurements, computing the maximum average score of a preparation game over a continuous set of target states and limiting the behavior of a (quantum) finite-state automaton. In this work, we introduce tractable relaxations of this class of optimization problems. To illustrate their performance, we use them to: (a) compute the probability that a finite-state automaton outputs a given sequence of bits; (b) develop a new many-body entanglement detection protocol; (c) let the computer invent an adaptive protocol for magic state detection. As we further show, the maximum score of a sequential problem in the limit of infinitely many time steps is in general incomputable. Nonetheless, we provide general heuristics to bound this quantity and show that they provide useful estimates in relevant scenarios.


I. INTRODUCTION
In quantum physics, we are usually concerned with manipulating quantum systems, whether we interact with them to cause them to reach (or remain in) a specific quantum state or measure them in order to observe or certify some of their properties.Finding new, better ways to perform tasks with time-ordered operations lies at the heart of current research in quantum information science.Quantum computing relies on a sequential application of gates to a quantum system and communication protocols aided by quantum systems depend on performing operations on these systems and exchanging information in a specific order.Sequences of operations are also crucial in less evident situations, e.g., specific sequences of pulses are applied in order to read out a superconducting qubit [1].
Traditionally, the problem of finding new ways (or sequences of operations) to complete a specific task has been dependent on the ingenuity of scientists to come up with new protocols.More recently, this type of problem is also tackled via machine-learning techniques, which require training a neural network to generate the right sequence of operations.However, a general method to optimize over sequential strategies or even a criterion to decide whether one such sequential protocol is optimal for a certain task is lacking.
Part of the difficulty in solving these problems lies in the fact that, when operations are performed sequentially, protocols can be made adaptive, i.e. future operations may depend on the result of previous ones.Adaptiveness can lead to great advantages; e.g., it minimizes statistical errors in entanglement certification [2] and quantum state detection [3].However, it also increases the variety of protocols that must be considered for a specific task, and thus the complexity of optimizing over them.Indeed, while optimization techniques are well established in the context of single-shot or independently repeated procedures (e.g. in applications of non-locality [4,5] or for witnessing entanglement [6]), developing them for sequential protocols remains a challenge.
In this work, we make progress on this problem.Specifically, we introduce a general technique to analyze and optimize a class of time-ordered processes that we call sequential models.These are protocols in N rounds, where in each round an interaction occurs and changes the state of a system of interest possibly depending on unknown or uncontrolled variables.The progress toward a certain goal is quantified by a reward that is generated at each round of the protocol and is the object of our optimization procedure.
Our first result is a technique that allows us to upper bound the maximum total reward that a sequential model can generate in N rounds through a complete hierarchy of increasingly difficult convex-optimization problems.To illustrate the performance of the hierarchy, we apply it to upper bound the type-I error of an adaptive protocol for many-body entanglement detection [2,7].The technique generates seemingly tight bounds for large system sizes (of order N ≈ 100).
The same technique also allows us to deepen the study of temporal correlations generated by finite-state automata [8,9].These models appear in several problems in quantum foundations and quantum information theory such as classical simulations of quantum contextuality [10][11][12][13], quantum simulations of classical stochastic processes [14][15][16], purity certification [17], dimension witnesses [18][19][20], quantum advantages in the design of time-keeping devices [21][22][23][24], and classical simulation of quantum computation [25]; for more details, see a recent review on temporal correlations [26].Specifically, we focus on the problem of optimal clocks, namely, how good a clock can we construct, given access to a classical automaton with d internal states?Using our tools, we solve this problem for low values of d, by providing upper bounds on the maximum probability that the automaton outputs the 'one-tick sequence' 000...01 investigated in [23,24].These bounds match the best known lower bounds up to numerical precision.
In addition to finite sequences of operations, we are also often interested in letting a protocol or process run indefinitely, i.e., we are interested in characterizing its asymptotic behavior as N → ∞.This is, e.g., important for systems that are left to evolve for a long time or in cases in which we aim to probe large systems, where this limit is a good approximation.
To our knowledge, not much is known about timeordered processes in the limit of infinitely many rounds (except for results on asymptotic rates in hypothesis testing and when taking specific limits [27,28]).In this paper we show that there is a fundamental reason for this: there are sequential models for which no algorithm can approximate their asymptotic behavior.This follows from undecidability results related to finite-state automata; specifically, a construction from Refs.[29,30].Nevertheless, we develop a heuristic method for computing rigorous bounds on the asymptotic behavior.We further find that in the applications we consider, these bounds are close to the expected asymptotic behavior and to the lower bounds we can compute, thus substantiating the usefulness of our method.More specifically, we use the asymptotic method to bound the type-I error of the many-body entanglement-detection protocol mentioned above in the limit of infinitely many particles.The result is a bound that is, at most, at a distance of 4 × 10 −4 from the actual figure.Our asymptotic method also bounds the probability that a two-state automaton outputs the one-tick sequence in the limit of many time steps to O(10 −2 ).
Finally, in some circumstances it becomes necessary not to analyze but to find sequential models with a good performance.That is, given a number of parameters that we can control -the policy -which might affect the evolution of the system as well as the reward, we pose the problem of deciding which policy maximizes the total reward after N rounds.We propose to tackle this problem via projected-gradient methods, and show how to cast the computation of the gradient of our upper bounds as a tractable convex-optimization problem.Using this approach, we let the computer discover a two-state, sixround preparation game to detect one-qubit magic states.
Our paper is structured as follows.In Sec.II, we present three problems in quantum information theory that we will use to illustrate our methods throughout.In Sec.III, we introduce the abstract notion of sequential models and show how to optimize them, before applying the optimization method in Sec.IV.In Sec.V, we prove that the asymptotic behavior of sequential models cannot be approximated in general.We also introduce a heuristic to tackle asymptotic problems, which we apply in Sec.VI.In section VII, we pose the problem of policy optimization and show how to compute the gradient of the upper bounds derived in section III, over the parameters of the policy.We use gradient methods to solve a policy optimization problem in section VIII.Finally, we present our conclusions in Sec.IX.

II. PROBLEMS WITH TIME-ORDERED OPERATIONS
A variety of different problems involving quantum systems are of a sequential nature.In the following we introduce three settings that are made up of time-ordered operations and that all pose challenging open problems to the quantum information community.

A. Temporal correlations
A finite-state automaton (FSA) is a mathematical structure that models an autonomous computational device with bounded memory.More formally [8,9], a dstate automaton is a device described, at every time, by an internal state σ ∈ Σ, with |Σ| = d.At regular intervals, rounds or time steps, the automaton updates its internal state and generates an outcome b ∈ B, depending on both its prior internal state σ and the state y ∈ Y of its input port.Such a double state-and-output transition is governed by the transition matrix of the automaton P (σ ′ , b|σ, y), which indicates the probability that the automaton transitions to the internal state σ ′ and outputs b, given that its input and prior states were, respectively, y and σ.Overall, the probability of outputs b = (b 1 , . . ., b n ) given inputs y = (y 1 , . . ., y n ) can be computed as p(b|y) = σ0,σ1,...,σn p(σ 0 )P (σ 1 , b 1 |σ 0 , y 1 ) • P (σ 2 , b 2 |σ 1 , y 2 ) . . .P (σ n , b n |σ n−1 , y n ) (1) Expressions such as Eq.(1), i.e., generated by a classical FSA, appear in several problems related to classical simulations of temporal quantum correlations, such as, e.g., classical simulations of quantum contextuality [10][11][12], quantum simulations of classical stochastic processes [14][15][16], classical simulation of quantum computation [25], and quantum advantages in the design of time-keeping devices [21][22][23][24].We are interested in scenarios in which the performance of a d-state automaton after n time steps is evaluated by yet another automaton, or, more generally, by a time-dependent process with bounded memory.We wish to bound said performance over all automata with d states.
This class of problems includes, e.g., computing the maximum probability that a d-state automaton generates the so-called 'one-tick sequence' 00 . . .01 (the name comes from its interpretation as a clock signal "1" for a tick of the clock).This optimization problem is central in the discussion of the optimal classical and quantum models able to generate a time signal, with the consequent quantum advantages in the design of physical clocks.Unfortunately, there are no general optimization methods to upper bound the performance of a classical model and FIG. 1.An Automaton (blue) coupled to a process Q (green).The process may in general differ from round to round.
Consider thus an FSA A, with states denoted by σ ∈ Σ, with |Σ| = d, inputs y ∈ Y and outputs b ∈ B. We assume that the automaton is described by the (unknown) transition matrix P (σ k+1 , b k |σ k , y k ).Let Q be a known time-dependent process, with an internal state denoted by t, and inputs (outputs) denoted by z (c).The (known) transition matrix of this process at time steps k . Each state t k of the process is associated with a reward α k (t k ).At time step 1, the state and output t 1 , c 1 are generated by the distribution Q 1 (t 1 , c 1 ).Now, let us couple Q with A, in the following way (see also Fig. 1 for an illustration): after Q generates t 1 , c 1 , we input c 1 in the automaton A, i.e., we let y 1 = c 1 .The automaton then transitions through P to a state σ 1 , outputting b 1 .This output is further input into the process Q (by letting z 1 = b 1 ), which in turn generates t 2 , c 2 through Q 2 , and so on.At time step N we consider the sum of all rewards, Our goal is to maximize the expectation value of the total reward over all d-state automata A. Note that by convexity, the best strategy is for the automaton to be initialized in a specific state σ 0 = σ0 rather than a distribution thereof [38].Hence, fixing the initial state, the optimal automaton is fully characterized in terms of the transition matrix P that maximizes the expected reward.This problem can be mathematically rephrased in a way that will prove useful later.Namely, the coupled systems A and Q can be regarded as a single dynamical system the internal state s k of which at time step k corresponds to the probability distribution of the triple (t k , c k , σ k ), i.e., s k := p k (t k , c k , σ k ).The evolution of this system from one time step to the next follows the equation of motion Note that the evolution is given here by a polynomial f k of the state s k and an unknown evolution parameter λ := (P (σ ′ , c|σ, z) : z ∈ Z, c ∈ C, σ ∈ Σ).In this formulation of the problem, the reward r k at time step k is given by Note that the reward does not depend on λ here (except via the p k ): such a dependency could, however, be introduced by using different reward functions r k (s k , λ).
In this picture, our task is to maximize the total reward over all possible evolution parameters λ.

B. Entanglement certification of many-body systems
Given an N -partite quantum system, we are often faced with the problem of determining whether its state is entangled.In principle, this problem can be solved by finding an appropriate entanglement witness and estimating its value on the state at hand.However, in addition to the problem that finding such a witness theoretically is NP-hard [39], its estimation may involve conducting joint measurements on many subsystems of the N -partite state, a feat that may not be experimentally possible.Replacing such joint measurements by estimates based on local measurement statistics requires, in general, Exp[N ] state preparations, thus rendering the corresponding protocols infeasible for moderately sized N (as is the case when the protocol demands full quantum state tomography).
One way to avoid these problems, i.e. perform singlesystem measurements and reduce the number of state preparations, is to carry out an adaptive protocol [2].Such a scheme proceeds as follows.We sequentially measure the particles that constitute the N -particle ensemble, making future measurements depend on previous ones as well as on previous outcomes.Once we conduct the last measurement, we make a guess on whether the underlying state was entangled or not.
To choose how to measure each system and to make our final guess, we use a dynamical system Q.This process has internal states t k ∈ T , and its transition matrix at time k = 1, ..., N − 1 is of the form Q k (t k+1 , y k+1 |t k , b k ), where b k ∈ B denotes the measurement outcome at time k and y k+1 labels the measurements to be conducted on the (k + 1)th particle.For each y ∈ Y k , there exists a positive operator-valued measure (POVM) {M k b|y } b .The final guess is generated through the function Q N (t N +1 |t N , b N ), with t N +1 ∈ {'entangled', 'separable'}.The initial state of the process and the measurement setting y 1 for the first particle are generated by the distribution Q 0 (t 1 , y 1 ).This is illustrated in Fig. 2.
If the target state ρ the entanglement of which we wish to detect experimentally admits an efficient matrix product operator (MPO) decomposition [40], then one can efficiently compute the probability that the process outputs the result 'separable', see Fig. 3.This probability is usually called a type-II error, or a false negative.
It remains to compute the type-I error, or false positive of the entanglement-detection protocol encoded in Q: that would correspond to the maximum probability that the process outputs 'entangled' when the input is a fully separable quantum state.In computing this maximum, we can assume, by convexity, that the player has prepared a pure product quantum state.That is, at each time step k, we measure some pure quantum state ρ k .
Despite this simplification, computing the type-I error of the scheme requires maximizing a multilinear function with a size-O(N ) input.As N grows, this becomes a more difficult problem.
Like the problem of computing the maximum performance over all d-state automata, maximizing the type-I error over the set of separable states can be phrased as an optimization over the sequential outputs of a deterministic dynamical system.In this case, the internal state s k of the dynamical system at time step k would correspond to the distribution P k of t k , y k .The equation of motion of the system is where the quantum state ρ k can be interpreted as an uncontrolled external variable h influencing the evolution of the system.At time step N , the system outputs the deterministic outcome C. Independent identically distributed (IID) preparation strategies in quantum preparation games The notion of preparation games aims to capture the structure of general (possibly adaptive) protocols, where quantum systems are probed sequentially [2].More precisely, a quantum preparation game is an N -round task involving a player and a referee.In each round, the player prepares a quantum state, which is then probed by the referee.In round k, before carrying out his measurement, the referee's current knowledge of the source used by the player is encoded in a variable t k ∈ T k , with |T k | < ∞, called the game configuration.This variable depends nontrivially on the past history of measurements and measurement results: in each round, it guides the referee in deciding which measurement to perform next and changes depending on its outcome.This double role of the game configuration can be encoded in the POVM used by the referee for that round.That is, assuming that the game configuration before the measurement is t ∈ T k , the probability that the new game configuration is t ′ ∈ T k+1 is given by tr(ρ k M k t ′ |t ), where ρ k is the player's kth state preparation; and {M k t ′ |t : t ′ ∈ T k+1 }, the POVM implemented by the referee when the game configuration is t.At the end of the preparation game, a score g(t N +1 ), where t N +1 denotes the final game configuration, is generated by the referee according to some scoring system g(t) for t ∈ T N +1 .
In some circumstances -e.g., in an entanglement detection protocol where the player tries to (honestly) prepare a specific entangled state -it makes sense to consider a player that follows an independent identically distributed (IID) strategy, meaning that the player produces the same state ρ in each round.
Consider thus the problem of computing the maximum game score, for a fixed scoring system, achievable with preparation strategies of the form ρ ⊗N , with ρ ∈ C, where C is some set of states.In Ref. [2], it has been shown how to compute the maximum score of a preparation game under IID strategies when the set C of feasible preparations has finite cardinality.In this work, we consider the case in which C is continuous.
This problem can be modeled through a dynamical system internal state s k of which at time k corresponds to P k (t k ), the distribution of game configurations at the beginning of round k.The equation of motion of the system is where the unknown evolution parameters λ := ρ ∈ C correspond to the player's preparation.The dynamical system outputs a reward at time N , namely Our goal is thus to maximize the reward of the system over all possible values of the evolution parameter λ, i.e., over all ρ ∈ C.
As an aside we note that, in some circumstances, it becomes necessary to compute the maximum average game score achievable by an adversarial player, who has access to the current game configuration and can thus adjust their preparation in each round accordingly.Curiously enough, this problem is easier than optimizing over IID strategies, and, in fact, a general solution is presented in Ref. [2].Notwithstanding, we show below that the computation of the maximum game score of a preparation game under adaptive strategies can also be regarded as a particular class of the sequential problems considered in this paper.

III. SEQUENTIAL MODELS AND THEIR OPTIMIZATION
The three problems described above are -even though from a physical point of view rather different -structurally very similar.Indeed, they are all examples of the type of problem that we introduce more abstractly in the following.Consider a scenario in which the state of a dynamical system is fully described at time step k by a variable s k ∈ S k .Between time steps k and k + 1, the system transitions from s k to another state s k+1 depending on s k , a number of uncontrolled variables h k ∈ H k FIG. 4. A pictorial representation of a sequential model.Note that the equation of motion itself may capture several interactions or be composed of several steps (e.g. the measurement of a quantum system followed by the update of a finite-state automaton).and a number of unknown evolution parameters λ ∈ Λ.The equation of motion that describes this transition is where S k , H k ∀k and Λ are sets of parameter values.
Our goal is to optimize over this type of system, according to some figure of merit that is given by the problem at hand.We model this by assuming that, in each time step, the system emits a reward (or penalty) r k (s k , h k , λ).Note that this reward may be the zero function for certain k, so that this includes problems where only the final outcome matters as a special case.We shall refer to such a system from now on as a sequential model (see Fig. 4).
Our goal is to estimate the maximum total reward ν ⋆ over all possible values of λ, h 1 , ..., h N .That is, we wish to solve the problem We generally assume that the initial state s 1 is known; otherwise, we can absorb it into the definition of λ in the sense that we add one time step that generates the initial state.In the absence of λ, the above is mathematically equivalent to a deterministic Markov decision problem, if we regard h as an action [41].

A. Reformulation of the optimization over sequential models
To compute the optimal value ν ⋆ , we consider an approach based on dynamical programming methods [41].
It consists in defining value functions V k (s k , λ), which represent the maximum reward achievable between time steps k and N over all possible values of h k , ..., h N , starting from the state s k , and under the assumption that the evolution parameters take the value λ.Approximating these functions with polynomials and invoking known characterizations of positive polynomials on a semialgebraic set, we manage to derive monotone, converging sequences of tractable upper bounds on ν ⋆ .We remark that similar ideas have been explored in the context of continuous-time optimal-control problems, which can be formulated as an infinite-dimensional linear program, for which semidefinite-program (SDP) relaxations based on polynomial optimization techniques exist; see Ref. [42] and references therein.
In terms of the reward functions, value functions are given by and where we recover ν ⋆ in the final step as The problem of finding ν ⋆ can thus be reduced to min Indeed, on one hand, any feasible point of the problem given in Eq. ( 13) provides an upper bound on ν ⋆ .On the other hand, the V 1 , ..., V N , ν ⋆ , as defined by Eqs. ( 11), (12), are also a feasible point of (13).Hence the solution of the problem given in Eq. ( 13) is ν ⋆ .
Unfortunately, unless their domain is finite, there is no general method for optimization over arbitrary functions.What is feasible is to solve problem given in Eq. ( 13) under the assumption that {V k (s k , λ)} k belong to a class of functions F described by finitely many parameters.Since, in general, the optimal functions V k (s k , λ) might not belong to F, the result ν of such an optimization is an upper bound on the actual solution ν ⋆ of the problem.The class F of polynomial functions is very handy, as it is closed under composition and dense in the set of continuous functions with respect to the uniform norm.This is the class we are working on in this paper.
For the rest of the paper, we thus assume that, for all k ∈ {1, 2, . . ., N }, f k (s k , h k , λ), r k (s k , h k , λ) are polynomials on s k , h k , λ and that the sets S, H k and Λ are bounded sets in some metric space that can be described by a finite number of polynomial inequalities.Note that, under these assumptions, the value functions are continuous on s k and λ.
Call, then, ν n ≥ ν ⋆ the result of Eq. ( 13) under the constraint that {V k (s k , λ)} k are polynomials of a given degree n.Since any continuous function defined on a bounded set can be arbitrarily well approximated by polynomials, it follows that lim n→∞ ν n = ν ⋆ .
There is the added difficulty that solving (13) entails enforcing positivity constraints of the form where p(x) is a polynomial on the vector variable x ∈ R m and X is a bounded region of R m defined by a finite number of polynomial inequalities (i.e., a basic semialgebraic set).Fortunately, there exist several complete (infinite) hierarchies of sufficient criteria to prove the positivity of a polynomial on a semialgebraic set [43][44][45][46].In algebraic geometry, any such hierarchy of criteria is called a positivstellensatz.
Two prominent positivstellensätze are Schmüdgen's [45] and Putinar's [46].Both hierarchies have the peculiarity that, for a fixed index k, the set of all polynomials satisfying the kth criterion is representable through a SDP.The application of Schmüdgen's and Putinar's positivstellensätze for polynomial optimization thus leads to two complete hierarchies of SDP relaxations.The SDP relaxation based on Putinar's positivstellensatz is known as the Lasserre-Parrilo hierarchy [47,48].The reader can find a description of both hierarchies in App. A. While the hierarchy based on Schmüdgen's positivstellensatz is more computationally demanding than the Lasserre-Parrilo hierarchy, it converges faster.In our numerical calculations below, we use a hybrid of the two.
The use of the k th (sufficient) positivity test of a given positivstellensatz while optimizing over polynomial value functions of degree n results in an upper bound ν n k on ν n , with lim k→∞ ν n k = ν n .For our purposes, the take-home message is that ν n k ≥ ν ⋆ , for all k, n and lim n→∞ lim k→∞ ν n k = ν ⋆ .We finish this section by noting that there is nothing fundamental about the use of polynomials to approximate value functions.In fact, there exist other complete families of functions that one might utilize to solve the problem given in Eq. (13).A promising such class is the set of signomials, or linear combinations of exponentials, for which there also exists a number of positivsellensätze in the literature [49,50].In this case, the corresponding sets of positive polynomials are not SDP representable but are nonetheless tractable convex sets.As such, they are suitable for optimization.

B. Quantum preparation games as sequential models
Sequential models capture the structure of various types of time-ordered processes, including the three introduced in Sec.II.Here, we argue that sequential models are actually rather general: optimizations over adversarial strategies in quantum preparation games can be modeled as sequential problems, too.In fact, their resolution by means of value functions allows one to rederive previous results on the topic [2].
Let S be a set of quantum states and consider a preparation game where the player is allowed to prepare, at each round k, any state ρ(t k ) ∈ S depending on the current game configuration t k .This preparation game can be interpreted as sequential model with Λ = ∅, We consider the problem of maximizing the average game score (or reward) where g is the score function of the game [2].
In principle, we could formulate this problem as in Eq. ( 13).Note that V N +1 = r N +1 is linear in p N +1 .This motivates us to consider an ansatz of linear value functions V k (p k ) for this problem.That is, we aim to solve the problem given in Eq. ( 13) under the assumption that for some vector (µ k (t) : Setting p k (t) = δ t,j , for j ∈ T k , we arrive at the recursion relations: which allow us to compute V 1 (t 0 ).This quantity is, in principle, an upper bound on the actual solution of the problem in Eq. ( 13), because we have enforced the extra constraint that the value functions are linear.The upper bound is, however, tight, as can be seen by computing the average score of the preparation game under the preparation strategy given by the maximizers ρk (t) of the problem in Eq. ( 19).In fact, all the above is but a complicated way to arrive at the recursion relations provided in Ref. [2] to compute the maximum score of a preparation game.

IV. APPLICATION TO OPTIMIZING TIME-ORDERED PROCESSES
All the sequential problems considered in this work could, in principle, be solved (or at least approximated) through the straightforward application of the Lasserre-Parrilo hierarchy [47,48].However, the degree of the polynomials in such problems grows linearly with the number of steps N , thus making the corresponding SDP intractable for more than a few iterations.Our reformulation of the sequential optimization problem as in (13), on the other hand, allows us to circumvent this issue and optimize sequential models for much higher values of N .

A. Optimizations over finite-state automata: Probability bounds
In this section, we investigate the maximum probability with which a classical FSA can generate a given output sequence.This problem has been extensively investigated for input-output sequences [12], as well as sequences with only inputs [23,24].In particular, we consider the so-called one-tick sequence, 00 . . .01, introduced in Ref. [23] in connection with the problem of optimal clocks.For a sequence of total length L, we use the compact notation 0 L−1 1.Here, we use the general techniques developed in Sec.III to tackle this problem.Note that this is an instance of the type of problem introduced in Sec.II A.
Consider a d-state automaton without inputs and with binary outputs, i.e., Y = ∅, B = {0, 1}.We wish to compute the maximum probability that the automaton outputs the one-tick sequence 0 L−1 1, denoted P d max (L).For this purpose, let us construct a simple process Q that, coupled to the automaton, produces an expected reward that equals P (0 L−1 1).Q has two internal states, i.e., t k ∈ {0, 1} and no outputs (C = ∅).Its initial state is t 1 = 1, and its transition matrices are as follows: For time steps k = 1, ..., L − 1, the process Q therefore keeps being in state 1 as long as the automaton outputs 0's; if the automaton outputs any 1, the state of the process changes to 0 and stays there until the end of the game.At time step L, process Q receives the output b L .If b L = 0 or t L = 0, then t L+1 = 0; otherwise, t L+1 = 1.The probability that the automaton produces the sequence 0 L−1 1 thus corresponds to the probability that t L+1 = 1.This corresponds to a reward function Using the techniques developed in Sec.III, we map this problem to a polynomial optimization problem.Due to the choice of reward (together with the process Q), we see that the only terms from the equation of motion given in Eq. ( 3) that contribute to the final reward are the ones with the correct b k .This allows a further simplification of the model.
First, let us redefine the internal states of the sequential model to be the vector described by d parameters s σ k , with constraints s σ k ≥ 0 for σ = 0, . . ., d − 1, and 1 − σ s σ k ≥ 0. Next, we redefine the unknown evolution parameter λ to have dimension d 2 , through: λ := (P (σ ′ , b = 0|σ) : σ, σ ′ = 0, 1), with constraints (24) while the reward function reads Now, we apply an SDP relaxation à la Lasserre [47] to obtain an upper bound on the solution of the sequential problem for the cases of d = 2 and d = 3.In FIG. 5. Optimization over two-state automata.The upper bounds (red) follow a pattern, where from L = 5 on pairs of consecutive values coincide.Our bounds furthermore match the explicit automata found in Ref. [23] up to numerical precision, thus providing a tight bound and proving the optimality of the models found in Ref. [23,24].The optimization has been performed with MOSEK [51] and CVXPY [52].Each solution provided by the solver has been tested to verify that linear and positivity constraints are satisfied up to numerical precision.the latter case, no nontrivial analytical upper bound has previously been known [23,24].For details regarding the exact implementation we refer to App.B 1. Figures 5 and 6 display the upper bounds computed with this method.We find that, in every case, the obtained upper bound matches the output probability of the automaton proposed in Refs.[23,24], up to numerical precision.This implies that the upper bounds computed are all tight.The optimal models for these sequences, except in the case L = d + 1, correspond to the so-called cyclic model [23].The model consists of deterministic transitions from one state to the other, and the probability of emitting 1 is nonzero only in the last state.As a consequence, the bound depends only on ⌊L/d⌋, corresponding to the 2-and 3-periodic plateaus observed in Figs. 5 and  6.
In order to compare the performance of our algorithm with those of the previously known method, namely, the Lasserre-Parrilo hierarchy [47,48], we also perform a numerical optimization of our problem with the latter method.While our method has been able to reach L = 50 for d = 2 and L = 10 for d = 3, the Lasserre-Parrilo hierarchy could not go beyond the case L = 7 for d = 2.This boost in performance was to be expected as the degree of the polynomials in the Lasserre-Parrilo hierarchy grows linearly with the number of steps L, whereas in our method it remains constant.For more details, see Apps.A and B.
In addition to the numerical values for P d max , we can also extract the value functions from our optimizations FIG. 6. Optimization over three-state automata.Our method allows us also to optimize over three-state automata.Again, we recover up to numerical precision the bound computed with the explicit model found in Refs.[23,24], proving that our bound is tight and that the model previously found was optimal.The optimization has been performed with SCS [53] and CVXPY [52].Each solution provided by the solver has been tested to verify that linear and positivity constraints are satisfied up to numerical precision.and use those to prove our bounds analytically.After deriving those value functions, we can further use them to find optimal automata for the problem at hand.For extracting value functions, let us note first that from the numerical solutions the coefficients of the polynomials V k can be directly extracted, as these are optimization variables of the problem.However, the value functions may not be unique and extracting polynomials with suitable (rational or integer) coefficients from the numerics is more challenging.Ways to further simplify these value functions are elaborated in App.B 2 The method for optimizing sequential models employed here allows us to derive upper bounds in terms of value functions, without treating the actual variables of the model -in this case the {p k } k and P -explicitly [54].A priori, the solution hence does not tell us what the values of the optimal dynamical systems achieving the bounds may be, even if the bounds are tight.However, this can be remedied by first extracting the value functions from the optimization and then running a second optimization, this time fixing the value functions and optimizing for the variables of the sequential model, here {p k } k and P .More precisely, if we found an optimal upper bound ν ⋆ , then all value function inequalities have to be tight and there is an optimal dynamical system that achieves the bound.Now if we can find a system that satisfies all inequalities with equality, then this will automatically be optimal.To find such a model, we can thus consider the optimization problem max This is again a polynomial optimization problem that can be solved with an SDP relaxation.In the following examples we use the SOLVEMOMENT functionality of YALMIP [55] at level 4 and the solver MOSEK [51] to solve this.[56] We find that in these cases, the problem is feasible and we extract optimal automata.
Note that there are usually many different value functions that could solve the problem.Extracting value functions with few terms and integer or fractional coefficients from the optimization problem, however, requires some tuning (see App. B 2).The process becomes more cumbersome as we increase the number of variables.We thus do not provide value functions for larger L here.We remark that extracting valid value functions from the optimization is, nevertheless, always possible.

B. Entanglement detection in many-body systems: the automaton-guided GHZ game
To illustrate the performance of our SDP relaxations to solve entanglement certification problems in manybody systems, we focus our attention on the N -qubit Greenberger-Horne-Zeilinger (GHZ) state of size up to N = 35.This is a timely application, as GHZ states of sizes up to 20 qubits can already be prepared [61][62][63].The main goal in devising such protocols is to obtain a procedure that requires as few state preparations as possible, for reliably certifying entanglement.
In this section we propose such a one-shot protocol and demonstrate that following this procedure indeed needs fewer repetitions n than a family of protocols based on shadow tomography [64].
The main feature of our method is that we work in a hypothesis-testing setting.This means that we are also able to prove bounds on the performance of the protocol for the worst-case separable state (corresponding to the type-I error), which is difficult to compute and generally not provided when estimating fidelities.Our method has the additional advantage of relying only on single-qubit quantum operations (as compared e.g. to the 18-qubit example from Ref. [61] that uses two-qubit gates not just in the preparation but also in the disentangling phase) and it can also deal with protocols that are adaptive (see Sec. IV C), (as compared to determining the expectation of an operator, e.g. the parity [62,63], by fixed measurements, or compared to evaluating Bell inequalities [65]).This latter aspect is what generally allows us to reduce the number of repetitions of our protocols.
It is worth remarking that, in any one-shot entanglement-detection protocol that aims to detect the pure GHZ state, the type-I and type-II errors e I , e II will satisfy the relation To see why, call M the N -qubit effective POVM element (depending on the protocol, the corresponding POVM will be global, LOCC or one-way LOCC) associated with declaring the system 'entangled' and let σ be the separable state Then we have that where e I (σ) is the probability of declaring σ "entangled".Since e I ≥ e I (σ), we arrive at Eq. ( 36).
The protocol that we propose in the following saturates the relation in Eq. ( 36) in the limit N → ∞.In this sense, despite solely relying on one-qubit measurements, our protocol is asymptotically optimal.
Let us, then, describe our entanglement-detection protocol.The protocol is inspired by the GHZ game, previously considered in Refs.[66,67], where here the role of the memory is taken by a classical FSA.We thus call this protocol the automaton-guided GHZ game.Specifically, picture a protocol for many-body entanglement verification where, for each of the first N − 1 particles, one of the following two measurements is performed with probability 1  2 : where ).This process is used to update the state of a four-state automaton that is initialized in state t 1 = 1 as displayed in Fig. 7.For the final particle, we choose measurement 1 for states t N = ±1 of the automaton and measurement 2 for t N = ±i.The verification of an entangled state is considered successful if the outcome, b N , of this additional measurement is b N = 1 for t N ∈ {1, i} or b N = 2 for t N ∈ {−1, −i}, in which case we update t N +1 to "entangled"; otherwise "separable".
This protocol succeeds with probability 1 for a GHZ state of any dimension.Indeed, think of an k-partite FIG. 7. The memory update of the four-state automaton in the GHZ game.The transitions between states depend deterministically on b and y, which are used to label the edges as b|y.In each round of the associated sequential model, this updates the automaton state t ∈ {1, −1, i, −i}.
GHZ-like state of the form where α ∈ {1, −1, i, −i}.Suppose that we measure one of the qubit states in the basis {|+⟩, |−⟩} ({|i⟩, | − i⟩}), and obtain the result |±⟩ (| ± i⟩).Then, the final postselected state is an (k − 1)-partite state of the form given in Eq. ( 39), but this time with phase ±α (∓iα).Due to its transition matrix, the internal state of the automaton in Fig. 7 keeps track of the phase of the GHZlike state, as its qubits get sequentially measured, i.e., t = α.The reward function simply verifies that the state of the last qubit corresponds to 1 √ 2 (|0⟩ + α|1⟩).The procedure introduced in Sec.III allows us to derive upper bounds on the maximum probability of success that is achievable with separable states, i.e. the worstcase type-I error e I .We display our bounds for different system sizes n in Fig. 8.Note that the size of the optimization problem allows us to reach round numbers as high as N = 35.As the reader can appreciate, e I seems to converge to 1  2 in the limit N → ∞, thus saturating the inequality in Eq. (36).Details on the implementation (including precision issues and how to resolve them in this case) are presented in App.B 3.
We further compare the detection performance of the automaton-guided GHZ game with a family of entanglement-detection protocols based on shadow tomography [64], in the following called shadow protocols.
There exist different protocols for shadow tomography, depending on the type of measurements conducted on the qubits.Since for the automaton-guided GHZ game we only allow the measurement of each qubit individually, we opt for shadow protocols involving random measurements in the three Pauli bases.Our family of shadow protocols for entanglement detection thus works as follows: 1.A random measurement in one of the Pauli bases is carried out on each qubit and both the result and the basis are recorded.This procedure is repeated for n experimental rounds (or state preparations).
2. The resulting classical shadow is then used to produce an estimator f for the underlying fidelity of the quantum state with respect to the GHZ state.
There is thus one shadow protocol for each value of θ.
The computation of the type-II error e II (θ) for this family of protocols is straightforward-not so the computation of e I (θ), due to the need to optimize the probability that the protocol outputs "entangled" over the set of separable input states.Because of this, we replace the maximization over all separable states by a maximization over the three separable states |0⟩ ⊗N , |+⟩ ⊗N and |+i⟩ ⊗N .The resulting quantity ẽI (θ) is therefore a lower bound on the actual value of e I (θ), i.e., e I (θ) ≥ ẽI (θ).Since our aim is to show that the automaton-guided GHZ game outperforms the shadow protocols, underestimating the error in the latter is sufficient, as it provides us with a lower bound on the advantages that we observe for the automaton-guided GHZ game.
For different values of θ, we have estimated the type-I and type-II errors ẽI (θ), e II (θ) of the shadow protocols for the 5-partite GHZ state in n = 50 experimental rounds [68].To compare these numbers against the performance of the one-round automaton-guided GHZ game, we extended the latter to n experimental repetitions in the following way: 1. Run the automaton-guided GHZ game for n experimental rounds, noting down the number of times n e that the protocol outputs "entangled".
Pairs of values (e I , e II ) for both families of protocols are plotted in Fig. 9.Note that the curve for the automaton-guided GHZ games lies completely below that of the shadow protocols.In fact, for some values of the threshold ϕ the automaton-guided GHZ game exhibits values with e I ≈ e II = 0.This was to be expected.Indeed, let ēI ≈ 0.53 be the type-I error of the one-shot GHZ game.Then, for ϕ = 1, the entanglement of the GHZ state is still detected with probability 1 (and so e II = 0).Furthermore, a separable state can only be mislabeled as entangled with probability upper bounded by e I = ē50 I ≈ 0. One might have expected that shadow-based protocols have the upper hand for large values of the system size N .Nonetheless, from Fig. 8, it is clear that the (e I (ϕ), e II (ϕ)) curve of the automaton-guided protocol will improve with increasing N .This is not the case for shadow-based GHZ fidelity estimation, (see Ref. [64]).Thus, for larger N , the advantage of our automatonguided GHZ game will be even larger.Finally, we remark that we have compared our automaton-guided GHZ game with the shadow-based GHZ fidelity estimation of [64].However, a better estimate on the GHZ fidelity can be obtained by selecting a different sampling rule, e.g., Paulis from the stabilizer group, following the idea of Ref. [69].

C. Entanglement detection in many-body systems:
The bit-sequence protocol To illustrate how our method allows us to optimize protocols that are genuinely adaptive, we consider here the following family of N -round games that we call bitsequence protocols.
In the first round, we perform, with equal probability, each of the following measurements and record the outcome: In each of the N − 1 subsequent rounds, we then use the following procedure.In round k, we take the measurement outcome (1 or 2) of round k − 1 as the new FIG.9. Type-I errors versus type-II errors.The blue squares denote the automaton method; the red circles, the shadow protocols.For some values of the threshold ϕ, the automaton method is essentially perfect.measurement choice.We perform the respective measurement and record the new outcome.After the last measurement in round N , we check whether the final outcome is 2, in which case we assign a reward of 1; otherwise the reward is 0.
For each N , the bit-sequence protocol defines an Nqubit operator that is applied to the N -qubit state of which we aim to certify its entanglement.In the following we will consider an N -qubit state, ρ N , that is an eigenstate to the maximal eigenvalue of this operator defined by the N -round protocol.
In Fig. 10, we compare the score, denoted S max , that we obtain with ρ N with the maximum score that can be obtained with a separable state.The score for separable states is computed using the procedure from Sec. III, see App B 4.
In this protocol, the maximal score obtained with separable states corresponds -as in the protocol of Sec.IV B -to the maximal type-I error e I , while the score for an entangled state ρ N corresponds to 1 − e II (ρ N ).
We observe a strict separation between the scores for ρ N and the upper bounds we obtain for separable states.This demonstrates that the bit-sequence protocols can certify the entanglement of this family of states.Our single-shot protocols can furthermore be turned into nround protocols using, e.g., the ideas of meta-games from Ref. [2], in order to amplify this separation.

V. ASYMPTOTIC BEHAVIOR OF SEQUENTIAL MODELS
In some scenarios, the considered sequential problem admits a natural generalization to arbitrarily many rounds or time steps N .In that case, we may also be interested in the asymptotic behavior of the total reward as N → ∞.However, as we shall see in the following (Sec.V A), there are sequential models for which approx- imately computing the asymptotic maximum reward (or penalty) is an undecidable problem.Thus the best we can hope for are heuristics that work well in practice.
In Sec.V B we introduce such heuristic methods and we demonstrate in Sec.VI that these methods lead to useful bounds for the problems that we have considered in the finite regime in Secs.IV A and IV B.

A. Sequential models with uncomputable asymptotics
To see that there are sequential models for which the asymptotic maximum reward cannot be well approximated, we rely on an intermediate result presented in [30], where specific probabilistic finite-state automata for which their asymptotic behavior cannot be well approximated were constructed.
Consider a finite-state automaton A with a (finite) state and input sets Σ and X, initial state σ 0 ∈ Σ and no outputs.The automaton is said to be freezable if there exists an input x 0 ∈ X that keeps the automaton in the same internal state.Call some subset A ⊂ Σ the set of accepted states.Then we can consider how likely it is to bring the state of the automaton from σ 0 to some accepted state in n time steps by suitably choosing the input word (x 1 , ..., x n ) ∈ X ×n .Define The value of the automaton val(A) is defined as sup n val n (A).Note that, if A is freezable, then the function val n (A) is nondecreasing in n.In that case, Given a freezable outputless automaton A, consider the sequential model with S k being the set of probability distributions over Σ, H k = X, Λ = ∅ and s 1 = σ 0 .The equation of motion is where P k+1 ∈ S k+1 , P k ∈ S k , and P A denotes the transition matrix of the automaton.
In a sequential problem with n rounds, we define the reward of the system as r k = 0 ∀ k < n, r n (s n ) = σ∈A P n (σ).From all the above, it is clear that the maximum total reward of the system corresponds to ν ⋆ n := val n (A).Hence, lim n→∞ ν ⋆ n = val(A).How difficult is to compute the asymptotic total reward for this kind of sequential models?The next result provides a very pessimistic answer.[70] Lemma 1 (Lemma 1 from [30]).For any rational λ ∈ (0, 1], there exists a family T λ of freezable probabilistic finite-state automata, with alphabet size |X| = 5 and with |Σ| = 62 states, such that, for all A ∈ T λ , (a) val(A) ≥ λ or val(A) ≤ λ 2 .(b) it is undecidable which is the case.Now, consider the class C of sequential problems for which the asymptotic value exists and is contained in [0, 1].This class contains the sequential problem of computing the value of any automaton in T 1 .It follows that there is no general algorithm to compute the asymptotic value of any sequential problem in C up to an error strictly smaller than 1/4.Otherwise, one could use it to discriminate between the automata in T 1 with value 1/2 and those with value 1, in contradiction with the above lemma.

B. Heuristic methods for computing upper bounds on the asymptotics of sequential models
Upper bounds on the asymptotic value of a sequential model as N → ∞ can be achieved by deriving a function ν(N ) that upper bounds the solution of the corresponding sequential problem of size N for any N and taking the limit.A way to achieve this is to encode, via functional constraints, recursion relations that define value functions for arbitrarily high N .For simplicity of presentation, let us consider a sequential model where r k = 0 for k ̸ = N and f k , S k , and H k are independent of k. [71] In the following, we show how to derive asymptotic bounds in case of convergence that is polynomial in 1  N as well as for exponential decay with N .Similar methods can also be used to bound the solution of the sequential problem even when the latter diverges, e.g., they can be used to prove that ν(N ) ≤ 4N 3 (although, in this case, reducing the corresponding constraints to optimizations over functions of bounded variables might require some regularization trick (e.g.: 1. Sequential models with polynomial speed of convergence in 1

N
Let us assume that, for high N , ν(N ) ≈ ν + A N , and we wish to compute ν.This problem might arise, e.g., when we wish to study the asymptotic type-I error of a quantum preparation game [2].To simplify the notation, we use negative indices −j to denote N − j.Following the formulation in Eq. ( 13), in order to obtain an upper bound on the solution of the sequential problem for size N , we need to find ν, {V −j (s, λ)} N −1 j=0 such that The value functions V −j obtained according to Eq. ( 13) are generally different for each problem size N , so they do not help us solve the asymptotic problem.
Introducing an extra parameter into the value functions, in order to account for the increase in j, however, allows us to use the same value function from some k on.To see this, suppose that we manage to show that there exist functions {W −j (s, λ)} k j=0 , W (α, s, λ) such that Then, the functions satisfy the constraints in Eqs. ( 43) and (44).Intuitively, in Eqs. ( 48) and ( 49), the variable α must be understood to represent 1 j , in which case α α+1 corresponds to 1 j+1 .The constraint in Eq. ( 49) hence implies the recursion relation in Eq. ( 44) for all j ≥ k, while the case j < k is taken care of by Eq. (47).Note that the value of α is bounded; this is important if we wish to use polynomial optimization methods to enforce the inequality constraints in Eq. ( 49) (see Sec. VI).
The previous scheme is expected to lead to good upper bounds in situations in which, for high N , the optimal value functions V 0 , ..., V −(N −1) of the problem given in Eq. ( 13) satisfy V −(N −1) ≈ V + Ṽ N , for some functions V , Ṽ .That is, they work well as long as lim N →∞ V −(N −1) exists.Experience with dynamical systems suggests, though, that even if the solution of the sequential problem tends to a given value ν, the sequence (V −(N −1) ) N might asymptotically approach a limit cycle.This means that for many problems only every C-th value function follows the required pattern: in such situations, there exist a natural number C (the period) and functions V 0 , ..., V C−1 such that, for high enough N , V −N ≈ V R , with R = N (mod C).Thus, instead of a single recursion relation in Eq. ( 49), we need to construct a family of such relations.
This situation can thus be accounted for by considering functions 2. Sequential models with exponential speed of convergence in N Suppose that the speed of convergence of the sequential model is exponential, i.e., ν(N ) ≈ ν + Aγ N , for γ < 1. Assuming, for simplicity, that lim N →∞ V −(N −1) exists, how could we estimate both ν and γ?
One way to check that the postulated convergence rate holds is to verify the existence of functions and Eq. ( 51) hold.The last line of the above equation is the induction constraint, wherein this time α is to be interpreted as γ j .As before, we have defined α in terms of j such that the former quantity is bounded.To treat this problem within the formalism of convex optimization, one would try minimizing ν for different values of γ.Each such minimization, if feasible, would result in an upper bound ν(N ) on the solution of the sequential problem with ν(N ) ≈ ν + O(γ N ).
For small values of δ, the conditions in Eqs.(43)(44)(45)(46)(47)(48) and ( 51) are not problematic, in the sense that a small infeasibility of any of these inequalities leads to a constant increase in the value function by that amount.This is why in the resolution of Eq. ( 13), we have so far not considered numerical precision issues explicitly.Indeed, let {δ j } j=k+1 be such that Then, the functions W−j := W −j + j i=0 δ i and W := W + k+1 i=0 δ i satisfy conditions in Eq. (43)(44)(45)(46)(47)(48).If, in addition, condition in Eq. ( 49) is satisfied by the original functions, then ν + is a sound upper bound of the asymptotic value of the game.However, precision issues become more problematic if the relation in Eq. ( 49) is only satisfied approximately, i.e., if In that case, for N > k, the optimal reward of the N th game round is upper bounded by ν + That is, the bound grows linearly with N .In that case, the asymptotic upper bounds ν obtained numerically can only be considered valid for N ≪ O 1 δ .

VI. APPLICATIONS OF ASYMPTOTIC OPTIMIZATION OF TIME-ORDERED PROCESSES
In this section, we bound the asymptotic behavior of the time-ordered processes that have been treated at a finite regime in Secs.IV A and IV B, respectively, relying on the techniques from Sec. V.While the former is an example of polynomial speed of convergence in 1  N , the latter illustrates the technique in the exponential case.

A. Asymptotic behavior of finite-state automata
The probability that a two-state automaton generates a one-tick sequence of length L seems to converge to 0 as O (1/L).It can also be verified that this is the convergence rate of the optimal model found in Ref. [23].In fact, this model gives the probability of the one-tick sequence where e is the basis of natural logarithms, and p(2k−1) = p(2k).This explicit example provides an upper bound on the converge rate for ν(L), i.e., ν(L) cannot converge to zero faster than 2/eL.Similarly to this model, the value ν(L) of the sequential game computed numerically satisfies ν(2k) = ν(2k − 1) from k = 2 onward.This suggests that we should apply the relaxation in Eq. ( 52), with C = 2. Remember that s k := (p k (0), p k (1)), where p k (t) denotes the probability that the automaton has so far produced the outcome 0 and is in state t, and λ := (P (σ ′ , 0|σ) : σ, σ ′ ), the transition matrix of the automaton.We allow the discrete-value functions W −k (s, λ) to be linear combinations of all monomials of the form s a λ b , with a, b = 0, 1, 2. The continuous-value functions W 1 , W 2 are linear combinations of the monomials s a λ b α c , with a, b, c = 0, 1, 2. Positivity constraints such as the last line of Eq. ( 52) are mapped to a polynomial problem by multiplying the whole expression by (1 + α) 2 .
To solve this problem, we have used the SDP solver MOSEK [51], in combination with YALMIP [55].After 28 iterations, MOSEK returned the upper bound 0.0265, together with the UNKNOWN message.The solution found by MOSEK was, however, almost feasible, with parameters PFEAS= 4 × 10 −7 ; DFEAS= 8.6 × 10 −7 GFEAS= 7.5 × 10 −11 and MU= 1.3 × 10 −7 .It is therefore reasonable to assume [73] that the actual solution of the SDP, which for some reason MOSEK could not find, is indeed close to 2 × 10 −2 .This figure is, indeed, greater than the conjectured null asymptotic value, but nonetheless a nontrivial upper bound thereof.

B. Asymptotics of entanglement-detection protocols
We observe that for small N , p sep max behaves as This value of p sep can also be achieved, by preparing the separable state |+⟩ ⊗N .For completeness, we illustrate this in App. C.This shows that our bounds from Fig. 8 for the finite N regime are tight (up to numerical precision).
The behavior in Eq. ( 59) also implies that we expect an asymptotic value of 1  2 .Implementing the procedure to bound the asymptotic behavior from Sec. V B 2, we have been able to confirm an upper bound of 0.5004, before running into precision problems using YALMIP [55] and the solver MOSEK [51].For the implementation, we have extended the one described in App. C. with the additional relations from Sec. V B 2. Note that a value of 1 2 corresponds to randomly guessing whether the state is entangled and is achievable for any N with the state in Eq. (37).It is thus a lower bound on e I .Note that gap between the success probability of 1 for the GHZ state and approximately 1  2 for the worst-case separable state, approximately reaches the maximal possible value of such a gap of 1  2 (as explained in Eq. ( 38)).This illustrates that the multiround GHZ game performs essentially optimally for the task of certifying the entanglement of a GHZ state.

VII. POLICIES WITH GUARANTEED MINIMUM REWARD
Let us complicate our sequential model a little bit, by adding a policy.The idea is that the parameters x describing our policy also affects the evolution of the system, i.e., as well as the reward r k (s k , h k , λ, x) obtained in each time step.
Ideally, we would like to find the policy x that minimizes the maximum penalty.That is, we wish to solve the problem under the assumption that Eq. ( 60) holds and that we know the initial state s 1 .
For fixed x, though, it might be intractable to compute the minimum expected reward.A more realistic and practical goal is thus to find a policy x and a sufficiently small upper bound ν on its maximum reward (or penalty) ν ⋆ (x).This is the subject of the next section.

A. Optimization over policies
For any x, let ν(x) denote an upper bound on ν ⋆ (x), obtained through some relaxation of problem given in Eq. (13).We propose to use standard gradient-descent methods or some variant thereof, such as Adam [74], to identify a policy x ⋆ with an acceptable guaranteed minimum reward ν(x ⋆ ).
More concretely, call X the set of admissible policies, and assume that X is convex.Starting from some guess x (0) on the optimal policy, and given some decreasing sequence of non-negative values (ϵ k ) k , projected gradient descent works by applying the iterative equation where π Z (x) denotes the projection of vector x in set Z, i.e., the vector z ∈ Z closest to x in Euclidean norm.
For k ≫ 1, we would expect policy x (k) to be close to minimal in terms of the guaranteed reward ν(x (k) ).The use of gradient methods requires estimating ∇ x ν(x) for arbitrary policies x.Considering that we have so far obtained bounds ν(x) (for fixed x) numerically from relaxations of Eq. ( 13), this task requires additional theoretical insight.The rest of this section is thus concerned with formulating an optimization problem to estimate the gradient of ν(x).
Suppose, for simplicity, that x consists of just one real parameter, i.e., we wish to compute d dx ν(x).Fix the policy to be x, let δx > 0, and let us consider problem given in Eq. (13).To arrive at a tractable problem, one normally considers a simplification of the following type: Here the relation a ≥ Q b signifies that the expression a − b, evaluated on some convex superset Q of the set of probability distributions on the variables {s k , h k } k , λ, is non-negative.In the polynomial problems considered in this paper, this superset Q is, in fact, dual to the positivstellensatz used to enforce the positivity constraints.
In the case of Putinar's positivstellensatz, Q would correspond to the set of monomial averages {⟨z i1 ...z i N ⟩} that, arranged in certain ways, define positive semidefinite matrices (for details, see [47]).
Since Q contains all probability distributions, a ≥ Q b implies that a ≥ b, and so the solution of Eq. ( 63) is a feasible point of Eq. ( 13) and hence an upper bound ν(x) on ν ⋆ (x).For the time being, we assume that the minimizer V 1 , ..., V N of problem given in Eq. ( 63) is unique for policy x.We will drop this assumption later.Now, let us consider how problem given in Eq. ( 63) changes when we replace x by x + δx.We assume not only that the solution satisfies ν(x + δx) ≈ ν(x) + µδx, but also that the optimal slack variables V 1 , ..., V N also experience an infinitesimal change, i.e., That is, not only is the solution differentiable, but also the minimizer of the optimization problem.
Ignoring all terms of the form o(δx) in Eq. ( 64), it is clear that ν + µδx is an upper bound on ν ⋆ (x + δx) if and only if Here, the symbol σ is used to denote the coordinates of s k+1 .
Our goal is now to turn this into an optimization problem that allows us to find µ.For this purpose it is useful to invoke the fact that the slack variables {V k } k are op-timal, i.e., they not only satisfy the conditions in the problem given in Eq. ( 63), but all inequalities must be saturated.Namely, for each k, there exists q ∈ Q such that To proceed, we need the following lemma, which allows us to phrase the constraints in Eqs. ( 65)-( 67) in terms of the quantities Vk .
Lemma 2. Let a ≥ Q 0, and let the set Q ′ := {q : q ∈ Q, ⟨a⟩ q = 0} be nonempty.Then, a + âδx To see the converse implication, note that, for q ∈ Q \ Q ′ , ⟨a⟩ q > 0 and therefore lim δx→0 + ⟨a + âδx⟩ q > 0. To determine if ⟨a + âδx⟩ q ≥ 0 for all q ∈ Q in the limit δx → 0 + , it is thus sufficient to check that the property holds for q ∈ Q ′ , i.e., that a ≥ Q ′ 0.

By duality, condition â ≥
From the above equation and relations in Eq. ( 68), it follows that (under the hypothesis that the minimizer V is differentiable and unique at x) the limit If, in addition, the sets have each cardinality 1, i.e., Q ′ k = {q k }, the problem to solve becomes even simpler, namely: The derivation of Eq. ( 70) relies on the hypothesis that the optimal slack variables V := {V k } k for policy x = x are unique.Should this not be true, the procedure to compute D + ν(x) requires solving a nonconvexoptimization problem.The reader can find it in App.D, together with a heuristic to tackle it.Now, let us return to the case of a multivariate policy x.If the gradient ∇ x ν(x) exists, then its ith entry corresponds to the limit lim Each of these entries can be computed, in turn, via the procedure sketched above.This requires solving an optimization problem of complexity comparable to that of computing ν(x).Moreover, each such optimization can be performed separately for each coordinate of x.
Namely, the process of computing the gradient of ν(x) can be parallelized.This allows us, through Eq. ( 62), to optimize over policies consisting of arbitrarily many parameters, as long as we have enough resources to compute ν(x).

VIII. APPLICATION: OPTIMIZATION OF ADAPTIVE PROTOCOLS FOR MAGIC STATE DETECTION
Magic states, i.e., states that cannot be expressed as convex combinations of stabilizer states, are a known resource for quantum computation: together with Clifford gates, they allow us to conduct universal quantum operations efficiently [75].This raises the problem of certifying whether a given source of states is able to produce them [76].
In the qubit case, magic states are those that cannot be expressed as a convex combination of the eigenvectors of the three Pauli matrices.Equivalently, magic states are those the Bloch vector ⃗ n of which violates at least one of the inequalities: (for an illustration, see Fig. 11).
FIG. 11.Qubit magic states.Represented in the Bloch sphere, magic states of a qubit are all those that lie outside the blue octahedron.In this section the goal is to certify whether a general state is magic or not.
We wish to devise an N -round adaptive measurement protocol that tells us whether a given source is preparing magic states.Namely, if the source can just produce nonmagic states, we expect the protocol to declare the source "magic" with low probability e I (the protocol's type-I error).If, however, the source is actually preparing independent copies of any quantum state violating one of these inequalities by an amount greater than or equal to δ, we wish the protocol to declare the source 'nonmagic' with low probability e II (the type-II error of the protocol), see Fig. 12.
To do so, we formulate the protocol as a timeindependent preparation game [2].That is, in every 12.An illustration of magic state detection.We display a cut through the Bloch sphere showing nonmagic (blue square) and magic states (region outside the square).The goal is to devise an N -round protocol that is able to reliably classify states into magic and nonmagic ones, as long as the considered magic states are at least δ away from the nonmagic ones.The protocol derived below, based on a twostate automaton, is valid in one of the regions outside the square of nonmagic states.measurement round k, our knowledge of the magic of the state is encoded in the configuration t ∈ T of the game, with |T | < ∞.The state prepared by the source is then measured by means of the POVM with elements (M t ′ |t : t ′ ∈ T ) ⊂ B(C 2 ) ×|T | .The result t ′ of this measurement is the new configuration of the game.When we exceed the total number of measurement rounds n, we must guess, based on the current game configuration, whether the state source is magic or not.For simplicity, we declare the source to be magic if t ∈ A ⊂ T , with The initial state of the game, t 1 , is chosen such that t 1 ̸ ∈ A.
The game is thus defined by the POVMs {M t ′ |t : t ′ ∈ T } t .For computing e I , we assume that the 'dishonest' player, who claims to produce magic states but does not, knows at every round the current game configuration.They can thus adapt their state preparation correspondingly to increase the type-I error.As explained in Sec.III B, the computation of the maximum score of a preparation game under adaptive strategies can be cast as a sequential problem with h k := (ρ k (t) : t ∈ T k ).In this case, the set of feasible states, C, is generated by convex combinations of {ψ i } 6 i=1 , the eigenvectors of the three Pauli matrices: linear maximizations over C therefore correspond to maximizations over these six "vertices".Putting it all together, we find that computing the type-I error of the game amounts to applying the recursion relation: Let us assume that all the maximizations above have a unique maximizer, and call i(k, t) the maximizer corresponding to the kth round and game configuration t.Then, the gradient of e I with respect to the POVM element M τ ′ |τ satisfies the recursion relations: It can hence be computed efficiently in the number of measurement rounds.Now, consider the computation of the type-II error under IID strategies, the formulation of which as a sequential problem can be found in Sec.II C. We first focus on the set S a of qubit states with Bloch vector ⃗ n satisfying j a j n j ≥ 1 + δ, for some a 1 , a 2 , a 3 ∈ {−1, 1}.Applied to a source that always produces the same state ρ ∈ S, the magic detection protocol can be modeled through a sequential model, with internal state s k at time k given by the probability distribution over the game configurations P k (t) in round k, just before the measurement.The state ρ ∈ S prepared by the source is to be identified with the evolution parameters λ, as the equation of motion of the model is: To calculate the type-II error, we assign the model the rewards r k = 0, for k = 1, ..., N − 1 and We can thus compute an upper bound on e II (S a ), the maximum type-II error achieved with states in class S a , via the SDP relaxation of Eq. ( 13); and its gradient with respect to the POVM element M t ′ |t , via Eq.( 70) [77] We assume that the solution of the problem given in Eq. ( 13) is unique, so the techniques of App.D are not necessary.
Our original problem, though, is to compute the type-II error of all states violating at least one of the facets of the magic polytope by an amount ≥ δ.Thus we have that e II = max a e II (S a ).The gradient (or, more properly said, subgradient) of this function is ⃗ ∇e II (S a ⋆ ), where a ⋆ is the argument of max a e II (S a ).FIG. 13.The upper bounds for eI + eII as a function of the number of gradient iterations.Starting from a random preparation game, we have used the Adam algorithm [74] to decrease the sum of the type-I and type-II errors of a magic state detection protocol.For this computation, we set the learning rate of the algorithm to 0.01.
We are now ready to apply gradient descent to minimize the combined error e I + e II .We have used the gradient method Adam [74], followed at each iteration by a projection onto the (convex) set of protocols For |T | = 2, n = 6, gradient descent converges to a value e I + e II ≈ 1, probably an indication that one cannot detect all magic states with a two-state automaton.If, however, we restrict the problem to that of detecting those states that satisfy the inequality i n i ≥ 1 + δ, we arrive at the plot shown in Fig. 13.
As the reader can appreciate, after a few iterations, the algorithm successfully identifies a strategy with e I +e II < 1.Beyond 50 iterations, the objective value does not seem to decrease much further.This curve has to be understood as a proof of principle that gradient methods are useful to devise preparation games.Note that, for a general preparation game, the number of components of the gradient is 4|T | 2 .Optimizations over preparation games with high |T | would thus benefit from the use of parallel processing to compute the whole gradient vector.

IX. CONCLUSIONS
We have proposed a method to relax sequential problems involving a finite number of rounds.This method allows us to solve optimization problems of sizes that have not been tractable with previous convex optimization techniques.Moreover, a variant of the method generates upper bounds on the solution of the problem in the asymptotic limit of infinitely many rounds.
We have demonstrated the practical use of our meth-ods by solving an open problem: namely, we have computed the maximum probability for any finite-state automaton of a certain fixed dimension to produce certain sequences, thus proving that the values conjectured in Refs.[23,24] are optimal (up to numerical precision).We further illustrated how to turn such bounds into analytical results.
As we have shown, the methods are also relevant in the context of certifying properties of quantum systems, especially in entanglement detection.Specifically, our method enables us to bound the misclassification error for separable states in multiround schemes as well as in the asymptotic limit, which we illustrate by means of a GHZ-game-inspired protocol, for which we also show that our bounds are tight.
Finally, we have explained how to combine these methods with gradient-descent techniques to optimize over sequential models with a certified maximum reward.This has allowed the computer to generate an adaptive protocol for magic-state detection.
In some of our optimization problems, we have found that otherwise successful SDP solvers, such as MOSEK [51], have failed in some cases to output a reliable result.The cause of this atypical behavior is unclear, and we lack a general solution for this issue.That said, the results from the numerical optimization have still allowed us to extract and confirm reliable bounds from the optimizations in most cases.
In two of the considered applications we found tight upper bounds on the quantities of interest.This may indicate that at least in reasonably simple cases our hierarchical method converges at relatively low levels of complexity.Finding conditions for convergence at such low levels is an interesting question that we leave open.
Quantum technologies are currently at the verge of building computing devices that can be used beyond purely academic purposes.These involve more than a few particles and thus the problem of certifying properties of systems of intermediate sizes is currently crucial.This is relevant not only from the perspective of a company building such devices but also from the perspective of a user who may want to test them independently.This progress is somewhat in tension with the state of research in certification, e.g. in entanglement theory, where methods are only abundant in the few-particle regime.Our GHZ game illustrates that automaton-guided protocols may bridge this gap: modeling one-way LOCC protocols through sequential processes might allow the certification of large classes of many-body systems, thus offering an alternative to protocols based on shadow tomography [64,78].
Our methods are furthermore able to certify systems beyond the linear regime that is employed when considering the usual types of witnesses for certifying systems, which from a mathematical viewpoint essentially distinguish convex sets by finding a separating hyperplane.As suggested in the example of magic states, our techniques beyond this regime also mean that we can analyse more general classes of systems simultaneously.Indeed, using a finite-state automaton of large enough dimension, we expect to be able to detect any states that have at least a certain distance from the set of nonmagic states.As optimization methods improve, we expect such universal protocols to become more and more common as they are also applicable with minimal knowledge about the state of the system at hand.
Finally, many different problems encountered in quantum information theory have a sequential structure similar to the one analyzed in this paper.Thus we expect that our methods will soon find applications beyond those studied in the present work.In this regard, it would be interesting to adapt our techniques to analyze quantum communication protocols.Another natural research line would be to extend our methods to model interactions with an evolving quantum system, rather than a (classical) finite state automaton.The Lasserre-Parrilo method provides a complete hierarchy of sufficient conditions to certify that a polynomial is non-negative, if evaluated in a compact region defined by a finite number of polynomial inequalities [47,48].Given a number of polynomials g 1 (z), ..., g u (z), g1 (z), ..., gv (z), we define the region Z ⊂ R n as Z = {z ∈ R n : g 1 (z), ..., g u (z) ≥ 0, g1 (z) = ... = gv (z) = 0} (A1) Let p(z) be a polynomial, and consider the problem of certifying that, for all z ∈ Z, p(z) ≥ 0. A sufficient condition is that p(z) can be expressed as where fi (z), f i (z), f ij (z) are polynomials on the vector variable z.The above is called a Sum of Squares (SOS) decomposition for polynomial p(z).Deciding whether p(z) admits a SOS with polynomials fi (z), f i (z), f ij (z) of bounded degree can be cast as a semidefinite program (SDP) [47].Moreover, as shown in [47], as long as Z is a bounded set and p(z) is strictly positive in Z, such a decomposition always exists [79].
This gives at each time step (except the initial and final one) four variables for the state s 0 k , s 1 k , s 0 k+1 , s 1 k+1 and four variables for the transition parameters {λ σ,σ ′ } σ,σ ′ =0,1 .These are the variables on which the polynomials appearing in Eq. (A1) are built.
To impose the problem's constraints, we need the two bases of monomials MB in and MB out discussed in A. These allow us to write the polynomials g i , gi appearing in the constraints in Eq. (A1) as well as the polynomials in Eq. (A2) and (possibly) Eq. (A3), as vectors.For the specific calculations of the probability for the one-tick sequences, Figs. 5 and 6 from the main text, it was sufficient to use Putinar's positivstellensatz, i.e., Eq. (A2).The input and output bases of monomials are chosen as follows.
• Combine these three sets together and add monomials of the form s i k λ σσ ′ , for σ, σ ′ , i = 0, 1 and of the form s i k+1 λ σσ ′ for σ, σ ′ , i = 0, 1.This operation concludes the construction of the input basis MB in .
• Construct in a similar way two other bases, MB k and MB k+1 , defined as follows.MB k is the union of monomials of degree two in the variables, respectively, (s 0 k , s 1 k ) and {λ σσ ′ } σσ ′ , together with monomials of the form s i k λ σσ ′ , for σ, σ ′ , i = 0, 1. MB k+1 has the same construction with s i k substituted by s i k+1 .In other words, these bases are defined in the same way as MB in , but this time excluding all monomials containing s k+1 or containing s k .
• The output basis is given by the tensor product of three copies of the input basis.In other word, each monomial of this list is a product of three monomials appearing in the input list.This concludes the construction of the output basis MB out .
Once we have these bases, we want to implement the constraints where SOS ℓ (g i , gi ) denotes the fact that the SOS constraints are implemented up to a level ℓ which depends on the choice of the input (and possibly output) basis above.The function V k (s k , λ) is a polynomial given by a linear combination of monomials in the list MB k .Similarly, V k+1 (s k+1 , λ) is a linear combination of elements of MB k+1 .The condition of belonging to SOS ℓ (g i , gi ) means that V k (s k , λ) − V k+1 (s k+1 , λ) can be written in the form of Eq. (A2) for properly chosen polynomials fi , f i , f ij .In our implementation, fi , f i , f ij are convex combinations of monomials in the MB in basis.For the way in which MB out is chosen, all possible monomials arising from the product of fi , f i , f ij with gi , g i appear in MB out .
For the case d = 2, the SDP is run with CVXPY [52] and MOSEK [51] up to L = 50.The values coincide with the explicit model found in [23] up to the fifth decimal digit.For the case d = 3, the SDP is run with CVXPY [52] and SCS [53] up to L = 10.The numerical solutions coincide with the optimal model found in [23] and the values explicitly reported in [24], up to the fourth decimal digit.For each explicit solutions provided by the solvers we verified that indeed the linear and positivity constraints are satisfied up to numerical precision.

Extracting value functions for the one-tick-sequences and finding optimal automata
In order to extract value functions that are suitable for further treatment, simplifications of the output of the polynomial optimization (from App.B 1) is desirable.Here it is useful to impose any symmetries and additional knowledge regarding the structure of the problem at hand.This may allow us, for instance, to impose that certain monomials share the same coefficient and thus to reduce the number of independent variables.
In cases where there are redundant variables involved, we can further realize this by inspecting the value functions.If we had for instance chosen an implementation of the problem that additionally involves the variables P (0, 1|0) and P (0, 1|1), we could find value functions that don't involve and of the monomials containing them.The other way around, we can check that we indeed need all variables used in the implementation of App.B 1, more precisely, while we could remove e.g.P (0, 0|1) or P (1, 0|1) in the d = 2, L = 3 case, these are needed for higher L.
As the numerical optimization tends to assign non-zero values to all optimization variables, it is further useful to simplify the value functions by eliminating irrelevant monomials, meaning setting their coefficients to zero.It turns out that this works well for the problem at hand when following a heuristic procedure by first computing the optimum P d * max and then iterating the following two steps: • Fix a threshold ν and set all coefficients of the V k smaller than ν to zero for the next optimization.If there are no such coefficients, increase ν.
• Solve the problem of optimizing P d max and check that the optimal value is still P d * max .If not, put coefficients back until recovering P d * max .
After this procedure the polynomials can then further be adjusted manually, e.g. by requiring that certain coefficients are equal.The automaton-guided GHZ state is conducted using a 4-state automaton that is updated by measuring a qubit state in each round.Due to normalization, this means that we need polynomials in 3 variables for the 4-state automaton, {p k (t)} t∈{1,−1,i} , and 2 variables for the Bloch vector of the state in each round, n 1 , n 2 , where n = (n 1 , n 2 , n 3 ) denotes the Bloch vector of the state in round k, (except in the very last round, where we generate the reward as a function of s N +1 in a way that updates this through multiplication with the success probability of the last measurement in each case.Due to the loss of normalization the last step thus needs one additional polynomial variable).
The equation of motion is in this case of degree 1 in p k , p k+1 , ρ k , so that to exhaust the degree we have at our disposal we can multiply it with polynomials of degree 1 in p k , p k+1 , ρ k and degree 2 in ρ k+1 .The squared polynomials multiplying the products g = g j g k g l ... are chosen to be linear combination of at most degree 1 in the variables that are not part of g.We implemented the above problem using YALMIP [55] and the solver MOSEK [51] for solving the semidefinite programs.To confirm that our results remain trustworthy for large N , we further extracted all value functions from the solution and then checked each inequality V N ≥ r N , V k − V k+1 ∀k, ν ≥ V 1 separately.For this purpose, we set up a con-strained polynomial optimization problem (constrained by positivity constraints and equation of motion) to maximize the violation of each inequality separately.As opposed to the case of the original optimization we did not set up the semidefinite formulation manually but used the SOLVEMOMENT functionality of YALMIP (setting level=3 except for k=30 where we used level=2) and MOSEK for this step.The sum of the separate violations led to the error estimate for our results (see also Sec.VC from the main text).
For N = 35 we obtained a value of 0.5025, where the upper bound was computed following the procedure (50) applied to the finite N case with ν = 0.5002.

Entanglement detection by means of bit-sequence protocols
The bit-sequence protocol is performed using a 2-state automaton that is updated by measuring a qubit state in each round.This automaton saves the outcome of the last measurement, based on which the next one is decided.Due to normalization, this means that we need polynomials in 1 variable for the 2-state automaton, {p k (t)} t∈{1} , and 2 variables for the Bloch vector of the state in each round, n 1 , n 3 , where n = (n 1 , n 2 , n 3 ) denotes the Bloch vector of the state in round k [81].
The value functions are constructed following an adaptation of Schmüdgen's positivstellensatz in Eq. (A3).The polynomials V k are taken to be of mixed degree 2, i.e., terms of order up to p k (t)p k (t ′ )n s n s ′ , t, t ′ = 1, s, s ′ ∈ {1, 2}.This means that for an inequality involving V k and V k+1 we consider polynomials of degree up to 2 in each of {p k (t)} 1 , {p k+1 (t)} 1 , n 1 , n 2 , n ′ 1 , n ′ 2 (where n = (n 1 , n 2 , n 3 ) denotes the Bloch vector of the state in round k and n ′ = (n ′ 1 , n ′ 2 , n ′ 3 ) is the Bloch vector in round k+1) This means that we consider monomials of the form p k (t)p k (t ′ )p k+1 (t ′′ )p k+1 (t ′′′ )n s n s ′ n ′ s ′′ n ′ s ′′′ .The polynomials g i are here given by positivity of the probabilities of the automaton (p k (t = 1) ≥ 0, 1 − p k (t = 1) ≥ 0, and the same for p k+1 ) as well as normalization of the Bloch vectors of (1 − n 2 1 − n 2 2 ≥ 0 and 1 − n ′2 1 − n ′2 2 ≥ 0).The equation of motion (see (5) from the main text) for this example defines the gi .
Appendix C: Lower bound on the maximum winning probability in the automaton-guided GHZ game with separable states The winning probability in the N -round GHZ game can be lower bounded by computing the score of any Npartite separable state.In the following we show that the optimal separable strategy consists in preparing the state |+⟩ ⊗n , which indeed recovers our bounds (up to numerical precision).
For this state, the measurement outcome for measurement 1 is 1 in each round, while for measurement 2, both outcomes occur with probability 1  2 .This means that the states of the automaton in round k have probabilities p k = (p k (1), p k (−1), p k (i), p k (−i)) = (1/4 + 1/2 n , 1/4 − 1/2 n , 1/4, 1/4), which is computed as p k = M k−1 p 1 , where p 1 = (1, 0, 0, 0) is the initial state of the sequential model and  The derivation of Eq. ( 66) from the main text relies on the assumption that the optimal slack variables V := {V k } k for policy x = x are unique.In that case, it seemed natural to postulate that they are differentiable at x = x, i.e., that the optimal slack variables of the perturbed problem, with policy x = x + δx, should be of the form of (60) from the main text.
If there exists more than one minimizer for problem (59) from the main text, though, then it is still natural to postulate that there exists at least one differentiable minimizer.However, there is no reason why the value of this minimizer should coincide with the value returned by the numerical solver.In other words, given a minimizer V of (59), the optimization problem (66) in general returns an upper bound on D + ν(x), rather than its exact value.
To tackle the possible non-uniqueness of the optimal slack variables V , consider, for fixed ξ, the following variant of problem (66): This problem optimizes, not just over V , but also over the minimizers V of problem (59).
A coordinate descent-based heuristic to compute the limit D + ν(x) is thus the following: given a minimizer V (j) of the unperturbed problem, we solve problem (66), hence obtaining the slack variables ξ (j) .Next, we solve problem in Eq. (D1) for ξ = ξ (j) , obtaining the new unperturbed solution V (j+1) .Starting from an initial solution of the unperturbed problem V (0) , we hence generate a sequence of minimizers V (1) , V (2) , ... for problem (59).The corresponding sequence of solutions of problem (66), with V = V (1) , V (2) , ..., is obviously decreasing, and, if the initial seed V (0) is close enough to the optimal solu-tion, one would expect it to converge to D + ν(x).
Since the estimation of D + ν(x) requires solving a nonconvex optimization problem, it is possible that the solution reached through coordinate descent is not optimal.A simple consistency check to verify that coordinate descent did not get stuck in a local minimum consists in computing the converse limit D − ν(x) := lim δx→0 + ν(x)−ν(x−δ|i⟩) δx and see if both quantities coincide (they should, if ν(x) is differentiable in x = x).It is not difficult to adapt the derivation of Eqs. ( 66), (D1) to estimate D − ν(x) instead of D + ν(x).
that, for some K > 0, the polynomial K − i z 2 i admits a SOS decomposition.The latter, in turn, implies that Z is bounded.
[80] This adaptation of the more common Lasserre-Parrilo hierarchy relies on the Schmuedgen Positivstellensatz [45] instead and is employed here to obtain a better numerical performance in the problems considered later.
[81] For convenience, we choose the first and third entry here since we perform Pauli measurements and thus this way the equation of motion can be formulated more directly.

FIG. 2 .FIG. 3 .
FIG. 2. The sequential model for entanglement detection.The particles of an N -party state (green) are measured one by one.The kth measurement y k depends on the internal state of a finite-state automaton and the outcome b k affects the transition of the automaton according to Q k .

FIG. 8 .
FIG.8.The upper bounds for the winning probability of the best separable states in the GHZ game.We observe that the winning probability (blue) drops quickly toward 0.5 (green line).The bounds are displayed up to N = 9 for aesthetic reasons.In fact, we have checked that we can derive bounds up to N = 35 with this method.With our asymptotic hierarchy (see Sec. VI B), we further confirm an upper bound on the asymptotic value of 0.5004 as N → ∞.
Appendix A: The Lasserre-Parrilo hierarchy and variants

3 .
Entanglement detection in the automaton-guided GHZ game a. Implementation of a polynomial optimization problem to bound the performance of separable states in the automaton-guided GHZ game b. Resolution of numerical problems in the optimization of the automaton-guided GHZ game
Appendix D: Computing the gradient under non-uniqueness of the optimal value functions Upper bounds for the winning probability of the best separable states compared to ρN in a single shot.The green dots give an upper bound on the score that can be obtained with separable states in the bit-sequence game.We observe that this value lies strictly below the value that we obtain with the states ρN (blue).