Ticking-clock performance enhanced by nonclassical temporal correlations

We investigate the role of nonclassical temporal correlations in enhancing the performance of ticking clocks in a discrete-time scenario. We show that the problem of optimal models for ticking clocks is related to the violation of Leggett-Garg-type temporal inequalities formulated in terms of, possibly invasive, sequential measurements, but on a system with a bounded memory capacity. Ticking clocks inspire the derivation of a family of temporal inequalities showing a gap between classical and quantum correlations, despite involving no input. We show that quantum ticking-clock models achieving accuracy beyond the classical bound are also those violating Leggett-Garg-type temporal inequalities for finite sequences and we investigate their continuous-time limit. Interestingly, we show that optimal classical clock models in the discrete-time scenario do not have a well-defined continuous-time limit, a feature that is absent in quantum models.

The first approach to the study of temporal correlations dates back to Leggett and Garg [33], who defined classical correlations from two assumptions: macrorealism per-se (MR), i.e., the existence of a definite value for a physical quantity at any time, and noninvasive measurability (NIM), i.e., the possibility of measuring such a quantity without altering its value. From this, they derived the so-called Leggett-Garg inequalities (LGI), which were tested in a wide variety of physical systems (see the review [34] and Refs. [35][36][37][38][39][40][41][42][43] for more recent experiments). A major challenge is the clumsiness loophole [38,[43][44][45][46][47]: the possibility that the NIM condition is violated due to an imperfect, or clumsy, measurement rather than quantum effects. This strong restriction on allowed operations makes it difficult to discuss in terms of LGI quantum advantages in information-theoretic tasks involving sequential operations: Any classical device with an internal memory that is updated sequentially would violate NIM. To overcome this problem, a new framework has been proposed [48], which relaxes NIM assumption to that of bounded internal memory: The operations are allowed to be invasive, but they can modify an internal memory of at most n bits; NIM is recovered for the special case n = 0.
Here, we address the problem of distinguishing quantum from classical clocks based on temporal correlations in the spirit of Leggett-Garg. We start from a discretetime structure and then analyze its continuous-time limit. We first show how the notion of accuracy of a ticking clock gives rise to a family of temporal inequalities and prove analytically the bound for the bit case. Such inequalities, involving first and second moments of the ticks distribution, however, need infinitely long sequences. We, then, derive another family of inequalities, which discriminates classical and quantum systems for finite sequences. For the finite sequences, experimental tests can be easily designed, in analogy with standard LGI tests. Finally, we show that quantum models that achieve an accuracy above the classical bound are also those that violate such inequalities. Interestingly, optimal classical clocks do not always have a well-defined continuous limit, in contrast with quantum clocks, which are readily extended to the continuous-time limit in all cases.

II. PRELIMINARY NOTIONS
To relax the constraint of NIM, we adopt the framework developed in [48][49][50], based on the notion of a finitestate machine: a box that receives an input x ∈ X , produces an output a ∈ A, then receives another input y and produces another output b, and so on. The classical version of a finite-state machine of dimension d can be compactly represented as p(ab|xy) = πT (a|x)T (b|y)η. (1) Here π ∈ R d is a (row) vector with components [π] j ≥ 0 representing the probability of being in the state j, and arXiv:2005.04241v2 [quant-ph] 29 Jul 2021 satisfying j [π] j = 1, the matrix T (o|i), associated with output o and input i, is d × d and substochastic, i.e., [T (o|i)] jk ≥ 0 and k [T (o|i)] jk ≤ 1, such that o T (o|i) is a stochastic matrix, i.e., k,o [T (o|i)] jk = 1, where each entry [T (o|i)] jk represents the probability of transitioning from the state i to the state j when the input i is given and the output o is observed. Finally, η ∈ R d is the column vector (1, . . . , 1), which provides a sum over all possible final states of the machine. Notice that the set of operations {T (o|i)} o,i is assumed to be fixed along the sequence, whenever the same pair of input-output is observed, the same state transformation is applied.
In contrast, in quantum mechanics, probabilities are given by p(ab|xy) := tr[I b|y • I a|x (ρ)], where ρ is the d-dimension initial state, {I o|i } o is the quantum instrument associated with the input i, i.e., I o|i is a completely positive (CP) map for all o and o I o|i is a completely-positive trace-preserving (CPTP) map, and • denotes composition. Similarly, {I o|i } o,i are assumed to be fixed along the sequence. The classical case, extensively discussed in the literature (see, e.g., [51]), corresponds to the quantum case of states and operations diagonal in the same basis, as can be easily verified, e.g., via the Choi-Jamiołkowski isomorphism. Classical and quantum models provide different correlations that can be distinguished via Leggett-Gargtype inequalities [48]. The finite memory constraint can be interpreted as a relaxation of Leggett and Garg's NIM assumption: Measurements are now allowed to be invasive, but up to a limited amount fixed by the system memory. The constraint of bounded memory is fundamental in the investigation of nonclassical temporal correlations, since classical system with unbounded memory are able to simulate all temporal correlations [52], see also the discussion in [48,49].
In its simplest instance, a ticking clock is modeled by a physical system generating an output 1 ="tick" at some instants of time. In our model, time is discrete. Our clock models are given by finite-state machines without input (or, equivalently, with X = {0} consisting of only one symbol), where the dynamics is described by either transition matrices {T 0 , T 1 } or a quantum instrument {I 0 , I 1 } associated with outputs A = {0, 1} = {"tick , "no tick }. We call this model a discrete-time ticking clock. The choice of no input is required for the clock to be autonomous, namely, not relying on some external input to generate the time signal. In contrast to previous models, e.g., [3], we do not require a so-called gear system assigning the output-memory allocation, and our model closely resembles a discrete-time version of [5]. This is the simplest measurement model one can imagine in the temporal scenario. Nevertheless, due to its nontrivial causal structure, this model already features observable differences between classical and quantum correlations, in contrast to the standard Bell and Leggett-Garg scenarios, in which an external input is needed.

III. ACCURACY OF DISCRETE-TIME CLOCKS
A figure of merit of the quality of a ticking clock (for a clock that resets itself after each tick) is the accuracy [1] where µ is the mean time interval between ticks and σ 2 its variance. This quantity corresponds to the average number of ticks that the clock can make before its uncertainty is larger than µ. With this figure of merit, it has been shown that continuous-time quantum clocks can outperform classical clocks of the same dimension [2]. For a discrete-time clock, defining by p(L) := p(00 . . . 01) the probability of the first occurrence of the outcome "1" at time-step L, one defines the mean and the variance of such a distribution as µ := ∞ L=1 Lp(L) and Using the arrow-of-time conditions [53], namely, the condition of no signaling from the future to the past, one can write where we defined the corresponding f : N → R. This expression has a convenient form that allows the employment of the notion of Z-transform to simplify the calculations. This is defined for f : N → R as where z is a complex variable. The Z-transform satisfies a series of properties, such as linearity, bijectivity, or properties associated with shifts and derivatives that will be useful in the following (see, e.g., [54]). In particular, we use that which can be straightforwardly verified. Combining Eqs. (4) and (7), we can write From the explicit expression Q(z) = ∞ L=0 p(L)z −L , one sees that Q(z) is also a generating function for the moments of the probability distribution p(L), namely which combined with Eq. (8) give wheref := ∂f ∂z . Note that so far the calculations do not involve any particular model (i.e., classical or quantum) to compute p(L). In the following, we specialize first to the classical case, for which the linearity of the Ztransform can be fully exploited.

A. Classical case
For a classical automaton with transition matrices given by (T 0 , T 1 ) for the two outcomes "0" and "1", we have (see Eq. (1)) where π 0 is the initial state.
, which can be computed as follows. Let us denote by π(L) the state at the time step L and by T 0 the transition matrix, we have the relation By applying the Z-transform on both sides of the second equation and using Eq. (6) and linearity, we obtain This equation has solution iff z is greater than the spectral range of T 0 (cf. Th.2.1 of Ref. [55]). We then have Thus, simplifying a bit the expression (15) and transforming back, we obtain where C is a closed region inside the convergence region. Note that the matrix inside the square parentheses is nothing more than T L−1 0 (1 1 − T 0 ). Interestingly, the above integral representation resemble the spectral representation for Hermitian operators, where the matrix (z1 1 − T 0 ) −1 , called the resolvent of T 0 , plays a role similar to the spectral projections in computing a function of T 0 , a polynomial in this case, but for a matrix T 0 that is in general not Hermitian or even diagonalizable.
In turn, the integral in (16) can be calculated from the residues of the function The moments of p(L) can, then, be calculated from Eq. (11) as and where in the last line we used the property that So far, the derivations have been carried out for an arbitrary dimension. However, the actual calculation of the moments and the p(L) become quickly very complicated as the dimension grows. In the following, we provide the full solution for the most accurate classical clock model in the bit case.

Results in the bit case
Let us consider here a general transition matrix for an automaton in d = 2: with a, b, c, d ≥ 0 and a + b, c + d ≤ 1. We can now directly compute the accuracy as a function of the model parameters a, b, c, d via Eq. (18). Note that since p(L) is a linear function of π, by convexity we can choose the initial state to be pure, i.e., either π 0 = (1, 0) or π 0 = (0, 1). Up to a relabeling of the basis elements, we can fix the initial state to be π 0 = (1, 0). Our goal here is to maximize the accuracy. However, it is easy to see that for µ = 2 the accuracy can be infinity, as one can easily construct a bit clock that ticks every two time-steps. More in general, one can see that accuracy of a clock can be infinite if the mean is smaller or equal than the dimension. A non-trivial solution can be obtained only if we fix µ > 2, or µ > d in the general case.
Interestingly, it is possible to solve analytically for a generic µ > 2, using the Karush-Kuhn-Tucker (KKT) conditions [56], a generalization of Lagrange multiplier for inequality constraints. The solution is given by b = 2/µ, a = d = 1−b, c = 0. See Appendix A for the details. As a solution we find that, as anticipated, for µ ≤ 2 one can obtain a perfect clock: σ 2 = 0 and R = ∞. For µ > 2 the maximum accuracy is The classical discrete ladder clock model versus its quantum counterpart. In the classical model, the system transition with probability q to the next state or remains with probability (1 − q). The probability of "tick" is zero for all states, except the last one, where it is (1 − q). In the quantum model, the "no-tick" instrument is of the form I0 = exp(iHθ) √ E0 and the initial state is a superposition of the eigenbasis of H. Similarly to the classical case, the probability of "tick" is non-zero only in the last state (in E0 eigenbasis), however the system is coherently rotated, by an angle θ at each step, such that all this intermediate states have a non-zero overlap with the last state.

achieved by
which we call one-way or discrete ladder clock model. From this solution we can extract an inequality for twostate machines: We have proven analytically the condition in Eq. (22) in the case of machine of dimension two, however, on the basis of an extensive numerical search, we expect to generalize to arbitrary dimension as See the discussion in Sec. VI and in Appendix F.

B. Qubit clock beating the classical accuracy
Inspired by the classical cyclic model, we construct a qubit model able to violate Eq. (22), for any µ > 2, in terms of only two real parameters q and u. The model is described by a single Kraus operator for the "no-tick" see also Fig. 1. Analogously to the classical case, µ and σ 2 can be computed via the Z-transform method, where now the moment generating function is (25) and from Eq. (2) we find again considering simply a pure state by a similar convexity argument. Note that in this case we cannot simplify much further the problem with the aid of the Ztransform, due to the fact that the expression is quadratic in the parameters of |ψ rather than linear, thus rendering a full optimization more complicated in this case. Nevertheless, for the particular model defined by Eq. (24), we can optimize the parameters q and u to find the optimal violation of Eq. (22). To do so, we minimize σ 2 over q and u for fixed µ(q, u). Full details are presented in Appendix C.
The result of this optimization provides the values u = 2q/(1 + q 2 ), and q = 1 − 2 µ . This solution gives for the l.h.s. of Eq. (22) (for µ > 2) and This model closely resembles a discrete version of the quasi-ideal clock presented in [2,5,57], in that it has a single Kraus operator with free evolution (i.e., U 0 ) and measurement (i.e., E 0 ) having eigenbases related by the (discrete in our case) Fourier transform.

IV. TEMPORAL INEQUALITIES FOR FINITE-LENGTH SEQUENCES
A. Bit case Equation (22) can be thought of as Leggett-Garg-type inequality providing a witness for nonclassical temporal correlations, w.r.t. the finite-state machine framework. In contrast to usual Leggett-Garg-type inequalities, however, it involves an infinite sequence of measurements. Inspired by the above construction, we find a family of temporal inequalities for finite length L: where Ω C d=2,L is an upper bound valid for all classical finite-state machines with d = 2. By optimizing the As a subsequent step, the computation is performed on a refined lattice, with refined error, corresponding to the blue region in (b). Only a subset of those points are able to violate the maximum (new blue region), on which a new refined lattice is defined. These steps are repeated until the desired error is reached. Obviously, the same algorithm works even without an initial estimate of the maximum (orange line), by simply using the maximum on the lattice as an estimate above probability a second model other than the ladder clock arises, which we call the cyclic model: where · is the ceiling function. Still, the ladder model gives the optimum for certain specific L, and we similarly obtain q = 1 − 2 L as the optimal corresponding value of the model parameter. More details can be found in Appendix B, where we also study generalizations of such model for higher dimensions.
For the case d = 2 we are indeed able to prove, up to L = 20, an analytical upper bound to Ω C 2,L , with a gap of 10 −4 w.r.t. the maximum achieved by the cyclic and the one-way models. Such precise upper bound can be obtained via grid-search methods, exploiting the fact that the probability function depends only on four parameters, i.e., the entries of T 0 defined in Eq. (19). Let us give a brief sketch of the main ideas. Given a function f : R D → R with proper continuity properties, we can write it in the lowest order in Taylor expansion with Lagrange remainder as In our case, f represents the probability p(L) as a function of a, b, c, d in Eq. (19), on a compact subset B defined by positivity and normalization constraints. First, notice that f and ∇f are continuous functions and B is compact, so they have a maximum in B. Thus, if instead of evaluating f on x, we evaluate on some other point y, e.g., belonging to a (hyper)cubic lattice of step δ, L δ := (δZ D ) ∩ B, we can bound the error in this approximation as the maximum of the gradient times the maximal distance between B and L δ , namely, where we used that δ √ D/2 is the maximal distance between B and L δ . A peculiar property of our probability function f is that also the components of its gradient can be interpreted as probabilities, hence, they are bounded between 0 and 1. The details of this calculation are presented in Appendix E. The evaluation of the function on a lattice offers two advantages that are central for obtaining a feasible computation. First, the algorithm can be highly parallelized, as each lattice point can be evaluated independently of the others. Second, the algorithm can be made iterative: one first gets a rough estimate, in which many points of the lattice can already removed, and then refines the lattice for the remaining points. A representation of this method is shown in Fig. 2. This allows us to estimate our functions on lattices of approximately 10 18 and 10 19 points. See Appendix E for further details.
A natural question arises regarding the relation between the saturation of the temporal inequality in Eq. (28) and the optimal accuracy of classical clocks. We see that the optimal clock in terms of accuracy, i.e., the one-way model, is also optimal for the finite sequence with L = 3, but in the other cases it is outperformed by the cyclic model. The latter, in turn, has a non-optimal accuracy and, as shown in Sec. V, does not have a continuous limit. Both models can be seen as a special instance of what we call the multicyclic model that generalizes to arbitrary dimension, and for which p(L), µ, σ, and R have analytic expressions in terms of the single parameter q. The multicylic model is discussed more in Sec. VI, and its properties are summarized in Table I. See also Appendix B for more details on the derivation of such properties.

B. Quantum violations
Again, given a classical bound as in Eq. (28) we can look for violations with a quantum model of the same dimension. For the qubit case, it turns out that the same model described in the previous section is also able to violate the classical inequality in Eq. (28). In fact, substituting in Eq. (24) u = 2q/(1 + q 2 ) and q = 1 − 2 µ , as before, and choosing µ = L, one obtains a violation of the classical bound p(L) ≤ Ω C 2,L . A maximization over u and q as free parameters, on the other hand, provides only a slightly better value. The result of this optimization are shown in Fig. 3. A similar model can be observed to outperform the multicyclic models also in higher dimensions. Details of the calculation are presented in Appendix C.

V. CONTINUOUS LIMIT OF CLOCK MODELS
We now investigate the limit of continuous time and how this recovers known results in the literature. Let us denote by δ > 0 the discrete time-step, and {I (δ) 0 , I (δ) 1 } the corresponding quantum instrument. The continuoustime limit corresponds to δ → 0 + , so in order to characterize the case of "no tick" in a finite interval [0, t], we need to consider a diverging number N of application of the instrument, namely N = t/δ ∈ N, with the composite instrument To find the form of I (t) 0 , we impose some very natural properties, which we can compactly write as and interpret as follows. First, if no time has passed, nothing happens, Eq. (33); second, the application is divisible, Eq.(34); third, we want the clock not to move instantaneously, Eq. (35). For every map that satisfy these properties (see [58]), there exists a liner operatorL such that I (t) 0 forms a semigroup with a generator representation, namely I (t) 0 = exp(tL). It is then straightforward to verify that the choice I (t/N ) 0 = exp (t/N )L in Eq. (32) implies that the conditions are satisfied. This guarantees self-consistency between definition (32) and the properties it must satisfy, i.e., (33)- (35). At this stage, we have not used the fact that the map I (t/N ) 0 is completely positive. Now, in order to prove that the generatorL must be of the Lindblad form, it suffices to note that I is. Therefore, by theorem 2.2 in [59], it follows that the generatorL must be of the Lindblad form, namely there exists Hamiltonian H, and operators A m such that for a d dimensional clock with I (t) 0 = exp tL, and I (δ) 0 = exp δL. Note the addition of the operator A 0 . This is because the channel is not trace preserving but rather trace non-increasing. Therefore, the generator L can always be found by evaluating The classical case is a special case of the above, and leads to the substochastic matrices T (t) 0 = e tA0 , and T (δ) 0 = e δA0 where A 0 is a substochastic generator giving rise to an evolution π(t) = π 1 e tA0 for t ≥ 0.
From the continuity condition, lim δ→0 + T (δ) 0 −id = 0, it is clear that the cyclic model does not have a continuous limit. In the one-way model, the condition is satisfied for q(0) = 1. For δ ≈ 0, we can approximate q via a Taylor expansion as q(δ) ≈ 1−αδ, with α > 0. The condition on the first derivative comes from the fact that q ≤ 1. 1 For the d-dimensional cyclic model (see Appendix B), the generator is A 0 = lim δ→0 + T0−1 1 δ and consists of a bidiagonal matrix with −1 on the diagonal and 1 on the upper diagonal. This is the ladder clock defined in [4] and proven to be the most accurate classical continuoustime clock in [2] -achieving an accuracy R = d. This clock can also be approached thermodynamically [1], in the limit of semi-classical dynamics and infinite entropy cost.
Similarly, we investigate the limit of the quantum model in Eq. (24), and compare it with the quasiideal clock from [2,5,57]. For a single Kraus operator, a general continuous-time evolution can be written as |ψ(t) = e (iHt−V t) |ψ(0) , with H Hermitian and V Within each block of length k = 2, the machine cycles with probability q or moves forward with probability 1 − q. When the last state of the last block is reached, the machine can output 1 with probability 1 − q.
positive semidefinite. The discrete evolution is From this relation and with q = q(δ), we can associate, similarly to the classical case, −V + iH = lim δ→0 + K0−1 1 δ . By substituting explicitly the quantum model from Eq. (24) and computing the limit (see Appendix D) we identify V = (1 1 + σ z )/2 and H = σ y / √ 2 and obtain in this way the continuous limit of our quantum model, which indeed closely resembles the quasi-ideal clock of [2,5,57], in that the Lindblad operator decomposes into a free evolution (H) and a measurement (V ), with the corresponding basis of eigenvectors related by Fourier transform.

VI. OUTLOOK ON HIGHER DIMENSIONAL CLOCKS
The classical models arising for d = 2 can be generalized to a family of higher dimensional models that we call multicyclic models. They are parametrized by a positive integer k, which gives the size of each block within which the behaviour is cyclic, but with the possibility of a transition from one block to the other, as in the oneway model. Only once the last state is reached, there is a nonzero probability of emitting the output "tick". See Fig. 4 for a pictorial description and Appendix B for further details. The one-way and cyclic models are recovered for k = 1 and k = d, respectively. The main properties of the multicyclic model, namely, the expressions for the probability, mean, variance, and accuracy in terms of the model's parameters, are collected in Table I.
We investigate the optimality of the multicyclic model via an extensive numerical search based on techniques from machine learning, specifically the Adam algo- rithm [60]. For Eq. (23) we search for a violation for d = 3, . . . , 10, and we explore the bounds for Eq. (28) for d = 3, . . . , 10 and L = d + 1, . . . , d + 10 and compare the results with the multicyclic model. In no case the numerical search provides better solutions. Details on numerical searches are collected in Appendix F. Moreover, the conjectured inequality presented in Eq. (23) is also supported by the optimality of the one-way model in the continuoustime limit [2]. These generalized classical models inspired quantum models beyond the qubit case, which are presented in Appendix C. Recently, [64] extended the multicyclic model to the so-called enhanced multicyclic model, able to provide higher values for p(L) for certain lengths and dimensions, specifically for (L, d) = (7, 5), (9, 7), (10,7). Moreover, the preprint also presents a high dimensional quantum model, based on the ideas described here, for the special case L = d+1 and saturating the algebraic maximum, p(L) = 1, in the limit L → ∞.

VII. DISCUSSION AND CONCLUSIONS
We have investigated the operational notion of ticking clocks in the context of temporal sequences of measurements and have observed how the performance of such clocks is connected to a memory resource. We have shown that genuine quantum measurements outperform classical ones as regularly-ticking devices by making a more efficient use of memory. It is natural to ask what is the relation between our results and present-day timekeeping devices such as the atomic fountain clocks currently providing time and frequency standards [61]. In simple terms, an atomic clock works by tuning a microwave synthetizer to the frequency corresponding to the hyperfine atomic transition of the ground state of the Caesium-133 atom. All relevant operations involve transitions or measurements in the energy eigenbasis, hence, they do not involve typical quantum elements such as non-commuting/incompatible measurements, and can straightforwardly be described by a classical hidden variable model. In this description, an atomic clock produces ticks, e.g., every second, by counting approximately 9 × 10 9 oscillations, for which a high internal (classical) memory is needed. Our work instead, despite being still far from practical applications and involving simple (low memory) systems, focuses on genuinely quantum models, i.e., models that feature a more efficient generation of information exploiting quantum coherence, or, in other words, truly quantum temporal correlations. In this respect, our results complement recent literature [1,2,5,6,57,62] in contributing to rethinking what are the fundamental quantum resources involved in the generation of time signals and possibly inspire future designs of truly quantum-improved clocks.
From the point of view of quantum clock models, our results confirm recent intuitions on optimal quantum clocks [2], and extend them to the discrete-time scenario, revealing its richer mathematical structure: We found that new optimal classical models arise that do not have a continuous-time limit, a feature absent in the quantum case, where the continuous limit is always valid. This suggests that some fundamental difference between classical and quantum models arises also in this respect.
Complementing recent works [1,2,6], we understand quantum advantages in a general framework of nonclassical temporal correlations. Notwithstanding the central role of LGI in a foundational perspective, such inequalities are not suitable for discussing tasks involving microscopic systems and invasive operations. By allowing for a relaxation of NIM to "limited invasivity", we are able to discuss nonclassical temporal correlations in relation with technological applications such as the design of time-keeping devices.
Interestingly, the gap found between classical and quantum clocks also reveals what is the simplest, yet nontrivial temporal correlation scenario, namely, a scenario with just 1 input and 2 outputs. This is in contrast with standard Bell and Leggett-Garg inequalities, where at least 2 inputs are needed to find a gap between classical and quantum correlations. This is due to the nontrivial causal structure associated with the finite-state machine model (not even representable by a directed acyclic graph), which is analogous to recently investigated nontrivial causal structures in nonlocality where no inputs are needed to witness a gap between classical and quantum correlations (e.g., the triangle scenario analysed in [63]). The analytical and numerical tools developed here, together with the limited number of parameters of our models, suggest that it may be possible to derive inequalities for arbitrary sequences and dimensions. This will be the object of future research.
Finally, finite and discrete-time sequences are, similarly to Bell inequalities, more experimentally friendly and we hope our results will stimulate also an experimental investigation of temporal correlations in the finite memory scenario in the near future. We obtain and in turn the accuracy reads .
The above expression can be optimized analytically for any fixed value of µ > 2, while for µ ≤ 2 the accuracy tends to infinity. To perform the optimization with a fixed mean µ, it is helpful to express the parameter d as a function of a, b, c, µ: which can be verified by direct substitution in Eq. (A1).
With the above substitution, we are now able to prove that the solution is given by the one way model, namely b = 2/µ, a = d = 1 − b, c = 0, where our parameter q appearing in Eq. (21) of the main text is simply q = a = 1 − 2/µ. The proof is based on the Karush-Kuhn-Tucker (KKT) conditions [56], a generalization of Lagrange multiplier for inequality constraints, that can be applied to a problem of the form where the optimization is performed over a convex set in R n and both the objective function f and the constraints g i satisfy some regularity conditions. In our case, the constraints g i are all affine functions of the parameters, which guarantees that for a global maximum x * the KKT conditions are satisfied [56]. A necessary condition for a point x * to be a global maximum is that there exists λ i for i = 1, . . . , m such that In our case, we want to optimize the function R(a, b, c) for a given µ, namely with the constraints g i (a, b, c) ≤ 0 defined via Now the set of equations can be solved explicitly, with the aid of a computer algebra system, for µ > 2 and give, for the case b = 0, i.e., g 2 = 0, the solution and for the case b > 0, the solution It is straightforward to check that only the second solution corresponds to a maximum, giving R = 2µ µ−2 , and it is precisely the one obtained by the one-way model in the bit case.
Finally, it is interesting to compute explicitly the probability p(L), as a function of L and the model parameter a, b, c, d for the bit case. This amounts to calculating explicitly the integral form from the residues of the function where adj(A) denotes the adjugate matrix of A, i.e., [adj(A)] ij = (−1) i+j M ji , where M ji is the (j, i)-minor of the matrix A, namely, the determinant of the matrix obtained by deleting the row j and column i of A (see, e.g., [65] for a textbook reference). In the bit case, the adjugate of z1 1 − T 0 is and thus, fixing the initial state as π 0 = (1, 0), we have π 0 · adj(z1 1 − T 0 ) · η = (z − d + b). By Eq. (A12), we have First, let us assume that T 0 has two distinct eigenvalues t + = t − . Calculating the residues of the function in Eq. (A14) we obtain the expression for the probability as (A15) In the case of identical eigenvalues, i.e., t + = t − = t 0 either T 0 is proportional to the identity, i.e., a = d = t 0 and b = c = 0, or T 0 is not diagonalizable, i.e., a = d and c = 0 = b. The former case is trivial and the latter can be computed similarly to the previous case. Notice, however, that the function z L−1 (1 − z)(z − d + b)/(z − t 0 ) 2 has a second order pole in t 0 , so we cannot directly substitute t + , t − → t 0 in Eq. (A15), but the residue must be computed with the formula for the second order poles.

Appendix B: Classical clock models for general dimension
In the following, we discuss in detail the classical models presented in the main text, which are all special instances of what we call the multicyclic model. As we discussed in the main text, in the bit case two classical models arise, namely, the one-way model and the cyclic model. It is instructive to summarize their main properties to understand how to generalize them. In the one way model, the machine either remains in the same state with probability q or transitions to the subsequent state with probability 1 − q, always emitting the output 0. When the last state is reached, the output 1 is emitted with probability 1 − q. In the cyclic model, the machine transitions from one state to the next with probability one always emitting the output 0, except in the last state where it may cycle, i.e., go back to the first state emitting 0, with probability q, or emit the output 1 with probability 1 − q.
The multicyclic model generalizes both ideas to arbitrary dimension. A simple example for dimension 6 is depicted in Fig. 4 and is described by the matrix: The behaviour of the machine can be interpreted as follows.
(ii) When the last state of the first block is reached, the machine may cycle with probability q, i.e., go back to the first state of the block, or transition to the next block with probability 1 − q, again emitting the output 0 with probability 1. The same behaviour is repeated in the second block, and all the other blocks except the last one.
(iii) When the last state of the last block is reached, the machine may cycle to the first state of the block with probability q and emit the output 0, or emit the output 1, for which the sequence is terminated, so the subsequent state needs not to be specified.
For the bit case, i.e., d = 2, the multicyclic model corresponds to the one-way model in the case of blocks of size k = 1, and to the cyclic model in the case of one block of size k = 2. We generalize this terminology to arbitrary dimensions, i.e., we call a multicyclic model with k = 1 a one-way model and one with k = d a cyclic model.

Probability p(L)
To compute the probability p(L), let us first consider the case in which d = nk and L = mk with k, m, n ∈ N + , and m > n. Then, the probability p(L) can be written as The expression in Eq. (B2) can be understood as follows. The output 1 is generated only in the last state, giving a factor (1 − q) to the total expression. A factor (1 − q) n−1 comes from the probability of transitioning from the first to the last block, as each transition contributes to a factor (1−q) and we need n−1 of them, n being the number of blocks. The binomial coefficient m−1 n−1 comes as a combinatorial factor from the possible choices of n − 1 transitions out of m − 1 possibilities, as the total length is L = mk and each block has size k. Finally, q m−n is the probability of cycling m − n times, i.e., in the remaining (m − n)k steps, in order to output 1 at the correct step L = mk.
The expression in Eq. (B2) can be optimized over q simply by taking the derivative of the expression (1 − q) n q m−n , giving For a fixed initial state π 1 = (1, 0, . . . , 0), the length of the sequence must be a multiple of the block length k in order to have a non-zero probability of outputting 1 at the correct time, namely otherwise .
(B4) In order to maximize p(L) for different lengths, one may decide to start from a different initial state, within the first block, namely, from the s-th state π s where s + L − 1 = 0 mod k, and the −1 comes from starting counting from 1, i.e, 1st, 2nd, etc. Intuitively, since the first k − 1 transitions are deterministic, the probability obtained with the initial state π s is equivalent to the probability obtained starting from the initial state π 1 for length L + s − 1. As a consequence, instead of m = L k appearing in Eq. (B2), we have a factor m = L k . Alternatively, one can verify by direct computation that for each block of size k where the n-th power is given by (B7) As an example, we provide in Table II the optimal k maximizing the expression p(L) for d = 3, . . . , 10 and L = d + 1, . . . , d + 10, obtained simply by comparing all multicyclic models with kn = d.

Accuracy
Using Eq. (B4), we can compute directly the mean and variance of the distribution p(L) and consequently its accuracy R, for a multicyclic model of dimension d and block size k, with d = nk, n, k ∈ N + . First, let us notice that from the normalization condition we obtain We can now proceed to calculate µ.
By using the following identity we can write The accuracy can, then, be written as We then recover the fact that the accuracy is optimal for the one-way model, i.e., k = 1, as it is for the bit case, whereas the cyclic model gives the worst accuracy, i.e., R = 1/q, which is even independent of the dimension.

Appendix C: Quantum models
Here we present an explicit construction of quantum clock models that are able to outperform the optimal classical clock in d = 2 and the multicyclic clocks in d = 3. First, let us derive in detail the qubit clock model described in the main text and observe how it violates the classical bounds on our temporal inequalities in d = 2.

Qubit clock model
An explicit quantum model that violates the inequality (7) of the main text for arbitrary values of µ is constructed as follows: consider a qubit in the initial state |ψ = (1, 0) that evolves via a single Kraus operator K 0 associated with the outcome 0, i.e., I 0 (ρ) = K 0 ρK † 0 , defined as K 0 = U 0 √ E 0 , with

13
The mean value and variance of the corresponding distribution can be obtained via the Z-transform method, where now the moment generating function is To write down the above expression, it is helpful to consider K 0 in its Jordan normal form K 0 = P Λ K P −1 , where for our model we have Λ K = diag(κ + , κ − ), i.e., the Kraus operator is diagonalizable. In particular, we have that K 0 has eigenvalues κ ± = 1 2 (1 + q) √ u ± Γ q,u , where Γ q,u = (1 + q) 2 u − 4q and (not orthonormal) eigenvectors given by which corresponds to the columns of P . Given this decomposition we write where we used the fact that our P is real, and we define |P −1 ψ = P −1 |ψ = This where we used that v + v − = κ + κ − = q. At this point, we can calculate the probability as which results in Furthermore, we can now calculatef qbit (z) from the element-wise Z-transform of F L , which is given bỹ where we used the known Z-transform Z[a L ] = 1 1−az −1 . Now, for calculating the mean and variance of our probability distribution we need to calculatef qbit (1) and f qbit (1). For the former, we can just substitute the value 1 intoF (z) and multiply by the vectors P −1 ψ| and |P −1 ψ , and we obtaiñ (C9) Similarly, for calculating the first derivative we can first derivateF (z) entrywise: and then substitute the value 1 and multiply, obtaining In turn, those lead to the following expressions for the mean and variance: The expression on the left hand side of Eq. (22) of the main text reads and we can try to maximize it over the parameters q and u for each fixed value of µ. Note that, for fixed µ the above expression can be maximized by just minimizingf qbit (1). At first, since we want µ to be fixed, we can invert Eq. (C12) and find a functional dependence of q(µ, u) that is given by: where ν := uµ − (µ + u). Then, by substituting this expression for q into Eq. (C11) we obtain where N ν = √ u 2 + ν 2 + 2ν. Using a computer algebra system, we can finally maximize the above expression over 0 ≤ u ≤ 1 for fixed µ > 2 and we find the maximum value which is obtained for Substituting back this solution into the above relations between q, u and µ we get which is the model described in the main text. With the above substitution, we obtain and the corresponding expression for the accuracy giving R = 4 = d 2 for µ → ∞. We can also write U 0 as U 0 = exp(iθσ y ) where σ y is the Pauli y matrix and θ = arctan √ 1 − u/ √ u , which becomes, with the optimal value for u and q θ = arctan (1 − q)/ 2q = arctan √ 2/ µ(µ − 2) . (C22) This will be useful later, in Sect. D, in order to compute the continuous limit. As a final remark for this section we note that with this relation between u and q we get and substituting the optimal value for q we get . (C24)

Generalizations to higher dimensions
As an outlook to construct a quantum clock model for general dimensions, we can now make a series of considerations from what we have learned from the classical case and from the qubit clock model described above. First of all, by convexity one can fix the initial state to be an arbitrary pure state, say the state |0 of the computational basis, and since we are interested in the sequence 00 . . . 01, it is enough to define only the map I 0 , which can be parametrized, e.g., in terms of its Kraus operators However, such a problem is already too complex to be treated, e.g., to perform some basic numerical optimization. Since we are now interested only in obtain a quantum model that violates the classical bound, and not to compute the quantum bound, we can simplify the problem assuming only one Kraus operator K 0 , i.e., I 0 (ρ) := K 0 ρK † 0 . This is a simplified model and it is not guaranteed to give the maximal quantum value. Even in this simplified scenario, a full optimization is still very difficult to do, even in small dimensions. However, we impose the form of the Kraus operator by analogy with the classical strategy and based on what we learned in the qubit case. Then, one possible solution has the following properties: 1. There is a single Kraus operator for the outcome "0", that has the form K 0 = U √ E 0 2. The initial state is |ψ = |0 in the computational basis 3. The effect E 0 has the diagonal form E 0 = diag(1, . . . , 1, q 2 ) in computational basis 4. The unitary is a one parameter family of the form where H clock = F ( k k|k k|)F † is obtained by Fourier transforming the computational basis.
Afterwards, there is still to optimize over the parameters q and θ. This over-simplified optimization, however, is still too hard to solve for general dimensions.
In the following, we provide some example of a quantum model beating the (conjectured) classical bound for the case d = 3. To use real parameters we parametrized the unitary as where 0 ≤ u ≤ 1. Numerical results are shown in Fig. 5, where we plot: (left) the probability p(L) for the quantum model described above, optimized over q and u together with the maximum achieved by the multicyclic models and (right) the value of the expression 3σ 2 − µ(µ − 3) for the same model, but where we also fix the functional relation between the parameters q and u to be the same as what we found in the qubit case, namely: and the value of q as Thus, we find from numerics that such a model, even if resulting from an over-simplified optimization, already outperforms the classical multicyclic models for d = 3 in both cases. Thus, we believe that a similar model would work for general dimension.

Appendix D: Continuous limit of quantum clock models
We now compute explicitly the continuous limit of the quantum model presented in the main text. For an initial pure state and single-Kraus-operator evolution, we write |ψ(t) = e (iHt−V t) |ψ(0) , with H Hermitian and V positive semidefinite. The Kraus operator is K 0 = U 0 √ E 0 , so that the discrete evolution is |ψ n = K n 0 |ψ 0 and we can put K 0 = e δ(−V +iH)) ≈ 1 1 + δ(−V + iH) for δ ≈ 0. From this relation and with q = q(δ), we can associate To calculate the logarithm of K 0 we can make use of the Baker-Campbell-Hausdorff formula and obtain log K 0 = log(q(δ))

Appendix E: Computing lower and upper bounds on maximum classical correlations via grid-search methods
In this section, we show how the maximum classical correlations can be estimated via grid-search methods. Intuitively, since the function we want to maximize is a polynomial defined on a compact set (a cartesian product of simplexes, each associated with the parameters of a row of the transition matrix T 0 ), its gradient will be bounded. Hence, it is possible to obtain an approximation with a bounded error, by evaluating the function only on a finite number of lattice points in the parameter space.

General considerations
Consider a function f : B ⊂ R D → R, where B is a compact set, and f is at least C 1 . Let us consider the following optimization problem max: f (x) subject to: x ∈ B (E1) Since f and ∇f are continuous functions and B is compact, they have a maximum in B. For δ ∈ R + , let us define L δ = (δZ D ) ∩ B, i.e., the intersection of the cubic lattice of step δ with the compact set B. By construction, we have that each point in B is at a distance at most δ √ D/2 from L δ , namely, where x corresponds to the intersection of all the diagonals of the D-dimensional hypercube with edge of length δ.
We can now prove a simple upper and lower bound for the problem in Eq. (E1) (E3) The proof is straightforward: for the first inequality it is sufficient to use the inclusion L δ ⊂ B, whereas for the second it is sufficient to use the 0 th -order Taylor expansion with the Lagrange remainder, i.e., where z = λx + (1 − λ)y for some 0 ≤ λ ≤ 1.
Notwithstanding the generality of this idea, it may work only if, a) the set B is easily characterized (e.g., a product of simplexes for the classical model), b) the space of parameters is small (e.g., small dimensional machines), and c) the gradient is easily upper bounded (e.g., for a sequence of length L generated by a classical machine the gradient can be interpreted as a vector of probabilities, where each component is bounded by one).
In fact, the number of points of the lattice scales polynomially with the inverse of the error , e.g., if we are inside the [0, 1] D cube and we take δ = 1/N , we need to compute N D points. Moreover, since the error can be evaluated at the beginning (e.g., as δ √ DL) and computations on a lattice points are independent of each other, this process can be highly parallelized, e.g., on a GPU.

Classical case
We now specialize the above result to the case of classical d-state machines, corresponding to D = d 2 parameters. Eq. (E1) for the case of p L (T ) then becomes We can compute the partial derivate of the above expression as where J ij is the matrix with a 1 in position (i, j) and 0 otherwise. It is convenient to represent using the Dirac notation as J ij = |i j|, even though the model is classical. Let us denote also π = 1| and η = |η . We can then write where p α,β (0 k 1 h ) is defined as the probability of outputting first k zeros, then h ones, when starting from the state α and ending in the state β, whereas the notation p j,− denotes a sum over the final state. One can prove that the expression in Eq. (E7) is smaller than one in absolute value, giving max T ∈B ∇p L (T ) ≤ √ D = d. Moreover, by the geometry of the set B we can have as a maximal distance δ √ D/2, providing Let us prove the above estimate for the gradient. First, notice that, since both terms are positive, We can then proceed to bound each term separately. Clearly p 1,i (0 L−1 ) ≤ 1, since it is a probability. Actually, one can also prove that i p 1,i (0 L−1 ) = p 1,− (0 L−1 ) ≤ 1.
For the other term, we just notice that where we used that p 1,i (0 r ) ≤ 1 and we identified ∞ r=0 p j,− (0 r 1) with the probability of occurrence of the output 1 for a given machine starting in the state j, which is either 0, when the outcome 1 never appears, or 1. Again, notice that we obtain that also the sum over i, i.e., i L−2 r=0 p 1,i (0 r )p j,− (0 L−r−2 1), is smaller than 1. Since each term ∂ ij p L (T ) ≤ 1, we obtain ∇p L (T ) ≤ √ D = d. Given the above estimate of the gradient, we can compute an upper bound on the maximum of the problem in Eq. (E5) by calculating the value of p L (T ) on a lattice in the space of parameters with lattice step δ. Moreover, since in our case we already have a guess of the optimal solution, such a computation can be reduced by an iterative method that evaluate the function on more and more refined lattices, defined as follows: (i) We start with an estimated optimal value p Est C max , a lattice step δ and the corresponding lattice L δ , and an error (δ) := δd 2 (we omit the factor 1/2 for reason that will be clear later). For each T ∈ L δ if p L (T ) + (δ) < p max , we remove the point from the lattice. After this procedure, we obtain a new lattice L δ . If for some T , p L (T ) > p Est C max , we update p Est C max with the this new value. (ii) For each of the remaining points in L δ , create a new lattice of length δ in each direction and lattice step δ 2 . Let us denote this lattice as L 1 δ 2 and the corresponding error (δ 2 ). Again, for each T ∈ L 1 δ 2 , is the value obtained numerically for the quantum case with the explicit model described in Sec. C 1; p Est C max is the estimated optimal solution coming from the cyclic model, except for L = 3 where the one-way model is better; p UB C max is the analytical upper bound for the classical case computed via lattice calculations.
if p L (T ) + (δ 2 ) < p Est C max , we remove the point from the lattice. We obtain the new lattice L 1 δ 2 and we update p Est C max if a better estimate is found.
(iii) Iterate the procedure until the desired error is reached.
Notice that, with the exception of the last iteration, the factor 1/2 for the error estimate cannot be used as a consequence of the way we construct the refined lattice. Such a procedure is illustrated for the simple one-dimensional case in Fig. 2.
With the above iterative method, we were able to evaluate max p L (T ) for d = 2 and L = 3, 4, . . . , 9 over a lattice of 10 18 points, and L = 10, 11, . . . , 20 over a lattice of 10 19 points. As a result, we can certify that the one-way model for L = 3 and the cyclic model for L = 4, . . . , 20 provide the optimal value for the problem in Eq. (E5), up to an error of 10 −4 for L = 3, . . . , 9 and of 3 × 10 −5 for L = 10, . . . , 20. The results are collected in Tab. III. The letter F denotes that the optimal model has been found by the numerical search, up to the a numerical precision of 10 −5 . In all other cases, the numerical search found a worse result, the ratio between the gap and the optimal value is indicated in percentage, e.g., for d = 3 and L = 8 the algorithm found an optimum which has a gap of 28% with respect to the optimum of the cyclic (k = 3) model, i.e., (opt cy − opt Adam )/opt cy = 28%.

Appendix F: Numerical search for inequalities in higher dimension
In this section, we discuss the numerical search for violation of the generalized inequalities To perform the optimization, we used the Adam algorithm [60] implementation present in the pytorch package [66]. The space of parameters for our optimization consists in the space of substochastic matrices, which are constrained by positivity and normalization conditions. We first transform the problem into an unconstrained one. For a given dimension d, we construct our transition matrices as follows. Let B 0 , B 1 be two d×d matrices with real coefficients. We define , for k = 0, 1, (F3) such that T 0 and T 1 are substochastic matrices and T 0 + T 1 is stochastic, for any choices of B 0 , B 1 . This transforms the constrained optimization problem into an unconstrained one, which can be attacked with the Adam algorithm. We fix the parameter of the optimization, i.e., number of steps, "learning rate", i.e., the size of each step, and number of repetitions of the optimization after few simple tests. We tried to perform the same optimization with the stochastic gradient descent algorithm, but obtained worse results.

Accuracy
By using Eq. (18), we have We recall that (1 1 − T 0 ) −1 = The package pytorch automatically computes the gradient of the expression in Eq. (F5) and performs the optimization. For each d ∈ {3, . . . , 10}, we performed 100 times the optimization starting from a random initial point and with 10 4 optimization steps and a "learning rate" of 0.005. For all dimensions, the optimization converges to a value of 10 −5 , i.e., approximately 0, consistent with the conjectured inequality and optimality of the one-way model.

Finite sequences
Using again the parametrization of T 0 , T 1 in terms of which we optimize again via Adam for d ∈ {3, . . . , 10} and L ∈ {d + 1, . . . , d + 10}. For each pair (d, L) we repeat the optimization 100 times with a randomly generated initial point, 10 4 steps for each optimization, and a learning rate of 0.005. Typically, Adam is able to find the correct value for low dimension and short sequences, as it was to be expected. In no case the algorithm found a better value than those already known. The results are summarized in Table IV.