Universal structure of objective states in all fundamental causal theories

A crucial question is how objective and classical behavior arises from a fundamental physical theory. Here we provide a natural definition of a decoherence process valid in all causal theories, and show how its behavior can be extremely different from the quantum one. Remarkably, despite this, we prove that the so-called spectrum broadcast structure characterizes all objective states in every fundamental causal theory, exactly as in quantum mechanics. Our results show a stark contrast between the extraordinarily diverse decoherence behavior and the universal features of objectivity.


I. INTRODUCTION
A common experience in our everyday life is that different observers agree on their observations. This agreement means that macroscopic physics is objective. Note that this is a general feature of classical physics, but it contrasts with quantum theory, where states are generally disturbed by the act of observation, and sometimes an agreement between observers is impossible [1,2]. Nevertheless, in quantum mechanics there are objective states, in the sense that various observers can determine them without disturbance [3]. It is argued that such objective states may indeed be responsible for the objectivity we experience in our everyday life [3,4]. In quantum mechanics, the theory of decoherence first [5][6][7][8], later quantum Darwinism [3,[9][10][11][12][13] and the presence of the so-called spectrum broadcast states (SBSs) [4,[14][15][16][17][18][19][20][21][22][23] have been proposed as explanations for the emergence of classicality and objectivity out of the quantum world.
In this paper, we extend the study of the emergence of objectivity beyond quantum theory, to arbitrary physical theories [24][25][26][27][28][29][30]. First, this enables us to identify which basic part of quantum mechanics is actually responsible for objectivity, by looking at it from the outside, in a landscape of conceivable alternative theories. Second, this analysis can be used as a test of physical consistency of post-quantum theories in the quest for quantum gravity [31][32][33], as every quantum extension must still account for objective macroscopic physics. This paper is organized as follows. In section II give a brief overview of the formalism to address arbitrary physical theories. In section III we explain how we can identify classical sub-theories of a given physical theory (if they exist). The notion of decoherence is introduced and examined in section IV, while objectivity and the universal form of objective states are studied in section V. Finally, in section VI we identify two axioms that guarantee a local behavior in the emergence of classicality in composite systems. Conclusions and further directions are discussed in section VII.

II. A GENERAL FRAMEWORK FOR PHYSICAL THEORIES
Our first challenge is to choose a suitable formalism for the study of arbitrary physical theories. We do this by adopting the framework of general probabilistic theories (GPTs). For more details, we refer the reader to appendix A.
The state ρ of a physical system A is associated with a preparation of it; after that, one can manipulate it by applying some transformation T , which can possibly transform the input system into another system B. Finally, one can measure the final system B by applying an effect e to it: in this case, the system does not exist any more, but is destroyed in the process. By repeating this experiment several times, the experimenter can estimate the probability of the overall process, denoted by (e|T |ρ). Note that here a state is viewed as a particular kind of transformation: a transformation without an input system. Similarly, an effect is a transformation without an output system. In this setting, one can set up a suitable notion of sequential and parallel composition; the former is denoted by AB, where A comes after B, the latter is denoted by A ⊗ B.
The application of a generic non-deterministic device in an experiment can be described as a collection of mutually exclusive processes {T i } i∈X , where i ∈ X represents the (classical) outcome read by the experimenter. We will call such a collection {T i } i∈X a test (measurement if it is a collection of effects). If a test is deterministic (i.e. there is a single outcome), we will call it a channel.
A state is said to be pure if the only way to write it as a sum of other states is the trivial way: ρ is pure if ρ = i ρ i implies ρ i = p i ρ, with {p i } a probability distribution. A non-pure state is called mixed.
In our analysis we assume the fundamental axiom of Causality [27], satisfied by both classical and quantum theory: Axiom 1 (Causality). The probability that a transformation occurs is independent of the choice of tests performed on its output.
Causality is equivalent to the existence of a unique deterministic effect u for every system [27], which can be used as the analog of the partial trace to discard systems in multipartite settings.
In causal theories, we can restrict ourselves to preparations ρ that are performed with certainty, (i.e. those for which (u|ρ) = 1) [27]. These are called deterministic states, and, given a measurement {a i } i∈X , for them we have i∈X (a i |ρ) = 1. In this situation, if all probabilities are allowed, the theory is convex [27]. In particular, this means that the state space of a theory is convex, and that all conical combinations of valid effects that lead to a valid effect are allowed. The latter means that effects span a convex cone.

III. CLASSICAL SUB-THEORIES
The states of finite-dimensional classical theory are probability distributions over a finite set, and the effects are all the linear functionals that yield a number in [0, 1] on states. A most notable feature of classical theory is that classical pure states can be jointly perfectly distinguished in a single-shot measurement.
Therefore, to find classical sub-theories of a given physical theory, we need to find pure states that are perfectly distinguishable. The states {ρ i } n i=1 are said to be perfectly distinguishable if there exists a measurement {a i } n i=1 such that (a i |ρ j ) = δ ij for all i, j. In addition, if there is no other state ρ 0 such that the states {ρ i } n i=0 are perfectly distinguishable, the set {ρ i } n i=1 is said to be maximal. One may wonder why we are interested specifically in perfectly distinguishable pure states as far as classical sub-theories are concerned, rather than generic states. The reason is that it is not restrictive to assume those states perfectly distinguishable states to be pure. Indeed, if {ρ i } n i=1 are mixed and perfectly distinguishable, then it is possible to find pure states {α i } n i=1 that are perfectly distinguishable: it is enough to take α i to be any pure state in a convex decomposition of ρ i into pure states. More details are provided in appendix B.
We are interested in the largest classical theory that can arise from a given set of perfectly distinguishable pure states S, therefore we will look for maximal sets of perfectly distinguishable pure states. Picking one such set S = {α i } d i=1 , we define the classical set α of dimension d as α := Conv {α i , i = 1, . . . , d}. This is the sim-  Figure 2. A cross-section of the effect convex cone of a classical trit (in orange) and a restricted trit (in blue). a1, a2, a3 are linear functionals such that (ai|αj) = δij, but they are not allowed effects of the restricted trit: the only pure effects are e12, e13, e23. The restriction on effects is evident.
plex generated by the α i 's, and it represents the states of a particular classical sub-theory. At this point, we restrict the effects of the original theory to the classical set α, identifying those that give the same probabilities on all classical states. These will be the classical effects. In appendix B 1, we show that, in this way, we get precisely all effects of the classical theory with α as the state space. In other words, with this strategy, it is enough to choose a classical set α to find a classical sub-theory of a causal theory.
All physical theories require a classical interface by which the observer reads the outcome of an experiment. Every fundamental theory should be able to describe this classical interface [34], otherwise we would be forced to accept an insurmountable division between the underlying physical world, and the macroscopic one, in which the observer performs their experiments. Consequently, a fundamental theory should obey the following principle: Condition 1 (Emergence of Classical Concepts). A fundamental physical theory of nature must contain classical states (or an arbitrarily good approximation thereof).
Not all theories obey condition 1: in figs. 1 and 2 we depict the state and effect spaces of a theory that violates it. The states of the basic system are the same as in the classical trit (with pure states α 1 , α 2 , α 3 ), but not all linear functionals are allowed, i.e. there is an intrinsic restriction on the effect space. Indeed, the only pure effects we have are e 12 , e 13 , e 23 , with e ij = 1 2 (a i + a j ), where a i is the linear functional of the trit such that (a i |α j ) = δ ij . This restricted trit theory has no subsets of pure states that can be distinguished perfectly in a single-shot, and therefore no classical states. Even more so, such absence persists in all composite systems (details in appendix C).
The restriction plays a crucial role: we show that if there is no restriction on the effects, a theory admits at least the classical bit as a sub-theory (appendix B 2).
In the following, we will always assume that a theory satisfies condition 1. Alternatively, the presence of classical states can be postulated directly [35,36], or enforced by mathematical (appendix B 2) or physical [37,38] principles.

IV. DECOHERENCE
For a complete description of the emergence of classicality we need to find a transition towards classical theory among the transformations of a given theory. This provides a classical interface that emerges from the physical description of nature [34].
In analogy to the well-known process of quantum decoherence [5][6][7][8], a similar mechanism in GPTs was studied in refs. [34,[39][40][41][42]. Our approach to GPT decoherence is different, as it focuses on its minimal properties as a process. First, note that if classicality were reached only probabilistically, it would be an intrinsically unstable theory, contrary to experimental evidence. This motivates searching for decoherence among deterministic processes, i.e. among the channels of the theory. Moreover, a complete decoherence should send all states to classical states, and preserve classical states themselves. This motivates the following definition, which characterizes decoherence as a resource-destroying map [43]: Definition 1. Given the classical set α, a channel D α is a complete decoherence if 1. D α ρ ∈ α for every state ρ; 2. D α γ = γ for every γ ∈ α.
One can apply the complete decoherence to all effects of the theory, which naturally produces the set of classical effects defined through the restriction procedure introduced above (appendix D). The question is whether, given a classical set, a complete decoherence on it always exists. Consider a measurement perfectly. We can construct the measure-and-prepare test The proof is in appendix D 1. In the light of this result, we will call every channel of the form D α = d i=1 |α i ) (a i | a test-induced decoherence (TID) with respect to the fixed classical set α.
Proposition 1 implies that in all causal theories there always exists a complete decoherence on every classical set, induced by measuring and forgetting the outcome. Figure 3. In this GPT, the state space is a square, and the vertical side in black is the classical set α = Conv {α1, α2}. The action of the test-induced decoherence Dα on the mixed state ρ is represented by a black arrow. Notice that ρ is decohered to the pure state α1; so the test-induced decoherence in GPTs can increase the purity of a state, contrary to what happens in quantum theory.
Despite this universal form, given a classical set α, such a complete decoherence can be highly non-unique (appendix D 1), a fact that was missed by previous works [39,41]. Furthermore, as opposed to quantum mechanics, there are GPTs where mixed states can be decohered to pure ones by the TID, as represented in fig. 3 (see appendix D 1 for details). All this shows how, in general, GPTs differ from quantum even concerning decoherence. Nevertheless, as we show below, in general the emergence of objectivity is something remarkably different from decoherence.

V. OBJECTIVITY GAME
The existence of classical states and decoherence processes is still not enough to reproduce the full classical picture, as from our everyday experience we know that the results of measurements are objective [9,11]. To address this issue, we use the setting of quantum Darwinism [9,11], where a system is surrounded by several fragments of environment, each of which is accessible to one observer. In quantum theory, objective states are SBS states [4,14,16,18,20,21,44,45], i.e. states of the form ρ = j p j |j j| S ⊗ ρ j,E1 ⊗ . . . ⊗ ρ j,En , where for every environment fragment E k the states {ρ j,E k } have orthogonal support.
Recall that a state of a system is objective if multiple observers can find it out without perturbing it [3,9]. Moreover, each observer should always be able to repeat their measurement, and always obtain the same result. To model non-disturbance, we extend the so-called Bohr non-disturbance criterion presented in refs. [4,[45][46][47] to arbitrary physical theories (cf. also ref. [48]): We can recast the concept of objectivity as a multiplayer game, called the objectivity game (OG), inspired by ref. [4]. In this game, the goal is to determine the state of a target system S which decoheres to a classical set α. We assume that there is a special observer on system S acting as a referee checking the findings of n players, who act independently by testing some environment fragment E k , correlated with the system. They win if they are able to determine the state of the target system S without disturbing the joint state ρ SE1...En . The Bohr non-disturbance criterion is argued to be the right concept here [4]. We insist that the n players should act independently in this game, therefore we enforce the following condition [4,21,45], which is widely accepted in the quantum research on objectivity: Condition 2 (Strong independence). The only correlation between the players is the common information about the system.
In general, to determine a state, several rounds of tests are necessary, but the players cannot change their devices between the various rounds. For this reason, all the observers, including the referee, want to be able to repeat their tests several times, without affecting the outcome. This is also a necessary condition for objectivity: if something is objective, each observer must be able to obtain the same outcome when they probe a system. These tests are called sharply repeatable tests (SRTs).
SRTs are an operational characterization of tests that can be repeated several times, always yielding the same outcome. This is a highly desirable feature for a test, but do such tests exist at all? The answer is positive in causal theories that admit perfectly distinguishable states. Indeed, if {ρ i } n i=1 is a set of perfectly distinguishable states, and {a i } n i=1 is the associated measurement, by Causality we can consider the measure-and-prepare test In quantum theory, these measure-and-prepare tests are quantum instruments of the form {M j }, where where the σ j 's have orthogonal supports, and E j is the orthogonal projector onto the support of σ j . In general, however, not every SRT needs to be of this form: in quantum theory, a von Neumann measurement with projectors of rank greater than 1 is obviously an SRT, but it is not of the measure-and-prepare type. For the scope of the OG, it is not important to characterize all the SRTs of a theory; it is enough to know that they exist. Indeed, non-disturbing SRTs were identified as providing objective information in causal theories in ref. [48].
The first move of the OG is from the referee who performs the SRT associated with some classical set α of S (see appendix E 1). Since the outcome is not communicated to the players, the system is decohered to ρ S = r i=1 p i α i , where p i > 0 for every i (recall we know that such a decoherence is always guaranteed to exist). The players win the game if they all correctly guess the outcome of the referee. On the other hand, the players are not restricted to this form of SRT. This is because in the setting of quantum Darwinism, they represent fractions of the environment, which have much more degrees of freedom than the referee's system.
The operational meaning of objectivity is the agreement of all the involved observers (including the referee). Therefore, an objective state is the joint state of the referee and the players that allows them to win the OG. To this end, we introduce SBS states for causal theories: are perfectly distinguishable pure states, and for every k, are perfectly distinguishable too.
It is not hard to show that states in the SBS form are objective (appendix E 2), so every causal theory has objective states. However, the fundamental question is the characterization of all objective states of a theory. Our main result is that the states of the SBS form are the only objective states in every fundamental causal theory. Theorem 1. In any fundamental causal theory (i.e. obeying condition 1), the players can win the OG if and only if the joint state is an SBS state.
The proof is in appendix E 2. This result is exceptionally general, for it demonstrates that only the principle of Causality is enough to ensure the emergence of objectivity. Furthermore, the structure of objective states is universal and isomorphic to the quantum one.

VI. EMERGENCE OF COMPOSITE CLASSICAL THEORIES
Our results obtained so far have focused only on single systems, but here we identify the minimal assumptions to ensure classicality in composite systems. Specifically, it is enough to impose the following two axioms: Axiom 2. The product of two pure states is a pure state.
is a maximal set of perfectly distinguishable pure states of A, and {β j } dB j=1 is a maximal set of perfectly distinguishable pure states of B, is a maximal set of perfectly distinguishable pure states of AB.
These axioms represent a "locality constraint" in the emergence of classicality. Indeed if these two axioms fail, the informational content of the classical composite system cannot be reduced to the informational contents of each classical subsystem.  If there are no "delocalized" classical systems, we expect that decohering AB will be the same as decohering A and B separately, i.e. D αβ = D α ⊗D β . Even if the theory satisfies both axioms 2 and 3, this property may not be satisfied by general complete decoherences. However, it is so for TIDs (appendix F).

VII. CONCLUSIONS AND DISCUSSION
In summary, Causality alone, in conjunction with the principle of Emergence of Classical Concepts (condition 1), is the backbone of our results. Here we solve the problem of the emergence of objectivity beyond quantum, identifying SBS and the process leading to it as its ultimate origin. Our analysis has striking outcomes: unlike in quantum mechanics [3,9], objectivity and decoherence are two distinct phenomena. Indeed, decoherence behavior can be exceedingly different from the quantum case, while objectivity and SBS are universal across causal theories. For instance, in appendix D 1, we show that complete decoherences can even increase the purity of a state in some GPTs; or in another example, there is an uncountable number of distinct complete decoherences. This also demonstrates how GPTs can differ radically from quantum theory, even within the scope of our analysis. In light of this, our result about the universality of the SBS form is even more surprising: it shows that the emergence of objectivity is, instead, a transversal phenomenon in physics that unifies physical theories with quite different behaviors, as illustrated in fig. 4.
However, we note that Causality is not enough to guarantee the emergence of localized classical composite systems: it is not always possible to reduce the information of the decohered system to the information of its sub-systems. We showed two additional axioms that are sufficient to ensure an appropriate non-holistic classical behavior.
Our results support the approach to objectivity presented in refs. [4, 15-20, 44, 45] based on SBS states (recently shown to be stronger than the notion of quantum Darwinism [21,45,49]). In particular, they identify the validity of the SBS approach far beyond the limits of quantum mechanics. Our findings also suggest that other approaches to classicality, such as quantum Darwinism and the associated broadcasting of information to the environment [40,50] could be extended to GPTs, opening a fruitful research field we intend to investigate in subsequent works. On the other hand, the universality of the shape of objective states suggests that Causality is really a strong assumption and, to find some unique behaviors, one should weaken it, e.g. by bringing in the structure of relativistic space-time [51][52][53]. Last but not least, concerning the increase of purity by decoherence in some non-quantum GPTs, one may have an interesting alternative: either we have some paradoxical process where decoherence increases the information about a system (if we follow the standard quantum intuition), or one must reconsider the concept of information of the system itself, at least partially decoupling it from the concept of purity [54][55][56]. We leave this as an open issue for further research. GPTs is a framework for a theory-independent description of physical probabilistic processes. They have been used in several successful reconstructions of quantum theory from information-theoretic postulates [24,35,37,[57][58][59][60], and they are the subject of active research in the quantum information community [36,[61][62][63][64][65][66][67][68]. The essence of this approach is that any physical theory must describe experiments performed in a laboratory, and predict the probabilities of their outcomes. These experiments are usually carried out by connecting several devices. Each device represents a physical process, and wires connecting them carry physical systems. Therefore for every physical transformation T transforming system A into system B (e.g. a beam splitter, a Stern-Gerlach magnet, etc.), it is natural to represent it as Some devices have no input, others have no output. They are represented respectively as Processes with no input are called states, and those with no output are called effects. If the output system of a transformation A matches the input system of another transformation B, we can apply B after A, and get a new transformation, denoted BA. This corresponds to connecting the two associated devices in sequence: Similarly, two transformations A and B can be applied independently, at the same time, on different systems. In this case, the resulting transformation is denoted by A ⊗ B, and the two associated devices are composed in parallel: One can build arbitrary circuits by connecting these devices, such as . This can be read at the same time as an instruction about how to build an actual experiment, and as the way physical processes are connected in the same experiment. This framework allows one to treat states, effects, and transformations on equal footing, by introducing a special system, trivial system I, which represents the lack of a system. For this reason, the composition of system A with the trivial system does not involve any change: AI = IA = A.
In general, the experimenter does not have full control over the transformation they can implement; this is because in nature there are also non-deterministic processes. Therefore, what we can say is that every device in an experiment implements a collection of mutually exclusive alternatives. Only one of them can occur in a run of the experiment, and the experimenter can read which process actually occurred by looking at the outcome of the experiment. For this reason, we can associate a collection of transformations {T i } i∈X , called test, with every device, where i is the outcome, and X the set of outcomes. A special kind of test are measurements (or observation-tests) {a i } i∈X , which are collections of effects. It is therefore natural to ask ourselves about the probability that a particular transformation occurs in an experiment. Probabilities are represented by circuits with no external wires, such as This circuit represents the joint probability p ijklmn to observe all these specific transformations in the experiment. If a test is deterministic, i.e. there is only one transformation associated with it, it is called channel.
We will often make use of the following short-hand notations, inspired by quantum theory, to mean some common diagrams occurring in our analysis. 1.
In particular, the transformation represented in 3 is called a measure-and-prepare transformation, because first the effect a (representing a measurement outcome) occurs, and then the state ρ is prepared. Now, for any state ρ and every effect a, (a|ρ) ∈ [0, 1], whereby a state of system A becomes a map from the set of effects Eff (A) of A to the unit interval [0, 1]: ρ : Eff (A) → [0, 1]. This leads naturally to the following definition Definition 5. Two states ρ and σ on the same system are tomographically distinct if there exists an effect a such that (a|ρ) = (a|σ).
The idea behind this definition is that two states (i.e. the associated preparations of a system) are indistinguishable if there is no measurement to witness their difference. In this case they must be identified as states.
Similarly, every effect gives rise to a map from the set of states St (A) to the unit interval, and one identifies effects that produce the same probabilities on all states.
As states and effects are maps to a subset of real numbers, they can be summed, and one can take their multiple by a real number. In this way, the set of states and the set of effects become spanning sets of real vector spaces, denoted St R (A) and Eff R (A), which we assume to be finite-dimensional. If one considers only linear combinations with non-negative coefficients (conical combinations), one obtains the cone of states St + (A) and the cone of effects Eff + (A). In this way, one can see how the convex geometric approach to GPTs arises [25,30]. Once the cone of states is defined, one can consider the dual cone St * + (A): this is the cone of linear functionals that are non-negative on St + (A). Clearly Eff + (A) ⊆ St * + (A), but we will discuss this inclusion in greater detail in appendix B, to examine its consequences for the emergence of classicality.
In this setting, a transformation from A to B is a completely positive map from St R (A) to St R (B). Here a positive map is a map sending an element of the input cone of states to an element of the output cone of states. A map T is completely positive if T ⊗ I S is a positive map, for any system S, where I S is the identity on system S. Complete positivity plays a crucial role in defining tomographically distinct transformations [27,69]. Figure 5. The outcome set X of the test {Ai} i∈X has 10 outcomes. To perform a coarse-graining of it, we lump together some of its outcomes, relabeling them as a new outcome. For example, the outcomes i1, i2, and i3 are relabeled as j1. This gives rise to a partition {Xj} j∈Y of X. We associate a new transformation with each set in the partition, such that it is the sum of the transformations associated with the outcomes contained in that set. Thus Bj 1 = Ai 1 + Ai 2 + Ai 3 . The new test {Bj} j∈Y has 5 outcomes.
Definition 6. Two transformations A and B from system A to system B are tomographically distinct if there exists a system S and a bipartite state ρ ∈ St (AS) such that If the theory satisfies an axiom called Local Tomography [24,27,37,59,70,71], by which product effects are enough to do tomography on bipartite states, as in quantum theory, then the ancillary system S is not necessary to distinguish transformations [27].
The linear structure introduced above allows us to talk about the coarse-graining of tests. Suppose one has the test {A i } i∈X . The action of coarse-graining means joining together some of the outcomes of this test to build a different test. This concept is easily explained by fig. 5 We also say that every transformation A i , with i ∈ X j , is a refinement of the transformation B j . Clearly, by performing the coarse-graining over all the outcomes of a test, we obtain a deterministic test, i.e. a channel: The natural question at this stage is to understand when a transformation T is of "primitive nature", or instead arises from the coarse-graining of other transformations in some experiment.

Definition 7.
A transformation T is pure if all its refinements are of the form p i T , where {p i } is a probability distribution. A non-pure transformation is called mixed.

Causality and its consequences
In our research, the main requirement we impose on a physical theory is that the propagation of information follows a "temporal order", therefore the result of a process can influence a future process, but never a process in the past. Causal theories [27] are those that satisfy this requirement, which is expressed precisely by the following axiom: Axiom 4 (Causality [27]). For every state ρ, take two measurements {a i } i∈X and {b j } j∈Y . One has Causality is also equivalent to the existence of a unique deterministic effect u [27]. We can use this deterministic effect to discard systems when dealing with composite systems. The marginal of a bipartite state can be defined as: where I A denotes the identity channel on system A. Sometimes we will keep tr as a notation for the unique deterministic effect when it is applied directly to states.
In causal theories, the set of states St (A) of a system A has a particular structure: it can be divided in two disjoint subsets, the set for which (u|ρ) = 1, and the set for which (u|ρ) < 1. The former is called the set of normalized states, cone of states normalized states sub-normalized states and corresponds to states that can be prepared deterministically. States of the latter form are called sub-normalized states, which are equivalent to a state that can be prepared only probabilistically, where (u|ρ) gives exactly that probability. In causal theories, we can always write a sub-normalized state as a probabilistic rescaling of a normalized state, therefore it is enough for our analysis to consider only state spaces of normalized states.
This gives rise naturally to the notion of a cone of states, denoted by St + (A). A geometric picture of this is given in fig. 6.
It is possible to show that in a causal theory where probabilities in the whole range [0, 1] are allowed, the state space is a compact convex set [27,60,69]. Note that given a measurement {a i } i∈X , we have i∈X a i = u, because the sum means a coarse-graining over all the effects of the measurement, and therefore it must be the unique deterministic effect. Hence when we apply that measurement on a state ρ, it yields a probability distribution {p i } i∈X , where p i := (a i |ρ). This shows that every state is a probabilistic assignment to any measurement, therefore recovering the usual picture of states in the convex approach to GPTs [25,26,30]. Similarly, it is easy to prove that channels preserve the deterministic effect: uC = u. This is the generalization of the quantum property that channels are trace-preserving. In particular, if we have a test {A i } i∈X , one has i∈X uA i = u [27].
If in a theory information flows from the past to the future, like in causal theories, it is possible to choose what experiment to perform now based on the outcome of a previous one [27,69]. This fact is so strongly linked to Causality that the ability to perform all classically-controlled experiments (i.e. chosen according to the outcome of a previous experiment) is equivalent to Causality itself [60,69]. A particular example of a classically-controlled test is a measure-and-prepare test.
The channel A = i∈X |ρ i ) (a i | is said to be measure-and-prepare as well.
A measure-and-prepare test is a classically-controlled test, because the state ρ i is prepared if the effect a i happens before. Note that the channel A is the complete coarse-graining over the outcomes of the measure-and-prepare test

Appendix B: Classicality
In this appendix we collect interesting and useful facts about classicality in GPTs. We start from some of the fundamental properties of classical states. Geometrically, the state space is a simplex, with all point-like probability distributions ( 1 0 . . . 0 T and permutations) as vertices ( fig. 7), and the unique deterministic effect is the rowvector u = 1 . . . 1 . Now, we explain how classical theory can be singled out among all other causal theories. It turns out that its key feature is that all pure states are jointly perfectly distinguishable. This means that there exists a measurement that distinguishes them perfectly in a single shot, as explained in section III.
Proposition 2. If all pure states of a causal theory are perfectly distinguishable, the theory is classical.
Proof. Suppose we have n pure states {ψ i } n i=1 of some system A. Requiring that they be perfectly distinguishable implies that they are linearly independent as vectors in St R (A). To see it, let {a i } n i=1 be the associated measurement. Then if we consider n i=1 λ i ψ i = 0, where λ i ∈ R, and we apply a j to both sides, we get λ j = 0, for all j = 1, . . . , n. Being n linearly independent vectors, they span an n-dimensional vector space. This means that the state space is a simplex. Now, let us examine the set of effects. By a similar argument, the effects {a i } n i=1 are linearly independent. Let us also show that these effects are pure. Suppose by contradiction that a generic a i is not pure, i.e. a i = k e k,i . Taking j = i, the fact that (a i |ψ j ) = 0 implies that (e k,i |ψ j ) = 0. Since effects are completely defined by their action on a basis of the vector space of states, e k,i = λ k,i a i , where λ k,i is a non-negative number. The condition a i = k e k,i for every i implies that {λ k,i } is a probability distribution. This shows that every a i is a pure effect. Hence the cone spanned by the {a i } n i=1 , which coincides with the dual cone St * + (A), is the whole effect cone Eff + (A). In this case all allowed mathematical effects are physical too, so there is no restriction on the effects. This is therefore classical theory.
This motivates the choice of classical states as the convex hull of a maximal set of perfectly distinguishable pure states. Considering only pure states is not restrictive. Indeed, suppose a causal theory has n perfectly distinguishable mixed states {ρ i } n i=1 , which are distinguished perfectly by the measurement {a i } n i=1 . Then, for every mixed state ρ i , we can find a pure state α i such that ρ i = p i α i + (1 − p i ) σ i , where p i ∈ (0, 1), and σ i is another state, for every i = 1, . . . , n. Then, for j = i, the fact that (a j |ρ i ) = 0 implies that p i (a j |α i ) + (1 − p i ) (a j |σ i ) = 0. Since all terms are non-negative, the only possibility is that (a j |α i ) = 0 (and (a j |σ i ) = 0). Additionally, since (a i |ρ i ) = 1, we have p i (a i |α i ) + (1 − p i ) (a i |σ i ) = 1. This convex combination of non-negative real numbers less than or equal to 1 can attain its maximum 1 if and only if (a i |α i ) = 1 (and (a i |σ i ) = 1). Therefore, the pure states{α i } n i=1 are perfectly distinguished by the same measurement {a i } n i=1 that distinguishes the mixed states {ρ i } n i=1 , as (a j |α i ) = δ ij . This shows us that in any physical theory with perfectly distinguishable states, it is always possible to pick perfectly distinguishable pure states to construct classical sub-theories.

Effects for a classical sub-theory
Once we pick a classical set α, which is the state space of a classical sub-theory of a given theory, we must find what effects to consider. Indeed, the counterexample of the restricted trit in appendix C will show that the choice of effects can have dramatic consequences for the structure of a theory. Even if the state space looks classical, as for the restricted trit, the theory can be very different from classical theory.
Given a classical set α, the natural way to assign effects to this classical sub-theory is to restrict the effects of the original theory to the set α, identifying those that are not tomographically distinct on α. More precisely, let us introduce the following equivalence relation on the original set of effects Eff (A): e ∼ α f if (e|γ) = (f |γ) for every classical state γ in α. The set of effects of the classical sub-theory is the set of equivalence classes Eff (A) /α := Eff (A) / ∼ α .
We need to show that this restricted set of effects Eff (A) /α is actually the set of effects of some classical theory. Recall that in classical theory, every element in the cone of effects arises as a conical combination of the effects that distinguish the pure states perfectly. In our setting, this means checking that every element of Eff + (A) /α arises as a conical combination of the equivalence classes [a i ] of the effects that distinguish the pure states α i in α. Note that it is not hard to see that Eff + (A) /α is still a cone, with the sum and the multiplication by a scalar inherited from Eff + (A). Consider a generic element ξ in Eff + (A), and let us show that it is in the same equivalence class as ξ = d i=1 λ i a i , where λ i = (ξ|α i ), for all i. By linearity, to check the equivalence of two elements of Eff + (A), it is enough to check that they produce the same numbers when applied to all pure states α j . Now, This shows that the effect cone Eff + (A) /α of the sub-theory is actually a classical effect cone, generated by the effects that perfectly distinguish the pure states in α.

Classical sub-theories from the no-restriction hypothesis
In appendix A, we saw how one can define the cone of states St + (A) and its dual St * + (A), given by linear functionals that are non-negative on St + (A). We also saw that one can define the cone of effects Eff + (A), generated by conical combinations of effects. Clearly, all the elements of Eff + (A) are linear functionals that yield a non-negative number when applied to elements of St + (A). Therefore, one has Eff + (A) ⊆ St * + (A). It is interesting to study when one has the equality in this inclusion. [27]). We say that a theory is non-restricted, or that it satisfies the norestriction hypothesis, if Eff + (A) = St * + (A) for every system.

Condition 3 (No-restriction hypothesis
While this may look like just a statement of mathematical interest, it has some important physical implications. Consider the subset of St * + (A) made of linear functionals f such that (f |ρ) ∈ [0, 1] for all states ρ. In a nonrestricted theory, these elements f are also valid effects. In other words, the no-restriction hypothesis states that every mathematically allowed effect is also a physical effect. Clearly, the no-restriction hypothesis concerns more the mathematical structure of the theory than its operational one. Indeed, it is the duty of the physical theory to specify what objects are to be considered physical effects, even if they are admissible in principle, based on their mathematical properties. For this reason, the no-restriction hypothesis has been questioned various times on the basis of its lack of operational motivation [27,72,73]. Moreover, recently it has been show that theories with almost quantum correlations [74] violate the no-restriction hypothesis [75].
Examples of theories that satisfy the no-restriction hypothesis are classical and quantum theory. The theory of restricted trits in appendix C, instead, explicitly violates it. This theory also has no classical states (see appendix C); this is not a coincidence, for one of the most important consequences of the no-restriction hypothesis is that a nonrestricted theory always admits at least the classical bit as a sub-theory.
Proposition 3. In a non-restricted theory, for every pure state ψ 1 there exists another pure state ψ 2 such that {ψ 1 , ψ 2 } are perfectly distinguishable.
Proof. Let ψ 1 be a pure state. The proof will consist of some steps. In the first step, let us prove that there exists a non-trivial element f of the dual cone St * + (A) such that (f |ψ 1 ) = 0. Note that being pure, ψ 1 lies in some supporting hyperplane through the origin of the cone St + (A) [76]. Such a hyperplane must have equation (f |x) = 0 for all x ∈ St R (A), where f is some non-trivial linear functional on St R (A), otherwise it would not pass through the origin (i.e. the zero vector). Being a supporting hyperplane, we can choose f to be in the dual cone St * + (A) [76]. Thus we have found f ∈ St * + (A) such that (f |ψ 1 ) = 0. Let us consider the maximum of f on the state space. Since f is continuous and the state space is compact, it achieves its maximum λ * on some state ρ * . Note that λ * > 0, otherwise f would be the zero functional. Let us show that the maximum is attained on some pure state. If ρ * is already a pure state, there is nothing to prove. If it is not, consider a refinement of ρ * in terms of pure states, ρ * = i p i ψ i , where {p i } is a probability distribution. Apply f to ρ * : Clearly λ * ≤ max (f |ψ i ), but being λ * the maximum of f , in fact λ * = max (f |ψ i ). This means that there is a pure state ψ 2 , chosen among these ψ i 's, on which f attains its maximum. Now consider the functional a 2 := f λ * , which takes values in the interval [0, 1] when applied to states. Specifically (a 2 |ψ 2 ) = 1 and (a 2 |ψ 1 ) = 0. By the no-restriction hypothesis, it is a valid effect, so we can construct the measurement {a 1 , a 2 }, where a 1 := u − a 2 , which perfectly distinguishes between ψ 1 and ψ 2 .
The essence of this proposition is that in every non-restricted physical theory there are at least two perfectly distinguishable pure states. By possibly adding other pure states so that the overall set is perfectly distinguishable, one can find a maximal set of perfectly distinguishable pure states. In this way one can always construct a classical set for every system, of dimension at least 2.
Even though the no-restriction hypothesis guarantees the existence of classical sets, we do not wish to assume it for its lack of operational motivation, preferring to stick to condition 1, which is agnostic about the reason why classical states arise in a theory.

Appendix C: A theory with no classical states
In this appendix we present a theory from which classical theory cannot emerge in any way, neither through decoherence, nor by taking arbitrarily large systems. This is the theory of restricted trits, which describes a classical trit (and its composites) when we fundamentally restrict its possible measurements. Remarkably, the effect of this restriction is that the theory contains no classical states, nor effective approximations of them.
To construct this theory, we start from the state space of the classical trit, represented in fig. 1, with pure states α 1 , α 2 , α 3 . Let {a 1 , a 2 , a 3 } be the measurement that perfectly distinguishes them in a single shot: (a i |α j ) = δ ij . Instead of allowing the full set of effects of classical theory, suppose that, for some reason, the most fine-grained effects that are allowed are e ij = 1 2 (a i + a j ), with i < j. A section of the dual cone (the same as the effect cone of classical theory), and of the effect cone of the restricted trit is represented in fig. 2 in the main text.
Since we have a smaller set of effects than the original classical trit, we must check what happens to the state space. Indeed it may happen that two states become tomographically indistinguishable because there are not enough effects to witness their difference (cf. definition 5). However, this is not the case for the restricted trit, because the effects e ij are linearly independent. As such, they span exactly the same effect vector space as the effects a i , which is what determines the tomographic power of a theory. Therefore the state space of the restricted trit coincides with that of the classical trit (cf. fig. 1 in the main text).
Nevertheless, the restriction on the allowed effects has a dramatic consequence: there are no perfectly distinguishable pure states, therefore no classical states even when composing an arbitrary large number of restricted trits. To this end, first let us show that {α 1 , α 2 , α 3 } are no longer perfectly distinguishable. Consider a generic effect e = λ 12 e 12 + λ 13 e 13 +λ 23 e 23 , where λ ij ≥ 0. This effect could yield 0 on α 2 and α 3 if and only if λ 12 = λ 13 = λ 23 = 0, but this would be the zero effect, which cannot yield 1 on α 1 . This means that the α i 's cannot be jointly perfectly distinguishable.
Maybe we can still find a pair of α i 's that are perfectly distinguishable? The answer is again negative. To see it, take e.g. the pair {α 1 , α 2 } (for the others the argument is the same). The only element in the effect cone that yields 1 on α 1 and 0 on α 2 is 2e 13 , but this is not a physical effect, because u − 2e 13 = a 2 , which is not an effect. In other words, 2e 13 cannot exist in a measurement of the form {2e 13 , u − 2e 13 }, but all effects must be part of some measurement! In conclusion, the restricted trit has no classical states.
What about the other systems of this theory? They are generated by composing restricted trits using the minimal tensor product [30,77]: the normalized states of the composite system AB are given by the convex hull of product states of A and B: where the subscript 1 means that we are only considering normalized states. This is how ordinary classical systems compose. Similarly the effect cones (generated by the extreme effects e ij ) are composed using the minimal tensor product of cones, whereby where Con denotes the conical hull. The generic composite system is obtained by composing N restricted trits. Therefore, it has 3 N pure states and 3 N extreme effects, given by e i1j1 ⊗ . . . ⊗ e i N j N , where for each k ∈ {1, . . . , N } one has i k < j k , with i k , j k ∈ {1, 2, 3}, and e i k j k = 1 2 (a i k + a j k ). Given that the {e i k j k } are linearly independent for every k ∈ {1, . . . , N }, the effects {e i1j1 ⊗ . . . ⊗ e i N j N } are still linearly independent, therefore they span the same vector space as the effects {a i1 ⊗ . . . ⊗ a i N }. This means that all states will stay tomographically distinct, and the state space of a composite of N restricted trits will look like the composite of N classical trits, namely, like a simplex with 3 N vertices.
Let us show that, even in composites, we still have a restriction on the mathematically allowed effects, represented by the dual cone. To this end, let us show that, for instance, we cannot obtain the effect a ⊗N 1 out of conical combinations of the extreme effects e i1j1 ⊗ . . . ⊗ e i N j N (for other products of the a i 's the argument is the same). Our goal is to determine the non-negative coefficients λ i1j1,...,i N j N such that Recalling the definition of e i k j k we have Unfolding the above expression, we get where primed summations indicate the other summation terms in eq. (C2). Clearly, primed summations must vanish, but since λ i1j1,...,i N j N ≥ 0, all coefficients λ i1j1,...,i N j N in primed summations must vanish. Note that the coefficients λ 1j1,...,1j N arise among the coefficients in the primed summations, which are all zero. This means that a ⊗N 1 = 0, which contradicts the hypothesis. It follows that it is not possible to obtain products of the a i 's from conical combination of the extreme effects e i1j1 ⊗ . . . ⊗ e i N j N . In other words, we are still in the presence of a restriction on the set of mathematically allowed effects.
Let us show that even in composites of the restricted trit there are no perfectly distinguishable pure states. To this end, it is enough to show that there are no pairs of perfectly distinguishable pure states. Indeed, if the states is also a set of perfectly distinguishable states: they are perfectly distinguished by the measurement {a 1 , u − a 1 }. Therefore no pairs of perfectly distinguishable pure states implies no sets of perfectly distinguishable pure states. Note that, in all composites, pure states are only of the product form; specifically they are the states α i1 ⊗. . .⊗α i N for the composition of N restricted trits. Now we will show by induction on N that there are no pairs of perfectly distinguishable pure states in any composite in the theory of restricted trits. For N = 1, we have already proved it. Now suppose this is true for N , and let us show that it is valid also for N + 1. Take system A to be the composition of N restricted trits, and let B be a single restricted trit. Suppose by contradiction that system AB, given by the composition of N + 1 restricted trits has two perfectly distinguishable pure states. They must be of the form {α 1 ⊗ β 1 , α 2 ⊗ β 2 }, where {α 1 , α 2 } are pure states of A, and {β 1 , β 2 } are pure states of B. Since {α 1 ⊗ β 1 , α 2 ⊗ β 2 } are perfectly distinguishable, there exists a measurement {E 1 , E 2 } on AB such that Now, by eq. (C1), E 1 is a conical combination of products of the extreme effects: where a i,1 and b i,1 are extreme effects of A and B respectively. Then we have Note that not all λ i,1 can be zero, otherwise E 1 would be the zero vector. Therefore we have two possibilities, which can be both true at the same time: 1.
One has In conclusion, we have proved that, in all composite systems in the theory of restricted trits, there are no perfectly distinguishable pure states. This means that no suitable classical limit can exist for this theory, not even considering an extremely large system.

Appendix D: Complete decoherence
From the definition of complete decoherence on a classical set α (definition 1 in the main text), it is immediate to see that applying the same decoherence twice on a single system is like applying it once. In other words, D 2 α ρ = D α ρ for every state ρ. Indeed, by definition D α ρ is a classical state γ, and applying the complete decoherence again, this classical state stays the same.
From a physical point of view, the fact that D 2 α ρ = D α ρ means that once a (single) system is decohered, classicality is reached, and there is nothing left to decohere. Note that D 2 α ρ = D α ρ for every ρ is not enough to conclude that D 2 α = D α , unless the theory satisfies Local Tomography [27], because, in general, transformations are defined by their action on half of a bipartite state, not on a state of a single system (see appendix A).
After understanding the behavior of complete decoherence on states, we need to look at what happens if we apply it to the effects of the theory. As it maps every state to a classical state, we expect that it does the same with effects: every effect becomes classical. This is indeed the case, as shown by the following: Proof. To show that the two sets coincide, we will actually show that there is a canonical bijection between the set of decohered effects {eD α } and the set Eff (A) /α. This bijection associates the equivalence class [eD α ] in Eff (A) /α with every decohered effect eD α . Let us prove that this is indeed a bijection. To this end, first observe that two decohered effects eD α and f D α are equal if and only if e ∼ α f . Indeed, eD α = f D α if and only if (e|D α |ρ) = (f |D α |ρ), for every state ρ. Now, define γ := D α ρ, which is a classical state. Therefore, eD α = f D α if and only if (e|γ) = (f |γ) for every classical state γ ∈ α, which means e ∼ α f . Let us prove that the mapping eD α → [eD α ] is injective. Assume [eD α ] = [f D α ], and let us show that eD α = f D α . If [eD α ] = [f D α ], then eD α ∼ α f D α , which means (e|D α |γ) = (f |D α |γ) for every classical state γ ∈ α. Now, D α γ = γ, so we have (e|γ) = (f |γ) for every γ ∈ α. This means e ∼ α f , which allows us to conclude that eD α = f D α . Now, let us prove that the mapping eD α → [eD α ] is surjective too. Take any equivalence class [e] in Eff (A) /α, and let us show that [e] = [eD α ], so the equivalence class [e] is associated with the decohered effect eD α . Now, for every classical state γ ∈ α, (e|D α |γ) = (e|γ) because D α γ = γ. This shows that e ∼ α eD α , so [e] = [eD α ].
Finally, let us show that the mapping eD α → [eD α ] respects the sums defined in the respective sets. Suppose e + f is a valid effect of the theory, where e and f are two valid effects, then eD α + f D α = (e + f ) D α is a valid decohered effect. With this effect we associate the equivalence class [(e + f ) D α ] = [eD α ] + [f D α ], which shows that the mapping eD α → [eD α ] respects the sums. This allows us to conclude that all classical effects can be regarded as decohered effects and vice versa.
Thus, complete decoherence maps all effects to classical effects, but it is not obvious if it leaves classical effects invariant: in general, it just sends them to an equivalent effect on α. From this point of view, the most general definition of complete decoherence (definition 1 in the main text) is asymmetric since classical states are left invariant, but not classical effects in general.
Definition 1 in the main text is so general that, in principle, given classical set α, there may be more than one channel that is a complete decoherence on α. There are essentially two possible ways in which a complete decoherence on α might be non-unique.
2. More subtly, if the theory does not satisfy Local Tomography, two complete decoherences D 1,α and D 2,α on α can be indistinguishable at the level of single systems, namely D 1,α ρ = D 2,α ρ for every ρ, but they can differ when applied only to part of a bipartite state: For quantum theory, however, definition 1 in the main text is enough to pick a unique decoherence for every fixed orthonormal basis, which is clearly the TID on that basis. The proof is not included here since is not relevant for the present paper. In appendix D 1, we present an example of a GPT in which the decoherence on a classical set is highly non-unique (cf. example 2), in clear contrast with the quantum behavior.

Test-induced decoherence and its properties
The next relevant question we investigate is whether a complete decoherence actually exists in every fundamental causal theory. In quantum theory, the decoherence on a classical set α, described by an orthonormal basis {|α j } d j=1 , is obtained from the von Neumann measurement on that orthonormal basis. Indeed, if we sum over all outcomes, we get the complete decoherence.
In appendix A 1, we noted that in causal theories one can always construct measure-and-prepare tests. Now we build one out of the pure states {α i } d i=1 of a classical set and their associated distinguishing measurement , which can be viewed as a non-demolition measurement on the classical set α. Taking the coarse-graining over all d outcomes yields a measure-and-prepare channel It is straightforward to show that D α is a complete decoherence, which we term the test-induced decoherence (TID).
Proof of proposition 1 in the main text. We must check if D α satisfies the two properties defining a complete decoherence (cf. definition 1 in the main text).
1. For any state ρ, where we have set p i := (a i |ρ). Note that p i ∈ [0, 1], and that is a measurement. Therefore D α ρ is a classical state, lying in the simplex generated by the α i 's.

For any
This means that D α preserves all pure states in α, and by linearity it preserves all classical states in α.
This shows that a complete decoherence always exists in causal theories. The TID enjoys some remarkable properties that make it a physically motivated form of decoherence.
be the measurement associated with the classical set α. The TID D α satisfies the following properties: 2. a i D α = a i for every i.
Proof. Let us prove the two properties.
1. Let us compose the TID with itself:

It is a straightforward calculation. Indeed
Note that property 1 means that the TID satisfies a stronger idempotence property than a generic complete decoherence: not only is this property valid on single systems, but also when the TID is applied to a part of a bipartite state. Again, this means that to decohere a system completely, it is enough to apply the TID just once: further applications of the TID will not change anything. Property 2 states that the TID preserves the effects that perfectly distinguish the pure states in α. Since all classical effects arise as suitable conical combinations of these effects, it means that the TID preserves each classical effect. This property removes the asymmetry we observed in the behavior of complete decoherences, which in general preserve only classical states, but not classical effects. The TID, instead, treats classical states and effects on equal footing, doing nothing to both of them. This makes it more physically appealing.
Recall that, in quantum theory, decoherence is always associated with the presence of an environment where information is leaked [7,8,40]. Instead in definition 1 in the main text, as well as in some other proposals in the GPT literature [39,41], the environment does not seem to play any explicit role in the process. However, in the TID, the environment and external observers are again present, albeit implicitly. Indeed, the fact that the TID arises as the coarse-graining of a test means that, at least in principle, an external observer is present in the process of decoherence.
Previous contributions on decoherence in GPTs [39,41] required the complete decoherence to be strictly puritydecreasing [39], or alternatively, that if a decohered state is pure, the original state was pure too [41]. In the following counterexample we show that the TID does not satisfy these desiderata in general: in some theories mixed states can be decohered to pure states. This behavior sharply contrasts with the one observed in quantum theory. Example 1. Let us consider the square bit [25]. Here the state space is a square, and the pure states are its vertices. This theory satisfies the no-restriction hypothesis, so all mathematically allowed effects are valid effects. The pure states are the vertices of the square. Fig. 8 shows the state space. The pure states are the vectors Figure 8. The state space of the square bit. Here α1, α2, α3, and α4 are pure states. The classical set α = Conv {α1, α2} is shown in black. Figure 9. A particularly useful parametrization of a generic state ρ of the square bit.
where the third component represents the fact that these states are normalized. Now consider the effects They make up a measurement {a 1 , a 2 } that perfectly distinguishes the pure states {α 1 , α 2 } in a single shot. Therefore, we can consider the classical set α = Conv {α 1 , α 2 }, which is simply the segment connecting α 1 and α 2 (see fig. 8). Now consider the TID D α = |α 1 ) (a 1 | + |α 2 ) (a 2 |. Note that D α decoheres the mixed state 1 2 (α 1 + α 4 ) to the pure state α 1 . This TID definitely increases purity! Is From this expression, it is immediate to see that in other words, the TID horizontally projects all states to the x-component of their above parametrization, which belongs to the set α. This is illustrated in fig. 10. From this geometric picture, it is clear that D α decoheres all the mixed states of the form pα 1 +(1 − p) α 4 , and pα 2 +(1 − p) α 3 , with p ∈ (0, 1), to pure states (α 1 and α 2 , respectively).
The natural question is when this counter-intuitive, purity-increasing behavior of the TID can be observed in a physical theory. In general, it is enough that one of the distinguishing effects {a i } d i=1 , say a 1 , gives 1 on another pure state ψ not in the classical set α. To show it, first note that since {a i } d i=1 is a measurement, if (a 1 |ψ) = 1, then (a i |ψ) = 0 for i > 1. Now take any mixed state of the form pα 1 +(1 − p) ψ, with p ∈ (0, 1); the TID D α = d i=1 |α i ) (a i | decoheres it to the pure state α 1 . In example 1, a 1 gave 1 also on α 4 , which was not in the classical set. Similarly, a 2 yielded 1 on α 3 too, again not in the classical set.
Finally, using the toy model of the square bit, we can study the uniqueness of complete decoherence on some classical sets, showing that, for some of them, the decoherence is unique (and therefore TID), while in others it is highly non-unique. This is another important illustration of how different GPTs can be from the quantum case. Example 2. Consider the classical set α = Conv {α 1 , α 2 } in example 1 again. Now we prove that the TID D α = |α 1 ) (a 1 | + |α 2 ) (a 2 | is the only complete decoherence for that classical set. To this end, let us consider a generic transformation D on the square bit, which can be represented as a square matrix of order 3: We want this matrix to represent a complete decoherence D α on α. The first condition is to require it to be a channel; therefore uD = u, where u = 0 0 1 is the deterministic effect (it yields 1 on all the pure states presented in example 1). This condition implies D α being a complete decoherence on α, we have D α α 1 = α 1 , D α α 2 = α 2 , and D α α 3 = pα 1 + (1 − p) α 2 , for p ∈ [0, 1], and D α α 4 = qα 1 + (1 − q) α 2 , for q ∈ [0, 1]. Recalling the expression of the pure states in example 1, the conditions D α α 1 = α 1 , D α α 2 = α 2 , and D α α 3 = pα 1 + (1 − p) α 2 yield the linear systems   with p ∈ [0, 1]. Solving them, we find that Let us see if this matrix is compatible with the condition D α α 4 = qα 1 + (1 − q) α 2 for q ∈ [0, 1]. We have On the other hand, This means that 2q − 1 = 2p + 1, which means q = p + 1. The only case in which q ∈ [0, 1], when p ∈ [0, 1], is when p = 0. This means that we have a unique complete decoherence on α, which is This coincides with the TID D α = |α 1 ) (a 1 | + |α 2 ) (a 2 |, as it is easy to check. This means that on the classical set α = Conv {α 1 , α 2 } there is only one complete decoherence, which is exactly the TID. By a symmetry argument, we have the same situation whenever we take a classical set corresponding to a side of the square. Something completely different, instead, happens when we take the classical set to be a diagonal of the square. Take e.g. α = Conv {α 1 , α 3 } (fig. 11). Now we show that in this case we can find an uncountable number of TIDs! To see it, let us characterize all the distinguishing measurements for {α 1 , α 3 }. Consider a generic effect e = e 1 e 2 e 3 ; to be part of a distinguishing measurement, without loss of generality we can assume (e|α 1 ) = 1 and (e|α 3 ) = 0. These conditions imply that e must be of the form This is not enough to guarantee that this is indeed an effect, because it must give a valid probability on α 2 and α 4 as well. In other words, we must impose that (e|α 2 ) ∈ [0, 1] and (e|α 4 ) ∈ [0, 1]. This gives the following system of inequalities where the solution is e 2 ∈ 0, 1 2 . For these values of e 2 , e is both a mathematically and a physically allowed effect, because the no-restriction hypothesis is assumed for the square bit [25]. Similarly, the effect e = u − e = −e 2 + 1 2 , −e 2 , 1 2 is also a mathematically allowed effect. Therefore {e, e } is a distinguishing measurement for {α 1 , α 3 } for any e 2 ∈ 0, 1 2 . This gives rise to a family of TIDs on the classical set α = {α 1 , α 3 } parameterized by a continuous parameter in 0, 1 2 : where we have set t := e 2 for simplicity of notation, and t ∈ 0, 1 2 . Using the parametrization of a generic state in eq. (D1), as depicted in fig. 12, we can exemplify the behavior of the family of TIDs with the two extreme cases of t = 0 and t = 1 2 .
This is illustrated in fig. 13. This means that D α ,0 projects every state onto the diagonal along the vertical sides, whereas D α , 1 2 projects every state onto the diagonal along the horizontal sides. Even in this case, there are no other complete decoherences besides the TIDs. Indeed, the generic matrix is like in eq. (D2). This time we require D α α 1 = α 1 , D α α 3 = α 3 , and D α α 2 = pα 1 + (1 − p) α 3 , for p ∈ [0, 1], and D α α 4 = qα 1 + (1 − q) α 3 , for q ∈ [0, 1]. From the conditions D α α 1 = α 1 , D α α 3 = α 3 , and D α α 2 = pα 1 + (1 − p) α 3 Figure 12. Using the intercept theorem, we can map the coefficients of convex combinations from the sides of the square to its diagonal. The blue segment is 1 − p times the diagonal, while the magenta segment is λ times the diagonal. for p ∈ [0, 1]. Solving them, we find that for p ∈ [0, 1]. Let us check if this matrix is compatible with the condition D α α 4 = qα 1 + (1 − q) α 3 for q ∈ [0, 1]. We have On the other hand This implies that 2p − 1 = −2q + 1, whence q = 1 − p. If p ∈ [0, 1], this guarantees that q ∈ [0, 1] too. There are no other constraints, so the most general complete decoherence on α is given by the matrix (D4), with p ∈ [0, 1]. A straightforward check shows that the TIDs in eq. (D3) with t ∈ 0, 1 2 cover all the complete decoherences in eq. (D4) once we set p := −2t + 1. This means that there are no complete decoherences on α other than the TIDs. Lemma 1. Let {P i } i∈X be an SRT, and ρ be a generic state, possibly not normalized. Then if the subset of nonvanishing states of {P i ρ} i∈X contains more than one element, these elements can be renormalized so they are perfectly distinguishable.
Proof. Let I be the subset of X of the indices labeling the non-vanishing elements in {P i ρ} i∈X . First, let us show that I is always non-empty. Suppose by contradiction that it is empty, then P i ρ = 0 for every i. By Causality, u = i∈X uP i , so (u|ρ) = i∈X (u|P i |ρ) = 0, which is impossible on a physical state. Suppose now |I| > 1. Let us first renormalize the states {P i ρ} i∈I by considering Piρ (u|Pi|ρ) . Let us prove that they are perfectly distinguished by the measurement {a i } i∈I , where for some arbitrary choice of i 0 ∈ I. For i = i 0 , where we have used the definition of SRT. Finally, for i = i 0 , (a i0 |P j |ρ) (u|P j |ρ) = (u|P i0 P j |ρ) (u|P j |ρ) + i / ∈I (u|P i P j |ρ) (u|P j |ρ) but the second term always vanishes because P i ρ = 0 for i / ∈ I.

The general form of objective states in causal theories
It is not hard to show that if the joint state in the OG is SBS, the players can win the game. Indeed, suppose the joint state is with p i > 0, where E denotes the joint environment composed of all the fragments controlled by the players E = E 1 . . . E n . Note that this state respects the strong independence condition: the states of the various players are only correlated by index i labeling the outcome found by the referee on S. In this game, the referee applies the SRT containing the transformations |α i ) (a i |. If {α i } r i=1 is a maximal set of perfectly distinguishable pure states, {|α i ) (a i |} r i=1 will be a test, otherwise it is enough to add other pure states to {α i } r i=1 until it becomes maximal {α i } d i=1 , with d > r. In this latter case, the SRT performed by the referee will be {|α i ) (a i |} d i=1 . What about the other players? What is their strategy to win the game? Since the states {ρ i,E k } r i=1 are perfectly distinguishable for every k, each player just needs to perform the SRT associated with them, namely is the measurement that distinguishes them. Note that, P i,E k ρ j,E k = δ ij ρ i,E k , so this SRT does not disturb the state (E1) in a strong sense. This shows that every causal theory has objective states.
The non-trivial part is to show that these are the only objective states. The key step is the following lemma.
Lemma 2. Let ρ SE be a state such that tr E ρ SE = r i=1 p i α i , where p i > 0 for all i, and the α i 's are the pure states of a d-dimensional classical set α, with d ≥ r. If where the P i 's are transformations in an SRT on E, then ρ SE must be of the form are perfectly distinguishable states of E. Now, let us define P i := P i,E1 ⊗ . . . ⊗ P i,En , which is an SRT on E. Imposing the Bohr non-disturbance condition (cf. definition 2 in main text) to the test {|α i ) (a i | ⊗ P i }, we find Now we are in the situation of lemma 2, so we know that ρ SE must be of the form ρ = r i=1 p i α i ⊗ ρ i , where the ρ i 's are perfectly distinguishable states of E. Imposing the strong independence condition, ρ i must be a product state, with the only correlations given by the index i: where for every k, the states {ρ i,E k } r i=1 are perfectly distinguishable. This concludes the proof.

Appendix F: Emergence of composite classical systems
In this appendix we elaborate more on the issue of composition of classical sub-theories of a given causal theory. For simplicity, we focus on bipartite systems; the generalization to more than two parties will be straightforward.
Consider now a bipartite system AB of a generic causal theory, and suppose system A has the classical set α, and system B the classical set β. If α is to represent an actual classical sub-theory for system A, and β an actual classical sub-theory for system B, it is natural to expect that the classical set for the composite system should mirror the composition of classical theory. Consequently, we would like to define the composite classical set as In particular, this definition implies that the pure states of the classical set for AB should all be of the form α i ⊗ β j , where α i is a pure state of α, and β j a pure state of β. However, here we face two problems. The first is that the product of two pure states may not be pure in general, as shown in ref. [78], and the second is that the set {α i ⊗ β j } may not be maximal for the composite system AB as shown in ref. [79], which means that there are extra pure states to add. Now, in this setting, axioms 2 and 3 in the main text are introduced to rule out this pathological holistic behavior. Indeed, if the first axiom fails, and the product of two pure states is not pure, the idea that the classical states of a composite system be reducible to the classical states of its components faces a considerable difficulty. In this case, since the product states are mixed, the theory is so holistic that, to construct the classical set for the composite system, we have to look for completely different states. If, instead, the theory satisfies axiom 2 in the main text, but it fails axiom 3 in the main text, we can construct the classical set for AB partially out of α and β, but we need some extra pure states of AB to make it maximal. Even in this case the theory shows a holistic behavior, and does not support the emergence of proper classical composite systems. If both axioms are satisfied, we have that the classical set for the composite system AB is given by (F1).
In a similar spirit, in the presence of axioms 2 and 3 in the main text, it is natural to expect that the decoherence process on a bipartite system is reducible to the decoherence of the two components [34,39,40,42]. In formula, for every bipartite state ρ AB . However, using the most general definition of complete decoherence (definition 1 in the main text) we cannot compare the action of D α ⊗ D β to the action of D αβ , as there is no specific recipe for decohering states. Let us see, instead, what happens when we consider TIDs. Let {a i } dA i=1 and {b j } dB j=1 be the measurements associated with α and β respectively; the measurement associated with αβ will be {a i ⊗ b j } dA i=1, dB j=1 . Therefore, the TID D αβ is the channel An easy rewriting shows that