On Phase Transitions to Cooperation in the Prisoner's Dilemma

Game theory formalizes certain interactions between physical particles or between living beings in biology, sociology, and economics, and quantifies the outcomes by payoffs. The prisoner's dilemma (PD) describes situations in which it is profitable if everybody cooperates rather than defects (free-rides or cheats), but as cooperation is risky and defection is tempting, the expected outcome is defection. Nevertheless, some biological and social mechanisms can support cooperation by effectively transforming the payoffs. Here, we study the related phase transitions, which can be of first order (discontinous) or of second order (continuous), implying a variety of different routes to cooperation. After classifying the transitions into cases of equilibrium displacement, equilibrium selection, and equilibrium creation, we show that a transition to cooperation may take place even if the stationary states and the eigenvalues of the replicator equation for the PD stay unchanged. Our example is based on adaptive group pressure, which makes the payoffs dependent on the endogeneous dynamics in the population. The resulting bistability can invert the expected outcome in favor of cooperation.

Game theory formalizes certain interactions between physical particles or between living beings in biology, sociology, and economics, and quantifies the outcomes by payoffs. The prisoner's dilemma (PD) describes situations in which it is profitable if everybody cooperates rather than defects (freerides or cheats), but as cooperation is risky and defection is tempting, the expected outcome is defection. Nevertheless, some biological and social mechanisms can support cooperation by effectively transforming the payoffs. Here, we study the related phase transitions, which can be of first order (discontinous) or of second order (continuous), implying a variety of different routes to cooperation. After classifying the transitions into cases of equilibrium displacement, equilibrium selection, and equilibrium creation, we show that a transition to cooperation may take place even if the stationary states and the eigenvalues of the replicator equation for the PD stay unchanged. Our example is based on adaptive group pressure, which makes the payoffs dependent on the endogeneous dynamics in the population. The resulting bistability can invert the expected outcome in favor of cooperation.
When two entities characterized by the states, "strategies", or "behaviors" i and j interact with each other, game theory formalizes the result by payoffs P ij , and the structure of the payoff matrix (P ij ) determines the kind of the game. The dynamics of a system of such entities is often delineated by the so-called replicator equations dp(i, t) dt = p(i, t) j P ij p(j, t) − j,l p(l, t)P lj p(j, t) (1) [3]. p(i, t) represents the relative frequency of behavior i in the system, which increases when the expected "success" F i = j P ij p(j, t) exceeds the average one, i F i p(i, t). Many collective phenomena in physics such as agglomeration or segregation phenomena can be studied in a game-theoretical way [5,6]. Applications also include the theory of evolution [10] and the study of ecosystems [11]. Another exciting research field is the study of mechanisms supporting the cooperation between selfish individuals [1][2][3] in situations like the "prisoner's dilemma" or public goods game, where they would usually defect (free-ride or cheat). Contributing to public goods and sharing them constitute ubiquitous situations, where cooperation is crucial, for example, in order to maintain a sustainable use of natural resources or a well-functioning health or social security system.
In the following, we will give an overview of the sta-tionary solutions of the replicator equations (1) and their stability properties. Based on this, we will discuss several "routes to cooperation", which transform the prisoner's dilemma into other games via different sequences of continuous or discontinuous phase transitions. These routes will then be connected to different biological or social mechanisms accomplishing such phase transitions [12]. Finally, we will introduce the concept of "equilibrium creation" and distinguish it from routes to cooperation based on "equilibrium selection" or "equilibrium displacement". A new cooperation-promoting mechanism based on adaptive group pressure will exemplify it. Stability properties of different games. Studying games with two strategies i only, the replicator equations (1) simplify, and we remain with where p(t) = p(1, t) represents the fraction of cooperators and 1 − p(t) = p(2, t) the fraction of defectors. λ 1 = P 12 − P 22 and λ 2 = P 21 − P 11 are the eigenvalues of the two stationary solutions p = p 1 = 0 and p = p 2 = 1.
For the sake of our discussion, we imagine an additional fluctuation term ξ(t) on the right-hand-side of Eq. (2), reflecting small perturbations of the strategy distribution. Four different cases can be classified [3]: (1) If λ 1 < 0 and λ 2 > 0, the stationary solution p 1 corresponding to defection by everybody is stable, while the stationary solution p 2 corresponding to cooperation by everyone is unstable. That is, any small perturbation will drive the system away from full cooperation towards full defection. This situation applies to the prisoner's dilemma (PD) defined by payoffs with P 21 > P 11 > P 22 > P 12 . According to this, strategy i = 1 ("cooperation") is risky, as it can yield the lowest payoff P 12 , while strategy i = 2 ("defection") is tempting, since it can give the highest payoff P 21 . (2) If λ 1 > 0 and λ 2 < 0, the stationary solution p 1 is unstable, while p 2 is stable. This means that the system will end up with cooperation by everybody. Such a situation occurs for the so-called harmony game (HG) with P 11 > P 21 > P 12 > P 22 , as mutual cooperation gives the highest payoff P 11 . (3) If λ 1 > 0 and λ 2 > 0, the stationary solutions p 1 and p 2 are unstable, but there exists a third stationary solution p 3 , which turns out to be stable. As a consequence, the system is driven towards a situation, where a fraction p 3 of cooperators is expected to coexist with a fraction (1 − p 3 ) of defectors. Such a situation occurs for the snowdrift game (SD) (also known as hawk-dove or chicken game). This game is characterized by P 21 > P 11 > P 12 > P 22 and assumes that unilateral defection is tempting, as it yields the highest payoff P 21 , but also risky, as mutual defection gives the lowest payoff P 22 . (4) If λ 1 < 0 and λ 2 < 0, the stationary solutions p 1 and p 2 are both stable, while the stationary solution p 3 is unstable. As a consequence, full cooperation is possible, but not guaranteed. In fact, the final state of the system depends on the initial condition p(0) (the "history"): If p(0) < p 3 , the system is expected to end up in the stationary solution p 1 , i.e. with full defection. If p(0) > p 3 , the system is expected to move towards p 2 = 1, corresponding to cooperation by everybody. The history-dependence implies that the system is multistable (here: bistable), as it has several (locally) stable solutions. This case is found for the stag hunt game (SH) (also called assurance). This game is characterized by P 11 > P 21 > P 22 > P 12 , i.e. cooperation is rewarding, as it gives the highest payoff P 11 in case of mutual cooperation, but it is also risky, as it yields the lowest payoff P 12 , if the interaction partner is uncooperative.
Phase transitions and routes to cooperation. When facing a prisoner's dilemma, it is of vital interest to transform the payoffs in such a way that cooperation between individuals is supported. Starting with the payoffs P 0 ij of a prisoner's dilemma, one can reach different payoffs P ij , for example, by introducing strategydependent taxes T ij = P 0 ij − P ij > 0. When increasing the taxes T ij from 0 to T 0 ij , the eigenvalues will change from λ 0 1 = P 0 12 − P 0 22 and λ 0 2 = P 0 21 − P 0 11 to λ 1 = λ 0 1 + T 22 − T 12 and λ 2 = λ 0 2 + T 11 − T 21 . In this way, one can create a variety of routes to cooperation, which are characterized by different kinds of phase transitions. We define route 1 [PD→HG] by a direct transition from a prisoner's dilemma to a harmony game. It is characterized by a discontinuous transition from a system, in which defection by everybody is stable, to a system, in which cooperation by everybody is stable (see Fig. 1a). Route 2 [PD→SH] is defined by a direct transition from the prisoner's dilemma to a stag hunt game. After the moment t * , where λ 2 changes from positive to negative values, the system behavior becomes history-dependent: When the fluctuations ξ(t) for t > t * exceed the critical threshold p 3 (t) = λ 1 /[λ 1 + λ 2 (t)], the system will experience a sudden transition to cooperation by everybody. Otherwise one will find defection by everyone, as in the prisoner's dilemma (see Fig. 1b). In order to make sure that the perturbations ξ(t) will eventually exceed p 3 (t) and trigger cooperation, the value of λ 2 must be reduced to sufficiently large negative values. It is also possible to have a continuous rather than sudden transition to cooperation: We define route 3 [PD→SD] by a transition from a prisoner's dilemma to a snowdrift game. As λ 1 is changed from negative to positive values, a fraction p 3 (t) = λ 1 (t)/[λ 1 (t) + λ 2 ] of cooperators is expected to result (see Fig. 1c). When increasing λ 1 , this fraction rises continuously. One may also implement more complicated transitions. Route 4, for example, establishes the transition sequence PD→SD→HG (see Fig. 1d), while we define route 5 by the transition PD→SH→HG (see Fig. 1e). One may also implement the transition PD→SD→HG→SH (route 6, see Fig. 1f), establishing a path-dependence, which can guarantee cooperation by everybody in the end. (When using route 2, the system remains in a defective state, if the perturbations do not exceed the critical value p 3 .) Relationship with cooperation-supporting mechanisms. We will now discuss the relationship of the above introduced routes to cooperation with biological and social mechanisms ("rules") promoting the evolution of cooperation. Martin A. Nowak performs his analysis of five such rules with the reasonable specifications T = b > 0, R = b − c > 0, S = −c < 0, and P = 0 in the limit of weak selection [12]. Cooperation is assumed to require a contribution c > 0 and to produce a benefit b > c for the interaction partner, while defection generates no payoff (P = 0). As most mechanisms leave λ 1 or λ = (λ 1 +λ 2 )/2 unchanged, we will now focus on the payoff-dependent parameters λ 1 and λ (rather than λ 1 and λ 2 ). The basic prisoner's dilemma is characterized by λ 0 1 = −c and λ 0 = 0.
According to the Supporting Online Material of Ref. [12], kin selection (genetic relatedness) tranforms the payoffs into P 11 = P 0 11 + r(b − c), P 12 = P 0 12 + br, P 21 = P 0 21 − cr, and P 22 = P 0 22 . Therefore, it leaves λ unchanged and increases λ 1 by T 22 − T 12 = br, where r represents the degree of genetic relatedness. Direct reciprocity (repeated interaction) does not change λ 1 , but it reduces λ by − where w is the probability of a future interaction. Network reciprocity (clustering of individuals playing the same strategy) leaves λ unchanged and increases λ 1 by H(k), where H(k) is a function of the number k of neighbors. Finally, group selection (competition between different populations) increases λ 1 by (b − c)(m − 1), where m is the number of groups, while λ is not modified. However, λ 1 and λ may also change simultaneously. For example, indirect reciprocity (based on trust and reputation) increases λ 1 by cq and reduces λ by − 1 2 (b − c)q < 0, where q quantifies social acquaintanceship.
Summarizing this, kin selection, network reciprocity, and group selection preserve λ = 0 and increase the value of λ 1 (see route 1 in Fig. 2). Direct reciprocity, in contrast, preserves the value of λ 1 and reduces λ (see route 2a in Fig. 2). Indirect reciprocity promotes the same transition (see route 2b in Fig. 2). Supplementary, one can analyze costly punishment. Using the payoff specifications made in the Supporting Information of Ref. [14], costly punishment changes λ by −(β + γ)/2 < 0 and λ 1 by −γ [14], i.e. when γ is increased, the values of λ and λ 1 are simultaneously reduced (see route 2c in Fig. 2). Here, γ > 0 represents the punishment cost invested by a cooperator to impose a punishmet fine β > 0 on a defector, which decreases the payoffs of both interaction partners. Route 3 can be generated by the formation of friendship networks [13]. Route 4 may occur by kin selection, network reciprocity, or group selection, when starting with a prisoner's dilemma with λ 0 < 0 (rather than λ 0 = 0 as assumed before). Route 5 may be generated by the same mechanisms, if λ 0 > 0. Finally, route 6 can be implemented by time-dependent taxation (see Fig. 2).
Further kinds of transitions to cooperation. The routes to cooperation discussed so far change the eigenvalues λ 1 and λ 2 , and leave the stationary solutions p 1 and p 2 unchanged. However, transitions to cooperation can also be 00000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000 generated by shifting the stationary solutions or creating new ones, as we will show now. For this, we generalize the replicator equation (2) by replacing λ 1 with f (p) and λ with g(p), and by adding a term h(p), which can describe effects of spontaneous transitions like mutations. To guarantee 0 ≤ p(t) ≤ 1, we must have h(p) = v(p) − pw(p) with functions w(p) ≥ v(p) ≥ 0. The resulting equation is dp/dt = F (p(t)) with F p = (1 − p)[f (p) − 2g(p)p]p + h(p), and its stationary solutions p k are given by F (p k ) = (1 − p k )[f (p k ) − 2g(p k )p k ]p k + h(p k ) = 0. The associated eigenvalues λ k = dF (p k )/dp determining the stability of the stationary solutions p k are where we have used the abbreviations are the derivatives of the functions f (p), g(p) and h(p) in the points p = p k .
Classification. We can now distinguish different kinds of transitions from defection to cooperation: If the stationary solutions p 1 = 0 and p 2 = 1 of the prisoner's dilemma are modified, we talk about transitions to cooperation by equilibrium displacement. This case occurs, for example, when random mutations are not weak (h = 0). If the eigenvalues λ 1 or λ 2 of the stationary solutions p 1 = 0 and p 2 = 1 are changed, we speak of equilibrium selection. This case applies to all routes to cooperation discussed before. If a new stationary solution appears, we speak of equilibrium creation. The different cases often appear in combination with each other (see the Summary below). In the following, we will dis-cuss an interesting case, where cooperation occurs solely through equilibrium creation, i.e. the stationary solutions p 1 and p 2 of the replicator equation for the prisoner's dilemma as well as their eigenvalues λ 1 and λ 2 remain unchanged. We illustrate this by the example of an adaptive kind of group pressure that rewards mutual cooperation (T 11 < 0) or sanctions unilateral defection (T 21 > 0). Both, rewarding and sanctioning reduces the value of λ 2 , while λ 1 remains unchanged. Assuming here that the group pressure vanishes, when everybody cooperates (as it is not needed then), while it is maximum when everybody defects (to encourage cooperation) [15], we may set f (p) = λ 0 1 and g(p) . It is obvious that we still have the two stationary solutions p 1 = 0 and p 2 = 1 with the eigenvalues λ 1 = λ 0 1 < 0 and λ 2 = 2λ 0 − λ 0 1 > 0 of the original prisoners dilemma with parameters λ 0 1 and λ 0 2 or λ 0 . However, for large enough values of K [namely for K > K 0 = λ 0 + |λ 0 1 | + |λ 0 1 |(2λ 0 + |λ 0 1 |)], we find two additional stationary solutions p − is an unstable stationary solution with p 1 < p − < p + and λ − = dF (p − )/dp > 0, while p + is a stable stationary solution with p − < p + < p 2 and λ + = dF (p + )/dp < 0 (see inset of Fig. 2). Hence, the assumed dependence of the payoffs on the proportion p of cooperators generates a bistable situation (BISTAB), with the possibility of a coexistence of a few defectors with a large proportion p + of cooperators, given K > K 0 . If p(0) < p − , where p(0) denotes the initial condition, defection by everybody results, while a stationary proportion p + of cooperators is established for p − < p(0) < 1. Surprisingly, in the limit K → ∞, cooperation is established for any initial condition p(0) = 0 (or through fluctuations). Summary. We have discussed from a physical point of view what must happen that social or biological, payoffchanging interaction mechanisms can create cooperation in the prisoner's dilemma. The possible ways are (i) moving the stable stationary solution away from pure defection (routes 3, 4, and 6), (ii) stabilizing the unstable solution (routes 1, 2, 4, 5 and 6), or (iii) creating new stationary solutions, which are stable (routes 3, 4 and 6). Several of these points can be combined. If (i) is fulfilled, we speak of "equilibrium displacement", if their eigenvalues change, we called this "equilbrium selection", and if (iii) is the case, we talk of "equilibrium creation". The first case can result from mutations, the second one applies to many social or biological cooperation-enhancing mechanisms [12]. We have discussed an interesting case of equilibrium creation, in which the outcome of the replicator equation is changed, although the stationary solutions of the PD and their eigenvalues remain unchanged. This can, for example, occur by adaptive group pressure [15], which introduces an adaptive feedback mechanism and thereby increases the order of non-linearity of the replicator equation. Surprisingly, already a linear dependence of the payoff values P ij on the endogeneous dynamics p(t) of the system is enough to destabilize defection and stabilize cooperation, thereby inverting the outcome of the prisoner's dilemma.