Correlation of positive and negative reciprocity fails to confer an evolutionary advantage: Phase transitions to elementary strategies

Economic experiments reveal that humans value cooperation and fairness. Punishing unfair behavior is therefore common, and according to the theory of strong reciprocity, it is also directly related to rewarding cooperative behavior. However, empirical data fail to confirm that positive and negative reciprocity are correlated. Inspired by this disagreement, we determine whether the combined application of reward and punishment is evolutionary advantageous. We study a spatial public goods game, where in addition to the three elementary strategies of defection, rewarding and punishment, a fourth strategy combining the later two competes for space. We find rich dynamical behavior that gives rise to intricate phase diagrams where continuous and discontinuous phase transitions occur in succession. Indirect territorial competition, spontaneous emergence of cyclic dominance, as well as divergent fluctuations of oscillations that terminate in an absorbing phase are observed. Yet despite the high complexity of solutions, the combined strategy can survive only in very narrow and unrealistic parameter regions. Elementary strategies, either in pure or mixed phases, are much more common and likely to prevail. Our results highlight the importance of patterns and structure in human cooperation, which should be considered in future experiments.


I. INTRODUCTION
Humans have mastered the art of cooperation like no other species [1,2]. Regardless of kinship and individual loss, we work together to achieve feats that are impossible to achieve alone. We have developed a very keen sense of fairness to uphold cooperative behavior in our societies [3,4], and we frequently punish those that do not cooperate in the pursuit of personal benefits and elevated status [5][6][7]. There also exist evidence for common marmosets and chimpanzees to show similar preferences regarding altruism and reward division [8,9], suggesting a long evolutionary history to the human sense of fairness. Although the origins of this behavior are not fully understood, there exist evidence for between-group conflicts [10] and the provisioning for someone else's young [11] as viable for igniting the evolution of remarkable other-regarding abilities of the genus Homo.
Like the origins of cooperative behavior, so too its later development and evolution continue to intrigue and stimulate new research across social and natural sciences [12][13][14][15][16]. Although key mechanisms have been identified that promote the evolution of cooperation [17], there is still disagreement between theory and experiment on many key issues. Two examples have recently attracted notable interest. The first concerns network reciprocity [18][19][20][21], according to which cooperators are able to exploit the structure of interaction networks to offset inherent evolutionary disadvantages over defectors. Recent large-scale human experiments, however, fail to provide evidence in support of network reciprocity [22]. The second example is of direct relevance for the present work, and it concerns the strong reciprocity model [23][24][25][26]. The later postulates that positive and negative reciprocity are directly correlated. In theory, it indeed seems reasonable to assume that rewarding cooperative behavior and punishing unfair behavior are to be seen as the two sides of the same preference for fair-ness. Yet recently gathered empirical data suggest otherwise [27,28]. In fact, Yamagishi et al. [27] have performed a series of experiments and concluded that there is no correlation between the tendencies to reject unfair offers in the ultimatum game [29] and the tendencies to exhibit prosocial behavior in other games [30,31]. Moreover, the analysis of private household data from the Socio-Economic Panel of the German Institute for Economics Research presented by Egloff et al. [28] has revealed that positive and negative reciprocity vary independently of each other, thus providing a severe challenge to the strong reciprocity model of the evolution of human cooperation. While the rejection of unfair offers, which ought to be seen equivalent to punishing defection [32], is simply a tacit strategy for avoiding the imposition of an inferior status, the act of cooperating appears to have an altogether different motivational background.
The described disagreement between the strong reciprocity model and empirical data invites an interdisciplinary approach, which promises to shed light on the subject from a different perspective. In the present paper, we therefore apply evolutionary game theory [33][34][35][36][37] and methods of statistical physics [38,39] to determine whether there are evolutionary advantages to be gained by adopting a strategy that punishes defectors as well as rewards cooperators, as opposed to doing just one or the other. While the elementary strategies of rewarding and punishment have received ample attention in the recent past [40][41][42][43][44][45], little is known about their combined effectiveness. To amend this, we propose and study a modified spatial public goods game [46,47], where defectors compete with cooperators that punish defectors, reward other cooperators, as well as do both. We intentionally leave out cooperators that neither reward nor punish in order to avoid the second-order free-riding problem [48,49], and to thus be able to focus solely on the effectiveness of the combined strategy against the three elementary strategies of defection, rewarding and punishment.
As we will show in what follows, although the spatiotemporal dynamics of the evolutionary game is very complex and interesting from the physics point of view, there exist only narrow and realistically unlikely parameter regions where the combined strategy is able to survive. Given the lack of notable evolutionary advantages of correlating positive and negative reciprocity, the outcome of the experiments by Yamagishi et al. [27] and Egloff et al. [28] can thus be better understood, though the complexity of solutions also lends some support to the strong reciprocity hypothesis as being viable at least under certain special circumstances. We will present compelling evidence to support these conclusions in section III, while in the next section we first describe the studied spatial public goods game and the methods in more detail.

II. PUBLIC GOODS GAME WITH POSITIVE AND NEGATIVE RECIPROCITY
As a frequently used paradigm of social conflicts and human cooperation, the public goods game is staged on a square lattice with periodic boundary conditions where L 2 players are arranged into overlapping groups of size G = 5 such that everyone is connected to its G − 1 nearest neighbors. Accordingly, each individual belongs to g = 1, . . . , G different groups. The square lattice is the simplest of networks that allows us to go beyond the unrealistic well-mixed population assumption, and as such it allows us to take into account the fact that the interactions among humans are inherently structured rather than random. By using the square lattice, we also continue a long-standing history that begun with the work of Nowak and May [18], who were the first to show that the most striking differences in the outcome of an evolutionary game emerge when the assumption of a well-mixed population is abandoned for the usage of a structured population. Many have since followed the same practice [46,50,51] (for a review see [14]), and there exist ample evidence in support of the claim that, especially for games that are governed by group interactions [47,52], using the square lattice suffices to reveal all the feasible evolutionary outcomes, and also that these are qualitatively independent of the interaction structure.
Initially each player on site x is designated either as a defector (s x = D), cooperator that punishes defectors (s x = P ), cooperator that rewards other cooperators (s x = R), or cooperator that both punishes defectors as well as rewards other cooperators (s x = B) with equal probability. All three cooperative strategies (P , R and B) contribute a fixed amount (here considered being equal to 1 without loss of generality) to the public good while defectors contribute nothing. The sum of all contributions in each group is multiplied by the synergy factor r and the resulting public goods are distributed equally amongst all the group members irrespective of their strategies. In addition, a defector suffers a fine β/(G−1) from each punisher (P or B) within the interaction neighborhood, which in turn requires the punisher to bear the cost γ/(G − 1) on each defecting individual in the group. A defector thus suffers the maximal fine β if it is surrounded solely by punishers, while a lonely punisher bears the largest cost γ if it is surrounded solely by defectors. Similarly, every cooperator is given the reward β/(G − 1) from every R and B player within the group, while each of them has to bear the cost of rewarding γ/(G−1) for every cooperator that is rewarded. As a technical comment, we note that the application of payoffs normalized by G − 1 enables relevant comparisons with the evolutionary outcomes on other interaction networks where players might differ in their degree and group size. Moreover, we use an equally strong fine and reward at the same cost, technically the same pair of (β, γ) values for reward and punishment, which ensures a fair evaluation of the evolutionary advantage of both strategies. By decoupling these parameters, for example by administering high fines and low rewards at the same cost to both punishers and those that reward, would confer an unfair advantage to punishment because it would then be relatively less costly than rewarding. Since giving equal chances for success is of paramount importance for assessing evolutionary viability, we do not decouple β and γ for reward and punishment, and we also award limitless resources to all competing strategies.
In agreement with the described rules of the game, the payoff values of the four competing strategies obtained from each group g are thus: where N sx denotes the number of other players with strategy s x in the group.
Monte Carlo simulations of the public goods game are carried out comprising the following elementary steps. A randomly selected player x plays the public goods game with its G − 1 partners as a member of all the g = 1, . . . , G groups, whereby its overall payoff Π sx is thus the sum of all the pay-offs Π g sx acquired in each individual group. Next, player x chooses one of its nearest neighbors at random, and the chosen co-player y also acquires its payoff Π sy in the same way. Finally, player x enforces its strategy s x onto player y with a probability given by the Fermi function w(s x → s y ) = 1/{1 + exp[(Π sy − Π sx )/K]}, where K = 0.5 quantifies the uncertainty by strategy adoptions [47], implying that bet- ter performing players are readily adopted, although it is not impossible to adopt the strategy of a player performing worse. Such errors in decision making can be attributed to mistakes and external influences that adversely affect the evaluation of the opponent. Each Monte Carlo step (MCS) gives a chance for every player to enforce its strategy onto one of the neighbors once on average. The average fractions of defectors (ρ D ), cooperators that punish (ρ P ), cooperators that reward (ρ R ), and cooperators that do both (ρ B ) on the square lattice were determined in the stationary state after a sufficiently long relaxation time. Depending on the actual conditions (proximity to phase transition points and the typical size of emerging spatial patterns), the linear system size was varied from L = 400 to 7200 and the relaxation time was varied from 10 4 to 10 5 MCS to ensure that the statistical error is comparable with the line thickness in the figures. We note that the random initial state may not necessarily yield a relaxation to the most stable solution of the game even at such a large system size (L = 7200). To verify the stability of different solutions, we have therefore applied prepared initial states (see Fig. 10 in [53]), and we have followed the same procedure as described previously in [54]. Next we proceed with presenting the main results.

III. RESULTS
Systematic Monte Carlo simulations are performed to reveal phase diagrams for two representative values of the synergy factor r. In the absence of reward and punishment, cooperators survive only if r > 3.74, and they are able to defeat defectors completely for r > 5.49 [47]. Taking these as benchmark values, we focus on r = 4.5 and r = 2.5. For r = 4.5 cooperators are able to coexist with defectors without support The phase diagram depicted in Fig. 1 suggests that at such a high value of r the far more effective action to outperform defectors is punishment rather than rewarding. The pure (or absorbing because the applied dynamical rule leaves the phase unchanged once the system arrives there) D phase in the upper left corner of the β−γ plane first gives way to the mixed D+P phase, and subsequently to the pure P phase as the fine (cost) increases (decreases). Only if the cost is negligible and the fine/reward is large are rewarding strategies able to survive. In this case defectors die out very soon, and from there on strategies R and B become equivalent since there is nobody left to punish. For the same reason strategy P transforms to that of ordinary cooperation. Accordingly, strategies R and B are able to coexist alongside strategy P as long as the cost of rewarding is sufficiently small to offset the second-order free-riding (because P do not contribute to rewarding other cooperators). The phase is denoted appropriately as P +(RB) in the lower right corner of Fig. 1.
Yet Fig. 1 fails to convey the full story behind the depicted phase diagram. For very low values of γ (∼ 10 3 smaller than β), the studies spatial public goods game reveals its true potential to yield rich dynamical behavior that gives rise to a truly intricate phase diagram. As can be observed in Fig. 2, no less than seven successive phase transitions can occur upon varying a single parameter (increasing β at a fixed value of γ). In addition to the pure P phase, we can observe two-strategy D + P , D + B and P + (RB) (note that here R and B are equivalent strategies) phases, three strategy D + P + B and D + P + R phases, and even the four-strategy D + P + R + B phase. While the majority of phase transitions is continuous, the D + P → D + B phase transition is discontinuous due to an indirect territorial competition (see [55,56] for further examples of this phenomenon) between strategies P and B. The two compete independently against the defectors, and the In particular, at smaller fines the transition is continuous because the average fraction of strategy P gradually decays to zero (see Fig. 6 for a quantitative insight). At larger fines, however, the averages of all three strategies remain finite, but the amplitude of oscillations diverges regardless of the system size (see Fig. 8 for details), which ultimately results in an abrupt termination of cyclic dominance between the three strategies (see Fig. 7). The notation of the P + (RB) phase at the lower right corner of the phase diagram has the same meaning as described in the caption of Fig. 1. victor is determined by whoever is more effective. The nature of the other phase transitions is illustrated quantitatively in Fig. 3, which shows a cross-section across β for the most interesting value of γ. Despite the complexity of solutions, the relevance of the presented results for the main question addressed in this study is quickly revealed. The dashed blue line in Fig. 2, marking the discontinuous D + P → D + B phase transition, conveys directly that the combined strategy B is more effective than the elementary strategy P only if β increases [if conditions for rewarding and sanctioning become more lenient (the two actions become less costly)], and this only when the costs are already negligible (∼ 10 3 smaller than the administered rewards and fines). Accordingly, we conclude that, at least for high values of the synergy factor r, there are no notable evolutionary advantages associated by correlating positive (rewarding) and negative (punishment) reciprocity in a single strategy. This agrees with the empirical data presented by Yamagishi et al. [27] and Egloff et al. [28], who failed to observe the same correlation in human experiments. On the other hand, it should not be overlooked that the combined strategy B is viable and that it does convey some advantages (albeit in very narrow and rather unrealistic parameter regions), which thus also lends some support to the strong reciprocity model [24].
To demonstrate just how the combined strategy B may survive, we show in Fig. 4 a series of snapshots from a prepared initial state (applied solely to allow the usage of a relatively small system size), which eventually evolves towards the three-strategy D + P + B phase. Based on this example, it could be argued that adopting strategy B does in fact confer an advantage over strategy R, which succumbs to the evolutionary pressure stemming from the three surviving strategies. However, as can be observed at a glance from the depicted phase diagrams presented in Figs. 1 and 2, this is limited to a very narrow and specific parameter range, which is practically invisible at normal resolution (see Fig. 1). In addition, we emphasize that strategy B is slightly less effective than strategy P (see Fig. 3 for the stationary fractions of the two strategies). Thus, although the combined strategy B might appear as a good choice in some of the parameter regions within Fig. 2, it is still second best to the elementary strategy P adopting solely punishment.

B. Synergy factor r = 2.5
If the conditions for the evolution of public cooperation become harsh, as is the case for r = 2.5, the relations between the competing strategies change quite significantly. The phase diagram presented in Fig. 5 reveals that, besides the expected extension of the pure D phase, the parameter region where strategy B can survive also becomes larger. Furthermore, there is a significant change in the nature of phase transitions. Unlike at r = 4.5 (see Fig. 1), here discontinuous phase transitions dominate, which has to do with the spontaneous emergence of cyclic dominance [57][58][59][60] between strategies D, P and B. In particular, within the three-strategy D + P + B phase strategy D outperforms strategy P , strategy P outper-  Fig. 6, here the D + P + B → D(B) phase transition is discontinuous because the amplitude of oscillations diverges independently of the system size (see Fig. 8 for details) as γ increases.
forms strategy B, while strategy B again outperforms strategy D. It is important to note that at r = 4.5 the stability of none of the three-strategy phases, and also not of the four-strategy phase, has been due to cyclic dominance. Instead, as Fig. 4 illustrates, there the stability was warranted by the stable coexistence of the strategies, rather than by oscillations that are brought about by cyclic dominance.
As was frequently the case before [53,54,61,62], here too the spontaneous emergence of cyclic dominance brings with it fascinating dynamical processes that are driven by pattern formation, by means of which the phase may terminate. Figures 6 and 7 feature two characteristic cross-sections of the phase diagram presented in Fig. 5, which reveal two qualitatively very different ways for the D +P +B cyclic dominance phase to give way to the D(B) phase [here D(B) indicates that either a pure D or a pure B phase can be the final state if starting from random initial conditions]. The process depicted in Fig. 6 is relatively straightforward. Here the average fractions of strategies P and B decay due to the increasing cost γ, which ultimately results in the vanishing average value of the fraction of strategy P . The closed cycle of dominance is therefore interrupted and the D + P + B phase terminates.
The situation for β = 0.55 is much more peculiar and interesting. As results presented in Fig. 7 demonstrate, here the average values of all three strategies remain finite. Hence, the termination of the D + P + B phase must have a different origin than at β = 0.37 presented in Fig. 6. In fact, for β = 0.55 it is the amplitude of oscillations that increases with increasing values of γ. And it is the increase in the amplitude that ultimately results in a uniform absorbing phase regardless of the system size. At this point it is crucial to emphasize that the increase of the amplitude of oscillation is not a finite-size effect. Although in spatial systems with cyclic dominance it is typical to observe oscillations with increasingly smaller am- plitude as the system size is increased, this does not hold in the present case. To demonstrate this, we measure the fluctuations in the stationary state according to  Fig. 6, only that here it is the fraction of strategy B that decays to zero as β increases, and which thus interrupts the closed cycle of dominance (in Fig. 6 it is the fraction of strategy P as γ increases that has the same effect). At 0 MCS the game is initiated from the same prepared initial state, and for the same reason, as described in the caption of Fig. 4. At 100 MCS, it can be observed that strategies P (green) and R (dark blue) are both weaker than strategy D (red), and accordingly their isolated islands shrink. Conversely, the combined strategy B is more effective when competing alone against the defectors, and thus the light blue island grows. Moreover, in the absence of defectors the strategy P can exploit rewarding strategies and spread fast in the bulk of the mixed domain (left upper circle). It is also worth pointing out that the B domain would grow endlessly in the sea of defectors if it would not meet the elementary strategy P that is able to exploit it. At 1800 MCS the final solution is practically formed, and from here on traveling waves dominate the spatial grid. At 4100 MCS it can be observed that the strategy B can spread towards strategy D, and based on this it may control a significant portion of the lattice for a short period of time. At 4240 MCS, however, the strategy P can easily invade the bulk of the B domain, but in the absence of the latter it itself becomes vulnerable against the defectors. This cycle of dominance is repeated from 4610 MCS onwards, which is a very similar configuration as the one at 1800 MCS. Naturally, the oscillations become more intense as we approach the edge of D + P + B phase in Fig. 5, and the evolution can easily terminate in an absorbing phase if the system size is not sufficiently large.
where ρ D is the average value of the fraction of defectors (a similar quantity can be calculated for the other strategies as well). As Fig. 8 shows, the scaled quantity χ is sizeindependent, thus indicating a divergent fluctuation as γ approaches the critical value. The three-strategy D + P + B phase is therefore unable to exist beyond this value despite the fact that the average fractions of all three strategies are far from zero. Instead, the phase terminates via a discontinuous phase transition towards the D(B) phase, as depicted in Fig. 5. Notably, within the D(B) phase either the pure D or the pure B phase can be the final state, depending on which strategy dies out first.
With this, however, we have not yet covered all the details of the phase diagram presented in Fig. 5. In addition, there are namely the same pure P and two-strategy P + (RB) phases observable that we have already reported above for r = 4.5 (see Fig. 1), only that at r = 2.5 they are shifted further towards higher values of β. This is understandable, given that the lesser support for public cooperation due to a lower value of the synergy factor needs to be offset by higher fines and rewards. Moreover, we must not overlook the existence, albeit a very subtle one, of the two-strategy D + B phase, the emergence of which is quantitatively described in the cross-section presented in Fig. 9. This is the only stable solution where solely the combined strategy B coexists with defectors, and where thus the correlation of negative and positive reciprocity truly outperforms elementary strategies P and R. As in all the previously outlined cases, however, in this case too this advantage is minute and limited to a very narrow region in the phase diagram.
In general, the harsher conditions for the evolution of public cooperation lend more support for the combined strategy to survive, as indeed the regions on the β − γ parameter plane where B can prevail become quite extensive at smaller values of the synergy factor r. This extends the credibility of the strong reciprocity model, and it indicates that, if at all, the evolutionary advantages of correlated positive and negative reciprocity ought to manifest clearer under extreme adversity.
In future experiments, it may thus be worthwhile working towards such conditions if the goal is to possibly discern some more actual advantages of correlated reciprocities, and to thus further support the assumptions of the strong reciprocity theory also with empirical data. A warning to end the presentation of results is, however, in order. As the series of final snapshots presented in Fig. 10 clearly demonstrates (and to no lesser extent also the series of snapshots presented in Fig. 4), conditions for pattern formation and complex strategic configurations need to be given for the subtle solutions, here identified by means of extensive and systematic Monte Carlo simulations, to emerge and be stable. Such conditions appear to be very difficult to achieve in experiments with humans, which is why efforts towards large-scale implementations, as recently reported in [22,63], are very encouraging and certainly worth developing further in the future.

IV. DISCUSSION
Our goal in the present paper was to determine whether there are evolutionary advantages associated by correlating positive and negative reciprocity in a single strategy, as opposed to adopting solely reward or punishment as an elementary strategy. Systematic Monte Carlo simulations have revealed that, regardless of the synergy factor governing the public goods game, elementary strategies, and punishment in particular, are in general significantly more effective in deterring defection than the combined strategy. Although there exist narrow and rather unrealistic parameter regions where the correlation of positive and negative reciprocity can outperform a particular elementary strategy, these advantages are highly unlikely to play a role in human experiments, and they also frequently come second to the evolutionary success that is warranted by punishment alone under the same conditions. The presented results thus lend support to the empirical data published in [27,28], which fail to support the central assumption of the strong reciprocity model that negative and positive reciprocity are correlated.
The studied four-strategy spatial public goods game gives rise to fascinating evolutionary outcomes that are separated by continuous and discontinuous phase transitions. We have demonstrated, for example, that indirect territorial competition may lend some credibility to the combined strategy, as the latter is sometimes more effective against the defectors than solely rewarding. In special parameter regions, the combination of positive and negative reciprocity can thus crowd out cooperators that reward other cooperators. Under the same conditions, however, cooperators that punish but do not reward can be more effective still, so overall it is difficult to argue in favor of choosing the combined strategy over an elementary one. Moreover, stationary solutions that are governed by indirect territorial competition terminate suddenly via discontinuous phase transitions, and accordingly, they are difficult to identify and are unlikely to seriously challenge conclusions based on empirical data.
For low synergy factors, we have shown that the spontaneous emergence of cyclic dominance between strategies D, P and B is also a possible solution of the system, and indeed it significantly extends the parameter region where the correlation of positive and negative reciprocity is viable. Within the cyclic phase defectors outperform punishers, punishers outperform the combined strategy, and the combined strategy is able to invade defectors, thereby closing the loop of dominance. In this case, it can again be argued in favor of the combined strategy over solely rewarding, but since the remaining three strategies become spontaneously entailed in a cycle of dominance, the advantage warranted by the correlation of negative and positive reciprocity is indirect and circumstantial at best. Furthermore, we have demonstrated that the cyclic dominance can terminate in very different ways. Either the average fraction of one strategy vanishes, or, more intriguingly, the amplitude of oscillations diverges in a system-size independent manner. Thus, although the average fractions of all three strategies are far from zero, the cyclic dominance phase may end abruptly via a discontinuous phase transition. Although phenomena like indirect territorial competition, cyclic dominance, divergent fluctuations of the amplitude of oscillations, as well as previously reported critical phenomena in evolutionary games [64], self-organized adaptation [65,66] and in-group favoritism [67], are all of significant interest to physicist, we emphasize that they would likely require massive efforts to be observed in human experiments. Nevertheless, recent large-scale attempts in this direction promise exciting times ahead [22,63].
Lastly, it remains to emphasize that punishment is the elementary strategy that is definitively more effective than the combined strategy, while rewarding is not necessarily so. However, rewarding can be made much more potent if rewards are administered not to all cooperators, but only to those who themselves reward others. In this case, rewarding can completely outperform punishment at low γ and high β values, while the situation reverses only if the costs become relatively high compared to the rewards and fines. Yet in this modified scenario, the act of punishing yields no extra advantages, and in general the strategy B can survive only when the strategy R can survive too. Therefore, even under such altered, rewarding-friendly conditions, there are still no notable evolutionary advantages to be gained by adopting a strategy that combines both positive and negative reciprocity. With this conclusion, we hope that our study will inspire further research aimed at investigating the role of correlated strategies in evolutionary games, and we also hope that more experimental work will be carried out to clarify their role by the evolution of human cooperation.