# Observations on Cooperation

Observations on Cooperation Abstract We study environments in which agents are randomly matched to play a Prisoner’s Dilemma, and each player observes a few of the partner’s past actions against previous opponents. We depart from the existing related literature by allowing a small fraction of the population to be commitment types. The presence of committed agents destabilizes previously proposed mechanisms for sustaining cooperation. We present a novel intuitive combination of strategies that sustains cooperation in various environments. Moreover, we show that under an additional assumption of stationarity, this combination of strategies is essentially the unique mechanism to support full cooperation, and it is robust to various perturbations. Finally, we extend the results to a setup in which agents also observe actions played by past opponents against the current partner, and we characterize which observation structure is optimal for sustaining cooperation. 1. Introduction Consider the following example of a simple yet fundamental economic interaction. Alice has to trade with another agent, Bob, whom she does not know. Both sides have opportunities to cheat, to their own benefit, at the expense of the other. Alice is unlikely to interact with Bob again, and thus her ability to retaliate, in case Bob acts opportunistically, is restricted. The effectiveness of external enforcement is also limited, $$e.g.$$ due to incompleteness of contracts, non-verifiability of information, and court costs. Thus cooperation may be impossible to achieve. Alice searches for information about Bob’s past behaviour, and she obtains anecdotal evidence about Bob’s actions in a couple of past interactions. Alice considers this information when she decides how to act. Alice also takes into account that her behaviour towards Bob in the current interaction may be observed by her future partners. Historically, the above-described situation was a challenge to the establishment of long-distance trade (Milgrom et al., 1990; Greif, 1993), and it continues to play an important role in the modern economy, in both offline (Bernstein, 1992; Dixit, 2003) and online interactions (Resnick and Zeckhauser, 2002; Jøsang et al., 2007). Several papers have studied the question of how cooperation can be supported by means of community enforcement. Most of these papers assume that all agents in the community are rational and, in equilibrium, best reply to what everyone else is doing. As argued by Ellison (1994, p. 578), this assumption may be fairly implausible in large populations. It seems quite likely that, in a large population, there will be at least some agents who fail to best respond to what the others are doing, either because they are boundedly rational, have idiosyncratic preferences, or because their expectations about other agents’ behaviour are incorrect. Motivated by this argument, we allow a few agents in the population to be committed to behaviours that do not necessarily maximize their payoffs. It turns out that this seemingly small modification completely destabilizes existing mechanisms for sustaining cooperation when agents are randomly matched with new partners in each period. Specifically, both the contagious equilibria (Kandori, 1992; Ellison, 1994) and the “belief-free” equilibria (Takahashi, 2010; Deb, 2017) fail in the presence of a small fraction of committed agents.1 Our key results are as follows. First, we show that always defecting is the unique perfect equilibrium, regardless of the number of observed actions, provided that the bonus of defection in the underlying Prisoner’s Dilemma is larger when the partner cooperates than when the partner defects. Second, in the opposite case, when the bonus of defection is larger when the partner defects than when the partner cooperates, we present a novel and essentially unique combination of strategies that sustains cooperation: all agents cooperate when they observe no defections and defect when they observe at least two defections.2 Some of the agents also defect when observing a single defection. Importantly, this cooperative behaviour is robust to various perturbations, and it appears consistent with experimental data. Third, we extend the model to environments in which an agent also obtains information about the behaviour of past opponents against the current partner. We show that in this setup cooperation can be sustained if and only if the bonus of defection of a player is less than half the loss she induces a cooperative partner to suffer. Finally, we characterize an observation structure that allows cooperation to be supported as a perfect equilibrium outcome in all Prisoner’s Dilemma games. In all observation structures, we use the same essentially unique construction to sustain cooperation. 1.1. Overview of the model Agents in an infinite population are randomly matched into pairs to play the Prisoner’s Dilemma game, in which each player decides simultaneously whether to cooperate or defect (see the payoff matrix in Table 1). If both players cooperate they obtain a payoff of one, if both defect they obtain a payoff of zero, and if one of the players defects, the defector gets $$1+g$$, while the cooperator gets $$-l$$, where $$g,l>0$$ and $$g<l+1$$. (The latter inequality implies that mutual cooperation is the efficient outcome that maximizes the sum of payoffs.) TABLE 1 Matrix payoffs of Prisoner’s Dilemma games     TABLE 1 Matrix payoffs of Prisoner’s Dilemma games     Before playing the game, each agent privately draws a random sample of $$k$$ actions that have been played by her partner against other opponents in the past. The assumption that a small random sample is taken from the entire history of the partner is intended to reflect a setting in which the memory of past interactions is long and accurate but dispersed. This means that the information that reaches an agent about her partner (through gossip) arrives in a non-deterministic fashion and may stem from any point in the past. We require each agent to follow a stationary strategy, $$i.e.$$ a mapping that assigns a mixed action to each signal that the agent may observe about the current partner. (That is, the action is not allowed to depend on calendar time or on the agent’s own history.) A steady state of the environment is a pair consisting of: (1) a distribution of strategies with a finite support that describes the fractions of the population following the different strategies, and (2) a signal profile that describes the distribution of signals that is observed when an agent is matched with a partner playing any of the strategies present in the population. The signal profile is required to be consistent with the distribution of strategies in the sense that a population of agents who follow the distribution of strategies and observe signals about the partners sampled from the signal profile will behave in a way that induces the same signal profile.3 Our restriction to stationary strategies and our focus on consistent steady states allow us to relax the standard assumption that there is an initial time zero at which an entire community starts to interact. In various real-life situations, the interactions within the community have been going on from time immemorial. Consequently, the participants may have only a vague idea of the starting point. Arguably, agents might therefore be unable to condition their behaviour on everything that has happened since the beginning of the interactions. We perturb the environment by introducing $$\epsilon$$committed agents who each follow one strategy from an arbitrary finite set of commitment strategies. We assume that at least one of the commitment strategies is totally mixed, which implies that all signals ($$i.e.$$ all sequences of k actions) are observed with positive probability. A steady state in a perturbed environment describes a population in which $$1-\epsilon$$ of the agents are normal; $$i.e.$$ they play strategies that maximize their long-run payoffs, while $$\epsilon$$ of the agents follow commitment strategies. We adapt the notions of Nash equilibrium and perfect equilibrium (Selten, 1975) to our setup. A steady state is a Nash equilibrium if no normal agent can gain in the long run by deviating to a different strategy (the agents are assumed to be arbitrarily patient). The deviator’s payoff is calculated in the new steady state that emerges following her deviation. A steady state is a perfect equilibrium if it is the limit of a sequence of Nash equilibria in a converging sequence of perturbed environments.4 1.2. Summary of results We start with a simple result (Prop. 1) that shows that defection is a perfect equilibrium outcome for any number of observed actions. We say that a Prisoner’s Dilemma game is offensive if there is a stronger incentive to defect against a cooperator than against a defector ($$i.e.$$$$g>l$$); in a defensive Prisoner’s Dilemma the opposite holds ($$i.e.$$$$g<l$$). Our first main result (Theorem 1) shows that always defecting is the unique perfect equilibrium in any offensive Prisoner’s Dilemma game ($$i.e.$$$$g>l$$) for any number of observed actions. The result assumes a mild regularity condition on the set of commitment strategies (Def. 3), namely, that this set is rich enough that, in any steady state of the perturbed environment, at least one of the commitment strategies induces agents to defect with a different probability than that of some of the normal agents. The intuition is as follows. The mild assumption that not all agents defect with exactly the same probability implies that the signal that Alice observes about her partner Bob is not completely uninformative. In particular, the more often Alice observes Bob to defect, the more likely Bob will defect against Alice. In offensive games, it is better to defect against partners who are likely to cooperate than to defect against partners who are likely to defect. This implies that a deviator who always defects is more likely to induce normal partners to cooperate. Consequently, such a deviator will outperform any agent who cooperates with positive probability. Theorem 1 may come as a surprise in light of a number of existing papers that have presented various equilibrium constructions that support cooperation in any Prisoner’s Dilemma game that is played in a population of randomly matched agents. Our result demonstrates that, in the presence of a small fraction of committed agents, the mechanisms that have been proposed to support cooperation fail, regardless of how these committed agents play (except in the “knife-edge” case of $$g=l$$; see Dilmé, 2016 and Remark 7 in Section 4.3). Thus, our article provides an explanation of why experimental evidence suggests that subjects’ behaviour corresponds neither to contagious equilibria (see, $$e.g.$$Duffy and Ochs, 2009) nor to belief-free equilibria (see, $$e.g.$$Matsushima et al., 2013). The empirical predictions of our model are discussed in Supplementary Appendix B. Our second main result (Theorem 2) shows that cooperation is a perfect equilibrium outcome in any defensive Prisoner’s Dilemma game ($$g<l$$) when players observe at least two actions. Moreover, there is an essentially unique distribution of strategies that support cooperation, according to which: (1) all agents cooperate when observing no defections, (2) all agents defect when observing at least 2 defections, (3) the normal agents defect with an average probability of $$0<q<1$$ when observing a single defection. The intuition for the result is as follows. Defection yields a direct gain that is increasing in the partner’s probability of defection (due to the game being defensive). In addition, defection results in an indirect loss because it induces future partners to defect when they observe the current defection. This indirect loss is independent of the current partner’s behaviour. One can show that there always exists a probability $$q$$ such that the above distribution of strategies balances the direct gain and the indirect loss of defection, conditional on the agent observing a single defection. Furthermore, cooperation is the unique best reply conditional on the agent observing no defections, and defection is the unique best reply conditional on the agent observing at least two defections. Next, we analyse the case of the observation of a single action ($$i.e.$$$$k=1$$). Proposition 2 shows that cooperation is a perfect equilibrium outcome in a defensive Prisoner’s Dilemma if and only if the bonus of defection is not too large (specifically, $$g\leq1)$$. The intuition is that similar arguments used to obtain the result above imply that there exists a unique mean probability $$q<1$$ by which agents defect when observing a defection in any cooperative perfect equilibrium. This implies that a deviator who always defects succeeds in getting a payoff of $$1+g$$ in a fraction $$1-q$$ of the interactions, and that such a deviator outperforms the incumbents if $$g$$ is too large. 1.3. Observations based on action profiles So far we have assumed that each agent observes only the partner’s (Bob’s) behaviour against other opponents, but that she cannot observe the behaviour of the past opponents against Bob. In Section 5 we relax this assumption. Specifically, we study three observation structures: the first two seem to be empirically relevant, and the third one is theoretically important since it allows us to construct an equilibrium that sustains cooperation in all Prisoner’s Dilemma games. (1) Observing conflicts: Each agent observes, in each of the $$k$$ sampled interactions of her partner, whether there was mutual cooperation ($$i.e.$$ no conflict: both partners are “happy”) or not ($$i.e.$$ partners complain about each other, but it is too costly for an outside observer to verify who actually defected). Such an observation structure (which we have not seen described in the existing literature) seems like a plausible way to capture non-verifiable feedback about the partner’s behaviour. (2) Observing action profiles: Each agent observes the full action profile in each of the sampled interactions. (3) Observing actions against cooperation: Each agent observes, in each of the sampled interactions, what action the partner took provided that the partner’s opponent cooperated. If the partner’s opponent defected then there is no information about what the partner did. It turns out that the stability of cooperation in the first two observation structures crucially depends on a novel classification of Prisoner’s Dilemma games. We say that a Prisoner’s Dilemma game is acute if $$g>\frac{l+1}{2}$$, and mild if $$g<\frac{l+1}{2}$$. The threshold between the two categories, namely, $$g=\frac{l+1}{2}$$, is characterized by the fact that the gain from a single unilateral defection is exactly half the loss incurred by the partner who is the sole cooperator. Consider a setup in which an agent is deterred from unilaterally defecting because it induces future partners to unilaterally defect against the agent with some probability. Deterrence in acute Prisoner’s Dilemmas requires this probability to be more than 50%, while a probability of below 50% is enough to deter deviations in mild PDs. Figure 1 (in Section 5.2) illustrates the classification of Prisoner’s Dilemma games. Figure 1 View largeDownload slide Classification of Prisoner’s Dilemma games Figure 1 View largeDownload slide Classification of Prisoner’s Dilemma games Our next results (Theorems 3 and 4) show that in both observation structures (conflicts or action profiles, and any $$k\geq2$$) cooperation is a perfect equilibrium outcome if and only if the underlying Prisoner’s Dilemma game is mild. Moreover, cooperation is supported by essentially the same unique behaviour as in Theorem 2. The intuition for why cooperation cannot be sustained in acute games with observation of conflicts is as follows. To support cooperation agents should be deterred from defecting against cooperators. As discussed above, in acute games, such deterrence requires that each such defection induce future partners to defect with a probability of at least 50%. However, this requirement implies that defection is contagious: each defection by an agent makes it possible that future partners observe a conflict both when being matched with the defecting agent and when being matched with the defecting agent’s partner. Such future partners defect with a probability of at least $$50\%$$ when making such observations. Thus the fraction of defections grows steadily, until all normal agents defect with high probability. The intuition for why cooperation cannot be sustained in acute games with observation of action profiles is as follows. The fact that deterring defections in acute games requires future partners to defect with a probability of at least 50% when observing a defection implies that when an agent (Alice) observes her partner (Bob) to defect against a cooperative opponent, then Bob is more likely to do so because he is a normal agent who observed his past opponent to defect than because Bob is a committed agent. This implies that Alice puts a higher probability on Bob defecting against her if she observes Bob to have defected against a partner who also defected than she does if she observes Bob to have defected against an opponent who cooperated. Thus, defecting is the unique best reply when observing the partner defect against a defector, but it removes the incentives required to support stable cooperation. Finally, we show that the third observation structure, observing actions against cooperation, is optimal in the sense that it sustains cooperation as a perfect equilibrium outcome for any Prisoner’s Dilemma game (Theorem 5). The intuition for this result is that not allowing Alice to observe Bob’s behaviour against a defector helps to sustain cooperation because it implies that defecting against a defector does not have any negative indirect effect (in any steady state) because it is never observed by future opponents. This encourages agents to defect against partners who are more likely to defect (regardless of the values of $$g$$ and $$l$$). 1.4. Conventional model and unrestricted strategies In Supplementary Appendix A, we relax the assumption that agents are restricted to choosing only stationary strategies. We present a conventional model of repeated games with random matching that differs from the existing literature only by our introducing a few committed agents. We show that this difference is sufficient to yield most of our key results. Specifically, the characterization of the conditions under which cooperation can be sustained as a perfect equilibrium outcome (as summarized in Table 1 in Section 5.3) holds also when agents are not restricted to stationary strategies, and even when agents observe the most recent past actions of the partner. On the other hand, the relaxation of the stationarity assumption in Supplementary Appendix A weakens the uniqueness results of the main model in two respects: (1) rather than showing that defection is the unique equilibrium outcome in offensive games, we show only that it is impossible to sustain full cooperation in such games; and (2) while a variant of the simple strategy of the main model still supports cooperation when the set of strategies is unrestricted, we are no longer able to show that this strategy is the unique way to support full cooperation. 1.5. Structure Section 2 presents the model. Our solution concept is described in Section 3. Section 4 studies the observation of actions. Section 5 extends the model to deal with general observation structures. We discuss the related literature in Section 6, and conclude in Section 7. Supplementary Appendix A adapts our key result to a conventional model with an unrestricted set of strategies. Supplementary Appendix B discusses our empirical predictions. Supplementary Appendix C presents technical definitions. In Supplementary Appendix D we present the refinements of strict perfection, evolutionary stability, and robustness. The formal proofs appear in Supplementary Appendix E. Supplementary Appendix F studies the introduction of cheap talk to our setup. 2. Stationary Model 2.1. Environment We model an environment in which patient agents in a large population are randomly matched in each round to play a two-player symmetric one-shot game. For tractability we assume throughout the article that the population is a continuum.5 We further assume that the agents are infinitely lived and do not discount the future ($$i.e.$$ they maximize the average per-round long-run payoff). Alternatively, our model can be interpreted as representing interactions between finitely lived agents who belong to infinitely lived dynasties, such that an agent who dies is succeeded by a protégé who plays the same strategy as the deceased mentor, and each agent observes $$k$$ random actions played by the partner’s dynasty. Before playing the game, each agent (she) privately observes $$k$$ random actions that her partner (he) played against other opponents in the past. As described in detail below, agents are restricted to using only stationary strategies, such that each agent’s behaviour depends only on the signal about the partner, and not on the agent’s own past play or on time. Thus, if all agents observe signals that come from a stationary distribution then the agents’ behaviour will result in a well-defined aggregate distribution of actions that is also stationary. We focus on steady states of the population, in which the distribution of actions, and hence the distribution of signals, is indeed stationary. In such steady states, the $$k$$ actions that an agent observes about her partner are drawn independently from the partner’s stationary distribution of actions. This sampling procedure may be interpreted as the limit of a process in which each agent randomly observes $$k$$ actions that are uniformly sampled from the last $$n$$ interactions of the partner, as $$n\rightarrow\infty$$. To simplify the notation, we assume that the underlying game has two actions, though all our concepts are applicable to games with any finite number of actions. An environment is a pair $$E=\left(G,k\right)$$, where $$G=\left(A=\left\{ c,d\right\} ,\pi\right)$$ is a two-player symmetric normal-form game, and $$k\in\mathbb{N}$$ is the number of observed actions. Let $$\pi:A\times A\rightarrow\mathbb{R}$$ be the payoff function of the underlying game. We refer to action $$c$$ (resp., $$d$$) as cooperation (resp., defection), since we will focus on the Prisoner’s Dilemma in our results. Let $$\Delta\left(A\right)$$ denote the set of mixed actions (distributions over $$A$$), and let $$\pi$$ be extended to mixed actions in the usual linear way. We use the letter $$a$$ (resp., $$\alpha$$) to denote a typical pure (mixed) action. With a slight abuse of notation let $$a\in A$$ also denote the element in $$\Delta\left(A\right)$$ that assigns probability 1 to $$a$$. We adopt this convention for all probability distributions throughout the article. 2.2. Stationary strategy The signal observed about the partner is the number of times he played each action $$a\in A$$ in the sample of $$k$$ observed actions. Let $$M=\left\{ 0,...,k\right\}$$ denote the set of feasible signals, where signal $$m\in M$$ is interpreted as the number of times that the partner defected in the sampled $$k$$ observations.6 Given a distribution of actions $$\alpha\in\Delta\left(A\right)$$ and an environment $$E=\left(G,k\right)$$, let $$\nu_{\alpha}\left(m\right)$$ be the probability of an agent observing signal $$m$$ conditional on being matched with a partner who plays on average the distribution of actions $$\alpha$$. That is, $$\nu\left(\alpha\right):=\nu_{\alpha}\in\Delta\left(M\right)$$ is a binomial signal distribution that describes a sample of $$k$$ i.i.d. actions, where each action is distributed according to $$\alpha$$:   $$\forall\left(m\right)\in M,\,\,\,\,\nu_{\alpha}\left(m\right)=\frac{k!\cdot\left(\alpha\left(d\right)\right)^{m}\cdot\left(\alpha\left(c\right)\right)^{\left(k-m\right)}}{m!\cdot\left(k-m\right)!}.\label{eq:multinomial}$$ (1) A stationary strategy (henceforth, strategy) is a mapping $$s:M\rightarrow\Delta\left(A\right)$$ that assigns a mixed action to each possible signal. Let $$s_{m}\in\Delta\left(A\right)$$ denote the mixed action assigned by strategy $$s$$ after observing signal $$m$$. That is, for each action $$a\in A$$, $$s_{m}\left(a\right)=s\left(m\right)\left(a\right)$$ is the probability that a player who follows strategy $$s$$ plays action $$a$$ after observing signal $$m$$. We also let $$a$$ denote the strategy $$s$$ that plays action $$a$$ regardless of the signal, $$i.e.$$$$s_{m}\left(a\right)=1$$ for all $$m\in M$$. Strategy $$s$$ is totally mixed, if for each action $$a\in A$$, and signal $$m\in M$$$$s_{m}\left(a\right)>0$$. Let $$\mathcal{S}$$ denote the set of all strategies. Given strategy $$s$$ and distribution of signals $$\nu\in\Delta\left(M\right)$$, let $$s\left(\nu\right)\in\Delta\left(A\right)$$ be the distribution of actions played by an agent who follows strategy $$s$$ and observes a signal sampled from $$\nu$$:   $\forall a\in A,\,\,\,\,s\left(\nu\right)\left(a\right)=\sum_{m\in M}\nu\left(m\right)\cdot s_{m}\left(a\right).$ 2.3. Signal profile and steady state Fix an environment and a finite set of strategies $$S$$. A signal profile $$\theta:S\rightarrow\Delta\left(M\right)$$ is a function that assigns a distribution of signals for each strategy in $$S$$. We interpret $$\theta_{s}\left(m\right)$$ as the probability that signal $$m$$ is observed when a partner playing strategy $$s$$ is encountered. Let $$O_{S}$$ be the set of all signal profiles defined over $$S$$. Given a strategy $$\sigma\in\Delta\left(S\right)$$ and a signal profile $$\theta\in O_{S}$$, let $$\theta_{\sigma}\in\Delta\left(M\right)$$ be the average distribution of signals in the population, $$i.e.$$$$\theta_{\sigma}\left(m\right):=\sum_{s\in S}\sigma\left(s\right)\cdot\theta_{s}\left(m\right)$$. We say that a signal profile $$\theta:S\rightarrow\Delta\left(M\right)$$ is consistent with distribution of strategies $$\sigma\in\Delta\left(S\right)$$ if   $$\forall m\in M,\,\,s\in S,\,\,\,\,\theta_{s}\left(m\right)=\nu\left(s\left(\theta_{\sigma}\right)\right)\left(m\right).$$ (2) The interpretation of the consistency requirement is that a population of agents who follow the distribution of strategies $$\sigma$$ and observe signals about the partners sampled from the profile $$\theta$$ have to behave in a way that induces the same profile of signal distributions $$\theta$$. Specifically, when Alice, who follows strategy $$s$$, is being matched with a random partner whose strategy is sampled according to $$\sigma$$, she observes a random signal according to the “current” average distribution of signals in the population $$\theta_{\sigma}$$. As a result her distribution of actions is $$s\left(\theta_{\sigma}\right)$$, and thus her behaviour induces the signal distribution $$\nu\left(s\left(\theta_{\sigma}\right)\right)$$. Consistency requires that this induced signal distribution coincide with $$\theta_{s}$$. A steady state is a triple consisting of (1) a finite set of strategies $$S$$ interpreted as the strategies that are played by the agents in the population, (2) a distribution $$\sigma$$ over $$S$$ interpreted as a description of the fraction of agents following each strategy, and (3) a consistent signal profile $$\theta:S\rightarrow\Delta\left(M\right)$$. Formally: Definition 1. A steady state (or state for short) of an environment $$\left(G,k\right)$$ is a triple $$\left(S,\sigma,\theta\right)$$ where $$S\subseteq\mathcal{S}$$ is a finite set of strategies, $$\sigma\in\Delta\left(S\right)$$ is a distribution with full support over $$S$$, and $$\theta:S\rightarrow\Delta\left(M\right)$$ is a consistent signal profile. When the set of strategies is a singleton, $$i.e.$$$$S=\left\{ s\right\}$$, we omit the degenerate distribution assigning a mass of one to $$s$$, and we write the steady state as a pair $$\left(\left\{ s\right\} ,\theta\right).$$ We adopt this convention, of omitting reference to degenerate distributions, throughout the article. A standard argument shows that any distribution of strategies admits a consistent signal profile (Lemma 1 in Supplementary Appendix C). Some distributions induce multiple consistent profiles of signal distributions. For example, suppose that $$k=3$$, and everyone follows the strategy of playing the most frequently observed action ($$i.e.$$ defecting iff $$m\geq2$$). In this setting there are three consistent signal profiles: one in which everyone cooperates, one in which everyone defects, and one in which everyone plays (on average) uniformly.7 2.4. Perturbed environment As discussed in the Introduction, and as argued by Ellison (1994, p. 578), it seems implausible that in large populations all agents are rational and know exactly the strategies played by other agents in the community. Motivated by this observation, we introduce the notion of a perturbed environment in which a small fraction of agents in the population are committed to playing specific strategies, even though these strategies are not necessarily payoff-maximizing. A perturbed environment is a tuple consisting of (1) an environment, (2) a distribution $$\lambda$$ over a set of commitment strategies $$S^{C}$$ that includes a totally mixed strategy, and (3) a number $$\epsilon$$ representing the share of agents who are committed to playing strategies in $$S^{C}$$ (henceforth, committed agents). The remaining $$1-\epsilon$$ share of the agents can play any strategy in $$\mathcal{S}$$ (henceforth, normal agents). Formally: Definition 2. A perturbed environment is a tuple $$E_{\epsilon}=\left(\left(G,k\right),\left(S^{C},\lambda\right),\epsilon\right)$$, where $$G$$ is the underlying game, $$k\in\mathbb{N}$$ is the number of observed actions, $$S^{C}$$ is a non-empty finite set of strategies (called, commitment strategies) that includes a totally mixed strategy, $$\lambda\in\Delta\left(S^{C}\right)$$ is a distribution with full support over the commitment strategies, and $$\epsilon\geq0$$ is the mass of committed agents in the population. We require $$S^{C}$$ to include at least one totally mixed strategy because we want all signals to be observed with positive probability in a perturbed environment when $$\epsilon>0$$. (This is analogous to the requirement in Selten, 1975, that all actions be played with positive probability in the perturbations defining a perfect equilibrium.) Throughout the article we look at the limit in which the share of committed agents, $$\epsilon$$, converges to zero. This is the only limit taken in the article. We use the notation of $$O\left(\epsilon\right)$$ (resp., $$O\left(\epsilon^{2}\right)$$) to refer to functions that are in the order of magnitude of $$\epsilon$$ (resp., $$\epsilon^{2}$$), $$i.e.$$$$\frac{f\left(\epsilon\right)}{\epsilon}\rightarrow_{\epsilon\rightarrow0}0$$ (resp., $$\frac{f\left(\epsilon\right)}{\epsilon^{2}}\rightarrow_{\epsilon\rightarrow0}0$$). We refer to $$\left(S^{C},\lambda\right)$$ as a distribution of commitments. With a slight abuse of notation, we identify an unperturbed environment$$\left(\left(G,k\right),\left(S^{C},\lambda\right),\epsilon=0\right)$$ with the equivalent environment $$\left(G,k\right)$$. Remark 1. To simplify the presentation, the definition of perturbed environment includes only commitment strategies, and it does not allow “trembling hand” mistakes. As discussed in Remark 6 in Section 4.3, the results also hold in a setup in which agents also tremble, as long as the probability by which a normal agent trembles is of the same order of magnitude as the frequency of committed agents. One of our main results (Theorem 1) requires an additional mild assumption on the perturbed environment that rules out the knife-edge case in which all agents (committed and non-committed alike) behave exactly the same. Specifically, a set of commitments is regular if for each distribution of actions $$\alpha$$, there exists a committed strategy $$s$$ that does not play distribution $$\alpha$$ when observing the signal distribution induced by $$\alpha$$. Formally: Definition 3. A set of commitment strategies $$S^{C}$$ is regular if for each distribution of actions $$\alpha\in\Delta\left(A\right)$$, there exists a strategy $$s\in S^{C}$$ such that $$s_{\nu\left(\alpha\right)}\neq\alpha$$. If the set of commitments is regular, then we say that the distribution $$\left(S^{C},\lambda\right)$$ and the perturbed environment $$\left(\left(G,k\right),\left(S^{C},\lambda\right),\epsilon\right)$$ are regular. An example of a regular set of commitments is the set that includes strategies $$s\equiv\alpha_{1}$$ and $$s'\equiv\alpha_{2}$$ that induce agents to play mixed actions $$\alpha_{1}\neq\alpha_{2}$$ regardless of the observed signal. 2.5. Steady state in a perturbed environment Fix a perturbed environment $$E_{\epsilon}=\left(\left(G,k\right),\left(S^{C},\lambda\right),\epsilon\right)$$ and a finite set of strategies $$S^{N}$$, interpreted as the strategies followed by the normal agents in the population. We redefine a signal profile $$\theta:S^{C}\cup S^{N}\rightarrow\Delta\left(M\right)$$ as a function that assigns a binomial distribution of signals to each strategy in $$S^{C}\cup S^{N}$$. Given a distribution over strategies of the normal agents $$\sigma\in\Delta\left(S^{N}\right)$$ and a signal profile $$\theta\in O_{S^{C}\cup S^{N}}$$, let $$\theta_{\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)}\in\Delta\left(M\right)$$ be the average distribution of signals in the population, $$i.e.$$$$\theta_{\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)}\left(m\right):=\sum_{s\in S^{C}\cup S^{N}}\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)\left(s\right)\cdot\theta_{s}\left(m\right)$$. We adapt the definitions of a consistent signal profile and of a steady state to perturbed environments. This straightforward adaptation is presented in detail in Supplementary Appendix C. The following example demonstrates a specific steady state in a specific perturbed environment. The example is intended to clarify the various definitions of this section and, in particular, the consistency requirement. Later, we revisit the same example to explain the essentially unique perfect equilibrium that supports cooperation. Example 1. Consider the perturbed environment $$\left(\left(G,k=2\right),\left(\left\{ s^{u}\equiv0.5\right\} \right),\epsilon\right)$$, in which each agent observes two of her partner’s actions, there is a single commitment strategy, denoted by $$s^{u}$$, which is followed by a fraction $$0<\epsilon<<1$$ of committed agents, who choose each action with probability $$0.5$$ regardless of the observed signal. Let $$\left(S=\left\{ s^{1},s^{2}\right\} ,\sigma=\left(\frac{1}{6},\frac{5}{6}\right),\theta\right)$$ be the following steady state. The state includes two normal strategies: $$s^{1}$$ and $$s^{2}$$. The strategy $$s^{1}$$ defects iff $$m\geq1$$, and the strategy $$s^{2}$$ defects iff $$m\geq2$$. The distribution $$\sigma$$ assigns a mass of $$\frac{1}{6}$$ to $$s^{1}$$ and a mass of $$\frac{5}{6}$$ to $$s^{2}$$. The consistent signal profile $$\theta$$ is defined as follows (neglecting terms of $$O\left(\epsilon^{2}\right)$$ throughout the example):   $$\theta_{s^{u}}\left(m\right)=\begin{cases} 25\% & if\,m=0\$3pt] 50\% & if\,m=1\\[3pt] 25\% & if\,m=2, \end{cases}\,\,\,\,\,\theta_{s^{1}}\left(m\right)=\begin{cases} 1-3.5\cdot\epsilon & if\,m=0\\[3pt] 3.5\cdot\epsilon & if\,m=1\\[3pt] 0 & if\,m=2 \end{cases}\,\,\,\,\,\theta_{s^{2}}\left(m\right)=\begin{cases} 1-0.5\cdot\epsilon & if\,m=0\\[3pt] 0.5\cdot\epsilon & if\,m=1\\[3pt] 0 & if\,m=2. \end{cases}$$ (3) To confirm the consistency of \theta, we have first to calculate the average distribution of signals in the population: \[ \theta_{\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)}\left(m\right)=\begin{cases} 1-1.75\cdot\epsilon & if\,m=0\\ 1.5\cdot\epsilon & if\,m=1\\ 0.25\cdot\epsilon & if\,m=2. \end{cases}$ Using $$\theta_{\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)}$$, we confirm the consistency of $$\theta_{s^{1}}$$ and $$\theta_{s^{2}}$$ (the consistency of $$\theta_{s^{u}}$$is immediate). We do so by calculating distribution of actions played by a player following strategy $$s_{i}$$ who observes the distribution of actions of a random partner:   $\begin{array}{ccc} s^{1}\left(\theta_{\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)}\right)\left(c\right) & = & 1-1.75\cdot\epsilon\\ s^{1}\left(\theta_{\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)}\right)\left(d\right) & = & 1.75\cdot\epsilon \end{array}\,\,\,\,\,\,\,\,\,\begin{array}{ccc} s^{2}\left(\theta_{\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)}\right)\left(c\right) & = & 1-0.25\cdot\epsilon,\\ s^{2}\left(\theta_{\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)}\right)\left(d\right) & = & 0.25\cdot\epsilon. \end{array}$ Note that $$s^{1}\left(\theta_{\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)}\right)\left(d\right)=1-\theta_{\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)}\left(2\cdot c\right)$$ and $$s^{2}\left(\theta_{\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)}\right)\left(d\right)=\theta_{\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)}\left(2\cdot d\right)$$. The final step in showing that $$\theta$$ is a consistent profile is the observation that each $$\theta_{s^{i}}$$ coincides with the binomial distribution that is induced by $$s^{i}\left(\theta_{\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)}\right)$$. 2.6. Discussion of the model Our model differs from most of the existing literature on community enforcement in three key dimensions (see, $$e.g.$$Kandori, 1992; Ellison, 1994; Dixit, 2003; Deb, 2017; Deb and González-Díaz, 2014). In what follows we discuss these three key differences, and their implications on our results. (1) The presence of a few committed agents. If one removes the commitment types from our setup, then one can show (by using belief-free equilibria, as in Takahashi, 2010) that: (1) it is always possible to support full cooperation as an equilibrium outcome, and (2) there are various strategies that sustain full cooperation. The results of this article show that the introduction of a few committed agents, regardless of how they behave, implies very different results: (1) defection is the unique equilibrium payoff in offensive Prisoner’s Dilemmas (Theorem 1), and (2) there is an essentially unique strategy combination that supports a cooperative equilibrium in defensive Prisoner’s Dilemmas. The intuition is that the presence of committed agents implies that observation of past actions must have some influence on the likely behaviour of the partner in the current match (more detailed discussions of this issue follow Theorem 1 and Remark 10). (2) Restriction to Stationary Strategies. In our model, we restrict agents to using stationary strategies that condition only on the number of times they observed each of the partner’s actions being played in past interactions. We allow agents to condition their play neither on the order in which the observed actions were played in the past, nor on the agent’s own history of play, nor on calendar time. The assumption simplifies the presentation of the model and results. In addition, the assumption allows us to achieve uniqueness results that might not hold without stationarity (as discussed in Section A.3). (3) Not having a “global time zero.” Most of the existing literature represents interactions within a community as a repeated game that has a “global time zero”, in which the first ever interaction takes place. In many real-life situations, the interactions within a community began a long time ago and have continued, via overlapping generations, to the present day. It seems implausible that today’s agents condition their behaviour on what happened in the remote past (or on calendar time). For example, trade interactions have been been taking place from time immemorial. It seems unreasonable to assume that Alice’s behaviour today is conditioned on what transpired in some long-forgotten time $$t=0$$, when, say, two hunter-gatherers were involved in the first ever trade. We suggest that, even though real-world interactions obviously begin at some definite date, a good way of modelling what the interacting agents think about the situation may be to get rid of global time zero and focus on strategies that do not condition on what happened in the remote past. The lack of a global time zero is the reason why, unlike in repeated games, a distribution of strategies does not uniquely determine the behaviour and the payoffs of the agent, so that one must explicitly add the consistent signal profile $$\theta$$ as part of the description of the state of the population. It is possible to interpret a steady state $$\left(S,\sigma,\theta\right)$$ as a kind of initial condition for society, in which agents already have a long-existing past. That is, we begin our analysis of community interaction at a point in time when agents have for a long time followed the strategy distribution $$\left(S,\sigma\right)$$ yielding the consistent signal profile $$\theta$$. We then ask whether any patient agent has a profitable deviation from her strategy. If not, then the steady state $$\left(S,\sigma,\theta\right)$$ is likely to persist. This approach stands in contrast to the standard approach that studies whether or not agents have a profitable deviation at a time $$t>>1$$ following a long history that started with the first ever interaction at $$t=0$$. In Supplementary Appendix A, we present a conventional repeated game model that differs from the existing literature in only one key aspect: the presence of a few committed agents. In particular, this alternative model features standard calendar time, and agents discount the future, observe the most recent past actions of the partner, and are not limited to choosing only stationary strategies. We show that most of our results hold also in this setup. We feel that this alternative model, while being closer to the existing literature than the main model, suffers from added technical complexity that may hinder the model from being insightful and accessible. 3. Solution Concept 3.1. Long-run payoff In this subsection, we define the long-run average (per-round) payoff of a patient agent who follows a stationary strategy $$s$$, given a steady state $$\left(S^{N},\sigma,\theta\right)$$ of a perturbed environment $$\left(\left(G,k\right),\left(S^{C},\lambda\right),\epsilon\right)$$. The same definition, when taking $$\epsilon=0$$, holds for an unperturbed environment. We begin by extending the definition of a consistent signal profile $$\theta$$ to non-incumbent strategies. For each non-incumbent strategy $$\hat{s}\in\mathcal{S}\backslash\left(S^{N}\cup S^{C}\right)$$, define $$\theta\left(\hat{s}\right)=\theta_{\hat{s}}$$ as the distribution of signals induced by a deviating agent who follows strategy $$\hat{s}$$ and observes the distribution of signals induced by a random partner in the population (sampled according to $$\left(1-\epsilon\right)\cdot\sigma\left(s'\right)+\epsilon\cdot\lambda\left(s'\right)$$). That is, for each strategy $$\hat{s}\in\mathcal{S}\backslash\left(S\cup S^{C}\right)$$, and each signal $$m\in M$$, we define   $\theta_{\hat{s}}\left(m\right)=\left(\nu\left(\hat{s}\left(\theta_{\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)}\right)\right)\right)\left(m\right).$ We define the long-run payoff of an agent who follows an arbitrary strategy $$s\in\mathcal{S}$$ as:   $$\pi_{s}\left(S^{N},\sigma,\theta\right)=\sum_{s'\in S^{N}\cup S^{C}}\left(\left(1-\epsilon\right)\cdot\sigma\left(s'\right)+\epsilon\cdot\lambda\left(s'\right)\right)\cdot\left(\sum_{\left(a,a'\right)\in A\times A}s_{\theta\left(s'\right)}\left(a\right)\cdot s'_{\theta\left(s\right)}\left(a'\right)\cdot\pi\left(a,a'\right)\right).$$ (4) Equation (4) is straightforward. The inner (right-hand) sum ($$i.e.$$$$\sum_{\left(a,a'\right)\in A\times A}s_{\theta\left(s'\right)}\left(a\right)\cdot s'_{\theta\left(s\right)}\left(a'\right)\cdot\pi\left(a,a'\right)$$) calculates the expected payoff of Alice who follows strategy $$s$$ conditional on being matched with a partner who follows strategy $$s'$$. The outer sum weighs these conditional expected payoffs according to the frequency of each incumbent strategy $$s'$$ ($$i.e.$$$$\left(\left(1-\epsilon\right)\cdot\sigma\left(s'\right)+\epsilon\cdot\lambda\left(s'\right)\right)$$), which yields the expected payoff of Alice against a random partner in the population. Let $$\pi\left(S,\sigma,\theta\right)$$ be the average payoff of the normal agents in the population:   $\pi\left(S^{N},\sigma,\theta\right)=\sum_{s\in S^{N}}\sigma\left(s\right)\cdot\pi_{s}\left(S^{N},\sigma,\theta\right).$ 3.2. Nash and perfect equilibrium A steady state is a Nash equilibrium if no agent can obtain a higher payoff by a unilateral deviation. Formally: Definition 4. The steady state $$\left(S^{N},\sigma,\theta\right)$$ of perturbed environment $$\left(\left(G,k\right),\left(S^{C},\lambda\right),\epsilon\right)$$ is a Nash equilibrium if for each strategy $$s\in\mathcal{S}$$, it is the case that $$\pi_{s}\left(S^{N},\sigma,\theta\right)\leq\pi\left(S^{N},\sigma,\theta\right)$$. Note that the $$1-\epsilon$$ normal agents in such a Nash equilibrium must obtain the same maximal payoff. That is, each normal strategy $$s\in S^{N}$$ satisfies $$\pi_{s}\left(S^{N},\sigma,\theta\right)=\pi\left(S^{N},\sigma,\theta\right)\geq\pi_{s'}\left(S^{N},\sigma,\theta\right)$$ for each strategy $$s'\in\mathcal{S}$$. However, the $$\epsilon$$ committed agents may obtain lower payoffs. A steady state is a (regular) perfect equilibrium if it is the limit of Nash equilibria of (regular) perturbed environments when the frequency of the committed agents converges to zero. Formally (where the standard definitions of convergence of strategies, distributions and states is presented in Supplementary Appendix C): Definition 5. A steady state $$\left(S^{*},\sigma^{*},\theta^{*}\right)$$ of the environment $$\left(G,k\right)$$ is a (regular) perfect equilibrium if there exist a (regular) distribution of commitments $$\left(S^{C},\lambda\right)$$ and converging sequences $$\left(S_{n}^{N},\sigma_{n},\theta_{n}\right)_{n}\rightarrow_{n\rightarrow\infty}\left(S^{*},\sigma^{*},\theta^{*}\right)$$ and $$\left(\epsilon_{n}>0\right)_{n}\rightarrow_{n\rightarrow\infty}0$$, such that for each $$n$$, the state $$\left(S_{n}^{N},\sigma_{n},\theta_{n}\right)$$ is a Nash equilibrium of the perturbed environment $$\left(\left(G,k\right),\left(S^{C},\lambda\right),\epsilon_{n}\right)$$. In this case, we say that $$\left(S^{*},\sigma^{*},\theta^{*}\right)$$ is a (regular) perfect equilibrium with respect to distribution of commitments $$\left(S^{C},\lambda\right)$$. If $$\theta^{*}\equiv a$$, we say that action $$a\in A$$ is a (regular) perfect equilibrium action. By standard arguments, any perfect equilibrium is a Nash equilibrium of the unperturbed environment. In Supplementary Appendix C.4 we show that any symmetric (perfect) Nash equilibrium of the underlying game corresponds to a (perfect) Nash equilibrium of the environment in which all normal agents ignore the observed signal. 3.3. Stronger refinements of perfect equilibrium In Supplementary Appendix D we present three refinements of perfect equilibrium: strict perfection, evolutionary stability, and robustness. The first refinement (strict perfection) is satisfied by the equilibria constructed in Proposition 1, Theorem 2, and Theorem 3. The remaining refinements (evolutionary stability and robustness) are satisfied by all the equilibria constructed in the article. The notion of perfect equilibrium might be considered too weak because it may crucially depend on a specific set of commitment strategies. The refinement of strict perfection (à la Okada, 1981) requires the equilibrium outcome to be sustained regardless of which commitment strategies are present in the population. The notion of perfect equilibrium considers only deviations by a single agent (who has mass zero in the infinite population). The refinement of an evolutionarily stable strategy (à la Maynard Smith and Price, 1973) requires stability against a group of agents with a small positive mass who jointly deviate. The outcome of a perfect equilibrium may be non-robust in the sense that small perturbations of the distribution of observed signals may induce a change of behaviour that moves the population away from the consistent signal profile. We address this issue by introducing a refinement that we call robustness, which requires that if we slightly perturb the distribution of observed signals, then the agents still play the same equilibrium outcome with a probability very close to one (in the spirit of the notion of Lyapunov stability). 4. Prisoner’s Dilemma and Observation of Actions 4.1. The prisoner’s dilemma Our results focus on environments in which the underlying game is the Prisoner’s Dilemma (denoted by $$G_{PD}$$), which is described in Table 2. The class of Prisoner’s Dilemma games is fully described by two positive parameters $$g$$ and $$l$$. The two actions are denoted $$c$$ and $$d$$, representing cooperation and defection, respectively. When both players cooperate they both get a high payoff (normalized to one), and when they both defect they both get a low payoff (normalized to zero). When a single player defects he obtains a payoff of $$1+g$$ ($$i.e.$$ an additional payoff of $$g$$) while his opponent gets $$-l$$. TABLE 2 Matrix payoffs of Prisoner’s Dilemma games     TABLE 2 Matrix payoffs of Prisoner’s Dilemma games     Following Dixit (2003) we classify Prisoner’s Dilemma games into two kinds: offensive and defensive.8 In an offensive Prisoner’s Dilemma there is a stronger incentive to defect against a cooperator than against a defector ($$i.e.$$$$g>l$$); in a defensive PD the opposite holds ($$i.e.$$$$l>g$$). If cooperating is interpreted as exerting high effort, then the defensive PD exhibits strategic complementarity; increasing one’s effort from low to high is less costly if the opponent exerts high effort. 4.2. Stability of defection We begin by showing that defection is a regular perfect equilibrium action in any Prisoner’s Dilemma game and for any $$k$$. Formally: Proposition 1. Let $$E=\left(G_{PD},k\right)$$ be an environment. Defection is a regular perfect equilibrium action. The intuition is straightforward. Consider any distribution of commitment strategies. Consider the steady state in which all the normal incumbents defect regardless of the observed signal. It is immediate that this strategy is the unique best reply to itself. This implies that if the share of committed agents is sufficiently small, then always defecting is also the unique best reply in the slightly perturbed environment. Our first main result shows that defection is the unique regular perfect equilibrium in offensive games. Theorem 1. Let $$E=\left(G_{PD},k\right)$$ be an environment, where $$G$$ is an offensive Prisoner’s Dilemma ($$i.e.$$$$g>l$$). If $$\left(S^{*},\sigma^{*},\theta^{*}\right)$$ is a regular perfect equilibrium, then $$S^{*}=\left\{ d\right\}$$ and $$\theta^{*}=k$$. Sketch of Proof. The payoff of a strategy can be divided into two components: (1) a direct component: defecting yields additional $$g$$ points if the partner cooperates and additional $$l$$ points if the partner defects, and (2) an indirect component: the strategy’s average probability of defection determines the distribution of signals observed by the partners, and thereby determines the partner’s probability of defecting. For each fixed average probability of defection $$q$$ the fact that the Prisoner’s Dilemma is offensive implies that the optimal strategy among all those who defect with an average probability of $$q$$ is to defect, with the maximal probability, against the partners who are most likely to cooperate. This implies that all agents who follow incumbent strategies are more likely to defect against partners who are more likely to cooperate. As a result, mutants who always defect outperform incumbents because they both have a strictly higher direct payoff (since defection is a dominant action) and a weakly higher indirect payoff (since incumbents are less likely to defect against them). ∥ Discussion of Theorem 1. The proof of Theorem 1 relies on the assumption that agents are limited to choosing only stationary strategies. The stationarity assumption implies that a partner who has been observed to defect more in the past is more likely to defect in the current match. However, this may no longer be true in a non-stationary environment. In Supplementary Appendix A we analyse the classic setup of repeated games, in which agents can choose non-stationary strategies and observe the opponent’s recent actions. In that setup we are able to prove a weaker version of Theorem 1 (namely, Theorem 6) which states that full cooperation cannot be supported as a perfect equilibrium outcome in offensive Prisoner’s Dilemmas ($$i.e.$$ cooperation is not a perfect equilibrium action in offensive games). Several papers in the existing literature present various mechanisms to support cooperation in any Prisoner’s Dilemma game. Kandori (1992, Theorem 1) and Ellison (1994) show that in large finite populations cooperation can be supported by contagious equilibria even when an agent does not observe any signal about her partner ($$i.e.$$$$k=0$$). In these equilibria each agent starts the game by cooperating, but she starts defecting forever as soon as any partner has defected against her. As pointed out by Ellison (1994, p. 578), if we consider a large population in which at least one “crazy” agent defects with positive probability in all rounds regardless of the observed signal, then Kandori’s and Ellison’s equilibria fail because agents assign high probability to the event that the contagion process has already begun, even after having experienced a long period during which no partner defected against them. Recently, Dilmé (2016) presented a novel “tit-for-tat”-like contagious equilibrium that is robust to the presence of committed agents, but only for the borderline case of $$g=l$$ (as discussed in Remark 7 below). Sugden (1986) and Kandori (1992, Theorem 2) show that cooperation can be a perfect equilibrium in a setup in which each player observes a binary signal about his partner, either a “good label” or a “bad label”. All players start with a good label. This label becomes bad if a player defects against a “good” partner. The equilibrium strategy that supports full cooperation in this setup is to cooperate against good partners and defect against bad partners. Theorem 1 reveal that the presence of a small fraction of committed agents does not allow the population to maintain such a simple binary reputation under an observation structure in which players observe an arbitrary number of past actions taken by their partners. The theorem shows this indirectly, because if it were possible to derive binary reputations from this information structure, then it should have been possible to support cooperation as a perfect equilibrium action. Moreover, Theorem 4 shows that cooperation is not a perfect equilibrium action in acute games when players observe action profiles. This suggests that the presence of a few committed agents does not allow us to maintain the seemingly simple binary reputation mechanisms of Sugden (1986) and Kandori (1992), even under observation structures in which each agent observes the whole action profile of many of her opponent’s past interactions. The mild restriction to a regular perfect equilibrium is necessary for Theorem 1 to go through. Example 5 in Supplementary Appendix G demonstrates the existence of a non-regular perfect equilibrium of an offensive PD, in which players cooperate with positive probability. This non-robust equilibrium is similar to the “belief-free” sequential equilibria that support cooperation in offensive Prisoner’s Dilemma games in Takahashi (2010), which have the property that players are always indifferent between their actions, but they choose different mixed actions depending on the signal they obtain about the partner. 4.3. Stability of cooperation in defensive Prisoner’s Dilemmas Our next result shows that if players observe at least two actions, then cooperation is a regular perfect equilibrium action in any defensive Prisoner’s Dilemma. Moreover, it shows that there is essentially a unique combination of strategies that supports full cooperation in the Prisoner’s Dilemma game, according to which: (1) all agents cooperate when observing no defections, (2) all agents defect when observing at least 2 defections, (3) sometimes (but not always) agents defect when observing a single defection. Theorem 2. Let $$E=\left(G_{PD},k\right)$$ be an environment with observations of actions, where $$G_{PD}$$ is a defensive Prisoner’s Dilemma ($$g<l$$), and $$k\geq2$$. (1)If $$\left(S^{*},\sigma^{*},\theta^{*}\equiv0\right)$$ is a perfect equilibrium then: (a) for each $$s\in S^{*}$$, $$s_{0}\left(c\right)=1$$ and $$s_{m}\left(d\right)=1$$ for each $$m\geq2$$; and (b) there exist $$s,s'\in S^{*}$$ such that $$s_{1}\left(d\right)<1$$ and $$s'_{1}\left(d\right)>0$$. (2)Cooperation is a regular perfect equilibrium action. Sketch of Proof. Suppose that $$\left(S^{*},\sigma^{*},\theta^{*}\equiv0\right)$$ is a perfect equilibrium. The fact that the equilibrium induces full cooperation, in the limit when the mass of commitment strategies converges to zero, implies that all normal agents must cooperate when they observe no defections, $$i.e.$$$$s_{0}\left(c\right)=1$$ for each $$s\in S^{*}$$. Next we show that there is a normal strategy that induces the agent to defect with positive probability when observing a single defection, $$i.e.$$$$s_{1}\left(d\right)>0$$ for some $$s\in S^{*}$$. Assume to the contrary that $$s_{1}\left(c\right)=1$$ for each $$s\in S^{*}$$. If an agent (Alice) deviates and defects with small probability $$\epsilon<<1$$ when observing no defections, then she outperforms the incumbents. On the one hand, the fact that she occasionally defects when observing $$m=0$$ gives her a direct gain of at least $$\epsilon\cdot g$$. On the other hand, the probability that a partner observes her defecting twice or more is $$O\left(\epsilon^{2}\right)$$; therefore her indirect loss from these additional $$\epsilon$$ defections is at most $$O\left(\epsilon^{2}\right)\cdot\left(1+l\right)$$, and therefore for a sufficiently small $$\epsilon>0$$, Alice strictly outperforms the incumbents. The fact that $$s_{1}\left(d\right)>0$$ for some $$s\in S^{*}$$ implies that defection is a best reply conditional on an agent observing $$m=1$$. The direct gain from defecting is strictly increasing in the probability that the partner defects (because the game is defensive), while the indirect influence of defection on the behaviour of future partners is independent of the partner’s play. This implies that defection must be the unique best reply when an agent observes $$m\geq2$$, since such an observation implies a higher probability that the partner is going to defect relative to the observation of a single defection. This establishes that $$s_{m}\left(d\right)=1$$ for all $$m\geq2$$ and all $$s\in S^{*}$$. To demonstrate that there is a strategy $$s$$ such that $$s_{1}\left(d\right)<1$$, assume to the contrary that $$s_{1}\left(d\right)=1$$ for each $$s\in S^{*}$$. Suppose that the average probability of defection in the population is $$0<\Pr\left(d\right)$$. Since there is full cooperation in the limit we have $$\Pr\left(d\right)=O\left(\epsilon\right)$$. This implies that a random partner is observed to defect at least once with a probability of $$k\cdot\Pr\left(d\right)+O\left(\epsilon^{2}\right)$$. This in turn induces the defection of a fraction $$k\cdot\Pr\left(d\right)+O\left(\epsilon^{2}\right)$$ of the normal agents (under the assumption that $$s_{1}\left(d\right)=1$$). Since the normal agents constitute a fraction $$1-O\left(\epsilon\right)$$ of the population we must have $$\Pr\left(d\right)=k\cdot\Pr\left(d\right)+O\left(\epsilon^{2}\right)$$, which leads to a contradiction for any $$k\geq2$$. Thus, if $$s_{1}\left(d\right)=1$$, then defections are “contagious”, and so there is no steady state in which only a fraction $$O\left(\epsilon\right)$$ of the population defects. This completes the sketch of the proof of part 1. To prove part 2 of the theorem, let $$s^{1}$$ and $$s^{2}$$ be the strategies that defect iff $$m\geq1$$ and $$m\geq2$$, respectively. Consider the state $$\left(\left\{ s^{1},s^{2}\right\} ,\left(q^{*},1-q^{*}\right),\theta^{*}\equiv0\right)$$. The direct gain from defecting (relative to cooperating) when observing a single defection is   $\Pr\left(m=1\right)\cdot\left(\left(l\cdot\Pr\left(d|m=1\right)\right)+g\cdot\Pr\left(c|m=1\right)\right),$ where $$\Pr\left(d|m=1\right)$$ ($$\Pr\left(c|m=1\right)$$) is the probability that a random partner is going to defect (cooperate) conditional on the agent observing $$m=1$$, and $$\Pr\left(m=1\right)$$ is the average probability of observing signal $$m=1$$. The indirect loss from defection, relative to cooperation, conditional on the agent observing a single defection, is   $q^{*}\cdot\left(k\cdot\Pr\left(m=1\right)\right)\cdot\left(l+1\right)+O\left(\left(\Pr\left(m=1\right)\right)^{2}\right).$ To see this, note that a random partner defects with an average probability of $$q$$ if he observes a single defection (which occurs with probability $$k\cdot\Pr\left(m=1\right)$$ when the partner makes $$k$$ i.i.d. observations, each of which has a probability of $$\Pr\left(m=1\right)$$ of being a defection), and each defection induces a loss of $$l+1$$ to the agent (who obtains $$-l$$ instead of 1). The fact that some normal agents cooperate and others defect when observing a single defection implies that in an equilibrium both actions have to be best replies conditional on the agent observing $$m=1$$. This implies that the indirect loss from defecting is exactly equal to the direct gain (up to $$O\left(\left(\Pr\left(m=1\right)\right)^{2}\right)$$), $$i.e.$$  $\Pr\left(m=1\right)\cdot\left(\left(l\cdot\Pr\left(d|m=1\right)\right)+g\cdot\Pr\left(c|m=1\right)\right)=q^{*}\cdot\left(k\cdot\Pr\left(m=1\right)\right)\cdot\left(l+1\right)$   $$\Rightarrow q^{*}=\frac{\left(l\cdot\Pr\left(d|m=1\right)\right)+g\cdot\Pr\left(c|m=1\right)}{k\cdot\left(l+1\right)}.\label{eq:q-indifference-equation}$$ (5) The probability $$\Pr\left(d|m=1\right)$$ depends on the distribution of commitments. Yet, one can show that for every distribution of commitment strategies $$\left(S^{C},\lambda\right)$$, there is a unique value of $$q^{*}\in\left(0,\frac{1}{k}\right)$$ that solves equation (5) and that, given this $$q^{*}$$, both $$s^{1}$$ and $$s^{2}$$ (and only these strategies) are best replies. This means that the steady state $$\left(\left\{ s^{1},s^{2}\right\} ,\left(q^{*},1-q^{*}\right),\theta^{*}\equiv0\right)$$ is a perfect equilibrium. ∥ Discussion of Theorem 2. We comment on a few issues related to Theorem 2. (1) In the formal proof of Theorem 2 we show that cooperation satisfies the stronger refinements of strict perfection, evolutionary stability, and robustness (see Section 3.3 and Supplementary Appendix D). (2) Each distribution of commitment strategies induces a unique frequency $$q^{*}\in\left(0,\frac{1}{k}\right)$$ of $$s^{1}$$-agents, which yields a perfect equilibrium. One may wonder whether a population starting from a different share $$q_{0}\neq q^{*}$$ of $$s^{1}$$-agents is likely to converge to the equilibrium frequency $$q^{*}$$. It is possible to show that the answer is affirmative. Specifically, given any initial low frequency $$q_{0}\in\left(0,q^{*}\right)$$, the $$s^{1}$$-agents achieve a higher payoff than the $$s^{2}$$-agents and, given any initial high frequency $$q_{0}\in\left(q^{*},\frac{1}{k}\right)$$, the $$s^{1}$$-agents achieve a lower payoff than the $$s^{2}$$-agents. Thus, under any smooth monotonic dynamic process in which a more successful strategy gradually becomes more frequent, the share of $$s^{1}$$-agents will shift from any initial value in the interval $$q_{0}\in\left(0,\frac{1}{k}\right)$$ to the exact value of $$q^{*}$$ that induces a perfect equilibrium. (3) As discussed in the formal proof in Supplementary Appendix E.3, some distributions of commitment strategies may induce a slightly different perfect equilibrium, in which the population is homogeneous, and each agent in the population defects with probability $$q^{*}\left(\mu\right)$$ when observing a single defection (contrary to the heterogeneous deterministic behaviour described above). (4) Random number of observed actions. Consider a random environment$$\left(G_{PD},p\right)$$, where $$p\in\Delta\left(\mathbb{N}\right)$$ is a distribution with a finite support, and each agent privately observes $$k$$ actions of the partner with probability $$p\left(k\right)$$. Theorem 2 (and, similarly, Theorems 3–5) can be extended to this setup for any random environment in which the probability of observing at least two interactions is sufficiently high. The perfect equilibrium has to be adapted as follows. As in the main model, all normal agents cooperate (defect) when observing no (at least two) defections. In addition, there will be a value $$\bar{k}\in supp\left(p\right)$$ and a probability $$q\in\left[0,1\right]$$ (which depend on the distribution of commitment strategies), such that all normal agents cooperate (defect) when observing a single defection out of $$k>\bar{k}$$ ($$k<\bar{k}$$), and a fraction $$q$$ of the normal agents defect when observing a single defection out of $$\bar{k}$$ observations. (5) Cheap talk. In Supplementary Appendix F we discuss the influence on Theorems 1 and 2 of the introduction of pre-play (slightly costly) cheap-talk communication. In this setup one can show that: (a) Offensive games: No stable state exists. Both defection and cooperation are only “quasi-stable” the population state occasionally changes between theses two states, based on the occurrence of rare random experimentations. The argument is adapted from Wiseman and Yilankaya (2001). (b) Defensive games (and $$k\geq2$$): The introduction of cheap talk destabilizes all inefficient equilibria, leaving cooperation as the unique stable outcome. The argument is adapted from Robson (1990). (6) General Noise Structures: In the model described above we deal with perturbed environments that include a single kind of noise, namely, committed agents who follow commitment strategies. It is possible to extend our results to include additional sources of noise: specifically, observation noise and/or trembles. We redefine a perturbed environment as a tuple $$E_{\epsilon,\delta}=\left(\left(G,k\right),\left(S^{C},\lambda\right),\alpha,\epsilon,\delta\right)$$, where $$\left(G,k\right),\left(S^{C},\lambda\right),\epsilon$$ are defined as in the main model, $$0<\delta<<1$$ is the probability of error in each observed action of a player, and $$\alpha\in\Delta\left(A\right)$$ is a totally mixed distribution according to which the observed error is sampled from in the event of an observation error. Alternatively, these errors can also be interpreted as actions played by mistake by the partner due to trembling hands. One can show that all of our results can be adapted to this setup in a relatively straightforward way. In particular, our results hold also in environments in which most of the noise is due to observation errors, provided that there is a small positive share of committed agents (possibly much smaller than the probability of an observation error). (7) The borderline case between defensiveness and offensiveness: $$g=l$$. Such a Prisoner’s Dilemma can be interpreted as a game in which each of the players simultaneously decides whether to sacrifice a personal payoff of $$g$$ in order to induce a gain of $$1+g$$ to her partner. One can show that cooperation is also a perfect equilibrium action in this setup, and that it can be supported by the same kind of perfect equilibrium as described above. However, in this case the uniqueness result (part 1 of Theorem 2) is no longer true. The reason for this is that when $$g=l$$ an agent has the same incentive to defect regardless of the signal she observes about the partner (because the direct bonus of defection is equal to $$g=l$$ regardless of the partner’s behaviour). This implies that cooperation can be supported by a large variety of strategies (including belief-free-like strategies as in Takahashi, 2010; Dilmé, 2016). We note that none of these strategies satisfy the refinement of evolutionary stability (Supplementary Appendix D). One can adapt the proof of Theorem 1 to show that defection is the unique evolutionarily stable outcome when $$g=l$$. The following example demonstrates the existence of a perfect equilibrium that supports cooperation when the unique commitment strategy is to play each action uniformly. Example 2. (Example 1 revisited: illustration of the perfect equilibrium that supports cooperation). Consider the perturbed environment $$\left(G_{D},2,\left\{ s^{u}\equiv0.5\right\} ,\epsilon\right)$$, where $$G_{D}$$ is the defensive Prisoner’s Dilemma game with the parameters $$g=1$$ and $$l=3$$ (as presented in Table 2 above). Consider the steady state $$\left(\left\{ s^{1},s^{2}\right\} ,\left(\frac{1}{6},\frac{5}{6}\right),\theta^{*}\right)$$, where $$\theta^{*}$$ is defined as in (3) in Example 1 above. A straightforward calculation shows that the average probability in which a normal agent observes $$m=1$$ when being matched with a random partner is   $\Pr\left(m=1\right)=\epsilon\cdot0.5+3.5\cdot\epsilon\cdot\frac{1}{6}+0.5\cdot\epsilon\cdot\frac{5}{6}+O\left(\epsilon^{2}\right)=1.5\cdot\epsilon+O\left(\epsilon^{2}\right).$ The probability that the partner is a committed agent conditional on observing a single defection is:   $\Pr\left(s^{u}|m=1\right)=\frac{\epsilon\cdot0.5}{1.5\cdot\epsilon}=\frac{1}{3}\,\,\Rightarrow\,\,\Pr\left(d|m=1\right)=\frac{1}{3}\cdot0.5=\frac{1}{6},$ which yields the conditional probability that the partner of a normal agent will defect. Next we calculate the direct gain from defecting conditional on the agent observing a single defection ($$m=1$$):   $\Pr\left(m=1\right)\cdot\left(\left(l\cdot\Pr\left(d|m=1\right)\right)+g\cdot\Pr\left(c|m=1\right)\right)=1.5\cdot\epsilon\cdot\left(3\cdot\frac{1}{6}+1\cdot\frac{5}{6}\right)+O\left(\epsilon^{2}\right)=2\cdot\epsilon+O\left(\epsilon^{2}\right).$ The indirect loss from defecting conditional on the agent observing a single defection is:   $q\cdot\left(k\cdot\Pr\left(m=1\right)\right)\cdot\left(l+1\right)+O\left(\epsilon^{2}\right)=q\cdot2\cdot1.5\cdot\epsilon\cdot\left(3+1\right)=12\cdot q\cdot\epsilon+O\left(\epsilon^{2}\right).$ When taking $$q=\frac{1}{6}$$ the indirect loss from defecting is exactly equal to the direct gain (up to $$O\left(\epsilon^{2}\right)$$). Stability of cooperation when observing a single action. We conclude this section by showing that in defensive Prisoner’s Dilemmas with $$k=1$$, cooperation is a regular perfect equilibrium action iff $$g<1$$. Proposition 2. Let $$E=\left(G_{PD},1\right)$$ be an environment where $$G_{PD}$$ is a defensive Prisoner’s Dilemma ($$g<l$$). Cooperation is a (regular) perfect equilibrium action iff $$g<1$$. Sketch of Proof. Similar arguments to those presented in part 1 of Theorem 2 imply that any distribution of commitment strategies induces a unique average probability $$q$$ by which normal agents defect when observing $$m=1$$, in any cooperative perfect equilibrium. This implies that a deviator who always defects gets a payoff of $$1+g$$ in a fraction $$1-q$$ of the interactions. One can show that such a deviator outperforms the incumbents if9$$g>1$$ (whereas, if $$g<1$$, there are distributions of commitment for which $$1-q$$ is sufficiently low such that the deviator is outperformed). ∥ Proposition 2 is immediately implied by Proposition 4 in Supplementary Appendix C.5, which characterizes which distributions of commitments support cooperation as a perfect equilibrium outcome in a defensive Prisoner’s Dilemma when $$k=1$$. 5. General Observation Structures In this section, we extend our analysis to general observation structures in which the signal about the partner may also depend on the behaviour of other opponents against the partner. 5.1. Definitions An observation structure is a tuple $$\Theta=\left(k,B,o\right)$$, where $$k\in\mathbb{N}$$ is the number of observed interactions, $$B=\left\{ b_{1},..,b_{\left|B\right|}\right\}$$ is a finite set of observations that can be made in each interaction, and the mapping $$o:A\times A\rightarrow B$$ describes the observed signal as a function the action profile played in the interaction (where the first action is the one played by the current partner, and the second action is the one played by his opponent). Note that observing actions (which was analysed in the previous section) is equivalent to having $$B=A$$ and $$o\left(a,a'\right)=a$$. In the results of this section we focus on three observation structures: (1) Observation of action profiles:$$B=A^{2}$$ and $$o\left(a,a'\right)=\left(a,a'\right).$$ In this observation structure, each agent observes, in each sampled interaction of her partner, both the action played by her partner and the action played by her partner’s opponent. (2) Observation of conflicts: observing whether or not there was mutual cooperation. That is, $$B=\left\{ C,D\right\}$$, $$o\left(c,c\right)=C$$, and $$o\left(a,a'\right)=D$$ for any $$\left(a,a'\right)\neq\left(c,c\right)$$. Such an observation structure (which we have not seen in the existing literature) seems like a plausible way to capture non-verifiable feedback about the partner’s behaviour. The agent can observe, in each sampled past interaction of the partner, whether both partners were “happy” ($$i.e.$$ mutual cooperation) or whether the partners complained about each other ($$i.e.$$ there was a conflict, at least one of the players defected, and it is too costly for an outside observer to verify who actually defected). (3) Observation of actions against cooperation: $$B=\left\{ CC,DC,*D\right\}$$ and $$o\left(c,c\right)=CC$$, $$o\left(d,c\right)=DC$$, and $$o\left(c,d\right)=o\left(d,d\right)=*D$$. That is, each agent (Alice) observes a ternary signal about each sampled interaction of her partner (Bob): either both players cooperated, or Bob unilaterally defected, or Bob’s partner defected (and in this latter case Alice cannot observe Bob’s action). We analyse this observation structure because it turns out to be an “optimal” observation structure that allows cooperation to be supported as a perfect equilibrium action in any Prisoner’s Dilemma. In each of these cases, we let the mapping $$o$$ and the set of signals $$B$$ be implied by the context, and identify the observation structure $$\Theta$$ with the number of observed interactions $$k$$. In what follows we present the definitions of the main model (Sections 2 and 3) that have to be changed to deal with the general observation structure. Before playing the game, each player independently samples $$k$$ independent interactions of her partner. Let $$M$$ denote the set of feasible signals:   $M=\left\{ m\in\mathbb{N}^{\left|B\right|}\left|\sum_{i}m_{i}=k\right.\right\} ,$ where $$m_{i}$$ is interpreted as the number of times that observation $$b_{i}$$ has been observed in the sample. When agents observe conflicts, we simplify the notation by letting $$M=\left\{ 1,...,k\right\}$$, and interpreting $$m\in\left\{ 1,...,k\right\}$$ as the number of observed conflicts. The definitions of a strategy and a perturbed environment remain the same. Given a distribution of action profiles $$\psi\in\Delta\left(A\times A\right)$$, let $$\nu_{\psi}=\nu\left(\psi\right)\in\Delta\left(M\right)$$ be the multinomial distribution of signals that is induced by the distribution of action profiles $$\psi$$, $$i.e.$$   $\nu_{\psi}\left(m_{1},...,m_{\left|B\right|}\right)=\frac{k!}{m_{1}!\cdot...\cdot m_{\left|B\right|}!}\cdot\prod_{i=1}^{\left|B\right|}\left(\sum_{\left\{ \left(a,a'\right)\in A\times A|o\left(a,a'\right)=b_{i}\right\} }\psi\left(a,a'\right)\right)^{m_{i}}.$ The definition of a steady state is adapted as follows. Definition 6. (Adaptation of Def. 6).A steady state (or state) of a perturbed environment $$\left(\left(G,k\right),\left(S^{C},\lambda\right),\epsilon\right)$$ is a triple $$\left(S^{N},\sigma,\theta\right)$$, where $$S^{N}\subseteq\mathcal{S}$$ is a finite set of strategies, $$\sigma\in\Delta\left(S^{N}\right)$$ is a distribution, and $$\theta:\left(S^{N}\cup S^{C}\right)\rightarrow\Delta\left(M\right)$$ is a profile of signal distributions that satisfies for each signal $$m$$ and each strategy $$s$$ the consistency requirement (7) below. Let $$\psi_{s}\in\Delta\left(A\times A\right)$$ be the (possibly correlated) distribution of action profiles that is played when an agent with strategy $$s\in S^{N}\cup S^{C}$$ is matched with a random partner (given $$\sigma$$ and $$\theta$$); $$i.e.$$ for each $$\left(a,a'\right)\in A\times A$$, where $$a$$ is interpreted as the action of the agent with strategy $$s$$, and $$a'$$ is interpreted as the action of her partner, let  $$\psi_{s}\left(a,a'\right)=\sum_{s'\in S^{N}\cup S^{C}}\left(\left(1-\epsilon\right)\cdot\sigma\left(s'\right)+\epsilon\cdot\lambda\left(s'\right)\right)\cdot s\left(\theta_{s'}\right)\left(a\right)\cdot s'\left(\theta_{s}\right)\left(a'\right).\label{eq:psi-s}$$ (6) The consistency requirement that the mapping $$\theta$$ has to satisfy is  $$\forall m\in M,\,\,s\in S^{N}\cup S^{C},\,\,\,\,\theta_{s}\left(m\right)=\nu\left(\psi_{s}\right)\left(m\right).\label{eq:consistency-1}$$ (7) The definition of the long-run payoff of an incumbent agent remains unchanged. We now adapt the definition of the payoff of an agent (Alice) who deviates and plays a non-incumbent strategy. Unlike in the basic model, in this extension there might be multiple consistent outcomes following Alice’s deviation, as demonstrated in Example 3. Example 3. Consider an unperturbed environment $$\left(G_{PD},3\right)$$ with an observation of $$k=3$$ action profiles. Consider a homogeneous incumbent population in which all agents play the following strategy: $$s^{*}\left(m\right)=d$$ if $$m\,\textrm{includes at least 2 interactions with}\,\left(d,d\right),$$ and $$s^{*}\left(m\right)=c$$ otherwise. Consider the state $$\left(\left\{ s^{*}\right\} ,\theta^{*}=0\right)$$ in which everyone cooperates. Consider a deviator (Alice) who follows the strategy of always defecting. Then there exist three consistent post-deviation steady states (in all of which the incumbents continue to cooperate among themselves): (1) all the incumbents defect against Alice, (2) all the incumbents cooperate against Alice, and (3) all the incumbents defect against Alice with a probability of 50%. Formally, we define a consistent distribution of signals for a deviator as follows. Definition 7. Given steady state $$\left(S^{N},\sigma,\theta\right)$$ and non-incumbent strategy $$\hat{s}\in\mathcal{S}\backslash\left(S^{N}\cup S^{C}\right)$$, we say that a distribution of signals $$\theta_{\hat{s}}\in\Delta\left(M\right)$$ is consistent if  $\forall m\in M,\,\,\,\,\theta_{\hat{s}}\left(m\right)=\nu\left(\psi_{\hat{s}}\right)\left(m\right),$ where $$\psi_{s}\in\Delta\left(A\times A\right)$$ is defined as in (6) above. Let $$\Theta_{\hat{s}}\subseteq\Delta\left(M\right)$$ be the set of all consistent signal distributions of strategy $$\hat{s}$$. Given steady state $$\left(S,\sigma,\theta\right)$$, non-incumbent strategy $$\hat{s}\in\mathcal{S}\backslash\left(S^{N}\cup S^{C}\right)$$, and consistent signal distribution $$\theta\left(s\right)\equiv\theta_{\hat{s}}\in\Delta\left(M\right)$$, let $$\pi_{\hat{s}}\left(S,\sigma,\theta|\theta_{\hat{s}}\right)$$ denote the deviator’s (long-run) payoff given that in the post-deviation steady state the deviator’s distribution of signals is $$\theta_{\hat{s}}$$. Formally:   $\pi_{\hat{s}}\left(S,\sigma,\theta|\theta_{\hat{s}}\right)=\sum_{s'\in S^{N}\cup S^{C}}\left(\left(1-\epsilon\right)\cdot\sigma\left(s'\right)+\epsilon\cdot\lambda\left(s'\right)\right)\cdot\left(\sum_{\left(a,a'\right)\in A\times A}\hat{s}_{\theta\left(s'\right)}\left(a\right)\cdot s'_{\theta\left(\hat{s}\right)}\left(a'\right)\cdot\pi\left(a,a'\right)\right).$ Let $$\pi_{\hat{s}}\left(S,\sigma,\theta\right)$$ be the maximal (long-run) payoff for a deviator who follows strategy $$\hat{s}$$ in a post-deviation steady state:   $$\pi_{\hat{s}}\left(S,\sigma,\theta\right):=_{\theta_{\hat{s}}\in\Theta_{\hat{s}}}\max\pi_{\hat{s}}\left(S,\sigma,\theta|\theta_{\hat{s}}\right).\label{eq:deviator-payoff-extension}$$ (8) Remark 2. Our results remain the same if one replaces the maximum function in (8) with a minimum function. 5.2. Acute and mild Prisoner’s Dilemma In this subsection, we present a novel classification of Prisoner’s Dilemma games that plays an important role in the results of this section. Recall that the parameter $$g$$ of a Prisoner’s Dilemma game may take any value in the interval $$\left[0,l+1\right]$$ (if $$g>l+1$$, then mutual cooperation is no longer the efficient outcome that maximizes the sum of payoffs). We say that a Prisoner’s Dilemma game is acute if $$g$$ is in the upper half of this interval ($$i.e.$$ if $$g>\frac{l+1}{2}$$), and mild if it’s in the lower half ($$i.e.$$ if $$g<\frac{l+1}{2}$$). The threshold, $$g=\frac{l+1}{2}$$, is characterized by the fact that the gain from a single unilateral defection is exactly half the loss incurred by the partner who is the sole cooperator. Hence, unilateral defection is mildly tempting in mild games and acutely tempting in acute games. An interpretation of this threshold comes from a setup (which will be important for our results) in which an agent is deterred from unilaterally defecting because it induces future partners to unilaterally defect against the agent with some probability. Deterrence in acute games requires this probability of being punished to be more than 50%, while a probability of below 50% is enough for mild games. Figure 1 illustrates the classification of games into offensive/defensive and mild/acute. Example 4. Table 3 demonstrates the payoffs of specific acute ($$G_{A}$$) and mild ($$G_{M}$$) Prisoner’s Dilemma games. In both examples $$g=l$$, $$i.e.$$ the Prisoner’s Dilemma game is “linear”. This means that it can be described as a “helping game” in which agents have to decide simultaneously whether to give up a payoff of $$g$$ in order to create a benefit of $$1+g$$ for the partner. In the acute game ($$G_{A}$$) on the left, $$g=3$$ and the loss of a helping player amounts to more than half of of the benefit to the partner who receives the help ($$\frac{3}{3+1}=\frac{3}{4}>\frac{1}{2}$$), while in the mild game ($$G_{M}$$) on the right, $$g=0.2$$ and the loss of the helping player is less than half of the benefit to the partner who receives the help ($$\frac{0.2}{0.2+1}=\frac{1}{6}<\frac{1}{2}$$). TABLE 3 Matrix payoffs of acute and mild Prisoner’s Dilemma games     TABLE 3 Matrix payoffs of acute and mild Prisoner’s Dilemma games     5.3. Analysis of the stability of cooperation We first note that Proposition 1 is valid also in this extended setup, with minor adaptations to the proof. Thus, always defecting is a perfect equilibrium regardless of the observation structure. Next we analyse the stability of cooperation in each of the three interesting observation structures. The following two results show that under either observation of conflicts or observation of action profiles, cooperation is a perfect equilibrium iff the Prisoner’s Dilemma is mild. Moreover, in mild Prisoner’s Dilemma games there is essentially a unique strategy distribution that supports cooperation (which is analogous to the essentially unique strategy distribution in Theorem 2). Formally: Theorem 3. Let $$E=\left(G_{PD},k\right)$$ be an environment with observation of conflicts with $$k\geq2$$. (1)If $$G_{PD}$$ is a mild PD ($$g<\frac{l+1}{2}$$), then: (a)If $$\left(S^{*},\sigma^{*},\theta^{*}\equiv0\right)$$ is a perfect equilibrium then (1) for each $$s\in S^{*}$$, $$s_{0}\left(c\right)=1$$ and $$s_{m}\left(d\right)=1$$ for each $$m\geq2$$, and (2) there exist $$s,s'\in S^{*}$$ such that $$s_{1}\left(d\right)<1$$ and $$s'_{1}\left(d\right)>0$$. (b)Cooperation is a regular perfect equilibrium action. (2)If $$G_{PD}$$ is an acute PD ($$g>\frac{l+1}{2}$$), then cooperation is not a perfect equilibrium action. Sketch of Proof. The argument for part 1(a) is analogous to Theorem 2. In what follows we sketch the proofs of part 1(b) and part 2. Fix a distribution of commitments, and a commitment level $$\epsilon\in\left(0,1\right)$$. Let $$m$$ denote the number of observed conflicts and define $$s^{1}$$ and $$s^{2}$$ as before, but with the new meaning of $$m$$. Consider the following candidate for a perfect equilibrium $$\left(\left\{ s^{1},s^{2}\right\} ,\left(q,1-q\right),\theta^{*}\equiv0\right)$$. Here, the probability $$q$$ will be determined such that both actions are best replies when an agent observes a single conflict. That is, the direct benefit from her defecting when observing $$m=1$$ (the LHS of the equation below) must balance the indirect loss due to inducing future partners who observe these conflicts to defect (the RHS, neglecting terms of $$O\left(\epsilon\right)$$). The RHS is calculated by noting that defection induces an additional conflict only if the current partner has cooperated and that, on expectation, each such additional conflict is observed by $$k$$ future partners, each of whom defects with an average probability of $$q$$). Recall that $$\Pr\left(d|m=1\right)$$ ($$\Pr\left(c|m=1\right)$$) is the probability that a random partner is going to defect (cooperate) conditional on the agent observing $$m=1$$.   $\Pr\left(m=1\right)\cdot\left(\left(l\cdot\Pr\left(d|m=1\right)\right)+g\cdot\Pr\left(c|m=1\right)\right)=\Pr\left(m=1\right)\cdot k\cdot q\cdot\Pr\left(c|m=1\right)\cdot\left(l+1\right)$   $$\Leftrightarrow q\cdot k=\frac{\left(l\cdot\Pr\left(d|m=1\right)\right)+g\cdot\Pr\left(c|m=1\right)}{\Pr\left(c|m=1\right)\cdot\left(l+1\right)}.\label{eq:q-k-conflict}$$ (9) One can see that the RHS is increasing in $$\Pr\left(d|m=1\right)$$. The minimal bound on the value of $$q$$ is obtained when $$\Pr\left(d|m=1\right)=0$$. In this case $$q\cdot k=\frac{g}{l+1}$$. Suppose that the game is acute. In this case $$q\cdot k>0.5$$. Suppose that the average probability of defection in the population is $$\Pr\left(d\right)$$. Since there is full cooperation in the limit we have $$\Pr\left(d\right)=O\left(\epsilon\right)$$. This implies that a fraction $$2\cdot\Pr\left(d\right)+O\left(\epsilon^{2}\right)$$ of the population is involved in conflicts. This in turn induces the defection of a fraction $$2\cdot\Pr\left(d\right)\cdot k\cdot q+O\left(\epsilon^{2}\right)$$ of the normal agents (because a normal agent defects with probability $$q$$ upon observing at least one conflict in the $$k$$ sampled interactions). Since the normal agents constitute a fraction $$1-O\left(\epsilon\right)$$ of the population we must have $$\Pr\left(d\right)=2\cdot\Pr\left(d\right)\cdot k\cdot q+O\left(\epsilon^{2}\right)$$. However, in an acute game, $$2\cdot k\cdot q>1$$ leads to the contradiction that $$\Pr\left(d\right)<\Pr\left(d\right)$$. Thus, if $$2\cdot k\cdot q>1$$, then defections are contagious, and so there is no steady state in which only a fraction $$O\left(\epsilon\right)$$ of the population defects. Suppose that the game is mild. One can show that $$\Pr\left(d|m=1\right)$$ is decreasing in $$q$$, and that it converges to zero when $$k\cdot q\nearrow0.5$$. (The reason is that when $$k\cdot q$$ is close to 0.5 each defection by a committed agent induces many defections by normal agents and, conditional on observing $$m=1$$, the partner is likely to be normal and to cooperate when being matched with a normal agent.) It follows that the RHS of equation (9) is decreasing in $$q$$ and approaches the value $$\frac{g}{l+1}$$ when $$k\cdot q\nearrow0.5$$. Since the game is mild, $$\frac{g}{l+1}<0.5$$. Hence there is some $$q\cdot k<0.5$$ that solves equation (9), and in which the normal agents defect with a low probability of ($$O\left(\epsilon\right)$$). ∥ Theorem 4. Let $$E=\left(G_{PD},k\right)$$ be an environment with observation of action profiles and $$k\geq2$$. (1)If $$G_{PD}$$ is a mild PD ($$g<\frac{l+1}{2}$$), then cooperation is a regular perfect equilibrium action. (2)If $$G_{PD}$$ is an acute PD ($$g>\frac{l+1}{2}$$), then cooperation is not a perfect equilibrium action. Sketch of Proof. Using arguments that are familiar from above one can show that in any perfect equilibrium that supports cooperation, normal agents have to defect with an average probability of $$q\in\left(0,1\right)$$ when observing a single unilateral defection (and $$k-1$$ mutual cooperations), and defect with a smaller probability when observing a single mutual defection (since this is necessary in order for a normal agent to have better incentives to cooperate against a partner who is more likely to cooperate). The value of $$q$$ is determined by equation (9) above, implying that both actions are best replies conditional on an agent observing the partner to be the sole defector once, and to be involved in mutual cooperation in the remaining $$k-1$$ observed action profiles. Let $$\epsilon$$ be the share of committed agents, and let $$\varphi$$ be the average probability that a committed agent unilaterally defects. To simplify the sketch of the proof, we will focus on the case in which the committed agents defect with a small probability when observing the partner to have been involved only in mutual cooperations, which implies, in particular, that $$\varphi<<1$$ (the formal proof in the Supplementary Appendix does not make this simplifying assumption). The unilateral defections of the committed agents induce a fraction $$\epsilon\cdot\varphi\cdot k\cdot q+O\left(\epsilon^{2}\right)+O\left(\varphi^{2}\right)$$ of the normal agents to defect when being matched against committed agents (because a normal agent defects with probability $$q$$ upon observing a single unilateral defection in the $$k$$ sampled interactions). These unilateral defections of normal agents against committed agents induce a further $$\left(\epsilon\cdot\varphi\cdot k\cdot q\right)\cdot k\cdot q+O\left(\epsilon^{2}\right)$$ defections of normal agents against other normal agents. Repeating this argument we come to the conclusion that the average probability of a normal agent being the sole defector is (neglecting terms of $$O\left(\epsilon^{2}\right)$$ and $$O\left(\varphi^{2}\right)$$):   $\epsilon\cdot\varphi\cdot k\cdot q\cdot\left(1+k\cdot q+\left(k\cdot q\right)^{2}+...\right)=\epsilon\cdot\varphi\cdot\frac{k\cdot q}{1-k\cdot q.}$ As discussed above, in acute games, the value of $$k\cdot q$$ must be larger than $$0.5$$, which implies that $$\frac{k\cdot q}{1-k\cdot q}>1$$. This implies that conditional on an agent observing the partner to be the sole defector once, the posterior probability that the partner is normal is:   $\frac{\epsilon\cdot\varphi\cdot\frac{k\cdot q}{1-k\cdot q}}{\epsilon\cdot\varphi+\epsilon\cdot\varphi\cdot\frac{k\cdot q}{1-k\cdot q}}=\frac{\frac{k\cdot q}{1-k\cdot q}}{1+\frac{k\cdot q}{1-k\cdot q}}>0.5.$ Thus, normal agents are more likely to unilaterally defect than committed agents. One can show that when there is a mutual defection, it is most likely that at least one of the agents involved is committed. This implies that the partner is more likely to defect when he is observed to be involved in mutual defection relative to being observed to be the sole defector. This implies that defection is the unique best reply when observing a single mutual defection, and this contradicts the assumption that normal agents cooperate with positive probability when observing a single mutual defection. When the game is mild, a construction similar to the previous proofs supports cooperation as a perfect equilibrium. ∥ Our last result studies the observation of actions against cooperation, and it shows that cooperation is a perfect equilibrium action in any underlying Prisoner’s Dilemma. Formally: Theorem 5. Let $$E=\left(G_{PD},k\right)$$ be an environment with observation of actions against cooperation and $$k\geq2$$. Then cooperation is a regular perfect equilibrium action. The intuition behind the proof is as follows. Not allowing Alice to observe Bob’s behaviour when his past opponent has defected helps to sustain cooperation because it implies that defecting against a defector does not have any negative indirect effect (in any steady state) because it is never observed by future opponents. This encourages agents to defect against partners who are more likely to defect, and allows cooperation to be sustained regardless of the values of $$g$$ and $$l$$. TABLE 4 summarizes our analysis and shows the characterization of the conditions under which cooperation can be sustained as a perfect equilibrium outcome in environments in which agents observe at least 2 actions. TABLE 4 Summary of key results: when is cooperation a perfect equilibrium outcome? Category of PD  Parameters  Observation structure (any $$k\geq2$$)  Actions  Conflicts  Action profiles  Actions against cooperation  Mild & Defensive  $$\overset{\overset{\overset{\,}{\,}}{\,}}{\underset{}{g}}\underset{\,}{<}\min\left(l,\frac{l+1}{2}\right)$$          Mild & Offensive  $$l < {g} < \frac{l+1}{2}$$          Acute & Defensive  $$\frac{l+1}{2} < {g} < l$$          Acute & Offensive  $$\max\left(l,\frac{l+1}{2}\right)<\overset{\overset{\overset{\,}{\,}}{\,}}{\underset{}{g}}$$          Category of PD  Parameters  Observation structure (any $$k\geq2$$)  Actions  Conflicts  Action profiles  Actions against cooperation  Mild & Defensive  $$\overset{\overset{\overset{\,}{\,}}{\,}}{\underset{}{g}}\underset{\,}{<}\min\left(l,\frac{l+1}{2}\right)$$          Mild & Offensive  $$l < {g} < \frac{l+1}{2}$$          Acute & Defensive  $$\frac{l+1}{2} < {g} < l$$          Acute & Offensive  $$\max\left(l,\frac{l+1}{2}\right)<\overset{\overset{\overset{\,}{\,}}{\,}}{\underset{}{g}}$$          TABLE 4 Summary of key results: when is cooperation a perfect equilibrium outcome? Category of PD  Parameters  Observation structure (any $$k\geq2$$)  Actions  Conflicts  Action profiles  Actions against cooperation  Mild & Defensive  $$\overset{\overset{\overset{\,}{\,}}{\,}}{\underset{}{g}}\underset{\,}{<}\min\left(l,\frac{l+1}{2}\right)$$          Mild & Offensive  $$l < {g} < \frac{l+1}{2}$$          Acute & Defensive  $$\frac{l+1}{2} < {g} < l$$          Acute & Offensive  $$\max\left(l,\frac{l+1}{2}\right)<\overset{\overset{\overset{\,}{\,}}{\,}}{\underset{}{g}}$$          Category of PD  Parameters  Observation structure (any $$k\geq2$$)  Actions  Conflicts  Action profiles  Actions against cooperation  Mild & Defensive  $$\overset{\overset{\overset{\,}{\,}}{\,}}{\underset{}{g}}\underset{\,}{<}\min\left(l,\frac{l+1}{2}\right)$$          Mild & Offensive  $$l < {g} < \frac{l+1}{2}$$          Acute & Defensive  $$\frac{l+1}{2} < {g} < l$$          Acute & Offensive  $$\max\left(l,\frac{l+1}{2}\right)<\overset{\overset{\overset{\,}{\,}}{\,}}{\underset{}{g}}$$          6. Related Literature In what follows we discuss related literature that was not discussed above. Related experimental literature is discussed in Supplementary Appendix B. 6.1. Models with rare committed types Various papers have shown that when a patient long-run agent (she) plays a repeated game against partners who can observe her entire history of play, and there is a small probability of the agent being a commitment type, then the agent can guarantee herself a high payoff in any equilibrium by mimicking an irrational type committed to Stackelberg-leader behaviour ($$e.g.$$Kreps et al., 1982; Fudenberg and Levine, 1989; Celetani et al., 1996; see Mailath and Samuelson, 2006, for a textbook analysis and survey). When both sides of the game are equally patient, and, possibly, both sides have a small probability of being a commitment type, then the specific details about the set of feasible commitment types, the underlying game, and the discount factor are important in determining whether an agent can guarantee a high Stackelberg-leader payoff or whether a folk theorem result holds and the set of equilibrium payoffs is the same as in the case of complete information (see, $$e.g.$$Cripps and Thomas, 1995; Chan, 2000; Cripps et al., 2005; Hörner and Lovo, 2009; Atakan and Ekmekci, 2011; Pęski, 2014). One contribution of our article is to demonstrate that the introduction of a small probability that an agent is committed may have qualitatively different implications in repeated games with random matching.10 In defensive games, the presence of a few committed agents in the population implies that there is a unique stationary strategy to sustain full cooperation. In offensive games with observation of actions, the presence of committed agents implies that the low payoff of zero (of mutual defection) is the unique equilibrium payoff in the stationary model (and it rules out the highest symmetric payoff of 1 in the conventional model).11 6.2. Image scoring In an influential paper, Nowak and Sigmund (1998) present the mechanism of image scoring to support cooperation when agents from a large community are randomly matched and each agent observes the partner’s past actions. In their setup, each agent observes the last $$k$$ past actions of the partner, and she defects if and only if the partner has defected at least $$m$$ times in the last $$k$$ observed actions. A couple of papers have raised concerns about the stability of cooperation under image-scoring mechanisms. Specifically, Leimar and Hammerstein (2001) demonstrate in simulations that cooperation is unstable, and Panchanathan and Boyd (2003) analytically study the case in which each agent observes the last action.12 Our article makes two key contributions to this literature. First, we introduce a novel variant of image scoring that is essentially the unique stationary way to support cooperation as a perfect equilibrium outcome when agents observe actions. Second, we show that the classification of Prisoner’s Dilemma games into offensive and defensive games is critical to the stability of cooperation when agents observe actions (and image scoring fails in offensive Prisoner’s Dilemma games). 6.3. Structured populations and voluntarily separable interactions A few papers have studied the scope of cooperation in the case where players do not have any information about their current partner but the matching of agents is not uniformly random. That is, the population is assumed to have some structure such that some agents are more likely to be matched to some partners than to other partners. van Veelen, et al. (2012) and Alger and Weibull (2013) show that it is possible to sustain cooperation with no information about the partner’s behaviour if matching is sufficiently assortative, $$i.e.$$ if cooperators are more likely to interact with other cooperators. Ghosh and Ray (1996) and Fujiwara-Greve and Okuno-Fujiwara (2009, 2017) show how to sustain cooperation in a related setup in which matching is random, but each pair of matched agents may unanimously agree to keep interacting without being rematched to other agents.13 Our paper shows that letting players observe the partner’s behaviour in two interactions is sufficient to sustain cooperation without assuming assortativity or repeated interactions with the same partner. 6.4. Models without calendar time The present paper differs from most of the literature on community enforcement by having a model without a global time zero. To the best of our knowledge, Rosenthal (1979) is the first paper to present the notion of a steady-state Nash equilibrium in environments in which each player observes the partner’s last action, and apply it to the study of the Prisoner’s Dilemma. Rosenthal focuses only on pure steady states (in which everyone uses the same pure strategy), and concludes that defection is the unique pure stationary Nash equilibrium action except in a few knife-edge cases. The methodology is further developed in Okuno-Fujiwara and Postlewaite (1995). Other papers following a related approach include Rubinstein and Wolinsky (1985), who study bargaining, and Phelan and Skrzypacz (2006) who study repeated games with private monitoring. Our methodological contribution to the previous literature is that (1) we allow each agent to observe the behaviour of the partner in several past interactions with other opponents, and (2) we combine the steady-state analysis with the presence of a few committed agents and present a novel notion of a perfect equilibrium to analyse this setup. 7. Conclusion In many situations, people engage in short-term interactions where they are tempted to behave opportunistically but there is a possibility that future partners will obtain some information about their behaviour today. We propose a new modelling approach based on the premises that (1) an equilibrium has to be robust to the presence of a few committed agents, and (2) the community has been interacting from time immemorial (though this latter assumption is relaxed in Supplementary Appendix A). We develop a novel methodology that allows for a tractable analysis of these seemingly complicated environments. We apply this methodology to the study of Prisoner’s Dilemma games, and we obtain sharp testable predictions for the equilibrium outcomes, and the exact conditions under which cooperation can be sustained as an equilibrium outcome. Finally, we show that whenever cooperation is sustainable, there is a unique (and novel) way to support it that has a few appealing properties: (1) agents behave in an intuitive and simple way, and (2) the equilibrium is robust, $$e.g.$$ to deviations by a group of agents, or to the presence of any kind of committed agents. We believe that our modelling approach will be helpful in understanding various interactions in future research. Acknowledgements A previous version of this article was circulated under the title “Stable observable behaviour”. We have benefited greatly from discussions with Vince Crawford, Eddie Dekel, Christoph Kuzmics, Ariel Rubinstein, Larry Samuelson, Bill Sandholm, Rann Smorodinsky, Rani Spiegler, Balázs Szentes, Satoru Takahashi, Jörgen Weibull, and Peyton Young. We would like to express our deep gratitude to seminar/workshop participants at the University of Amsterdam (CREED), University of Bamberg, Bar Ilan University, Bielefeld University, University of Cambridge, Hebrew University of Jerusalem, Helsinki Center for Economic Research, Interdisciplinary Center Herzliya, Israel Institute of Technology, Lund University, University of Oxford, University of Pittsburgh, Stockholm School of Economics, Tel Aviv University, NBER Theory Workshop at Wisconsin-Madison, KAEA session at the ASSA 2015, the Biological Basis of Preference conference at Simon Fraser University, and the 6th workshop on stochastic methods in game theory at Erice, for many useful comments. Danial Ali Akbari provided excellent research assistance. Yuval Heller is grateful to the European Research Council for its financial support (Starting Grant #677057). Erik Mohlin is grateful to Handelsbankens forskningsstiftelser (grant #P2016-0079:1), the Swedish Research Council (grant #2015-01751), and the Knut and Alice Wallenberg Foundation (Wallenberg Academy Fellowship #2016-0156) for their financial support. Finally, we thank Renana Heller for suggesting the title. Supplementary Data Supplementary data are available at Review of Economic Studies online. Footnotes 1. In contagious equilibria players start by cooperating. If one player defects at stage $$t$$, her partner defects at stage $$t+1$$, infecting another player who defects at stage $$t+2$$, and so on. In belief-free equilibria players are always indifferent between their actions, but they choose different mixed actions depending on the signal they obtain about the partner. We discuss the non-robustness of these classes of equilibria at the end of Section 4.2. 2. As discussed later, our uniqueness results also rely on an additional assumption that agents are restricted to choose stationary strategies, which depend only on the signal about the partner. As shown in Supplementary Appendix A, all other results hold also in a standard setup without the restriction to stationary strategies. 3. The reason why the consistent signal profile is required to be part of the description of a steady state, rather than being uniquely determined by the distribution of strategies, is that our environment, unlike a standard repeated game, lacks a global starting time that determines the initial conditions. An example of a strategy that has multiple consistent signal profiles is as follows. The parameter $$k$$ is equal to three, and everyone plays the most frequently observed action in the sample of the three observed actions. There are three behaviours that are consistent with this population: one in which everyone cooperates, one in which everyone defects, and one in which everyone plays (on average) uniformly. 4. In Supplementary Appendix D we show that all the equilibria presented in this article satisfy two additional refinements: (1) evolutionary stability (Maynard Smith, 1974)—any small group of agents who jointly deviate are outperformed, and (2) robustness—no small perturbation in the distribution of observed signals can move the population’s behaviour away from a situation in which everyone plays the equilibrium outcome. In addition, most of these equilibria also satisfy the refinement of strict perfection (Okada, 1981)—the equilibrium remains stable with respect to all commitment strategies. 5. The results can be adapted to a setup with a large finite population. We do not formalize a large finite population, as this adds much complexity to the model without giving substantial new insights. Most of the existing literature also models large populations as continua (see, $$e.g.$$Rubinstein and Wolinsky, 1985; Weibull, 1995; Dixit, 2003; Herold and Kuzmics, 2009; Sakovics and Steiner, 2012; Alger and Weibull, 2013). Kandori (1992) and Ellison (1994) show that large finite populations differ from infinite populations because only the former can induce contagious equilibria. However, as noted by Ellison (1994, p. 578), and as discussed in Section 4.2, these contagious equilibria fail in the presence of a single “crazy” agent who always defects. 6. We do not allow agents to manipulate the observed signals. In our companion paper (Heller and Mohlin, 2017a) we study a related setup in which agents are allowed to exert effort in deception by influencing the signal observed by the opponent. 7. In a companion paper (Heller and Mohlin, 2017b), we study in a broader setup necessary and sufficient conditions for a strategy distribution admitting a unique consistent signal profile. 8. Takahashi (2010) calls offensive (defensive) Prisoner’s Dilemmas submodular (supermodular). 9. In environments with $$k\geq2$$, a deviator who always defects gets a payoff of zero, regardless of the value of $$q$$ (because all agents observe $$m=k$$ when being matched with such a deviator). 10. We are aware of only one paper that introduces rare commitment types to repeated games with random matching. Dilmé (2016) constructs cooperative “tit-for-tat”-like equilibria that are robust to the presence of committed agents, in the borderline case in which $$g=l$$ is the underlying Prisoner’s Dilemma (see the discussion of the case of $$g=l$$ in Remark 7 in Section 4.3). Ghosh and Ray (1996) study a somewhat related setup (which is further discussed below) in which the presence of a non-negligible share of agents who are committed to always defecting allows cooperation to be sustained among the normal agents in voluntarily separable interactions. 11. Ely et al. (2008) show a related result in a setup in which a long-run player faces a sequence of short-run players. They show that if the participation of the short-run players is optional, and if every action of the long-run player that makes the short-run players want to participate can be interpreted as a signal that the long-run player is “bad,” then reputation uniquely chooses a low equilibrium payoff to the long-run player. 12. See Berger and Grüne (2016) who study observation of $$k$$ actions, but restrict agents to play only image-scoring-like strategies. 13. See also Herold (2012) who studies a “haystack” model in which individuals interact within separate groups. REFERENCES ALGER I. and WEIBULL J. W. ( 2013), “Homo Moralis – Preference Evolution under Incomplete Information and Assortative Matching”, Econometrica , 81, 2269– 2302. Google Scholar CrossRef Search ADS   ATAKAN A. E. and EKMEKCI M. ( 2011), “Reputation in Long-run Relationships”, The Review of Economic Studies , 79, 451– 480. Google Scholar CrossRef Search ADS   BERGER U. and GRÜNE A. ( 2016), “On the Stability of Cooperation under Indirect Reciprocity with First-order Information”, Games and Economic Behavior , 98, 19– 33. Google Scholar CrossRef Search ADS   BERNSTEIN L. ( 1992), “Opting Out of the Legal System: Extralegal Contractual Relations in the Diamond Industry”, The Journal of Legal Studies , 21, 115– 157. Google Scholar CrossRef Search ADS   CELETANI M., FUDENBERG D., LEVINE D. K., et al.   ( 1996), “Maintaining a Reputation Against a Long-lived Opponent”, Econometrica , 64, 691– 704. Google Scholar CrossRef Search ADS   CHAN J. ( 2000), “On the Non-existence of Reputation Effects in Two-person Infinitely-Repeated Games” ( Discussion Paper, Working Papers, The Johns Hopkins University, Department of Economics). CRIPPS M. W., DEKEL E. and PESENDORFER W. ( 2005), “Reputation with Equal Discounting in Repeated Games with Strictly Conflicting Interests”, Journal of Economic Theory , 121, 259– 272. Google Scholar CrossRef Search ADS   CRIPPS M. W. and THOMAS J. P. ( 1995), “Reputation and Commitment in Two-person Repeated Games without Discounting”, Econometrica , 6, 1401– 1419. Google Scholar CrossRef Search ADS   DEB J. ( 2017), “Cooperation and Community Responsibility: A Folk Theorem for Repeated Matching Games with Names” ( Mimeo). Google Scholar CrossRef Search ADS   DEB J. and GONZÁLEZ-DÍAZ J. ( 2014), “Community Enforcement Beyond the Prisoner’s Dilemma” ( Mimeo). DILMé F. ( 2016), “Helping Behavior in Large Societies”, International Economic Review , 57, 1261– 1278. Google Scholar CrossRef Search ADS   DIXIT A. ( 2003), “On Modes of Economic Governance”, Econometrica , 71, 449– 481. Google Scholar CrossRef Search ADS   DUFFY J. and OCHS J. ( 2009), “Cooperative Behavior and the Frequency of Social Interaction”, Games and Economic Behavior , 66, 785– 812. Google Scholar CrossRef Search ADS   ELLISON G. ( 1994), “Cooperation in the Prisoner’s Dilemma with Anonymous Random Matching”, The Review of Economic Studies , 61, 567– 588. Google Scholar CrossRef Search ADS   ELY J., FUDENBERG D. and LEVINE D. K. ( 2008), “When is Reputation Bad?”, Games and Economic Behavior , 63, 498– 526. Google Scholar CrossRef Search ADS   FUDENBERG D. and LEVINE D. K. ( 1989), “Reputation and Equilibrium Selection in Games with a Patient Player”, Econometrica , 57, 759– 778. Google Scholar CrossRef Search ADS   FUJIWARA-GREVE T. and OKUNO-FUJIWARA M. ( 2009), “Voluntarily Separable Repeated Prisoner’s Dilemma”, The Review of Economic Studies , 76, 993– 1021. Google Scholar CrossRef Search ADS   FUJIWARA-GREVE T. and OKUNO-FUJIWARA M. ( 2017), “Long-term Cooperation and Diverse Behavior Patterns under Voluntary Partnerships”, ( Mimeo). GHOSH P. and RAY D. ( 1996), “Cooperation in Community Interaction without Information Flows”, The Review of Economic Studies , 63, 491– 519. Google Scholar CrossRef Search ADS   GREIF A. ( 1993), “Contract Enforceability and Economic Institutions in Early Trade: The Maghribi Traders’ Coalition”, The American Economic Review , 83, 525– 548. HELLER Y. and MOHLIN E. ( 2017a), “Coevolution of Deception and Preferences: Darwin and Nash Meet Machiavelli” ( Mimeo). Google Scholar CrossRef Search ADS   HELLER Y. and MOHLIN E. ( 2017b), “When Is Social Learning Path-Dependent?”. HEROLD F. ( 2012), “Carrot or stick? The Evolution of Reciprocal Preferences in a Haystack Model”, American Economic Review , 102( 2), 914– 940. Google Scholar CrossRef Search ADS   HEROLD F. and KUZMICS C. ( 2009), “Evolutionary Stability of Discrimination under Observability”, Games and Economic Behavior , 67, 542– 551. Google Scholar CrossRef Search ADS   HÖRNER J. and LOVO S. ( 2009), “Belief-free Equilibria in Games With Incomplete Information”, Econometrica , 77, 453– 487. Google Scholar CrossRef Search ADS   JØSANG A., ISMAIL R. and BOYD C. ( 2007), “A Survey of Trust and Reputation Systems for Online Service Provision”, Decision Support Systems , 43, 618– 644. Google Scholar CrossRef Search ADS   KANDORI M. ( 1992), “Social Norms and Community Enforcement”, The Review of Economic Studies , 59, 63– 80. Google Scholar CrossRef Search ADS   KREPS D. M., MILGROM P., ROBERTS J., et al.   ( 1982), “Rational Cooperation in the Finitely Repeated Prisoners’ Dilemma”, Journal of Economic Theory , 27, 245– 252. Google Scholar CrossRef Search ADS   LEIMAR O. and HAMMERSTEIN P. ( 2001), “Evolution of Cooperation through Indirect Reciprocity”, Proceedings of the Royal Society of London. Series B: Biological Sciences , 268, 745– 753. Google Scholar CrossRef Search ADS   MAILATH G. J. and SAMUELSON L. ( 2006), Repeated Games and Reputations , vol. 2 ( Oxford: Oxford University Press). Google Scholar CrossRef Search ADS   MATSUSHIMA H., TANAKA T. and TOYAMA T. ( 2013), “Behavioral Approach to Repeated Games with Private Monitoring” ( University of Tokyo Faculty of Economics Discussion paper). MAYNARD SMITH J. ( 1974), “The Theory of Games and the Evolution of Animal Conflicts”, Journal of Theoretical Biology , 47, 209– 221. Google Scholar CrossRef Search ADS PubMed  MAYNARD SMITH J. and PRICE G. R. ( 1973), “The Logic of Animal Conflict”, Nature , 246, 15. Google Scholar CrossRef Search ADS   MILGROM P., NORTH D. C. and WEINGAST B. R. ( 1990), “The Role of Institutions in the Revival of Trade: The Law Merchant, Private Judges, and the Champagne Fairs”, Economics and Politics , 2, 1– 23. Google Scholar CrossRef Search ADS   NOWAK M. A. and SIGMUND K. ( 1998), “Evolution of Indirect Reciprocity by Image Scoring”, Nature , 393( 6685), 573– 577. Google Scholar CrossRef Search ADS PubMed  OKADA A. ( 1981), “On Stability of Perfect Equilibrium Points”, International Journal of Game Theory , 10, 67– 73. Google Scholar CrossRef Search ADS   OKUNO-FUJIWARA M. and POSTLEWAITE A. ( 1995), “Social Norms and Random Matching Games”, Games and Economic Behavior , 9, 79– 109. Google Scholar CrossRef Search ADS   PANCHANATHAN K. and BOYD R. ( 2003), “A Tale of Two Defectors: The Importance of Standing for Evolution of Indirect Reciprocity”, Journal of Theoretical Biology , 224, 115– 126. Google Scholar CrossRef Search ADS PubMed  PĘSKI M. ( 2014), “Repeated Games with Incomplete Information and Discounting”, Theoretical Economics , 9, 651– 694. Google Scholar CrossRef Search ADS   PHELAN C. and SKRZYPACZ A. ( 2006), “Private Monitoring with Infinite Histories” ( Discussion Paper, Federal Reserve Bank of Minneapolis). RESNICK P. and ZECKHAUSER R. ( 2002), “Trust Among Strangers in Internet Transactions: Empirical Analysis of eBay’s Reputation System”, The Economics of the Internet and E-commerce , 11, 23– 25. ROBSON A. J. ( 1990), “Efficiency in Evolutionary Games: Darwin, Nash, and the Secret Handshake”, Journal of Theoretical Biology , 144, 379– 396. Google Scholar CrossRef Search ADS PubMed  ROSENTHAL R. W. ( 1979), “Sequences of Games with Varying Opponents”, Econometrica , 47, 1353– 1366. Google Scholar CrossRef Search ADS   RUBINSTEIN A. and WOLINSKY A. ( 1985), “Equilibrium in a Market with Sequential Bargaining”, Econometrica , 53, 1133– 1150. Google Scholar CrossRef Search ADS   SAKOVICS J. and STEINER J. ( 2012), “Who Matters in Coordination Problems?”, The American Economic Review , 102, 3439– 3461. Google Scholar CrossRef Search ADS   SELTEN R. ( 1975), “Reexamination of the Perfectness Concept for Equilibrium Points in Extensive Games”, International Journal of Game Theory , 4, 25– 55. Google Scholar CrossRef Search ADS   SUGDEN R. ( 1986), The Economics of Rights, Co-operation and Welfare  ( Oxford: Blackwell Publishers). TAKAHASHI S. ( 2010), “Community Enforcement when Players Observe Partners’ Past Play”, Journal of Economic Theory , 145, 42– 62. Google Scholar CrossRef Search ADS   VAN VEELEN M., GARCÍA J., Rand D. G., et al.   ( 2012), “Direct Reciprocity in Structured Populations”, Proceedings of the National Academy of Sciences , 109, 9929– 9934. Google Scholar CrossRef Search ADS   WEIBULL J. W. ( 1995), Evolutionary Game Theory  ( Cambridge, MA: MIT Press). WISEMAN T., and YILANKAYA O. ( 2001), “Cooperation, Secret Handshakes, and Imitation in the Prisoners’ Dilemma”, Games and Economic Behavior , 37, 216– 242. Google Scholar CrossRef Search ADS   © The Author 2017. Published by Oxford University Press on behalf of The Review of Economic Studies Limited. Advance access publication 20 December 2017 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png The Review of Economic Studies Oxford University Press

# Observations on Cooperation

, Volume Advance Article – Dec 20, 2017
30 pages

/lp/ou_press/observations-on-cooperation-34gg0WmwIe
Publisher
Oxford University Press
© The Author 2017. Published by Oxford University Press on behalf of The Review of Economic Studies Limited. Advance access publication 20 December 2017
ISSN
0034-6527
eISSN
1467-937X
D.O.I.
10.1093/restud/rdx076
Publisher site
See Article on Publisher Site

### Abstract

Abstract We study environments in which agents are randomly matched to play a Prisoner’s Dilemma, and each player observes a few of the partner’s past actions against previous opponents. We depart from the existing related literature by allowing a small fraction of the population to be commitment types. The presence of committed agents destabilizes previously proposed mechanisms for sustaining cooperation. We present a novel intuitive combination of strategies that sustains cooperation in various environments. Moreover, we show that under an additional assumption of stationarity, this combination of strategies is essentially the unique mechanism to support full cooperation, and it is robust to various perturbations. Finally, we extend the results to a setup in which agents also observe actions played by past opponents against the current partner, and we characterize which observation structure is optimal for sustaining cooperation. 1. Introduction Consider the following example of a simple yet fundamental economic interaction. Alice has to trade with another agent, Bob, whom she does not know. Both sides have opportunities to cheat, to their own benefit, at the expense of the other. Alice is unlikely to interact with Bob again, and thus her ability to retaliate, in case Bob acts opportunistically, is restricted. The effectiveness of external enforcement is also limited, $$e.g.$$ due to incompleteness of contracts, non-verifiability of information, and court costs. Thus cooperation may be impossible to achieve. Alice searches for information about Bob’s past behaviour, and she obtains anecdotal evidence about Bob’s actions in a couple of past interactions. Alice considers this information when she decides how to act. Alice also takes into account that her behaviour towards Bob in the current interaction may be observed by her future partners. Historically, the above-described situation was a challenge to the establishment of long-distance trade (Milgrom et al., 1990; Greif, 1993), and it continues to play an important role in the modern economy, in both offline (Bernstein, 1992; Dixit, 2003) and online interactions (Resnick and Zeckhauser, 2002; Jøsang et al., 2007). Several papers have studied the question of how cooperation can be supported by means of community enforcement. Most of these papers assume that all agents in the community are rational and, in equilibrium, best reply to what everyone else is doing. As argued by Ellison (1994, p. 578), this assumption may be fairly implausible in large populations. It seems quite likely that, in a large population, there will be at least some agents who fail to best respond to what the others are doing, either because they are boundedly rational, have idiosyncratic preferences, or because their expectations about other agents’ behaviour are incorrect. Motivated by this argument, we allow a few agents in the population to be committed to behaviours that do not necessarily maximize their payoffs. It turns out that this seemingly small modification completely destabilizes existing mechanisms for sustaining cooperation when agents are randomly matched with new partners in each period. Specifically, both the contagious equilibria (Kandori, 1992; Ellison, 1994) and the “belief-free” equilibria (Takahashi, 2010; Deb, 2017) fail in the presence of a small fraction of committed agents.1 Our key results are as follows. First, we show that always defecting is the unique perfect equilibrium, regardless of the number of observed actions, provided that the bonus of defection in the underlying Prisoner’s Dilemma is larger when the partner cooperates than when the partner defects. Second, in the opposite case, when the bonus of defection is larger when the partner defects than when the partner cooperates, we present a novel and essentially unique combination of strategies that sustains cooperation: all agents cooperate when they observe no defections and defect when they observe at least two defections.2 Some of the agents also defect when observing a single defection. Importantly, this cooperative behaviour is robust to various perturbations, and it appears consistent with experimental data. Third, we extend the model to environments in which an agent also obtains information about the behaviour of past opponents against the current partner. We show that in this setup cooperation can be sustained if and only if the bonus of defection of a player is less than half the loss she induces a cooperative partner to suffer. Finally, we characterize an observation structure that allows cooperation to be supported as a perfect equilibrium outcome in all Prisoner’s Dilemma games. In all observation structures, we use the same essentially unique construction to sustain cooperation. 1.1. Overview of the model Agents in an infinite population are randomly matched into pairs to play the Prisoner’s Dilemma game, in which each player decides simultaneously whether to cooperate or defect (see the payoff matrix in Table 1). If both players cooperate they obtain a payoff of one, if both defect they obtain a payoff of zero, and if one of the players defects, the defector gets $$1+g$$, while the cooperator gets $$-l$$, where $$g,l>0$$ and $$g<l+1$$. (The latter inequality implies that mutual cooperation is the efficient outcome that maximizes the sum of payoffs.) TABLE 1 Matrix payoffs of Prisoner’s Dilemma games     TABLE 1 Matrix payoffs of Prisoner’s Dilemma games     Before playing the game, each agent privately draws a random sample of $$k$$ actions that have been played by her partner against other opponents in the past. The assumption that a small random sample is taken from the entire history of the partner is intended to reflect a setting in which the memory of past interactions is long and accurate but dispersed. This means that the information that reaches an agent about her partner (through gossip) arrives in a non-deterministic fashion and may stem from any point in the past. We require each agent to follow a stationary strategy, $$i.e.$$ a mapping that assigns a mixed action to each signal that the agent may observe about the current partner. (That is, the action is not allowed to depend on calendar time or on the agent’s own history.) A steady state of the environment is a pair consisting of: (1) a distribution of strategies with a finite support that describes the fractions of the population following the different strategies, and (2) a signal profile that describes the distribution of signals that is observed when an agent is matched with a partner playing any of the strategies present in the population. The signal profile is required to be consistent with the distribution of strategies in the sense that a population of agents who follow the distribution of strategies and observe signals about the partners sampled from the signal profile will behave in a way that induces the same signal profile.3 Our restriction to stationary strategies and our focus on consistent steady states allow us to relax the standard assumption that there is an initial time zero at which an entire community starts to interact. In various real-life situations, the interactions within the community have been going on from time immemorial. Consequently, the participants may have only a vague idea of the starting point. Arguably, agents might therefore be unable to condition their behaviour on everything that has happened since the beginning of the interactions. We perturb the environment by introducing $$\epsilon$$committed agents who each follow one strategy from an arbitrary finite set of commitment strategies. We assume that at least one of the commitment strategies is totally mixed, which implies that all signals ($$i.e.$$ all sequences of k actions) are observed with positive probability. A steady state in a perturbed environment describes a population in which $$1-\epsilon$$ of the agents are normal; $$i.e.$$ they play strategies that maximize their long-run payoffs, while $$\epsilon$$ of the agents follow commitment strategies. We adapt the notions of Nash equilibrium and perfect equilibrium (Selten, 1975) to our setup. A steady state is a Nash equilibrium if no normal agent can gain in the long run by deviating to a different strategy (the agents are assumed to be arbitrarily patient). The deviator’s payoff is calculated in the new steady state that emerges following her deviation. A steady state is a perfect equilibrium if it is the limit of a sequence of Nash equilibria in a converging sequence of perturbed environments.4 1.2. Summary of results We start with a simple result (Prop. 1) that shows that defection is a perfect equilibrium outcome for any number of observed actions. We say that a Prisoner’s Dilemma game is offensive if there is a stronger incentive to defect against a cooperator than against a defector ($$i.e.$$$$g>l$$); in a defensive Prisoner’s Dilemma the opposite holds ($$i.e.$$$$g<l$$). Our first main result (Theorem 1) shows that always defecting is the unique perfect equilibrium in any offensive Prisoner’s Dilemma game ($$i.e.$$$$g>l$$) for any number of observed actions. The result assumes a mild regularity condition on the set of commitment strategies (Def. 3), namely, that this set is rich enough that, in any steady state of the perturbed environment, at least one of the commitment strategies induces agents to defect with a different probability than that of some of the normal agents. The intuition is as follows. The mild assumption that not all agents defect with exactly the same probability implies that the signal that Alice observes about her partner Bob is not completely uninformative. In particular, the more often Alice observes Bob to defect, the more likely Bob will defect against Alice. In offensive games, it is better to defect against partners who are likely to cooperate than to defect against partners who are likely to defect. This implies that a deviator who always defects is more likely to induce normal partners to cooperate. Consequently, such a deviator will outperform any agent who cooperates with positive probability. Theorem 1 may come as a surprise in light of a number of existing papers that have presented various equilibrium constructions that support cooperation in any Prisoner’s Dilemma game that is played in a population of randomly matched agents. Our result demonstrates that, in the presence of a small fraction of committed agents, the mechanisms that have been proposed to support cooperation fail, regardless of how these committed agents play (except in the “knife-edge” case of $$g=l$$; see Dilmé, 2016 and Remark 7 in Section 4.3). Thus, our article provides an explanation of why experimental evidence suggests that subjects’ behaviour corresponds neither to contagious equilibria (see, $$e.g.$$Duffy and Ochs, 2009) nor to belief-free equilibria (see, $$e.g.$$Matsushima et al., 2013). The empirical predictions of our model are discussed in Supplementary Appendix B. Our second main result (Theorem 2) shows that cooperation is a perfect equilibrium outcome in any defensive Prisoner’s Dilemma game ($$g<l$$) when players observe at least two actions. Moreover, there is an essentially unique distribution of strategies that support cooperation, according to which: (1) all agents cooperate when observing no defections, (2) all agents defect when observing at least 2 defections, (3) the normal agents defect with an average probability of $$0<q<1$$ when observing a single defection. The intuition for the result is as follows. Defection yields a direct gain that is increasing in the partner’s probability of defection (due to the game being defensive). In addition, defection results in an indirect loss because it induces future partners to defect when they observe the current defection. This indirect loss is independent of the current partner’s behaviour. One can show that there always exists a probability $$q$$ such that the above distribution of strategies balances the direct gain and the indirect loss of defection, conditional on the agent observing a single defection. Furthermore, cooperation is the unique best reply conditional on the agent observing no defections, and defection is the unique best reply conditional on the agent observing at least two defections. Next, we analyse the case of the observation of a single action ($$i.e.$$$$k=1$$). Proposition 2 shows that cooperation is a perfect equilibrium outcome in a defensive Prisoner’s Dilemma if and only if the bonus of defection is not too large (specifically, $$g\leq1)$$. The intuition is that similar arguments used to obtain the result above imply that there exists a unique mean probability $$q<1$$ by which agents defect when observing a defection in any cooperative perfect equilibrium. This implies that a deviator who always defects succeeds in getting a payoff of $$1+g$$ in a fraction $$1-q$$ of the interactions, and that such a deviator outperforms the incumbents if $$g$$ is too large. 1.3. Observations based on action profiles So far we have assumed that each agent observes only the partner’s (Bob’s) behaviour against other opponents, but that she cannot observe the behaviour of the past opponents against Bob. In Section 5 we relax this assumption. Specifically, we study three observation structures: the first two seem to be empirically relevant, and the third one is theoretically important since it allows us to construct an equilibrium that sustains cooperation in all Prisoner’s Dilemma games. (1) Observing conflicts: Each agent observes, in each of the $$k$$ sampled interactions of her partner, whether there was mutual cooperation ($$i.e.$$ no conflict: both partners are “happy”) or not ($$i.e.$$ partners complain about each other, but it is too costly for an outside observer to verify who actually defected). Such an observation structure (which we have not seen described in the existing literature) seems like a plausible way to capture non-verifiable feedback about the partner’s behaviour. (2) Observing action profiles: Each agent observes the full action profile in each of the sampled interactions. (3) Observing actions against cooperation: Each agent observes, in each of the sampled interactions, what action the partner took provided that the partner’s opponent cooperated. If the partner’s opponent defected then there is no information about what the partner did. It turns out that the stability of cooperation in the first two observation structures crucially depends on a novel classification of Prisoner’s Dilemma games. We say that a Prisoner’s Dilemma game is acute if $$g>\frac{l+1}{2}$$, and mild if $$g<\frac{l+1}{2}$$. The threshold between the two categories, namely, $$g=\frac{l+1}{2}$$, is characterized by the fact that the gain from a single unilateral defection is exactly half the loss incurred by the partner who is the sole cooperator. Consider a setup in which an agent is deterred from unilaterally defecting because it induces future partners to unilaterally defect against the agent with some probability. Deterrence in acute Prisoner’s Dilemmas requires this probability to be more than 50%, while a probability of below 50% is enough to deter deviations in mild PDs. Figure 1 (in Section 5.2) illustrates the classification of Prisoner’s Dilemma games. Figure 1 View largeDownload slide Classification of Prisoner’s Dilemma games Figure 1 View largeDownload slide Classification of Prisoner’s Dilemma games Our next results (Theorems 3 and 4) show that in both observation structures (conflicts or action profiles, and any $$k\geq2$$) cooperation is a perfect equilibrium outcome if and only if the underlying Prisoner’s Dilemma game is mild. Moreover, cooperation is supported by essentially the same unique behaviour as in Theorem 2. The intuition for why cooperation cannot be sustained in acute games with observation of conflicts is as follows. To support cooperation agents should be deterred from defecting against cooperators. As discussed above, in acute games, such deterrence requires that each such defection induce future partners to defect with a probability of at least 50%. However, this requirement implies that defection is contagious: each defection by an agent makes it possible that future partners observe a conflict both when being matched with the defecting agent and when being matched with the defecting agent’s partner. Such future partners defect with a probability of at least $$50\%$$ when making such observations. Thus the fraction of defections grows steadily, until all normal agents defect with high probability. The intuition for why cooperation cannot be sustained in acute games with observation of action profiles is as follows. The fact that deterring defections in acute games requires future partners to defect with a probability of at least 50% when observing a defection implies that when an agent (Alice) observes her partner (Bob) to defect against a cooperative opponent, then Bob is more likely to do so because he is a normal agent who observed his past opponent to defect than because Bob is a committed agent. This implies that Alice puts a higher probability on Bob defecting against her if she observes Bob to have defected against a partner who also defected than she does if she observes Bob to have defected against an opponent who cooperated. Thus, defecting is the unique best reply when observing the partner defect against a defector, but it removes the incentives required to support stable cooperation. Finally, we show that the third observation structure, observing actions against cooperation, is optimal in the sense that it sustains cooperation as a perfect equilibrium outcome for any Prisoner’s Dilemma game (Theorem 5). The intuition for this result is that not allowing Alice to observe Bob’s behaviour against a defector helps to sustain cooperation because it implies that defecting against a defector does not have any negative indirect effect (in any steady state) because it is never observed by future opponents. This encourages agents to defect against partners who are more likely to defect (regardless of the values of $$g$$ and $$l$$). 1.4. Conventional model and unrestricted strategies In Supplementary Appendix A, we relax the assumption that agents are restricted to choosing only stationary strategies. We present a conventional model of repeated games with random matching that differs from the existing literature only by our introducing a few committed agents. We show that this difference is sufficient to yield most of our key results. Specifically, the characterization of the conditions under which cooperation can be sustained as a perfect equilibrium outcome (as summarized in Table 1 in Section 5.3) holds also when agents are not restricted to stationary strategies, and even when agents observe the most recent past actions of the partner. On the other hand, the relaxation of the stationarity assumption in Supplementary Appendix A weakens the uniqueness results of the main model in two respects: (1) rather than showing that defection is the unique equilibrium outcome in offensive games, we show only that it is impossible to sustain full cooperation in such games; and (2) while a variant of the simple strategy of the main model still supports cooperation when the set of strategies is unrestricted, we are no longer able to show that this strategy is the unique way to support full cooperation. 1.5. Structure Section 2 presents the model. Our solution concept is described in Section 3. Section 4 studies the observation of actions. Section 5 extends the model to deal with general observation structures. We discuss the related literature in Section 6, and conclude in Section 7. Supplementary Appendix A adapts our key result to a conventional model with an unrestricted set of strategies. Supplementary Appendix B discusses our empirical predictions. Supplementary Appendix C presents technical definitions. In Supplementary Appendix D we present the refinements of strict perfection, evolutionary stability, and robustness. The formal proofs appear in Supplementary Appendix E. Supplementary Appendix F studies the introduction of cheap talk to our setup. 2. Stationary Model 2.1. Environment We model an environment in which patient agents in a large population are randomly matched in each round to play a two-player symmetric one-shot game. For tractability we assume throughout the article that the population is a continuum.5 We further assume that the agents are infinitely lived and do not discount the future ($$i.e.$$ they maximize the average per-round long-run payoff). Alternatively, our model can be interpreted as representing interactions between finitely lived agents who belong to infinitely lived dynasties, such that an agent who dies is succeeded by a protégé who plays the same strategy as the deceased mentor, and each agent observes $$k$$ random actions played by the partner’s dynasty. Before playing the game, each agent (she) privately observes $$k$$ random actions that her partner (he) played against other opponents in the past. As described in detail below, agents are restricted to using only stationary strategies, such that each agent’s behaviour depends only on the signal about the partner, and not on the agent’s own past play or on time. Thus, if all agents observe signals that come from a stationary distribution then the agents’ behaviour will result in a well-defined aggregate distribution of actions that is also stationary. We focus on steady states of the population, in which the distribution of actions, and hence the distribution of signals, is indeed stationary. In such steady states, the $$k$$ actions that an agent observes about her partner are drawn independently from the partner’s stationary distribution of actions. This sampling procedure may be interpreted as the limit of a process in which each agent randomly observes $$k$$ actions that are uniformly sampled from the last $$n$$ interactions of the partner, as $$n\rightarrow\infty$$. To simplify the notation, we assume that the underlying game has two actions, though all our concepts are applicable to games with any finite number of actions. An environment is a pair $$E=\left(G,k\right)$$, where $$G=\left(A=\left\{ c,d\right\} ,\pi\right)$$ is a two-player symmetric normal-form game, and $$k\in\mathbb{N}$$ is the number of observed actions. Let $$\pi:A\times A\rightarrow\mathbb{R}$$ be the payoff function of the underlying game. We refer to action $$c$$ (resp., $$d$$) as cooperation (resp., defection), since we will focus on the Prisoner’s Dilemma in our results. Let $$\Delta\left(A\right)$$ denote the set of mixed actions (distributions over $$A$$), and let $$\pi$$ be extended to mixed actions in the usual linear way. We use the letter $$a$$ (resp., $$\alpha$$) to denote a typical pure (mixed) action. With a slight abuse of notation let $$a\in A$$ also denote the element in $$\Delta\left(A\right)$$ that assigns probability 1 to $$a$$. We adopt this convention for all probability distributions throughout the article. 2.2. Stationary strategy The signal observed about the partner is the number of times he played each action $$a\in A$$ in the sample of $$k$$ observed actions. Let $$M=\left\{ 0,...,k\right\}$$ denote the set of feasible signals, where signal $$m\in M$$ is interpreted as the number of times that the partner defected in the sampled $$k$$ observations.6 Given a distribution of actions $$\alpha\in\Delta\left(A\right)$$ and an environment $$E=\left(G,k\right)$$, let $$\nu_{\alpha}\left(m\right)$$ be the probability of an agent observing signal $$m$$ conditional on being matched with a partner who plays on average the distribution of actions $$\alpha$$. That is, $$\nu\left(\alpha\right):=\nu_{\alpha}\in\Delta\left(M\right)$$ is a binomial signal distribution that describes a sample of $$k$$ i.i.d. actions, where each action is distributed according to $$\alpha$$:   $$\forall\left(m\right)\in M,\,\,\,\,\nu_{\alpha}\left(m\right)=\frac{k!\cdot\left(\alpha\left(d\right)\right)^{m}\cdot\left(\alpha\left(c\right)\right)^{\left(k-m\right)}}{m!\cdot\left(k-m\right)!}.\label{eq:multinomial}$$ (1) A stationary strategy (henceforth, strategy) is a mapping $$s:M\rightarrow\Delta\left(A\right)$$ that assigns a mixed action to each possible signal. Let $$s_{m}\in\Delta\left(A\right)$$ denote the mixed action assigned by strategy $$s$$ after observing signal $$m$$. That is, for each action $$a\in A$$, $$s_{m}\left(a\right)=s\left(m\right)\left(a\right)$$ is the probability that a player who follows strategy $$s$$ plays action $$a$$ after observing signal $$m$$. We also let $$a$$ denote the strategy $$s$$ that plays action $$a$$ regardless of the signal, $$i.e.$$$$s_{m}\left(a\right)=1$$ for all $$m\in M$$. Strategy $$s$$ is totally mixed, if for each action $$a\in A$$, and signal $$m\in M$$$$s_{m}\left(a\right)>0$$. Let $$\mathcal{S}$$ denote the set of all strategies. Given strategy $$s$$ and distribution of signals $$\nu\in\Delta\left(M\right)$$, let $$s\left(\nu\right)\in\Delta\left(A\right)$$ be the distribution of actions played by an agent who follows strategy $$s$$ and observes a signal sampled from $$\nu$$:   $\forall a\in A,\,\,\,\,s\left(\nu\right)\left(a\right)=\sum_{m\in M}\nu\left(m\right)\cdot s_{m}\left(a\right).$ 2.3. Signal profile and steady state Fix an environment and a finite set of strategies $$S$$. A signal profile $$\theta:S\rightarrow\Delta\left(M\right)$$ is a function that assigns a distribution of signals for each strategy in $$S$$. We interpret $$\theta_{s}\left(m\right)$$ as the probability that signal $$m$$ is observed when a partner playing strategy $$s$$ is encountered. Let $$O_{S}$$ be the set of all signal profiles defined over $$S$$. Given a strategy $$\sigma\in\Delta\left(S\right)$$ and a signal profile $$\theta\in O_{S}$$, let $$\theta_{\sigma}\in\Delta\left(M\right)$$ be the average distribution of signals in the population, $$i.e.$$$$\theta_{\sigma}\left(m\right):=\sum_{s\in S}\sigma\left(s\right)\cdot\theta_{s}\left(m\right)$$. We say that a signal profile $$\theta:S\rightarrow\Delta\left(M\right)$$ is consistent with distribution of strategies $$\sigma\in\Delta\left(S\right)$$ if   $$\forall m\in M,\,\,s\in S,\,\,\,\,\theta_{s}\left(m\right)=\nu\left(s\left(\theta_{\sigma}\right)\right)\left(m\right).$$ (2) The interpretation of the consistency requirement is that a population of agents who follow the distribution of strategies $$\sigma$$ and observe signals about the partners sampled from the profile $$\theta$$ have to behave in a way that induces the same profile of signal distributions $$\theta$$. Specifically, when Alice, who follows strategy $$s$$, is being matched with a random partner whose strategy is sampled according to $$\sigma$$, she observes a random signal according to the “current” average distribution of signals in the population $$\theta_{\sigma}$$. As a result her distribution of actions is $$s\left(\theta_{\sigma}\right)$$, and thus her behaviour induces the signal distribution $$\nu\left(s\left(\theta_{\sigma}\right)\right)$$. Consistency requires that this induced signal distribution coincide with $$\theta_{s}$$. A steady state is a triple consisting of (1) a finite set of strategies $$S$$ interpreted as the strategies that are played by the agents in the population, (2) a distribution $$\sigma$$ over $$S$$ interpreted as a description of the fraction of agents following each strategy, and (3) a consistent signal profile $$\theta:S\rightarrow\Delta\left(M\right)$$. Formally: Definition 1. A steady state (or state for short) of an environment $$\left(G,k\right)$$ is a triple $$\left(S,\sigma,\theta\right)$$ where $$S\subseteq\mathcal{S}$$ is a finite set of strategies, $$\sigma\in\Delta\left(S\right)$$ is a distribution with full support over $$S$$, and $$\theta:S\rightarrow\Delta\left(M\right)$$ is a consistent signal profile. When the set of strategies is a singleton, $$i.e.$$$$S=\left\{ s\right\}$$, we omit the degenerate distribution assigning a mass of one to $$s$$, and we write the steady state as a pair $$\left(\left\{ s\right\} ,\theta\right).$$ We adopt this convention, of omitting reference to degenerate distributions, throughout the article. A standard argument shows that any distribution of strategies admits a consistent signal profile (Lemma 1 in Supplementary Appendix C). Some distributions induce multiple consistent profiles of signal distributions. For example, suppose that $$k=3$$, and everyone follows the strategy of playing the most frequently observed action ($$i.e.$$ defecting iff $$m\geq2$$). In this setting there are three consistent signal profiles: one in which everyone cooperates, one in which everyone defects, and one in which everyone plays (on average) uniformly.7 2.4. Perturbed environment As discussed in the Introduction, and as argued by Ellison (1994, p. 578), it seems implausible that in large populations all agents are rational and know exactly the strategies played by other agents in the community. Motivated by this observation, we introduce the notion of a perturbed environment in which a small fraction of agents in the population are committed to playing specific strategies, even though these strategies are not necessarily payoff-maximizing. A perturbed environment is a tuple consisting of (1) an environment, (2) a distribution $$\lambda$$ over a set of commitment strategies $$S^{C}$$ that includes a totally mixed strategy, and (3) a number $$\epsilon$$ representing the share of agents who are committed to playing strategies in $$S^{C}$$ (henceforth, committed agents). The remaining $$1-\epsilon$$ share of the agents can play any strategy in $$\mathcal{S}$$ (henceforth, normal agents). Formally: Definition 2. A perturbed environment is a tuple $$E_{\epsilon}=\left(\left(G,k\right),\left(S^{C},\lambda\right),\epsilon\right)$$, where $$G$$ is the underlying game, $$k\in\mathbb{N}$$ is the number of observed actions, $$S^{C}$$ is a non-empty finite set of strategies (called, commitment strategies) that includes a totally mixed strategy, $$\lambda\in\Delta\left(S^{C}\right)$$ is a distribution with full support over the commitment strategies, and $$\epsilon\geq0$$ is the mass of committed agents in the population. We require $$S^{C}$$ to include at least one totally mixed strategy because we want all signals to be observed with positive probability in a perturbed environment when $$\epsilon>0$$. (This is analogous to the requirement in Selten, 1975, that all actions be played with positive probability in the perturbations defining a perfect equilibrium.) Throughout the article we look at the limit in which the share of committed agents, $$\epsilon$$, converges to zero. This is the only limit taken in the article. We use the notation of $$O\left(\epsilon\right)$$ (resp., $$O\left(\epsilon^{2}\right)$$) to refer to functions that are in the order of magnitude of $$\epsilon$$ (resp., $$\epsilon^{2}$$), $$i.e.$$$$\frac{f\left(\epsilon\right)}{\epsilon}\rightarrow_{\epsilon\rightarrow0}0$$ (resp., $$\frac{f\left(\epsilon\right)}{\epsilon^{2}}\rightarrow_{\epsilon\rightarrow0}0$$). We refer to $$\left(S^{C},\lambda\right)$$ as a distribution of commitments. With a slight abuse of notation, we identify an unperturbed environment$$\left(\left(G,k\right),\left(S^{C},\lambda\right),\epsilon=0\right)$$ with the equivalent environment $$\left(G,k\right)$$. Remark 1. To simplify the presentation, the definition of perturbed environment includes only commitment strategies, and it does not allow “trembling hand” mistakes. As discussed in Remark 6 in Section 4.3, the results also hold in a setup in which agents also tremble, as long as the probability by which a normal agent trembles is of the same order of magnitude as the frequency of committed agents. One of our main results (Theorem 1) requires an additional mild assumption on the perturbed environment that rules out the knife-edge case in which all agents (committed and non-committed alike) behave exactly the same. Specifically, a set of commitments is regular if for each distribution of actions $$\alpha$$, there exists a committed strategy $$s$$ that does not play distribution $$\alpha$$ when observing the signal distribution induced by $$\alpha$$. Formally: Definition 3. A set of commitment strategies $$S^{C}$$ is regular if for each distribution of actions $$\alpha\in\Delta\left(A\right)$$, there exists a strategy $$s\in S^{C}$$ such that $$s_{\nu\left(\alpha\right)}\neq\alpha$$. If the set of commitments is regular, then we say that the distribution $$\left(S^{C},\lambda\right)$$ and the perturbed environment $$\left(\left(G,k\right),\left(S^{C},\lambda\right),\epsilon\right)$$ are regular. An example of a regular set of commitments is the set that includes strategies $$s\equiv\alpha_{1}$$ and $$s'\equiv\alpha_{2}$$ that induce agents to play mixed actions $$\alpha_{1}\neq\alpha_{2}$$ regardless of the observed signal. 2.5. Steady state in a perturbed environment Fix a perturbed environment $$E_{\epsilon}=\left(\left(G,k\right),\left(S^{C},\lambda\right),\epsilon\right)$$ and a finite set of strategies $$S^{N}$$, interpreted as the strategies followed by the normal agents in the population. We redefine a signal profile $$\theta:S^{C}\cup S^{N}\rightarrow\Delta\left(M\right)$$ as a function that assigns a binomial distribution of signals to each strategy in $$S^{C}\cup S^{N}$$. Given a distribution over strategies of the normal agents $$\sigma\in\Delta\left(S^{N}\right)$$ and a signal profile $$\theta\in O_{S^{C}\cup S^{N}}$$, let $$\theta_{\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)}\in\Delta\left(M\right)$$ be the average distribution of signals in the population, $$i.e.$$$$\theta_{\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)}\left(m\right):=\sum_{s\in S^{C}\cup S^{N}}\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)\left(s\right)\cdot\theta_{s}\left(m\right)$$. We adapt the definitions of a consistent signal profile and of a steady state to perturbed environments. This straightforward adaptation is presented in detail in Supplementary Appendix C. The following example demonstrates a specific steady state in a specific perturbed environment. The example is intended to clarify the various definitions of this section and, in particular, the consistency requirement. Later, we revisit the same example to explain the essentially unique perfect equilibrium that supports cooperation. Example 1. Consider the perturbed environment $$\left(\left(G,k=2\right),\left(\left\{ s^{u}\equiv0.5\right\} \right),\epsilon\right)$$, in which each agent observes two of her partner’s actions, there is a single commitment strategy, denoted by $$s^{u}$$, which is followed by a fraction $$0<\epsilon<<1$$ of committed agents, who choose each action with probability $$0.5$$ regardless of the observed signal. Let $$\left(S=\left\{ s^{1},s^{2}\right\} ,\sigma=\left(\frac{1}{6},\frac{5}{6}\right),\theta\right)$$ be the following steady state. The state includes two normal strategies: $$s^{1}$$ and $$s^{2}$$. The strategy $$s^{1}$$ defects iff $$m\geq1$$, and the strategy $$s^{2}$$ defects iff $$m\geq2$$. The distribution $$\sigma$$ assigns a mass of $$\frac{1}{6}$$ to $$s^{1}$$ and a mass of $$\frac{5}{6}$$ to $$s^{2}$$. The consistent signal profile $$\theta$$ is defined as follows (neglecting terms of $$O\left(\epsilon^{2}\right)$$ throughout the example):   $$\theta_{s^{u}}\left(m\right)=\begin{cases} 25\% & if\,m=0\$3pt] 50\% & if\,m=1\\[3pt] 25\% & if\,m=2, \end{cases}\,\,\,\,\,\theta_{s^{1}}\left(m\right)=\begin{cases} 1-3.5\cdot\epsilon & if\,m=0\\[3pt] 3.5\cdot\epsilon & if\,m=1\\[3pt] 0 & if\,m=2 \end{cases}\,\,\,\,\,\theta_{s^{2}}\left(m\right)=\begin{cases} 1-0.5\cdot\epsilon & if\,m=0\\[3pt] 0.5\cdot\epsilon & if\,m=1\\[3pt] 0 & if\,m=2. \end{cases}$$ (3) To confirm the consistency of \theta, we have first to calculate the average distribution of signals in the population: \[ \theta_{\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)}\left(m\right)=\begin{cases} 1-1.75\cdot\epsilon & if\,m=0\\ 1.5\cdot\epsilon & if\,m=1\\ 0.25\cdot\epsilon & if\,m=2. \end{cases}$ Using $$\theta_{\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)}$$, we confirm the consistency of $$\theta_{s^{1}}$$ and $$\theta_{s^{2}}$$ (the consistency of $$\theta_{s^{u}}$$is immediate). We do so by calculating distribution of actions played by a player following strategy $$s_{i}$$ who observes the distribution of actions of a random partner:   $\begin{array}{ccc} s^{1}\left(\theta_{\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)}\right)\left(c\right) & = & 1-1.75\cdot\epsilon\\ s^{1}\left(\theta_{\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)}\right)\left(d\right) & = & 1.75\cdot\epsilon \end{array}\,\,\,\,\,\,\,\,\,\begin{array}{ccc} s^{2}\left(\theta_{\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)}\right)\left(c\right) & = & 1-0.25\cdot\epsilon,\\ s^{2}\left(\theta_{\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)}\right)\left(d\right) & = & 0.25\cdot\epsilon. \end{array}$ Note that $$s^{1}\left(\theta_{\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)}\right)\left(d\right)=1-\theta_{\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)}\left(2\cdot c\right)$$ and $$s^{2}\left(\theta_{\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)}\right)\left(d\right)=\theta_{\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)}\left(2\cdot d\right)$$. The final step in showing that $$\theta$$ is a consistent profile is the observation that each $$\theta_{s^{i}}$$ coincides with the binomial distribution that is induced by $$s^{i}\left(\theta_{\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)}\right)$$. 2.6. Discussion of the model Our model differs from most of the existing literature on community enforcement in three key dimensions (see, $$e.g.$$Kandori, 1992; Ellison, 1994; Dixit, 2003; Deb, 2017; Deb and González-Díaz, 2014). In what follows we discuss these three key differences, and their implications on our results. (1) The presence of a few committed agents. If one removes the commitment types from our setup, then one can show (by using belief-free equilibria, as in Takahashi, 2010) that: (1) it is always possible to support full cooperation as an equilibrium outcome, and (2) there are various strategies that sustain full cooperation. The results of this article show that the introduction of a few committed agents, regardless of how they behave, implies very different results: (1) defection is the unique equilibrium payoff in offensive Prisoner’s Dilemmas (Theorem 1), and (2) there is an essentially unique strategy combination that supports a cooperative equilibrium in defensive Prisoner’s Dilemmas. The intuition is that the presence of committed agents implies that observation of past actions must have some influence on the likely behaviour of the partner in the current match (more detailed discussions of this issue follow Theorem 1 and Remark 10). (2) Restriction to Stationary Strategies. In our model, we restrict agents to using stationary strategies that condition only on the number of times they observed each of the partner’s actions being played in past interactions. We allow agents to condition their play neither on the order in which the observed actions were played in the past, nor on the agent’s own history of play, nor on calendar time. The assumption simplifies the presentation of the model and results. In addition, the assumption allows us to achieve uniqueness results that might not hold without stationarity (as discussed in Section A.3). (3) Not having a “global time zero.” Most of the existing literature represents interactions within a community as a repeated game that has a “global time zero”, in which the first ever interaction takes place. In many real-life situations, the interactions within a community began a long time ago and have continued, via overlapping generations, to the present day. It seems implausible that today’s agents condition their behaviour on what happened in the remote past (or on calendar time). For example, trade interactions have been been taking place from time immemorial. It seems unreasonable to assume that Alice’s behaviour today is conditioned on what transpired in some long-forgotten time $$t=0$$, when, say, two hunter-gatherers were involved in the first ever trade. We suggest that, even though real-world interactions obviously begin at some definite date, a good way of modelling what the interacting agents think about the situation may be to get rid of global time zero and focus on strategies that do not condition on what happened in the remote past. The lack of a global time zero is the reason why, unlike in repeated games, a distribution of strategies does not uniquely determine the behaviour and the payoffs of the agent, so that one must explicitly add the consistent signal profile $$\theta$$ as part of the description of the state of the population. It is possible to interpret a steady state $$\left(S,\sigma,\theta\right)$$ as a kind of initial condition for society, in which agents already have a long-existing past. That is, we begin our analysis of community interaction at a point in time when agents have for a long time followed the strategy distribution $$\left(S,\sigma\right)$$ yielding the consistent signal profile $$\theta$$. We then ask whether any patient agent has a profitable deviation from her strategy. If not, then the steady state $$\left(S,\sigma,\theta\right)$$ is likely to persist. This approach stands in contrast to the standard approach that studies whether or not agents have a profitable deviation at a time $$t>>1$$ following a long history that started with the first ever interaction at $$t=0$$. In Supplementary Appendix A, we present a conventional repeated game model that differs from the existing literature in only one key aspect: the presence of a few committed agents. In particular, this alternative model features standard calendar time, and agents discount the future, observe the most recent past actions of the partner, and are not limited to choosing only stationary strategies. We show that most of our results hold also in this setup. We feel that this alternative model, while being closer to the existing literature than the main model, suffers from added technical complexity that may hinder the model from being insightful and accessible. 3. Solution Concept 3.1. Long-run payoff In this subsection, we define the long-run average (per-round) payoff of a patient agent who follows a stationary strategy $$s$$, given a steady state $$\left(S^{N},\sigma,\theta\right)$$ of a perturbed environment $$\left(\left(G,k\right),\left(S^{C},\lambda\right),\epsilon\right)$$. The same definition, when taking $$\epsilon=0$$, holds for an unperturbed environment. We begin by extending the definition of a consistent signal profile $$\theta$$ to non-incumbent strategies. For each non-incumbent strategy $$\hat{s}\in\mathcal{S}\backslash\left(S^{N}\cup S^{C}\right)$$, define $$\theta\left(\hat{s}\right)=\theta_{\hat{s}}$$ as the distribution of signals induced by a deviating agent who follows strategy $$\hat{s}$$ and observes the distribution of signals induced by a random partner in the population (sampled according to $$\left(1-\epsilon\right)\cdot\sigma\left(s'\right)+\epsilon\cdot\lambda\left(s'\right)$$). That is, for each strategy $$\hat{s}\in\mathcal{S}\backslash\left(S\cup S^{C}\right)$$, and each signal $$m\in M$$, we define   $\theta_{\hat{s}}\left(m\right)=\left(\nu\left(\hat{s}\left(\theta_{\left(\left(1-\epsilon\right)\cdot\sigma+\epsilon\cdot\lambda\right)}\right)\right)\right)\left(m\right).$ We define the long-run payoff of an agent who follows an arbitrary strategy $$s\in\mathcal{S}$$ as:   $$\pi_{s}\left(S^{N},\sigma,\theta\right)=\sum_{s'\in S^{N}\cup S^{C}}\left(\left(1-\epsilon\right)\cdot\sigma\left(s'\right)+\epsilon\cdot\lambda\left(s'\right)\right)\cdot\left(\sum_{\left(a,a'\right)\in A\times A}s_{\theta\left(s'\right)}\left(a\right)\cdot s'_{\theta\left(s\right)}\left(a'\right)\cdot\pi\left(a,a'\right)\right).$$ (4) Equation (4) is straightforward. The inner (right-hand) sum ($$i.e.$$$$\sum_{\left(a,a'\right)\in A\times A}s_{\theta\left(s'\right)}\left(a\right)\cdot s'_{\theta\left(s\right)}\left(a'\right)\cdot\pi\left(a,a'\right)$$) calculates the expected payoff of Alice who follows strategy $$s$$ conditional on being matched with a partner who follows strategy $$s'$$. The outer sum weighs these conditional expected payoffs according to the frequency of each incumbent strategy $$s'$$ ($$i.e.$$$$\left(\left(1-\epsilon\right)\cdot\sigma\left(s'\right)+\epsilon\cdot\lambda\left(s'\right)\right)$$), which yields the expected payoff of Alice against a random partner in the population. Let $$\pi\left(S,\sigma,\theta\right)$$ be the average payoff of the normal agents in the population:   $\pi\left(S^{N},\sigma,\theta\right)=\sum_{s\in S^{N}}\sigma\left(s\right)\cdot\pi_{s}\left(S^{N},\sigma,\theta\right).$ 3.2. Nash and perfect equilibrium A steady state is a Nash equilibrium if no agent can obtain a higher payoff by a unilateral deviation. Formally: Definition 4. The steady state $$\left(S^{N},\sigma,\theta\right)$$ of perturbed environment $$\left(\left(G,k\right),\left(S^{C},\lambda\right),\epsilon\right)$$ is a Nash equilibrium if for each strategy $$s\in\mathcal{S}$$, it is the case that $$\pi_{s}\left(S^{N},\sigma,\theta\right)\leq\pi\left(S^{N},\sigma,\theta\right)$$. Note that the $$1-\epsilon$$ normal agents in such a Nash equilibrium must obtain the same maximal payoff. That is, each normal strategy $$s\in S^{N}$$ satisfies $$\pi_{s}\left(S^{N},\sigma,\theta\right)=\pi\left(S^{N},\sigma,\theta\right)\geq\pi_{s'}\left(S^{N},\sigma,\theta\right)$$ for each strategy $$s'\in\mathcal{S}$$. However, the $$\epsilon$$ committed agents may obtain lower payoffs. A steady state is a (regular) perfect equilibrium if it is the limit of Nash equilibria of (regular) perturbed environments when the frequency of the committed agents converges to zero. Formally (where the standard definitions of convergence of strategies, distributions and states is presented in Supplementary Appendix C): Definition 5. A steady state $$\left(S^{*},\sigma^{*},\theta^{*}\right)$$ of the environment $$\left(G,k\right)$$ is a (regular) perfect equilibrium if there exist a (regular) distribution of commitments $$\left(S^{C},\lambda\right)$$ and converging sequences $$\left(S_{n}^{N},\sigma_{n},\theta_{n}\right)_{n}\rightarrow_{n\rightarrow\infty}\left(S^{*},\sigma^{*},\theta^{*}\right)$$ and $$\left(\epsilon_{n}>0\right)_{n}\rightarrow_{n\rightarrow\infty}0$$, such that for each $$n$$, the state $$\left(S_{n}^{N},\sigma_{n},\theta_{n}\right)$$ is a Nash equilibrium of the perturbed environment $$\left(\left(G,k\right),\left(S^{C},\lambda\right),\epsilon_{n}\right)$$. In this case, we say that $$\left(S^{*},\sigma^{*},\theta^{*}\right)$$ is a (regular) perfect equilibrium with respect to distribution of commitments $$\left(S^{C},\lambda\right)$$. If $$\theta^{*}\equiv a$$, we say that action $$a\in A$$ is a (regular) perfect equilibrium action. By standard arguments, any perfect equilibrium is a Nash equilibrium of the unperturbed environment. In Supplementary Appendix C.4 we show that any symmetric (perfect) Nash equilibrium of the underlying game corresponds to a (perfect) Nash equilibrium of the environment in which all normal agents ignore the observed signal. 3.3. Stronger refinements of perfect equilibrium In Supplementary Appendix D we present three refinements of perfect equilibrium: strict perfection, evolutionary stability, and robustness. The first refinement (strict perfection) is satisfied by the equilibria constructed in Proposition 1, Theorem 2, and Theorem 3. The remaining refinements (evolutionary stability and robustness) are satisfied by all the equilibria constructed in the article. The notion of perfect equilibrium might be considered too weak because it may crucially depend on a specific set of commitment strategies. The refinement of strict perfection (à la Okada, 1981) requires the equilibrium outcome to be sustained regardless of which commitment strategies are present in the population. The notion of perfect equilibrium considers only deviations by a single agent (who has mass zero in the infinite population). The refinement of an evolutionarily stable strategy (à la Maynard Smith and Price, 1973) requires stability against a group of agents with a small positive mass who jointly deviate. The outcome of a perfect equilibrium may be non-robust in the sense that small perturbations of the distribution of observed signals may induce a change of behaviour that moves the population away from the consistent signal profile. We address this issue by introducing a refinement that we call robustness, which requires that if we slightly perturb the distribution of observed signals, then the agents still play the same equilibrium outcome with a probability very close to one (in the spirit of the notion of Lyapunov stability). 4. Prisoner’s Dilemma and Observation of Actions 4.1. The prisoner’s dilemma Our results focus on environments in which the underlying game is the Prisoner’s Dilemma (denoted by $$G_{PD}$$), which is described in Table 2. The class of Prisoner’s Dilemma games is fully described by two positive parameters $$g$$ and $$l$$. The two actions are denoted $$c$$ and $$d$$, representing cooperation and defection, respectively. When both players cooperate they both get a high payoff (normalized to one), and when they both defect they both get a low payoff (normalized to zero). When a single player defects he obtains a payoff of $$1+g$$ ($$i.e.$$ an additional payoff of $$g$$) while his opponent gets $$-l$$. TABLE 2 Matrix payoffs of Prisoner’s Dilemma games     TABLE 2 Matrix payoffs of Prisoner’s Dilemma games     Following Dixit (2003) we classify Prisoner’s Dilemma games into two kinds: offensive and defensive.8 In an offensive Prisoner’s Dilemma there is a stronger incentive to defect against a cooperator than against a defector ($$i.e.$$$$g>l$$); in a defensive PD the opposite holds ($$i.e.$$$$l>g$$). If cooperating is interpreted as exerting high effort, then the defensive PD exhibits strategic complementarity; increasing one’s effort from low to high is less costly if the opponent exerts high effort. 4.2. Stability of defection We begin by showing that defection is a regular perfect equilibrium action in any Prisoner’s Dilemma game and for any $$k$$. Formally: Proposition 1. Let $$E=\left(G_{PD},k\right)$$ be an environment. Defection is a regular perfect equilibrium action. The intuition is straightforward. Consider any distribution of commitment strategies. Consider the steady state in which all the normal incumbents defect regardless of the observed signal. It is immediate that this strategy is the unique best reply to itself. This implies that if the share of committed agents is sufficiently small, then always defecting is also the unique best reply in the slightly perturbed environment. Our first main result shows that defection is the unique regular perfect equilibrium in offensive games. Theorem 1. Let $$E=\left(G_{PD},k\right)$$ be an environment, where $$G$$ is an offensive Prisoner’s Dilemma ($$i.e.$$$$g>l$$). If $$\left(S^{*},\sigma^{*},\theta^{*}\right)$$ is a regular perfect equilibrium, then $$S^{*}=\left\{ d\right\}$$ and $$\theta^{*}=k$$. Sketch of Proof. The payoff of a strategy can be divided into two components: (1) a direct component: defecting yields additional $$g$$ points if the partner cooperates and additional $$l$$ points if the partner defects, and (2) an indirect component: the strategy’s average probability of defection determines the distribution of signals observed by the partners, and thereby determines the partner’s probability of defecting. For each fixed average probability of defection $$q$$ the fact that the Prisoner’s Dilemma is offensive implies that the optimal strategy among all those who defect with an average probability of $$q$$ is to defect, with the maximal probability, against the partners who are most likely to cooperate. This implies that all agents who follow incumbent strategies are more likely to defect against partners who are more likely to cooperate. As a result, mutants who always defect outperform incumbents because they both have a strictly higher direct payoff (since defection is a dominant action) and a weakly higher indirect payoff (since incumbents are less likely to defect against them). ∥ Discussion of Theorem 1. The proof of Theorem 1 relies on the assumption that agents are limited to choosing only stationary strategies. The stationarity assumption implies that a partner who has been observed to defect more in the past is more likely to defect in the current match. However, this may no longer be true in a non-stationary environment. In Supplementary Appendix A we analyse the classic setup of repeated games, in which agents can choose non-stationary strategies and observe the opponent’s recent actions. In that setup we are able to prove a weaker version of Theorem 1 (namely, Theorem 6) which states that full cooperation cannot be supported as a perfect equilibrium outcome in offensive Prisoner’s Dilemmas ($$i.e.$$ cooperation is not a perfect equilibrium action in offensive games). Several papers in the existing literature present various mechanisms to support cooperation in any Prisoner’s Dilemma game. Kandori (1992, Theorem 1) and Ellison (1994) show that in large finite populations cooperation can be supported by contagious equilibria even when an agent does not observe any signal about her partner ($$i.e.$$$$k=0$$). In these equilibria each agent starts the game by cooperating, but she starts defecting forever as soon as any partner has defected against her. As pointed out by Ellison (1994, p. 578), if we consider a large population in which at least one “crazy” agent defects with positive probability in all rounds regardless of the observed signal, then Kandori’s and Ellison’s equilibria fail because agents assign high probability to the event that the contagion process has already begun, even after having experienced a long period during which no partner defected against them. Recently, Dilmé (2016) presented a novel “tit-for-tat”-like contagious equilibrium that is robust to the presence of committed agents, but only for the borderline case of $$g=l$$ (as discussed in Remark 7 below). Sugden (1986) and Kandori (1992, Theorem 2) show that cooperation can be a perfect equilibrium in a setup in which each player observes a binary signal about his partner, either a “good label” or a “bad label”. All players start with a good label. This label becomes bad if a player defects against a “good” partner. The equilibrium strategy that supports full cooperation in this setup is to cooperate against good partners and defect against bad partners. Theorem 1 reveal that the presence of a small fraction of committed agents does not allow the population to maintain such a simple binary reputation under an observation structure in which players observe an arbitrary number of past actions taken by their partners. The theorem shows this indirectly, because if it were possible to derive binary reputations from this information structure, then it should have been possible to support cooperation as a perfect equilibrium action. Moreover, Theorem 4 shows that cooperation is not a perfect equilibrium action in acute games when players observe action profiles. This suggests that the presence of a few committed agents does not allow us to maintain the seemingly simple binary reputation mechanisms of Sugden (1986) and Kandori (1992), even under observation structures in which each agent observes the whole action profile of many of her opponent’s past interactions. The mild restriction to a regular perfect equilibrium is necessary for Theorem 1 to go through. Example 5 in Supplementary Appendix G demonstrates the existence of a non-regular perfect equilibrium of an offensive PD, in which players cooperate with positive probability. This non-robust equilibrium is similar to the “belief-free” sequential equilibria that support cooperation in offensive Prisoner’s Dilemma games in Takahashi (2010), which have the property that players are always indifferent between their actions, but they choose different mixed actions depending on the signal they obtain about the partner. 4.3. Stability of cooperation in defensive Prisoner’s Dilemmas Our next result shows that if players observe at least two actions, then cooperation is a regular perfect equilibrium action in any defensive Prisoner’s Dilemma. Moreover, it shows that there is essentially a unique combination of strategies that supports full cooperation in the Prisoner’s Dilemma game, according to which: (1) all agents cooperate when observing no defections, (2) all agents defect when observing at least 2 defections, (3) sometimes (but not always) agents defect when observing a single defection. Theorem 2. Let $$E=\left(G_{PD},k\right)$$ be an environment with observations of actions, where $$G_{PD}$$ is a defensive Prisoner’s Dilemma ($$g<l$$), and $$k\geq2$$. (1)If $$\left(S^{*},\sigma^{*},\theta^{*}\equiv0\right)$$ is a perfect equilibrium then: (a) for each $$s\in S^{*}$$, $$s_{0}\left(c\right)=1$$ and $$s_{m}\left(d\right)=1$$ for each $$m\geq2$$; and (b) there exist $$s,s'\in S^{*}$$ such that $$s_{1}\left(d\right)<1$$ and $$s'_{1}\left(d\right)>0$$. (2)Cooperation is a regular perfect equilibrium action. Sketch of Proof. Suppose that $$\left(S^{*},\sigma^{*},\theta^{*}\equiv0\right)$$ is a perfect equilibrium. The fact that the equilibrium induces full cooperation, in the limit when the mass of commitment strategies converges to zero, implies that all normal agents must cooperate when they observe no defections, $$i.e.$$$$s_{0}\left(c\right)=1$$ for each $$s\in S^{*}$$. Next we show that there is a normal strategy that induces the agent to defect with positive probability when observing a single defection, $$i.e.$$$$s_{1}\left(d\right)>0$$ for some $$s\in S^{*}$$. Assume to the contrary that $$s_{1}\left(c\right)=1$$ for each $$s\in S^{*}$$. If an agent (Alice) deviates and defects with small probability $$\epsilon<<1$$ when observing no defections, then she outperforms the incumbents. On the one hand, the fact that she occasionally defects when observing $$m=0$$ gives her a direct gain of at least $$\epsilon\cdot g$$. On the other hand, the probability that a partner observes her defecting twice or more is $$O\left(\epsilon^{2}\right)$$; therefore her indirect loss from these additional $$\epsilon$$ defections is at most $$O\left(\epsilon^{2}\right)\cdot\left(1+l\right)$$, and therefore for a sufficiently small $$\epsilon>0$$, Alice strictly outperforms the incumbents. The fact that $$s_{1}\left(d\right)>0$$ for some $$s\in S^{*}$$ implies that defection is a best reply conditional on an agent observing $$m=1$$. The direct gain from defecting is strictly increasing in the probability that the partner defects (because the game is defensive), while the indirect influence of defection on the behaviour of future partners is independent of the partner’s play. This implies that defection must be the unique best reply when an agent observes $$m\geq2$$, since such an observation implies a higher probability that the partner is going to defect relative to the observation of a single defection. This establishes that $$s_{m}\left(d\right)=1$$ for all $$m\geq2$$ and all $$s\in S^{*}$$. To demonstrate that there is a strategy $$s$$ such that $$s_{1}\left(d\right)<1$$, assume to the contrary that $$s_{1}\left(d\right)=1$$ for each $$s\in S^{*}$$. Suppose that the average probability of defection in the population is $$0<\Pr\left(d\right)$$. Since there is full cooperation in the limit we have $$\Pr\left(d\right)=O\left(\epsilon\right)$$. This implies that a random partner is observed to defect at least once with a probability of $$k\cdot\Pr\left(d\right)+O\left(\epsilon^{2}\right)$$. This in turn induces the defection of a fraction $$k\cdot\Pr\left(d\right)+O\left(\epsilon^{2}\right)$$ of the normal agents (under the assumption that $$s_{1}\left(d\right)=1$$). Since the normal agents constitute a fraction $$1-O\left(\epsilon\right)$$ of the population we must have $$\Pr\left(d\right)=k\cdot\Pr\left(d\right)+O\left(\epsilon^{2}\right)$$, which leads to a contradiction for any $$k\geq2$$. Thus, if $$s_{1}\left(d\right)=1$$, then defections are “contagious”, and so there is no steady state in which only a fraction $$O\left(\epsilon\right)$$ of the population defects. This completes the sketch of the proof of part 1. To prove part 2 of the theorem, let $$s^{1}$$ and $$s^{2}$$ be the strategies that defect iff $$m\geq1$$ and $$m\geq2$$, respectively. Consider the state $$\left(\left\{ s^{1},s^{2}\right\} ,\left(q^{*},1-q^{*}\right),\theta^{*}\equiv0\right)$$. The direct gain from defecting (relative to cooperating) when observing a single defection is   $\Pr\left(m=1\right)\cdot\left(\left(l\cdot\Pr\left(d|m=1\right)\right)+g\cdot\Pr\left(c|m=1\right)\right),$ where $$\Pr\left(d|m=1\right)$$ ($$\Pr\left(c|m=1\right)$$) is the probability that a random partner is going to defect (cooperate) conditional on the agent observing $$m=1$$, and $$\Pr\left(m=1\right)$$ is the average probability of observing signal $$m=1$$. The indirect loss from defection, relative to cooperation, conditional on the agent observing a single defection, is   $q^{*}\cdot\left(k\cdot\Pr\left(m=1\right)\right)\cdot\left(l+1\right)+O\left(\left(\Pr\left(m=1\right)\right)^{2}\right).$ To see this, note that a random partner defects with an average probability of $$q$$ if he observes a single defection (which occurs with probability $$k\cdot\Pr\left(m=1\right)$$ when the partner makes $$k$$ i.i.d. observations, each of which has a probability of $$\Pr\left(m=1\right)$$ of being a defection), and each defection induces a loss of $$l+1$$ to the agent (who obtains $$-l$$ instead of 1). The fact that some normal agents cooperate and others defect when observing a single defection implies that in an equilibrium both actions have to be best replies conditional on the agent observing $$m=1$$. This implies that the indirect loss from defecting is exactly equal to the direct gain (up to $$O\left(\left(\Pr\left(m=1\right)\right)^{2}\right)$$), $$i.e.$$  $\Pr\left(m=1\right)\cdot\left(\left(l\cdot\Pr\left(d|m=1\right)\right)+g\cdot\Pr\left(c|m=1\right)\right)=q^{*}\cdot\left(k\cdot\Pr\left(m=1\right)\right)\cdot\left(l+1\right)$   $$\Rightarrow q^{*}=\frac{\left(l\cdot\Pr\left(d|m=1\right)\right)+g\cdot\Pr\left(c|m=1\right)}{k\cdot\left(l+1\right)}.\label{eq:q-indifference-equation}$$ (5) The probability $$\Pr\left(d|m=1\right)$$ depends on the distribution of commitments. Yet, one can show that for every distribution of commitment strategies $$\left(S^{C},\lambda\right)$$, there is a unique value of $$q^{*}\in\left(0,\frac{1}{k}\right)$$ that solves equation (5) and that, given this $$q^{*}$$, both $$s^{1}$$ and $$s^{2}$$ (and only these strategies) are best replies. This means that the steady state $$\left(\left\{ s^{1},s^{2}\right\} ,\left(q^{*},1-q^{*}\right),\theta^{*}\equiv0\right)$$ is a perfect equilibrium. ∥ Discussion of Theorem 2. We comment on a few issues related to Theorem 2. (1) In the formal proof of Theorem 2 we show that cooperation satisfies the stronger refinements of strict perfection, evolutionary stability, and robustness (see Section 3.3 and Supplementary Appendix D). (2) Each distribution of commitment strategies induces a unique frequency $$q^{*}\in\left(0,\frac{1}{k}\right)$$ of $$s^{1}$$-agents, which yields a perfect equilibrium. One may wonder whether a population starting from a different share $$q_{0}\neq q^{*}$$ of $$s^{1}$$-agents is likely to converge to the equilibrium frequency $$q^{*}$$. It is possible to show that the answer is affirmative. Specifically, given any initial low frequency $$q_{0}\in\left(0,q^{*}\right)$$, the $$s^{1}$$-agents achieve a higher payoff than the $$s^{2}$$-agents and, given any initial high frequency $$q_{0}\in\left(q^{*},\frac{1}{k}\right)$$, the $$s^{1}$$-agents achieve a lower payoff than the $$s^{2}$$-agents. Thus, under any smooth monotonic dynamic process in which a more successful strategy gradually becomes more frequent, the share of $$s^{1}$$-agents will shift from any initial value in the interval $$q_{0}\in\left(0,\frac{1}{k}\right)$$ to the exact value of $$q^{*}$$ that induces a perfect equilibrium. (3) As discussed in the formal proof in Supplementary Appendix E.3, some distributions of commitment strategies may induce a slightly different perfect equilibrium, in which the population is homogeneous, and each agent in the population defects with probability $$q^{*}\left(\mu\right)$$ when observing a single defection (contrary to the heterogeneous deterministic behaviour described above). (4) Random number of observed actions. Consider a random environment$$\left(G_{PD},p\right)$$, where $$p\in\Delta\left(\mathbb{N}\right)$$ is a distribution with a finite support, and each agent privately observes $$k$$ actions of the partner with probability $$p\left(k\right)$$. Theorem 2 (and, similarly, Theorems 3–5) can be extended to this setup for any random environment in which the probability of observing at least two interactions is sufficiently high. The perfect equilibrium has to be adapted as follows. As in the main model, all normal agents cooperate (defect) when observing no (at least two) defections. In addition, there will be a value $$\bar{k}\in supp\left(p\right)$$ and a probability $$q\in\left[0,1\right]$$ (which depend on the distribution of commitment strategies), such that all normal agents cooperate (defect) when observing a single defection out of $$k>\bar{k}$$ ($$k<\bar{k}$$), and a fraction $$q$$ of the normal agents defect when observing a single defection out of $$\bar{k}$$ observations. (5) Cheap talk. In Supplementary Appendix F we discuss the influence on Theorems 1 and 2 of the introduction of pre-play (slightly costly) cheap-talk communication. In this setup one can show that: (a) Offensive games: No stable state exists. Both defection and cooperation are only “quasi-stable” the population state occasionally changes between theses two states, based on the occurrence of rare random experimentations. The argument is adapted from Wiseman and Yilankaya (2001). (b) Defensive games (and $$k\geq2$$): The introduction of cheap talk destabilizes all inefficient equilibria, leaving cooperation as the unique stable outcome. The argument is adapted from Robson (1990). (6) General Noise Structures: In the model described above we deal with perturbed environments that include a single kind of noise, namely, committed agents who follow commitment strategies. It is possible to extend our results to include additional sources of noise: specifically, observation noise and/or trembles. We redefine a perturbed environment as a tuple $$E_{\epsilon,\delta}=\left(\left(G,k\right),\left(S^{C},\lambda\right),\alpha,\epsilon,\delta\right)$$, where $$\left(G,k\right),\left(S^{C},\lambda\right),\epsilon$$ are defined as in the main model, $$0<\delta<<1$$ is the probability of error in each observed action of a player, and $$\alpha\in\Delta\left(A\right)$$ is a totally mixed distribution according to which the observed error is sampled from in the event of an observation error. Alternatively, these errors can also be interpreted as actions played by mistake by the partner due to trembling hands. One can show that all of our results can be adapted to this setup in a relatively straightforward way. In particular, our results hold also in environments in which most of the noise is due to observation errors, provided that there is a small positive share of committed agents (possibly much smaller than the probability of an observation error). (7) The borderline case between defensiveness and offensiveness: $$g=l$$. Such a Prisoner’s Dilemma can be interpreted as a game in which each of the players simultaneously decides whether to sacrifice a personal payoff of $$g$$ in order to induce a gain of $$1+g$$ to her partner. One can show that cooperation is also a perfect equilibrium action in this setup, and that it can be supported by the same kind of perfect equilibrium as described above. However, in this case the uniqueness result (part 1 of Theorem 2) is no longer true. The reason for this is that when $$g=l$$ an agent has the same incentive to defect regardless of the signal she observes about the partner (because the direct bonus of defection is equal to $$g=l$$ regardless of the partner’s behaviour). This implies that cooperation can be supported by a large variety of strategies (including belief-free-like strategies as in Takahashi, 2010; Dilmé, 2016). We note that none of these strategies satisfy the refinement of evolutionary stability (Supplementary Appendix D). One can adapt the proof of Theorem 1 to show that defection is the unique evolutionarily stable outcome when $$g=l$$. The following example demonstrates the existence of a perfect equilibrium that supports cooperation when the unique commitment strategy is to play each action uniformly. Example 2. (Example 1 revisited: illustration of the perfect equilibrium that supports cooperation). Consider the perturbed environment $$\left(G_{D},2,\left\{ s^{u}\equiv0.5\right\} ,\epsilon\right)$$, where $$G_{D}$$ is the defensive Prisoner’s Dilemma game with the parameters $$g=1$$ and $$l=3$$ (as presented in Table 2 above). Consider the steady state $$\left(\left\{ s^{1},s^{2}\right\} ,\left(\frac{1}{6},\frac{5}{6}\right),\theta^{*}\right)$$, where $$\theta^{*}$$ is defined as in (3) in Example 1 above. A straightforward calculation shows that the average probability in which a normal agent observes $$m=1$$ when being matched with a random partner is   $\Pr\left(m=1\right)=\epsilon\cdot0.5+3.5\cdot\epsilon\cdot\frac{1}{6}+0.5\cdot\epsilon\cdot\frac{5}{6}+O\left(\epsilon^{2}\right)=1.5\cdot\epsilon+O\left(\epsilon^{2}\right).$ The probability that the partner is a committed agent conditional on observing a single defection is:   $\Pr\left(s^{u}|m=1\right)=\frac{\epsilon\cdot0.5}{1.5\cdot\epsilon}=\frac{1}{3}\,\,\Rightarrow\,\,\Pr\left(d|m=1\right)=\frac{1}{3}\cdot0.5=\frac{1}{6},$ which yields the conditional probability that the partner of a normal agent will defect. Next we calculate the direct gain from defecting conditional on the agent observing a single defection ($$m=1$$):   $\Pr\left(m=1\right)\cdot\left(\left(l\cdot\Pr\left(d|m=1\right)\right)+g\cdot\Pr\left(c|m=1\right)\right)=1.5\cdot\epsilon\cdot\left(3\cdot\frac{1}{6}+1\cdot\frac{5}{6}\right)+O\left(\epsilon^{2}\right)=2\cdot\epsilon+O\left(\epsilon^{2}\right).$ The indirect loss from defecting conditional on the agent observing a single defection is:   $q\cdot\left(k\cdot\Pr\left(m=1\right)\right)\cdot\left(l+1\right)+O\left(\epsilon^{2}\right)=q\cdot2\cdot1.5\cdot\epsilon\cdot\left(3+1\right)=12\cdot q\cdot\epsilon+O\left(\epsilon^{2}\right).$ When taking $$q=\frac{1}{6}$$ the indirect loss from defecting is exactly equal to the direct gain (up to $$O\left(\epsilon^{2}\right)$$). Stability of cooperation when observing a single action. We conclude this section by showing that in defensive Prisoner’s Dilemmas with $$k=1$$, cooperation is a regular perfect equilibrium action iff $$g<1$$. Proposition 2. Let $$E=\left(G_{PD},1\right)$$ be an environment where $$G_{PD}$$ is a defensive Prisoner’s Dilemma ($$g<l$$). Cooperation is a (regular) perfect equilibrium action iff $$g<1$$. Sketch of Proof. Similar arguments to those presented in part 1 of Theorem 2 imply that any distribution of commitment strategies induces a unique average probability $$q$$ by which normal agents defect when observing $$m=1$$, in any cooperative perfect equilibrium. This implies that a deviator who always defects gets a payoff of $$1+g$$ in a fraction $$1-q$$ of the interactions. One can show that such a deviator outperforms the incumbents if9$$g>1$$ (whereas, if $$g<1$$, there are distributions of commitment for which $$1-q$$ is sufficiently low such that the deviator is outperformed). ∥ Proposition 2 is immediately implied by Proposition 4 in Supplementary Appendix C.5, which characterizes which distributions of commitments support cooperation as a perfect equilibrium outcome in a defensive Prisoner’s Dilemma when $$k=1$$. 5. General Observation Structures In this section, we extend our analysis to general observation structures in which the signal about the partner may also depend on the behaviour of other opponents against the partner. 5.1. Definitions An observation structure is a tuple $$\Theta=\left(k,B,o\right)$$, where $$k\in\mathbb{N}$$ is the number of observed interactions, $$B=\left\{ b_{1},..,b_{\left|B\right|}\right\}$$ is a finite set of observations that can be made in each interaction, and the mapping $$o:A\times A\rightarrow B$$ describes the observed signal as a function the action profile played in the interaction (where the first action is the one played by the current partner, and the second action is the one played by his opponent). Note that observing actions (which was analysed in the previous section) is equivalent to having $$B=A$$ and $$o\left(a,a'\right)=a$$. In the results of this section we focus on three observation structures: (1) Observation of action profiles:$$B=A^{2}$$ and $$o\left(a,a'\right)=\left(a,a'\right).$$ In this observation structure, each agent observes, in each sampled interaction of her partner, both the action played by her partner and the action played by her partner’s opponent. (2) Observation of conflicts: observing whether or not there was mutual cooperation. That is, $$B=\left\{ C,D\right\}$$, $$o\left(c,c\right)=C$$, and $$o\left(a,a'\right)=D$$ for any $$\left(a,a'\right)\neq\left(c,c\right)$$. Such an observation structure (which we have not seen in the existing literature) seems like a plausible way to capture non-verifiable feedback about the partner’s behaviour. The agent can observe, in each sampled past interaction of the partner, whether both partners were “happy” ($$i.e.$$ mutual cooperation) or whether the partners complained about each other ($$i.e.$$ there was a conflict, at least one of the players defected, and it is too costly for an outside observer to verify who actually defected). (3) Observation of actions against cooperation: $$B=\left\{ CC,DC,*D\right\}$$ and $$o\left(c,c\right)=CC$$, $$o\left(d,c\right)=DC$$, and $$o\left(c,d\right)=o\left(d,d\right)=*D$$. That is, each agent (Alice) observes a ternary signal about each sampled interaction of her partner (Bob): either both players cooperated, or Bob unilaterally defected, or Bob’s partner defected (and in this latter case Alice cannot observe Bob’s action). We analyse this observation structure because it turns out to be an “optimal” observation structure that allows cooperation to be supported as a perfect equilibrium action in any Prisoner’s Dilemma. In each of these cases, we let the mapping $$o$$ and the set of signals $$B$$ be implied by the context, and identify the observation structure $$\Theta$$ with the number of observed interactions $$k$$. In what follows we present the definitions of the main model (Sections 2 and 3) that have to be changed to deal with the general observation structure. Before playing the game, each player independently samples $$k$$ independent interactions of her partner. Let $$M$$ denote the set of feasible signals:   $M=\left\{ m\in\mathbb{N}^{\left|B\right|}\left|\sum_{i}m_{i}=k\right.\right\} ,$ where $$m_{i}$$ is interpreted as the number of times that observation $$b_{i}$$ has been observed in the sample. When agents observe conflicts, we simplify the notation by letting $$M=\left\{ 1,...,k\right\}$$, and interpreting $$m\in\left\{ 1,...,k\right\}$$ as the number of observed conflicts. The definitions of a strategy and a perturbed environment remain the same. Given a distribution of action profiles $$\psi\in\Delta\left(A\times A\right)$$, let $$\nu_{\psi}=\nu\left(\psi\right)\in\Delta\left(M\right)$$ be the multinomial distribution of signals that is induced by the distribution of action profiles $$\psi$$, $$i.e.$$   $\nu_{\psi}\left(m_{1},...,m_{\left|B\right|}\right)=\frac{k!}{m_{1}!\cdot...\cdot m_{\left|B\right|}!}\cdot\prod_{i=1}^{\left|B\right|}\left(\sum_{\left\{ \left(a,a'\right)\in A\times A|o\left(a,a'\right)=b_{i}\right\} }\psi\left(a,a'\right)\right)^{m_{i}}.$ The definition of a steady state is adapted as follows. Definition 6. (Adaptation of Def. 6).A steady state (or state) of a perturbed environment $$\left(\left(G,k\right),\left(S^{C},\lambda\right),\epsilon\right)$$ is a triple $$\left(S^{N},\sigma,\theta\right)$$, where $$S^{N}\subseteq\mathcal{S}$$ is a finite set of strategies, $$\sigma\in\Delta\left(S^{N}\right)$$ is a distribution, and $$\theta:\left(S^{N}\cup S^{C}\right)\rightarrow\Delta\left(M\right)$$ is a profile of signal distributions that satisfies for each signal $$m$$ and each strategy $$s$$ the consistency requirement (7) below. Let $$\psi_{s}\in\Delta\left(A\times A\right)$$ be the (possibly correlated) distribution of action profiles that is played when an agent with strategy $$s\in S^{N}\cup S^{C}$$ is matched with a random partner (given $$\sigma$$ and $$\theta$$); $$i.e.$$ for each $$\left(a,a'\right)\in A\times A$$, where $$a$$ is interpreted as the action of the agent with strategy $$s$$, and $$a'$$ is interpreted as the action of her partner, let  $$\psi_{s}\left(a,a'\right)=\sum_{s'\in S^{N}\cup S^{C}}\left(\left(1-\epsilon\right)\cdot\sigma\left(s'\right)+\epsilon\cdot\lambda\left(s'\right)\right)\cdot s\left(\theta_{s'}\right)\left(a\right)\cdot s'\left(\theta_{s}\right)\left(a'\right).\label{eq:psi-s}$$ (6) The consistency requirement that the mapping $$\theta$$ has to satisfy is  $$\forall m\in M,\,\,s\in S^{N}\cup S^{C},\,\,\,\,\theta_{s}\left(m\right)=\nu\left(\psi_{s}\right)\left(m\right).\label{eq:consistency-1}$$ (7) The definition of the long-run payoff of an incumbent agent remains unchanged. We now adapt the definition of the payoff of an agent (Alice) who deviates and plays a non-incumbent strategy. Unlike in the basic model, in this extension there might be multiple consistent outcomes following Alice’s deviation, as demonstrated in Example 3. Example 3. Consider an unperturbed environment $$\left(G_{PD},3\right)$$ with an observation of $$k=3$$ action profiles. Consider a homogeneous incumbent population in which all agents play the following strategy: $$s^{*}\left(m\right)=d$$ if $$m\,\textrm{includes at least 2 interactions with}\,\left(d,d\right),$$ and $$s^{*}\left(m\right)=c$$ otherwise. Consider the state $$\left(\left\{ s^{*}\right\} ,\theta^{*}=0\right)$$ in which everyone cooperates. Consider a deviator (Alice) who follows the strategy of always defecting. Then there exist three consistent post-deviation steady states (in all of which the incumbents continue to cooperate among themselves): (1) all the incumbents defect against Alice, (2) all the incumbents cooperate against Alice, and (3) all the incumbents defect against Alice with a probability of 50%. Formally, we define a consistent distribution of signals for a deviator as follows. Definition 7. Given steady state $$\left(S^{N},\sigma,\theta\right)$$ and non-incumbent strategy $$\hat{s}\in\mathcal{S}\backslash\left(S^{N}\cup S^{C}\right)$$, we say that a distribution of signals $$\theta_{\hat{s}}\in\Delta\left(M\right)$$ is consistent if  $\forall m\in M,\,\,\,\,\theta_{\hat{s}}\left(m\right)=\nu\left(\psi_{\hat{s}}\right)\left(m\right),$ where $$\psi_{s}\in\Delta\left(A\times A\right)$$ is defined as in (6) above. Let $$\Theta_{\hat{s}}\subseteq\Delta\left(M\right)$$ be the set of all consistent signal distributions of strategy $$\hat{s}$$. Given steady state $$\left(S,\sigma,\theta\right)$$, non-incumbent strategy $$\hat{s}\in\mathcal{S}\backslash\left(S^{N}\cup S^{C}\right)$$, and consistent signal distribution $$\theta\left(s\right)\equiv\theta_{\hat{s}}\in\Delta\left(M\right)$$, let $$\pi_{\hat{s}}\left(S,\sigma,\theta|\theta_{\hat{s}}\right)$$ denote the deviator’s (long-run) payoff given that in the post-deviation steady state the deviator’s distribution of signals is $$\theta_{\hat{s}}$$. Formally:   $\pi_{\hat{s}}\left(S,\sigma,\theta|\theta_{\hat{s}}\right)=\sum_{s'\in S^{N}\cup S^{C}}\left(\left(1-\epsilon\right)\cdot\sigma\left(s'\right)+\epsilon\cdot\lambda\left(s'\right)\right)\cdot\left(\sum_{\left(a,a'\right)\in A\times A}\hat{s}_{\theta\left(s'\right)}\left(a\right)\cdot s'_{\theta\left(\hat{s}\right)}\left(a'\right)\cdot\pi\left(a,a'\right)\right).$ Let $$\pi_{\hat{s}}\left(S,\sigma,\theta\right)$$ be the maximal (long-run) payoff for a deviator who follows strategy $$\hat{s}$$ in a post-deviation steady state:   $$\pi_{\hat{s}}\left(S,\sigma,\theta\right):=_{\theta_{\hat{s}}\in\Theta_{\hat{s}}}\max\pi_{\hat{s}}\left(S,\sigma,\theta|\theta_{\hat{s}}\right).\label{eq:deviator-payoff-extension}$$ (8) Remark 2. Our results remain the same if one replaces the maximum function in (8) with a minimum function. 5.2. Acute and mild Prisoner’s Dilemma In this subsection, we present a novel classification of Prisoner’s Dilemma games that plays an important role in the results of this section. Recall that the parameter $$g$$ of a Prisoner’s Dilemma game may take any value in the interval $$\left[0,l+1\right]$$ (if $$g>l+1$$, then mutual cooperation is no longer the efficient outcome that maximizes the sum of payoffs). We say that a Prisoner’s Dilemma game is acute if $$g$$ is in the upper half of this interval ($$i.e.$$ if $$g>\frac{l+1}{2}$$), and mild if it’s in the lower half ($$i.e.$$ if $$g<\frac{l+1}{2}$$). The threshold, $$g=\frac{l+1}{2}$$, is characterized by the fact that the gain from a single unilateral defection is exactly half the loss incurred by the partner who is the sole cooperator. Hence, unilateral defection is mildly tempting in mild games and acutely tempting in acute games. An interpretation of this threshold comes from a setup (which will be important for our results) in which an agent is deterred from unilaterally defecting because it induces future partners to unilaterally defect against the agent with some probability. Deterrence in acute games requires this probability of being punished to be more than 50%, while a probability of below 50% is enough for mild games. Figure 1 illustrates the classification of games into offensive/defensive and mild/acute. Example 4. Table 3 demonstrates the payoffs of specific acute ($$G_{A}$$) and mild ($$G_{M}$$) Prisoner’s Dilemma games. In both examples $$g=l$$, $$i.e.$$ the Prisoner’s Dilemma game is “linear”. This means that it can be described as a “helping game” in which agents have to decide simultaneously whether to give up a payoff of $$g$$ in order to create a benefit of $$1+g$$ for the partner. In the acute game ($$G_{A}$$) on the left, $$g=3$$ and the loss of a helping player amounts to more than half of of the benefit to the partner who receives the help ($$\frac{3}{3+1}=\frac{3}{4}>\frac{1}{2}$$), while in the mild game ($$G_{M}$$) on the right, $$g=0.2$$ and the loss of the helping player is less than half of the benefit to the partner who receives the help ($$\frac{0.2}{0.2+1}=\frac{1}{6}<\frac{1}{2}$$). TABLE 3 Matrix payoffs of acute and mild Prisoner’s Dilemma games     TABLE 3 Matrix payoffs of acute and mild Prisoner’s Dilemma games     5.3. Analysis of the stability of cooperation We first note that Proposition 1 is valid also in this extended setup, with minor adaptations to the proof. Thus, always defecting is a perfect equilibrium regardless of the observation structure. Next we analyse the stability of cooperation in each of the three interesting observation structures. The following two results show that under either observation of conflicts or observation of action profiles, cooperation is a perfect equilibrium iff the Prisoner’s Dilemma is mild. Moreover, in mild Prisoner’s Dilemma games there is essentially a unique strategy distribution that supports cooperation (which is analogous to the essentially unique strategy distribution in Theorem 2). Formally: Theorem 3. Let $$E=\left(G_{PD},k\right)$$ be an environment with observation of conflicts with $$k\geq2$$. (1)If $$G_{PD}$$ is a mild PD ($$g<\frac{l+1}{2}$$), then: (a)If $$\left(S^{*},\sigma^{*},\theta^{*}\equiv0\right)$$ is a perfect equilibrium then (1) for each $$s\in S^{*}$$, $$s_{0}\left(c\right)=1$$ and $$s_{m}\left(d\right)=1$$ for each $$m\geq2$$, and (2) there exist $$s,s'\in S^{*}$$ such that $$s_{1}\left(d\right)<1$$ and $$s'_{1}\left(d\right)>0$$. (b)Cooperation is a regular perfect equilibrium action. (2)If $$G_{PD}$$ is an acute PD ($$g>\frac{l+1}{2}$$), then cooperation is not a perfect equilibrium action. Sketch of Proof. The argument for part 1(a) is analogous to Theorem 2. In what follows we sketch the proofs of part 1(b) and part 2. Fix a distribution of commitments, and a commitment level $$\epsilon\in\left(0,1\right)$$. Let $$m$$ denote the number of observed conflicts and define $$s^{1}$$ and $$s^{2}$$ as before, but with the new meaning of $$m$$. Consider the following candidate for a perfect equilibrium $$\left(\left\{ s^{1},s^{2}\right\} ,\left(q,1-q\right),\theta^{*}\equiv0\right)$$. Here, the probability $$q$$ will be determined such that both actions are best replies when an agent observes a single conflict. That is, the direct benefit from her defecting when observing $$m=1$$ (the LHS of the equation below) must balance the indirect loss due to inducing future partners who observe these conflicts to defect (the RHS, neglecting terms of $$O\left(\epsilon\right)$$). The RHS is calculated by noting that defection induces an additional conflict only if the current partner has cooperated and that, on expectation, each such additional conflict is observed by $$k$$ future partners, each of whom defects with an average probability of $$q$$). Recall that $$\Pr\left(d|m=1\right)$$ ($$\Pr\left(c|m=1\right)$$) is the probability that a random partner is going to defect (cooperate) conditional on the agent observing $$m=1$$.   $\Pr\left(m=1\right)\cdot\left(\left(l\cdot\Pr\left(d|m=1\right)\right)+g\cdot\Pr\left(c|m=1\right)\right)=\Pr\left(m=1\right)\cdot k\cdot q\cdot\Pr\left(c|m=1\right)\cdot\left(l+1\right)$   $$\Leftrightarrow q\cdot k=\frac{\left(l\cdot\Pr\left(d|m=1\right)\right)+g\cdot\Pr\left(c|m=1\right)}{\Pr\left(c|m=1\right)\cdot\left(l+1\right)}.\label{eq:q-k-conflict}$$ (9) One can see that the RHS is increasing in $$\Pr\left(d|m=1\right)$$. The minimal bound on the value of $$q$$ is obtained when $$\Pr\left(d|m=1\right)=0$$. In this case $$q\cdot k=\frac{g}{l+1}$$. Suppose that the game is acute. In this case $$q\cdot k>0.5$$. Suppose that the average probability of defection in the population is $$\Pr\left(d\right)$$. Since there is full cooperation in the limit we have $$\Pr\left(d\right)=O\left(\epsilon\right)$$. This implies that a fraction $$2\cdot\Pr\left(d\right)+O\left(\epsilon^{2}\right)$$ of the population is involved in conflicts. This in turn induces the defection of a fraction $$2\cdot\Pr\left(d\right)\cdot k\cdot q+O\left(\epsilon^{2}\right)$$ of the normal agents (because a normal agent defects with probability $$q$$ upon observing at least one conflict in the $$k$$ sampled interactions). Since the normal agents constitute a fraction $$1-O\left(\epsilon\right)$$ of the population we must have $$\Pr\left(d\right)=2\cdot\Pr\left(d\right)\cdot k\cdot q+O\left(\epsilon^{2}\right)$$. However, in an acute game, $$2\cdot k\cdot q>1$$ leads to the contradiction that $$\Pr\left(d\right)<\Pr\left(d\right)$$. Thus, if $$2\cdot k\cdot q>1$$, then defections are contagious, and so there is no steady state in which only a fraction $$O\left(\epsilon\right)$$ of the population defects. Suppose that the game is mild. One can show that $$\Pr\left(d|m=1\right)$$ is decreasing in $$q$$, and that it converges to zero when $$k\cdot q\nearrow0.5$$. (The reason is that when $$k\cdot q$$ is close to 0.5 each defection by a committed agent induces many defections by normal agents and, conditional on observing $$m=1$$, the partner is likely to be normal and to cooperate when being matched with a normal agent.) It follows that the RHS of equation (9) is decreasing in $$q$$ and approaches the value $$\frac{g}{l+1}$$ when $$k\cdot q\nearrow0.5$$. Since the game is mild, $$\frac{g}{l+1}<0.5$$. Hence there is some $$q\cdot k<0.5$$ that solves equation (9), and in which the normal agents defect with a low probability of ($$O\left(\epsilon\right)$$). ∥ Theorem 4. Let $$E=\left(G_{PD},k\right)$$ be an environment with observation of action profiles and $$k\geq2$$. (1)If $$G_{PD}$$ is a mild PD ($$g<\frac{l+1}{2}$$), then cooperation is a regular perfect equilibrium action. (2)If $$G_{PD}$$ is an acute PD ($$g>\frac{l+1}{2}$$), then cooperation is not a perfect equilibrium action. Sketch of Proof. Using arguments that are familiar from above one can show that in any perfect equilibrium that supports cooperation, normal agents have to defect with an average probability of $$q\in\left(0,1\right)$$ when observing a single unilateral defection (and $$k-1$$ mutual cooperations), and defect with a smaller probability when observing a single mutual defection (since this is necessary in order for a normal agent to have better incentives to cooperate against a partner who is more likely to cooperate). The value of $$q$$ is determined by equation (9) above, implying that both actions are best replies conditional on an agent observing the partner to be the sole defector once, and to be involved in mutual cooperation in the remaining $$k-1$$ observed action profiles. Let $$\epsilon$$ be the share of committed agents, and let $$\varphi$$ be the average probability that a committed agent unilaterally defects. To simplify the sketch of the proof, we will focus on the case in which the committed agents defect with a small probability when observing the partner to have been involved only in mutual cooperations, which implies, in particular, that $$\varphi<<1$$ (the formal proof in the Supplementary Appendix does not make this simplifying assumption). The unilateral defections of the committed agents induce a fraction $$\epsilon\cdot\varphi\cdot k\cdot q+O\left(\epsilon^{2}\right)+O\left(\varphi^{2}\right)$$ of the normal agents to defect when being matched against committed agents (because a normal agent defects with probability $$q$$ upon observing a single unilateral defection in the $$k$$ sampled interactions). These unilateral defections of normal agents against committed agents induce a further $$\left(\epsilon\cdot\varphi\cdot k\cdot q\right)\cdot k\cdot q+O\left(\epsilon^{2}\right)$$ defections of normal agents against other normal agents. Repeating this argument we come to the conclusion that the average probability of a normal agent being the sole defector is (neglecting terms of $$O\left(\epsilon^{2}\right)$$ and $$O\left(\varphi^{2}\right)$$):   $\epsilon\cdot\varphi\cdot k\cdot q\cdot\left(1+k\cdot q+\left(k\cdot q\right)^{2}+...\right)=\epsilon\cdot\varphi\cdot\frac{k\cdot q}{1-k\cdot q.}$ As discussed above, in acute games, the value of $$k\cdot q$$ must be larger than $$0.5$$, which implies that $$\frac{k\cdot q}{1-k\cdot q}>1$$. This implies that conditional on an agent observing the partner to be the sole defector once, the posterior probability that the partner is normal is:   $\frac{\epsilon\cdot\varphi\cdot\frac{k\cdot q}{1-k\cdot q}}{\epsilon\cdot\varphi+\epsilon\cdot\varphi\cdot\frac{k\cdot q}{1-k\cdot q}}=\frac{\frac{k\cdot q}{1-k\cdot q}}{1+\frac{k\cdot q}{1-k\cdot q}}>0.5.$ Thus, normal agents are more likely to unilaterally defect than committed agents. One can show that when there is a mutual defection, it is most likely that at least one of the agents involved is committed. This implies that the partner is more likely to defect when he is observed to be involved in mutual defection relative to being observed to be the sole defector. This implies that defection is the unique best reply when observing a single mutual defection, and this contradicts the assumption that normal agents cooperate with positive probability when observing a single mutual defection. When the game is mild, a construction similar to the previous proofs supports cooperation as a perfect equilibrium. ∥ Our last result studies the observation of actions against cooperation, and it shows that cooperation is a perfect equilibrium action in any underlying Prisoner’s Dilemma. Formally: Theorem 5. Let $$E=\left(G_{PD},k\right)$$ be an environment with observation of actions against cooperation and $$k\geq2$$. Then cooperation is a regular perfect equilibrium action. The intuition behind the proof is as follows. Not allowing Alice to observe Bob’s behaviour when his past opponent has defected helps to sustain cooperation because it implies that defecting against a defector does not have any negative indirect effect (in any steady state) because it is never observed by future opponents. This encourages agents to defect against partners who are more likely to defect, and allows cooperation to be sustained regardless of the values of $$g$$ and $$l$$. TABLE 4 summarizes our analysis and shows the characterization of the conditions under which cooperation can be sustained as a perfect equilibrium outcome in environments in which agents observe at least 2 actions. TABLE 4 Summary of key results: when is cooperation a perfect equilibrium outcome? Category of PD  Parameters  Observation structure (any $$k\geq2$$)  Actions  Conflicts  Action profiles  Actions against cooperation  Mild & Defensive  $$\overset{\overset{\overset{\,}{\,}}{\,}}{\underset{}{g}}\underset{\,}{<}\min\left(l,\frac{l+1}{2}\right)$$          Mild & Offensive  $$l < {g} < \frac{l+1}{2}$$          Acute & Defensive  $$\frac{l+1}{2} < {g} < l$$          Acute & Offensive  $$\max\left(l,\frac{l+1}{2}\right)<\overset{\overset{\overset{\,}{\,}}{\,}}{\underset{}{g}}$$          Category of PD  Parameters  Observation structure (any $$k\geq2$$)  Actions  Conflicts  Action profiles  Actions against cooperation  Mild & Defensive  $$\overset{\overset{\overset{\,}{\,}}{\,}}{\underset{}{g}}\underset{\,}{<}\min\left(l,\frac{l+1}{2}\right)$$          Mild & Offensive  $$l < {g} < \frac{l+1}{2}$$          Acute & Defensive  $$\frac{l+1}{2} < {g} < l$$          Acute & Offensive  $$\max\left(l,\frac{l+1}{2}\right)<\overset{\overset{\overset{\,}{\,}}{\,}}{\underset{}{g}}$$          TABLE 4 Summary of key results: when is cooperation a perfect equilibrium outcome? Category of PD  Parameters  Observation structure (any $$k\geq2$$)  Actions  Conflicts  Action profiles  Actions against cooperation  Mild & Defensive  $$\overset{\overset{\overset{\,}{\,}}{\,}}{\underset{}{g}}\underset{\,}{<}\min\left(l,\frac{l+1}{2}\right)$$          Mild & Offensive  $$l < {g} < \frac{l+1}{2}$$          Acute & Defensive  $$\frac{l+1}{2} < {g} < l$$          Acute & Offensive  $$\max\left(l,\frac{l+1}{2}\right)<\overset{\overset{\overset{\,}{\,}}{\,}}{\underset{}{g}}$$          Category of PD  Parameters  Observation structure (any $$k\geq2$$)  Actions  Conflicts  Action profiles  Actions against cooperation  Mild & Defensive  $$\overset{\overset{\overset{\,}{\,}}{\,}}{\underset{}{g}}\underset{\,}{<}\min\left(l,\frac{l+1}{2}\right)$$          Mild & Offensive  $$l < {g} < \frac{l+1}{2}$$          Acute & Defensive  $$\frac{l+1}{2} < {g} < l$$          Acute & Offensive  $$\max\left(l,\frac{l+1}{2}\right)<\overset{\overset{\overset{\,}{\,}}{\,}}{\underset{}{g}}$$          6. Related Literature In what follows we discuss related literature that was not discussed above. Related experimental literature is discussed in Supplementary Appendix B. 6.1. Models with rare committed types Various papers have shown that when a patient long-run agent (she) plays a repeated game against partners who can observe her entire history of play, and there is a small probability of the agent being a commitment type, then the agent can guarantee herself a high payoff in any equilibrium by mimicking an irrational type committed to Stackelberg-leader behaviour ($$e.g.$$Kreps et al., 1982; Fudenberg and Levine, 1989; Celetani et al., 1996; see Mailath and Samuelson, 2006, for a textbook analysis and survey). When both sides of the game are equally patient, and, possibly, both sides have a small probability of being a commitment type, then the specific details about the set of feasible commitment types, the underlying game, and the discount factor are important in determining whether an agent can guarantee a high Stackelberg-leader payoff or whether a folk theorem result holds and the set of equilibrium payoffs is the same as in the case of complete information (see, $$e.g.$$Cripps and Thomas, 1995; Chan, 2000; Cripps et al., 2005; Hörner and Lovo, 2009; Atakan and Ekmekci, 2011; Pęski, 2014). One contribution of our article is to demonstrate that the introduction of a small probability that an agent is committed may have qualitatively different implications in repeated games with random matching.10 In defensive games, the presence of a few committed agents in the population implies that there is a unique stationary strategy to sustain full cooperation. In offensive games with observation of actions, the presence of committed agents implies that the low payoff of zero (of mutual defection) is the unique equilibrium payoff in the stationary model (and it rules out the highest symmetric payoff of 1 in the conventional model).11 6.2. Image scoring In an influential paper, Nowak and Sigmund (1998) present the mechanism of image scoring to support cooperation when agents from a large community are randomly matched and each agent observes the partner’s past actions. In their setup, each agent observes the last $$k$$ past actions of the partner, and she defects if and only if the partner has defected at least $$m$$ times in the last $$k$$ observed actions. A couple of papers have raised concerns about the stability of cooperation under image-scoring mechanisms. Specifically, Leimar and Hammerstein (2001) demonstrate in simulations that cooperation is unstable, and Panchanathan and Boyd (2003) analytically study the case in which each agent observes the last action.12 Our article makes two key contributions to this literature. First, we introduce a novel variant of image scoring that is essentially the unique stationary way to support cooperation as a perfect equilibrium outcome when agents observe actions. Second, we show that the classification of Prisoner’s Dilemma games into offensive and defensive games is critical to the stability of cooperation when agents observe actions (and image scoring fails in offensive Prisoner’s Dilemma games). 6.3. Structured populations and voluntarily separable interactions A few papers have studied the scope of cooperation in the case where players do not have any information about their current partner but the matching of agents is not uniformly random. That is, the population is assumed to have some structure such that some agents are more likely to be matched to some partners than to other partners. van Veelen, et al. (2012) and Alger and Weibull (2013) show that it is possible to sustain cooperation with no information about the partner’s behaviour if matching is sufficiently assortative, $$i.e.$$ if cooperators are more likely to interact with other cooperators. Ghosh and Ray (1996) and Fujiwara-Greve and Okuno-Fujiwara (2009, 2017) show how to sustain cooperation in a related setup in which matching is random, but each pair of matched agents may unanimously agree to keep interacting without being rematched to other agents.13 Our paper shows that letting players observe the partner’s behaviour in two interactions is sufficient to sustain cooperation without assuming assortativity or repeated interactions with the same partner. 6.4. Models without calendar time The present paper differs from most of the literature on community enforcement by having a model without a global time zero. To the best of our knowledge, Rosenthal (1979) is the first paper to present the notion of a steady-state Nash equilibrium in environments in which each player observes the partner’s last action, and apply it to the study of the Prisoner’s Dilemma. Rosenthal focuses only on pure steady states (in which everyone uses the same pure strategy), and concludes that defection is the unique pure stationary Nash equilibrium action except in a few knife-edge cases. The methodology is further developed in Okuno-Fujiwara and Postlewaite (1995). Other papers following a related approach include Rubinstein and Wolinsky (1985), who study bargaining, and Phelan and Skrzypacz (2006) who study repeated games with private monitoring. Our methodological contribution to the previous literature is that (1) we allow each agent to observe the behaviour of the partner in several past interactions with other opponents, and (2) we combine the steady-state analysis with the presence of a few committed agents and present a novel notion of a perfect equilibrium to analyse this setup. 7. Conclusion In many situations, people engage in short-term interactions where they are tempted to behave opportunistically but there is a possibility that future partners will obtain some information about their behaviour today. We propose a new modelling approach based on the premises that (1) an equilibrium has to be robust to the presence of a few committed agents, and (2) the community has been interacting from time immemorial (though this latter assumption is relaxed in Supplementary Appendix A). We develop a novel methodology that allows for a tractable analysis of these seemingly complicated environments. We apply this methodology to the study of Prisoner’s Dilemma games, and we obtain sharp testable predictions for the equilibrium outcomes, and the exact conditions under which cooperation can be sustained as an equilibrium outcome. Finally, we show that whenever cooperation is sustainable, there is a unique (and novel) way to support it that has a few appealing properties: (1) agents behave in an intuitive and simple way, and (2) the equilibrium is robust, $$e.g.$$ to deviations by a group of agents, or to the presence of any kind of committed agents. We believe that our modelling approach will be helpful in understanding various interactions in future research. Acknowledgements A previous version of this article was circulated under the title “Stable observable behaviour”. We have benefited greatly from discussions with Vince Crawford, Eddie Dekel, Christoph Kuzmics, Ariel Rubinstein, Larry Samuelson, Bill Sandholm, Rann Smorodinsky, Rani Spiegler, Balázs Szentes, Satoru Takahashi, Jörgen Weibull, and Peyton Young. We would like to express our deep gratitude to seminar/workshop participants at the University of Amsterdam (CREED), University of Bamberg, Bar Ilan University, Bielefeld University, University of Cambridge, Hebrew University of Jerusalem, Helsinki Center for Economic Research, Interdisciplinary Center Herzliya, Israel Institute of Technology, Lund University, University of Oxford, University of Pittsburgh, Stockholm School of Economics, Tel Aviv University, NBER Theory Workshop at Wisconsin-Madison, KAEA session at the ASSA 2015, the Biological Basis of Preference conference at Simon Fraser University, and the 6th workshop on stochastic methods in game theory at Erice, for many useful comments. Danial Ali Akbari provided excellent research assistance. Yuval Heller is grateful to the European Research Council for its financial support (Starting Grant #677057). Erik Mohlin is grateful to Handelsbankens forskningsstiftelser (grant #P2016-0079:1), the Swedish Research Council (grant #2015-01751), and the Knut and Alice Wallenberg Foundation (Wallenberg Academy Fellowship #2016-0156) for their financial support. Finally, we thank Renana Heller for suggesting the title. Supplementary Data Supplementary data are available at Review of Economic Studies online. Footnotes 1. In contagious equilibria players start by cooperating. If one player defects at stage $$t$$, her partner defects at stage $$t+1$$, infecting another player who defects at stage $$t+2$$, and so on. In belief-free equilibria players are always indifferent between their actions, but they choose different mixed actions depending on the signal they obtain about the partner. We discuss the non-robustness of these classes of equilibria at the end of Section 4.2. 2. As discussed later, our uniqueness results also rely on an additional assumption that agents are restricted to choose stationary strategies, which depend only on the signal about the partner. As shown in Supplementary Appendix A, all other results hold also in a standard setup without the restriction to stationary strategies. 3. The reason why the consistent signal profile is required to be part of the description of a steady state, rather than being uniquely determined by the distribution of strategies, is that our environment, unlike a standard repeated game, lacks a global starting time that determines the initial conditions. An example of a strategy that has multiple consistent signal profiles is as follows. The parameter $$k$$ is equal to three, and everyone plays the most frequently observed action in the sample of the three observed actions. There are three behaviours that are consistent with this population: one in which everyone cooperates, one in which everyone defects, and one in which everyone plays (on average) uniformly. 4. In Supplementary Appendix D we show that all the equilibria presented in this article satisfy two additional refinements: (1) evolutionary stability (Maynard Smith, 1974)—any small group of agents who jointly deviate are outperformed, and (2) robustness—no small perturbation in the distribution of observed signals can move the population’s behaviour away from a situation in which everyone plays the equilibrium outcome. In addition, most of these equilibria also satisfy the refinement of strict perfection (Okada, 1981)—the equilibrium remains stable with respect to all commitment strategies. 5. The results can be adapted to a setup with a large finite population. We do not formalize a large finite population, as this adds much complexity to the model without giving substantial new insights. Most of the existing literature also models large populations as continua (see, $$e.g.$$Rubinstein and Wolinsky, 1985; Weibull, 1995; Dixit, 2003; Herold and Kuzmics, 2009; Sakovics and Steiner, 2012; Alger and Weibull, 2013). Kandori (1992) and Ellison (1994) show that large finite populations differ from infinite populations because only the former can induce contagious equilibria. However, as noted by Ellison (1994, p. 578), and as discussed in Section 4.2, these contagious equilibria fail in the presence of a single “crazy” agent who always defects. 6. We do not allow agents to manipulate the observed signals. In our companion paper (Heller and Mohlin, 2017a) we study a related setup in which agents are allowed to exert effort in deception by influencing the signal observed by the opponent. 7. In a companion paper (Heller and Mohlin, 2017b), we study in a broader setup necessary and sufficient conditions for a strategy distribution admitting a unique consistent signal profile. 8. Takahashi (2010) calls offensive (defensive) Prisoner’s Dilemmas submodular (supermodular). 9. In environments with $$k\geq2$$, a deviator who always defects gets a payoff of zero, regardless of the value of $$q$$ (because all agents observe $$m=k$$ when being matched with such a deviator). 10. We are aware of only one paper that introduces rare commitment types to repeated games with random matching. Dilmé (2016) constructs cooperative “tit-for-tat”-like equilibria that are robust to the presence of committed agents, in the borderline case in which $$g=l$$ is the underlying Prisoner’s Dilemma (see the discussion of the case of $$g=l$$ in Remark 7 in Section 4.3). Ghosh and Ray (1996) study a somewhat related setup (which is further discussed below) in which the presence of a non-negligible share of agents who are committed to always defecting allows cooperation to be sustained among the normal agents in voluntarily separable interactions. 11. Ely et al. (2008) show a related result in a setup in which a long-run player faces a sequence of short-run players. They show that if the participation of the short-run players is optional, and if every action of the long-run player that makes the short-run players want to participate can be interpreted as a signal that the long-run player is “bad,” then reputation uniquely chooses a low equilibrium payoff to the long-run player. 12. See Berger and Grüne (2016) who study observation of $$k$$ actions, but restrict agents to play only image-scoring-like strategies. 13. See also Herold (2012) who studies a “haystack” model in which individuals interact within separate groups. REFERENCES ALGER I. and WEIBULL J. W. ( 2013), “Homo Moralis – Preference Evolution under Incomplete Information and Assortative Matching”, Econometrica , 81, 2269– 2302. Google Scholar CrossRef Search ADS   ATAKAN A. E. and EKMEKCI M. ( 2011), “Reputation in Long-run Relationships”, The Review of Economic Studies , 79, 451– 480. Google Scholar CrossRef Search ADS   BERGER U. and GRÜNE A. ( 2016), “On the Stability of Cooperation under Indirect Reciprocity with First-order Information”, Games and Economic Behavior , 98, 19– 33. Google Scholar CrossRef Search ADS   BERNSTEIN L. ( 1992), “Opting Out of the Legal System: Extralegal Contractual Relations in the Diamond Industry”, The Journal of Legal Studies , 21, 115– 157. Google Scholar CrossRef Search ADS   CELETANI M., FUDENBERG D., LEVINE D. K., et al.   ( 1996), “Maintaining a Reputation Against a Long-lived Opponent”, Econometrica , 64, 691– 704. Google Scholar CrossRef Search ADS   CHAN J. ( 2000), “On the Non-existence of Reputation Effects in Two-person Infinitely-Repeated Games” ( Discussion Paper, Working Papers, The Johns Hopkins University, Department of Economics). CRIPPS M. W., DEKEL E. and PESENDORFER W. ( 2005), “Reputation with Equal Discounting in Repeated Games with Strictly Conflicting Interests”, Journal of Economic Theory , 121, 259– 272. Google Scholar CrossRef Search ADS   CRIPPS M. W. and THOMAS J. P. ( 1995), “Reputation and Commitment in Two-person Repeated Games without Discounting”, Econometrica , 6, 1401– 1419. Google Scholar CrossRef Search ADS   DEB J. ( 2017), “Cooperation and Community Responsibility: A Folk Theorem for Repeated Matching Games with Names” ( Mimeo). Google Scholar CrossRef Search ADS   DEB J. and GONZÁLEZ-DÍAZ J. ( 2014), “Community Enforcement Beyond the Prisoner’s Dilemma” ( Mimeo). DILMé F. ( 2016), “Helping Behavior in Large Societies”, International Economic Review , 57, 1261– 1278. Google Scholar CrossRef Search ADS   DIXIT A. ( 2003), “On Modes of Economic Governance”, Econometrica , 71, 449– 481. Google Scholar CrossRef Search ADS   DUFFY J. and OCHS J. ( 2009), “Cooperative Behavior and the Frequency of Social Interaction”, Games and Economic Behavior , 66, 785– 812. Google Scholar CrossRef Search ADS   ELLISON G. ( 1994), “Cooperation in the Prisoner’s Dilemma with Anonymous Random Matching”, The Review of Economic Studies , 61, 567– 588. Google Scholar CrossRef Search ADS   ELY J., FUDENBERG D. and LEVINE D. K. ( 2008), “When is Reputation Bad?”, Games and Economic Behavior , 63, 498– 526. Google Scholar CrossRef Search ADS   FUDENBERG D. and LEVINE D. K. ( 1989), “Reputation and Equilibrium Selection in Games with a Patient Player”, Econometrica , 57, 759– 778. Google Scholar CrossRef Search ADS   FUJIWARA-GREVE T. and OKUNO-FUJIWARA M. ( 2009), “Voluntarily Separable Repeated Prisoner’s Dilemma”, The Review of Economic Studies , 76, 993– 1021. Google Scholar CrossRef Search ADS   FUJIWARA-GREVE T. and OKUNO-FUJIWARA M. ( 2017), “Long-term Cooperation and Diverse Behavior Patterns under Voluntary Partnerships”, ( Mimeo). GHOSH P. and RAY D. ( 1996), “Cooperation in Community Interaction without Information Flows”, The Review of Economic Studies , 63, 491– 519. Google Scholar CrossRef Search ADS   GREIF A. ( 1993), “Contract Enforceability and Economic Institutions in Early Trade: The Maghribi Traders’ Coalition”, The American Economic Review , 83, 525– 548. HELLER Y. and MOHLIN E. ( 2017a), “Coevolution of Deception and Preferences: Darwin and Nash Meet Machiavelli” ( Mimeo). Google Scholar CrossRef Search ADS   HELLER Y. and MOHLIN E. ( 2017b), “When Is Social Learning Path-Dependent?”. HEROLD F. ( 2012), “Carrot or stick? The Evolution of Reciprocal Preferences in a Haystack Model”, American Economic Review , 102( 2), 914– 940. Google Scholar CrossRef Search ADS   HEROLD F. and KUZMICS C. ( 2009), “Evolutionary Stability of Discrimination under Observability”, Games and Economic Behavior , 67, 542– 551. Google Scholar CrossRef Search ADS   HÖRNER J. and LOVO S. ( 2009), “Belief-free Equilibria in Games With Incomplete Information”, Econometrica , 77, 453– 487. Google Scholar CrossRef Search ADS   JØSANG A., ISMAIL R. and BOYD C. ( 2007), “A Survey of Trust and Reputation Systems for Online Service Provision”, Decision Support Systems , 43, 618– 644. Google Scholar CrossRef Search ADS   KANDORI M. ( 1992), “Social Norms and Community Enforcement”, The Review of Economic Studies , 59, 63– 80. Google Scholar CrossRef Search ADS   KREPS D. M., MILGROM P., ROBERTS J., et al.   ( 1982), “Rational Cooperation in the Finitely Repeated Prisoners’ Dilemma”, Journal of Economic Theory , 27, 245– 252. Google Scholar CrossRef Search ADS   LEIMAR O. and HAMMERSTEIN P. ( 2001), “Evolution of Cooperation through Indirect Reciprocity”, Proceedings of the Royal Society of London. Series B: Biological Sciences , 268, 745– 753. Google Scholar CrossRef Search ADS   MAILATH G. J. and SAMUELSON L. ( 2006), Repeated Games and Reputations , vol. 2 ( Oxford: Oxford University Press). Google Scholar CrossRef Search ADS   MATSUSHIMA H., TANAKA T. and TOYAMA T. ( 2013), “Behavioral Approach to Repeated Games with Private Monitoring” ( University of Tokyo Faculty of Economics Discussion paper). MAYNARD SMITH J. ( 1974), “The Theory of Games and the Evolution of Animal Conflicts”, Journal of Theoretical Biology , 47, 209– 221. Google Scholar CrossRef Search ADS PubMed  MAYNARD SMITH J. and PRICE G. R. ( 1973), “The Logic of Animal Conflict”, Nature , 246, 15. Google Scholar CrossRef Search ADS   MILGROM P., NORTH D. C. and WEINGAST B. R. ( 1990), “The Role of Institutions in the Revival of Trade: The Law Merchant, Private Judges, and the Champagne Fairs”, Economics and Politics , 2, 1– 23. Google Scholar CrossRef Search ADS   NOWAK M. A. and SIGMUND K. ( 1998), “Evolution of Indirect Reciprocity by Image Scoring”, Nature , 393( 6685), 573– 577. Google Scholar CrossRef Search ADS PubMed  OKADA A. ( 1981), “On Stability of Perfect Equilibrium Points”, International Journal of Game Theory , 10, 67– 73. Google Scholar CrossRef Search ADS   OKUNO-FUJIWARA M. and POSTLEWAITE A. ( 1995), “Social Norms and Random Matching Games”, Games and Economic Behavior , 9, 79– 109. Google Scholar CrossRef Search ADS   PANCHANATHAN K. and BOYD R. ( 2003), “A Tale of Two Defectors: The Importance of Standing for Evolution of Indirect Reciprocity”, Journal of Theoretical Biology , 224, 115– 126. Google Scholar CrossRef Search ADS PubMed  PĘSKI M. ( 2014), “Repeated Games with Incomplete Information and Discounting”, Theoretical Economics , 9, 651– 694. Google Scholar CrossRef Search ADS   PHELAN C. and SKRZYPACZ A. ( 2006), “Private Monitoring with Infinite Histories” ( Discussion Paper, Federal Reserve Bank of Minneapolis). RESNICK P. and ZECKHAUSER R. ( 2002), “Trust Among Strangers in Internet Transactions: Empirical Analysis of eBay’s Reputation System”, The Economics of the Internet and E-commerce , 11, 23– 25. ROBSON A. J. ( 1990), “Efficiency in Evolutionary Games: Darwin, Nash, and the Secret Handshake”, Journal of Theoretical Biology , 144, 379– 396. Google Scholar CrossRef Search ADS PubMed  ROSENTHAL R. W. ( 1979), “Sequences of Games with Varying Opponents”, Econometrica , 47, 1353– 1366. Google Scholar CrossRef Search ADS   RUBINSTEIN A. and WOLINSKY A. ( 1985), “Equilibrium in a Market with Sequential Bargaining”, Econometrica , 53, 1133– 1150. Google Scholar CrossRef Search ADS   SAKOVICS J. and STEINER J. ( 2012), “Who Matters in Coordination Problems?”, The American Economic Review , 102, 3439– 3461. Google Scholar CrossRef Search ADS   SELTEN R. ( 1975), “Reexamination of the Perfectness Concept for Equilibrium Points in Extensive Games”, International Journal of Game Theory , 4, 25– 55. Google Scholar CrossRef Search ADS   SUGDEN R. ( 1986), The Economics of Rights, Co-operation and Welfare  ( Oxford: Blackwell Publishers). TAKAHASHI S. ( 2010), “Community Enforcement when Players Observe Partners’ Past Play”, Journal of Economic Theory , 145, 42– 62. Google Scholar CrossRef Search ADS   VAN VEELEN M., GARCÍA J., Rand D. G., et al.   ( 2012), “Direct Reciprocity in Structured Populations”, Proceedings of the National Academy of Sciences , 109, 9929– 9934. Google Scholar CrossRef Search ADS   WEIBULL J. W. ( 1995), Evolutionary Game Theory  ( Cambridge, MA: MIT Press). WISEMAN T., and YILANKAYA O. ( 2001), “Cooperation, Secret Handshakes, and Imitation in the Prisoners’ Dilemma”, Games and Economic Behavior , 37, 216– 242. Google Scholar CrossRef Search ADS   © The Author 2017. Published by Oxford University Press on behalf of The Review of Economic Studies Limited. Advance access publication 20 December 2017

### Journal

The Review of Economic StudiesOxford University Press

Published: Dec 20, 2017

## You’re reading a free preview. Subscribe to read the entire article.

### DeepDyve is your personal research library

It’s your single place to instantly
that matters to you.

over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month ### Explore the DeepDyve Library ### Search Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly ### Organize Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place. ### Access Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals. ### Your journals are on DeepDyve Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more. All the latest content is available, no embargo periods. DeepDyve ### Freelancer DeepDyve ### Pro Price FREE$49/month
\$360/year

Save searches from
PubMed

Create lists to

Export lists, citations