TY - JOUR AU - Vahid,, Farshid AB - Abstract In the payoff assessment model of choice (Sarin and Vahid, 1999), only the assessment of the chosen strategy is updated. We extend that model to allow the agent to also update the assessments of strategies that the agent thinks are similar to the chosen strategy. We use this model to explain observed behaviour in a recent experiment. Statistical tests cannot distinguish between the payoff distributions generated by the model and the observed payoff distributions in almost every period. We consider an individual who knows the set of available strategies. The set of strategies is ordered and the individual knows the order. However, the agent has little idea about the consequences of choosing any particular strategy. In particular, she does not know what payoff she will get from any choice, nor the distribution according to which a payoff might be selected. In fact, the agent may not even know, or think about, all the factors on which such a distribution may depend. After choosing a strategy, the agent observes its payoff. No further information about the payoff environment is obtained. In such an environment, we provide an interpretation of the very old idea: people expect similar results from similar causes.1 Specifically, we suppose that people expect similar payoffs from similar strategies.2 The strategies that are similar to the chosen one are those that are close to it.3 The same is true for the payoffs that are considered similar. We propose a specific functional form that formalises the idea that nearby strategies give nearby payoffs. The model of choice in which we incorporate the idea that nearby strategies give nearby payoffs is the payoff assessment model (Sarin and Vahid, 1999). The model supposes that the agent knows the strategies available to her, but knows little about the payoff function she faces. With each strategy the individual associates a number called its payoff assessment. This represents the payoff that the agent subjectively assesses she will obtain from the strategy. The agent is assumed to choose, at each time, the strategy with the highest payoff assessment. After the agent chooses a strategy she observes its payoff. The agent uses this payoff information to update her assessment of the chosen strategy by moving her assessment of the chosen strategy in the direction of the received payoff. In the payoff assessment model, the assessments of strategies other than the chosen one are not updated. We incorporate the idea that nearby strategies give nearby payoffs into the payoff assessment model by supposing that the individual also updates the assessments of strategies near the chosen strategy. The closer a strategy to the chosen one, the more it is updated. Hence, introducing similarity in our model results in the agent using the information directly relevant for one strategy also for nearby strategies. Two specific functional forms of similarity functions are considered. Each has only one parameter which concerns the number of strategies on either side of the chosen strategy that will be updated. We use the payoff assessment model with similarity to explain observed behaviour in an experiment conducted by Van Huyck, Battalio and Rankin (1996) (henceforth VBR). In the VBR experiment 8 groups, with 5 players in each, repeatedly played a coordination game for 75 periods. In the stage game, each player had 101 strategies. Each of the strategies of the players was labelled in the same way across the players. Specifically, the 101 strategies for each player were labelled 0, 1, …, 99, 100. The subjects did not know the payoff function of the game they were playing, nor did they know the strategies available to the other players. At each repetition of the game each subject observed only the payoff she obtained from the strategy chosen. The stage game had two strict Nash equilibria. The equilibria were symmetric and efficient. Hence, traditional refinements could not reduce the set of the Nash equilibria. The experimental subjects, however, managed to coordinate on the same equilibrium in each experimental session. An even more surprising finding was that the subjects managed to coordinate on the equilibrium in a remarkably short time: typically the median player had chosen the equilibrium strategy in the first 25 periods and had converged to choose only this strategy within 40 periods. The information the agents had available about their payoff environment in the VBR experiment roughly coincided with the assumptions about the information available to agents’ in reinforcement learning models. VBR's simulations of a model of reinforcement learning, due to Cross (1973), revealed that it could generate a final distribution of choices similar to those of the experimental subjects. However, it took the Cross model no less than 750 periods to exhibit such a final distribution whereas the subjects in the experiment had only 75 periods. Hence, VBR concluded that reinforcement learning models were unable to capture learning behaviour in their experiment. In contrast to the VBR analysis of the Cross learning model, we obtain striking support for the payoff assessment model with similarity. Simulations of the model suggest that the median player in the model converges to choose the same equilibrium as chosen by the experimental subjects. Furthermore, the model converges to that equilibrium in roughly the same number of repetitions as did the experimental subjects. Statistical tests could not distinguish the payoff distributions generated by the model from the observed payoff distributions in almost every period. Our findings suggest that people did use similarity among strategies to simplify the ‘large’ strategy spaces they faced in the VBR experiment. Given the success of introducing similarity among strategies in the payoff assessment model we also modified the Cross model to include such similarity. We also modified the Cross model to allow the step‐size or learning rate to decrease over time. With the latter modification the Cross dynamic becomes very much like the basic model studied in Roth and Erev (1995), even though it has more attractive short‐run properties.4 Surprisingly, neither modification improves the predictive power of the Cross model. A huge literature on similarity in decision making exists in cognitive psychology. A recent survey of some of the issues dealt with in this literature is provided by Hahn and Ramscar (2001). Researchers in machine learning have also been paying an increasing amount of attention to this subject, e.g., Kaelbling et al. (1996). The method of case based reasoning uses similarity among problems or cases encountered in the past as the building block for how agents choose in new situations, e.g., Riesbeck and Schank (1989). Gilboa and Schmeidler (1995, 1997) adapt case based reasoning to decision theory. They assume that a similarity relation between cases and/or acts is given. In Gilboa and Schmeidler (1995) an action is evaluated according to how it has performed in similar decision problems, or cases, encountered in the past. In Gilboa and Schmeidler (1997) the evaluation of an action is also allowed to depend on the performance of similar actions. LiCalzi (1995) analyses the consequences, on the convergence properties of a fictitious play like learning rule, of assuming that all 2 × 2 games are similar. Economists have also attempted to explain choice behaviour assuming similarity. Rubinstein (1988) postulated a similarity relation among actions in a decision problem under risk to explain Allais paradox type behaviour. Roth and Erev (1995) assume that adjacent strategies are similar when using a reinforcement learning model to explain behaviour in extensive form games. A few papers written after the first version of our paper was circulated also use similarity among strategies to explain behaviour observed in experiments. Camerer et al. (2002) consider experiments on bilateral call markets. They extend the basic EWA model of Camerer and Ho (1999) to allow for some private values to be more similar to others. Chen and Khoroshilov (2003) use similarity among nearby strategies in cost sharing games. They use the payoff assessment model, and like us, assume that the strategies5 and payoffs that are nearby are similar. Chen and Gazzale (2002) use the same notion of similarity among strategies6 and payoffs as in the payoff assessment model with similarity to explain behaviour in experiments on the compensation mechanism. There are two other learning processes that bear some relation to the model studied in this paper: directional learning and hill climbing. In both approaches the agent implicitly updates assessments of unchosen strategies. In directional learning (Selten and Buchta, 1999), the agent moves in the direction of the higher payoff. Information about this direction is obtained from the payoff received from the chosen action and by the agent's assumed knowledge of the payoff function. Clearly, the agent uses more information about her environment in updating the assessments of unplayed strategies than is used by the agent in the payoff assessment model with similarity. In hill climbing, the agent uses her information about two consecutive action‐payoff pairs to determine the direction in which to move. In this approach also the agent uses more information than is used in payoff assessment learning with similarity. Huck et al. (1999) provide evidence for hill climbing behaviour in Cournot oligopoly experiments. This paper is structured as follows. The next section presents the model of this paper. Section 2 summarises the VBR experiments. In Section 3 we present simulations of our model and compare the results with those obtained in the VBR experiments. Also, in Section 3, we estimate the parameters of our model which lead to the best fit over the entire path of play in the experimental sessions. Section 4 considers variants of the Cross model and compares their predictions with the experimental data. Section 5 concludes by discussing some of the limitations of our study. 1. The Model Let I denote the set of players in the game, and let S i denote the pure strategy set of player i  ∈  I. We shall assume that there are a finite number m of players. For notational simplicity suppose that each player has a finite number J of strategies. By s we shall denote the strategy profile (s1,…, sm) actually played by the m players. Denote by the subjective assessment of player i for her strategy j, , at repetition n, and let ui(n) denote the vector of her subjective payoff assessments of the payoff for each of her strategies. Hence, at time n, the game player i subjectively assesses that she is playing is given by Γi(n) =[Si, ui(n)]. At each time n, each player i chooses the strategy that has the highest assessment. Let denote the strategy that player i assesses to have the highest payoff at time n. After each player i at time n chooses , she obtains a payoff . After receiving the payoff each player updates her assessment of the chosen strategy. We suppose that the individual takes a weighted average of her assessment and the payoff she receives to form her updated assessment. Specifically, if at time n player i chooses strategy and receives a payoff , then (1) In the payoff assessment model the agent does not update the assessment of unchosen strategies. To specify how the assessment of unplayed strategies may be updated consider the ‘similarity’ function f i : S i × S i → ℜ which maps the similarity between any two strategies to a real number. A value of f i equal to 1 indicating that player i sees the two strategies as identical, and a value of 0 indicating that she sees no similarity between them at all. We may now write the manner in which the agent updates her assessments. If at time n player i chooses strategy and receives a payoff of , then: (2) Note that (2) reduces to the payoff assessment model if there is no similarity among strategies, i.e. for all , and because a strategy is identical to itself, i.e. for all . The model implies that the assessment of a played action is moved in the direction of the obtained payoff by an amount λ. The assessments of unplayed actions also are updated. The payoff received from the played action is used to update the assessments of unplayed actions by an amount λf i. Observe that we assume that the similarity function is constant over time. This may appear to be a restrictive assumption if the strategy space is small, the environment is nonstationary and the decision maker knows that she will be facing the same decision problem sufficiently often to experience each of the available strategies many times. However, for an agent lacking any specific knowledge about the stationarity of the environment and facing a large set of strategies, it is intuitive to consider, for example, that the strategy labelled 19 stays close to strategy labelled 20 no matter how non‐stationary the environment is over time. In such situations it does not appear overly restrictive to suppose that the similarity function is constant over time. Such an assumption on the similarity function also greatly simplifies the analysis and estimation. Next we specify the similarity functions that we consider in this paper. Both assume that the strategies are ordered and can be represented by numbers. In both, player i conjectures that all strategies within a symmetric ‘window’ around the played strategy to be similar to , and that nearer strategies are more similar. The size of the window, which specifies the number of strategies on either side of the chosen strategy which are updated, is the single parameter that describes these similarity functions. The first similarity function we consider, which we call the Bartlett similarity function,7 supposes that the degree of similarity decreases linearly as the distance between the played strategy and others increases. Hence, the closeness between strategies 7 and 8 is like that between 8 and 9. The Bartlett similarity function is given by: The parameter h determines the h − 1 unplayed strategies on either side of the played strategy whose assessments are updated. In the second similarity function we consider, which we call the Parzen similarity function8 similarity decreases nonlinearly with nearby strategies being closer per unit than further ones.9 The Parzen similarity function is given by: The parameter h plays a similar role here. The shapes of the Bartlett and Parzen similarity functions for these values of h  ∈  {0, 6, 12} are given in Figure 1. Fig. 1. Open in new tabDownload slide Bartlett and Parzen Similarity Functions Fig. 1. Open in new tabDownload slide Bartlett and Parzen Similarity Functions Using either similarity function would lead the player to move the assessments of all strategies within the window in the direction of the received payoff, with the assessments of strategies closer to the chosen strategy being updated more. Hence, both similarity functions embody the idea that nearby strategies give nearby payoffs. In environments in which players know little about the payoff environment other than the ordering of the strategies, and observe little at each stage, having a similarity relation between strategies may allow the agent to utilise the limited information actually obtained more ‘efficiently’. In particular, if the actual payoff environment the players’ face is characterised by nearby strategies yielding nearby payoffs then using a similarity relation among strategies that has this feature allows information regarding payoffs from played strategies to be used in updating the assessments of unplayed strategies. This helps move the assessments of unchosen strategies in the correct direction. Of course, using such a similarity relation may also mislead the players if the payoff environment does not give nearby payoffs to nearby strategies. 2. The VBR Experiments10 Eight groups, with five individuals in each, repeatedly played the same (coordination) game for 75 periods. Each individual i had the same set of 101 pure strategies Si = {0, 1 ,…, 100}. In each period, each individual chose one strategy and received a payoff. Let si denote the pure strategy chosen by player i, and s−i denote the strategies chosen by players other than player i. The payoff function for each player i was given by where ei = 1 − si/100, e = (e1, …, e5), M(e) is the median when the strategy profile chosen by the 5 players is e, and ω is a constant (which took the value approximately equal to 2.44 for some groups and 3.85 for other groups). Players were not informed about the payoff function, or the strategies available to the other players. The game has two strict Nash equilibria which are symmetric and efficient. The equilibrium strategy of a player depends not only on the strategy the player and her opponents choose, but also on ω. We refer to the game with ω equal to approximately 2.44 as G (2.44) and the game with ω approximately equal to 3.85 as G (3.85). The strict equilibria of G (2.44) involved each player choosing either (41) or (100). For G (3.85) the strict equilibria were (26) and (100). Each of the equilibria involves the player choosing her strategy so that ei = ωM(e)[1 − M(e)], which leads to a payoff of 50 cents. Every player in the game had information on (a) the number of repetitions, (b) the payoff each player would obtain would depend on the choice of that player and that of four other people in the group, (c) the payoff function was deterministic (so that if all players in the group took the same action as previously then they would receive the same payoff as previously). The experimental results revealed that: For both values of ω statistical tests could not reject the null hypothesis that initial play was uniform. In all eight groups (of 5 players in each group), the median player converged to the interior equilibrium. Choices of the median player in each of the groups are illustrated in Figures 2a and 2b. The four groups who played G (2.44) first chose the interior equilibrium (41) between periods 10 and 18 and got ‘absorbed’ between periods 18 and 25. The four groups who played G (3.85) first chose the interior equilibrium (26) between periods 10 and 32 and got ‘absorbed’ between periods 14 and 55. In the final period of the stage game most players were playing the interior equilibrium strategy. Among the players who were not, they were playing strategies that earned them within 10 cents of what the equilibrium strategy paid. The configuration of the end period payoffs of the 20 subjects who played each game is given in the table below: Fig. 2. Open in new tabDownload slide Observed Median Choices of the Four Cohorts (a) playing G(2.44) (the interior equilibrium of G(2.44) is 0.59), (b) playing G(3.85) (the interior equilibrium of G(3.85) is 0.74) Fig. 2. Open in new tabDownload slide Observed Median Choices of the Four Cohorts (a) playing G(2.44) (the interior equilibrium of G(2.44) is 0.59), (b) playing G(3.85) (the interior equilibrium of G(3.85) is 0.74) G (2.44) . G (3.85) . Payoff . Frequency . Payoff . Frequency . 41c 1 40c 1 46c 1 47c 3 49c 1 48c 2 50c 17 50c 14 G (2.44) . G (3.85) . Payoff . Frequency . Payoff . Frequency . 41c 1 40c 1 46c 1 47c 3 49c 1 48c 2 50c 17 50c 14 G (2.44) . G (3.85) . Payoff . Frequency . Payoff . Frequency . 41c 1 40c 1 46c 1 47c 3 49c 1 48c 2 50c 17 50c 14 G (2.44) . G (3.85) . Payoff . Frequency . Payoff . Frequency . 41c 1 40c 1 46c 1 47c 3 49c 1 48c 2 50c 17 50c 14 There were two remarkable features of the experimental results. First, that in each group (there were 8 in all, with 4 playing each value of ω) play of the median player always converged exactly to the interior equilibrium. This was remarkable for two reasons. Firstly, each of the equilibria was strict, symmetric and efficient and hence it was difficult to explain why the interior equilibrium proved to be ‘focal’. Secondly, the mechanism that resulted in the median player choosing exactly the interior equilibria in each group must have been adequate ‘experimentation’ given that nobody knew the payoff function. However, the quick convergence to the equilibria suggests that not much ‘experimentation’ was engaged in. The second feature, which was even more remarkable than the first, was that the convergence to the interior equilibrium was extremely rapid. In particular, the median subject finally got absorbed in the interior equilibrium well before any subject could have sampled, even once, more than 50% of the available strategies. Even trying a strategy once could not have been particularly informative as the number of payoff relevant states for each strategy of each player was huge.11 Each player's lack of knowledge of the true payoff function made the choice problem each faced even more daunting. It was this rapid convergence that led us to suspect that the labelling of the strategies, which was common across the players, played a role in the players being able to coordinate so quickly (on the interior equilibrium). 3. Simulation and Estimation In this Section we report the results of the simulations we performed with the payoff assessment model with similarity. Then, we estimate the parameters which best fit the data. We use these parameter estimates to simulate the model and compare the distribution of payoffs it generates with those obtained in the experiment. 3.1. Simulations In this subsection we investigate, by way of simulations, whether the model specified in Section 2 produces median player dynamics similar to the observed median dynamics in the VBR experiment (see Figure 2). To simulate the model we need to specify values of the initial payoff assessments of the players, values of the update parameter λ in (2), and the similarity functions. Given this information, and the knowledge of the payoff function used in the VBR experiment, we can simulate the model. We considered the Bartlett and Parzen similarity functions for window sizes h = 0, 6, 12. We performed simulations for four values of the update parameter, specifically λ = {0.25, 0.50, 0.75, 1.00}. For the initial assessments, we consider six possible cases where five are drawn randomly from different distributions and one where the assessments lie on a random smooth polynomial. In particular, we looked at initial assessments that were (i) uniformly distributed on [−1.00, 1.00]; (ii) normally distributed with mean 0.25 and standard deviation 0.25; (iii) distributed according to the symmetric triangular distribution on [−0.50, 1.00]; (iv) uniformly distributed on [0.50, 1.00]; (v) uniformly distributed on [0.00, 0.50]; (vi) lie on a smooth polynomial. We looked at (i), (ii) and (iii) because they seemed reasonable, given both the reputation of the Economics Laboratory at Texas A&M University, and the hint in the instructions provided to the students that payoffs might be negative. We looked at (iv) because it represented ‘optimistic’ assessments, and such assessments played a special role in Sarin and Vahid (1999). We looked at (v) because it represented the case where the range of the assessments was roughly the same as the range of the actual payoffs in the experiment. Lastly, in (vi) we considered initial assessments that embodied the idea that similar strategies also have similar initial assessments. This was implemented by choosing assessments of 6 strategies 0, 20, 40, 60, 80, 100 independently from a uniform distribution on [0, 1], and then finding the polynomial of degree 5 which passed through these points. The assessments of the 101 strategies were then read off this polynomial. An example of such a configuration of initial assessments is given in the following Figure 3. Fig. 3. Open in new tabDownload slide Smooth Initial Assessments Fig. 3. Open in new tabDownload slide Smooth Initial Assessments It should be noted that the initial assessments play an important role in the payoff assessment model (with and without similarity). If initial assessments are too low, then players feel satisfied with all possible payoffs and there is a tendency for strategies to lock into their initial choices. Contrariwise, if initial assessments are too high, then agents tend to keep ‘experimenting’ for a long time as they feel dissatisfied with all the payoffs they receive. The chosen supports for the initial assessments seemed suitable given what we thought the subjects might expect to be paid over the course of the experiment, and as we will see, they worked quite well. To summarise, the design space of the simulations is the cross product of six distributions of initial assessments, five similarity functions (both Bartlett and Parzen functions with zero window width degenerate to the same case in which there is no similarity among the actions), and four update parameters for each of the two games G (2.44) and G (3.85). To ensure that any particular draw of the random initial assessments is not given too much weight in our conclusions, we perform 100 simulations for each case. We do not report the detailed results from our simulations. These can be obtained from the earlier version of our paper which is available at ‘http://www.buseco.monash.edu.au/Depts/EBS/Pubs/WPapers/2001/wp8‐01.pdf’. The details of the simulations are also available at this Journal’s web site. Our simulations suggest: The updating parameter must be large (λ > 0.50) in order for the simulated median player to learn as quickly as in the observed data. At first glance, this seems at odds with previous estimates of the updating parameter in payoff assessment models fitted to experimental data; see, for example, Sarin and Vahid (2001). However, previous studies concern choice environments in which each player has very few (mostly two) strategies at each stage. Considering that in the present experiment, subjects have 101 strategies at each stage and they know that they only have 75 opportunities to make a decision, it is quite intuitive that they would place a large weight on observed payoffs when updating their assessments. We conjecture that if adaptive models are to be fit to a cross section of repeated games with vastly different number of strategies, the updating parameter will have to be a function of the number of strategies that the decision maker faces. Similarity matters. Allowing for similarity brings the distribution of the choice of the median player closer to the observed median at every stage. For λ > 0.50 and h ≥ 6, the model produces median behaviour that closely resembles that of the observed data. In particular, the mode of the distribution of the median choices converges to the interior equilibrium as fast as the observed median choice does. This fast convergence to the interior equilibrium is a property that other models do not seem to be capable of matching.12 The above results are robust. We observe the above in both games, using either of the two similarity functions (Bartlett or Parzen), and for each of the distributions of initial assessments we considered. We infer from these results that, although data supports adaptive behavior and updating of assessments of similar strategies, it cannot provide sharp point estimates for λ and h. In other words, any objective function that depends on the distance between the predictions of the model and the observed data is likely to be flat on a large subset of the parameter space which does not include λ = 0 and h = 0. This is a nice result because it implies that the assumptions of the same updating parameter and the same similarity function with the same window width across all agents are not essential and the model can allow for some heterogeneity among agents. Furthermore, the invariance of the results to reasonable variations in the distribution of initial assessments is fortunate since the data reveals very little information about the initial assessments, especially for the assessments of the strategies that are never played. Hence, we assume that the initial assessments for each strategy and each player were drawn independently from a uniform distribution over [−1, 1], and we fix the similarity function to be the Parzen similarity function, and then estimate λ and h. 3.2. Estimation Since h is restricted to positive integers smaller than 50 and λ is between 0 and 1, we estimate these parameters by a grid search. We find the optimal λ for each h, and then choose the (λ, h) pair that optimises the objective function. It should be noted that even though, for a particular value of h, there is only one unknown parameter, the maximum likelihood estimation of this parameter is not feasible. Since at every stage, there are 101 possible choices, and the game is repeated 75 times, the likelihood of an observed sequence of choices implied by our structural model involves evaluation of many‐fold integrals. This makes the evaluation of the likelihood computationally infeasible even by simulation methods. Hence, we estimate λ by matching the moments of the simulated and real data.13 For a given h, the parameter λ is chosen to minimise the weighted distance between the first and second sample moments of the observed payoffs and the first and second moments of the payoffs from the theoretical model at each round. This amounts to 75 first moments and 75 second moments, a total of 150 moment conditions.14 We match moments of payoffs rather than choices because payoffs reflect information about the median choices as well as individual choices. The first moment of the payoffs implied by the model is approximated by averaging 2,500 simulated payoff paths, and the average of the observed payoff paths are taken as an estimate of the expected path of the actual payoffs. Theoretical and observed second moments are calculated analogously using squared payoffs. Our estimates of (λ, h) minimise: The superscripts (ω1) and (ω2) indicate the two different games that the subjects were playing, y is the 75 × 1 vector of sample first moments of actual payoffs, i.e. the vector of for t = 1, …, 75, and is the 75 × 1 vector of average simulated payoffs for a particular (h, λ) pair, i.e., the vector of for t = 1, …, 75. Similarly, z and are vectors of and for t = 1, …, 75, respectively. The W matrices are 150 × 150 symmetric positive definite matrices that weight these moments appropriately before combining them. The choice of W is explained below. It is not necessary to use both first and second moments. One could only use the first moments, or even only a subset of the first moments. There is no general econometric result on the superiority of the GMM estimator that uses more moment conditions when samples are small and optimal weight matrices W are estimated.15 Any W that converges in probability to a positive definite matrix leads to consistent estimators for h and λ, but these estimators can have very large standard errors if W is not chosen wisely. For example, even though letting W be an identity matrix simplifies the problem, it is not a good choice because yt and zt have different scales and can be highly correlated.16 We use an estimate of the ‘optimal’ weighting matrix based on first stage estimates of (h, λ). In the first stage, we used the first moments only with identity weight matrices and obtained and . Based on these estimates, we obtained estimates of the optimal W(ω1) and W(ω2), and used these weight matrices and both first and second moments in the final stage of estimation. Technical details about the optimal weighting matrices are provided in the Appendix. Figure 4 shows the value of the objective function at {0.00, 0.01, …, 1.00} ×{1, 2, …, 20}. Consistent with our simulations, it can be seen from this graph that the objective function improves as h increases for almost all λ and that the objective function is almost flat on h ≥ 10 and 0.5 ≤ λ ≤ 1.0. It is evident that the objective function flattens out as h becomes large. The almost flatness of the objective function somewhat assures us that the assumption of identical updating parameters and similarity weights across the population is not too costly, in the sense that the simplified model is almost observationally equivalent to a model where the subjects have different h and λ, as long as for all subjects h ≥ 10 and 0.5 ≤ λ ≤ 1.0. The minimum is achieved when h = 17 and λ = 0.70. It is interesting that these final estimates are very close to the initial estimates which used the first moments only. It tells us that the behavioural model calibrated to fit the first moments of the data only, provides a good fit for the second moments as well. The histogram of the median choice in simulations of the estimated model for G(2.44) is shown in Figure 5. The mode of this histogram is the interior equilibrium, and there is very little dispersion around this mode. The same is true for G(3.85), so the plot of the histogram for G(3.85) is not provided to save space. Fig. 5. Open in new tabDownload slide Histogram of the Median Choices in Period 75 in 500 Simulations of G(2.44) Played by Agents with Uniform [−1,1] Initial Assessments with Parzen Similarity, h = 17 and λ = 0.71 Fig. 5. Open in new tabDownload slide Histogram of the Median Choices in Period 75 in 500 Simulations of G(2.44) Played by Agents with Uniform [−1,1] Initial Assessments with Parzen Similarity, h = 17 and λ = 0.71 Fig. 4. Open in new tabDownload slide The Objective Function Fig. 4. Open in new tabDownload slide The Objective Function Under the assumption of correct specification, the simulated method of moment estimator will be consistent as long as the number of simulations is always a multiple of the sample size (i.e. it grows to infinity at least at the same rate). However, strong dependence and non‐stationarity of the distribution of observations over time, makes the development of the classical tests of model adequacy very difficult. Hence, we follow VBR and use Kolmogorov‐Smirnov tests (Berry and Lindgren, 1996, pp. 534–5) of whether the observed and simulated data could have come from the same (unconditional) distributions. While the rejection of such a test at the terminal stage of the game, a stage in which most subjects have converged to repeating the same strategy, is sufficiently strong for the dismissal of the theoretical model, not rejecting the test at just one stage is only weak evidence for it. We therefore perform this test for every stage of the game.17 Table 1 reports the p‐values of the Kolmogorov‐Smirnov test of the null hypothesis that the observed and the simulated strategy choices come from the same distribution for every stage of the game. Overall the null of equality of CDF's of the observed and simulated individual choices is rejected for 4 out of 75 periods in G(3.85) and 2 out of 75 for the G(2.44) at 5% level of significance.18 We take this as strong evidence that our simple model of choice, and the postulated rule for updating the assessment of payoffs of similar strategies, can describe the behaviour of decision makers who have very limited knowledge.19 Table 1 Smirnov Test Statistics of the Equality of the Observed and the Simulated Empirical Distributions of the Agents’ Choices for the Payoff Assessment Model with Parzen Similarity Period . G(2.44) . G(3.85) . Period . G(2.44) . G(3.85) . Period . G(2.44) . G(3.85) . 1 0.17 0.19 26 0.22 0.17 51 0.29 0.25 2 0.11 0.24 27 0.27 0.17 52 0.24 0.19 3 0.14 0.22 28 0.27 0.13 53 0.27 0.24 4 0.20 0.17 29 0.21 0.13 54 0.22 0.18 5 0.34 0.23 30 0.32 0.13 55 0.21 0.24 6 0.30 0.35 31 0.30 0.23 56 0.26 0.29 7 0.28 0.26 32 0.32 0.14 57 0.25 0.29 8 0.17 0.16 33 0.26 0.14 58 0.24 0.29 9 0.25 0.32 34 0.30 0.13 59 0.21 0.29 10 0.17 0.32 35 0.30 0.14 60 0.19 0.20 11 0.30 0.27 36 0.30 0.14 61 0.23 0.29 12 0.19 0.11 37 0.28 0.13 62 0.23 0.29 13 0.16 0.17 38 0.31 0.14 63 0.22 0.29 14 0.17 0.13 39 0.31 0.19 64 0.22 0.29 15 0.25 0.17 40 0.28 0.19 65 0.22 0.29 16 0.24 0.11 41 0.28 0.19 66 0.22 0.29 17 0.19 0.10 42 0.28 0.19 67 0.16 0.34 18 0.19 0.12 43 0.30 0.20 68 0.21 0.29 19 0.24 0.13 44 0.30 0.25 69 0.21 0.29 20 0.24 0.14 45 0.30 0.23 70 0.21 0.29 21 0.20 0.19 46 0.31 0.15 71 0.21 0.29 22 0.15 0.25 47 0.35 0.23 72 0.21 0.29 23 0.16 0.15 48 0.28 0.20 73 0.20 0.29 24 0.23 0.15 49 0.26 0.25 74 0.21 0.24 25 0.22 0.12 50 0.29 0.25 75 0.21 0.29 Period . G(2.44) . G(3.85) . Period . G(2.44) . G(3.85) . Period . G(2.44) . G(3.85) . 1 0.17 0.19 26 0.22 0.17 51 0.29 0.25 2 0.11 0.24 27 0.27 0.17 52 0.24 0.19 3 0.14 0.22 28 0.27 0.13 53 0.27 0.24 4 0.20 0.17 29 0.21 0.13 54 0.22 0.18 5 0.34 0.23 30 0.32 0.13 55 0.21 0.24 6 0.30 0.35 31 0.30 0.23 56 0.26 0.29 7 0.28 0.26 32 0.32 0.14 57 0.25 0.29 8 0.17 0.16 33 0.26 0.14 58 0.24 0.29 9 0.25 0.32 34 0.30 0.13 59 0.21 0.29 10 0.17 0.32 35 0.30 0.14 60 0.19 0.20 11 0.30 0.27 36 0.30 0.14 61 0.23 0.29 12 0.19 0.11 37 0.28 0.13 62 0.23 0.29 13 0.16 0.17 38 0.31 0.14 63 0.22 0.29 14 0.17 0.13 39 0.31 0.19 64 0.22 0.29 15 0.25 0.17 40 0.28 0.19 65 0.22 0.29 16 0.24 0.11 41 0.28 0.19 66 0.22 0.29 17 0.19 0.10 42 0.28 0.19 67 0.16 0.34 18 0.19 0.12 43 0.30 0.20 68 0.21 0.29 19 0.24 0.13 44 0.30 0.25 69 0.21 0.29 20 0.24 0.14 45 0.30 0.23 70 0.21 0.29 21 0.20 0.19 46 0.31 0.15 71 0.21 0.29 22 0.15 0.25 47 0.35 0.23 72 0.21 0.29 23 0.16 0.15 48 0.28 0.20 73 0.20 0.29 24 0.23 0.15 49 0.26 0.25 74 0.21 0.24 25 0.22 0.12 50 0.29 0.25 75 0.21 0.29 Note: The Smirnov test statistic, better known as the two‐way Kolmogorov‐Smirnov statistic (Berry and Lindgren, 1996, p.535), is based on the maximum vertical distance between the empirical distributions of the observed and the simulated choices. That is, maxx|Fm(x) − Gn(x)|, where m and n are the number of simulated and observed choices and Fm and Gn are the empirical cumulative distribution functions of the simulated and observed choices, respectively. The 5% and 1% critical values for this test are 0.31 and 0.37 respectively. Open in new tab Table 1 Smirnov Test Statistics of the Equality of the Observed and the Simulated Empirical Distributions of the Agents’ Choices for the Payoff Assessment Model with Parzen Similarity Period . G(2.44) . G(3.85) . Period . G(2.44) . G(3.85) . Period . G(2.44) . G(3.85) . 1 0.17 0.19 26 0.22 0.17 51 0.29 0.25 2 0.11 0.24 27 0.27 0.17 52 0.24 0.19 3 0.14 0.22 28 0.27 0.13 53 0.27 0.24 4 0.20 0.17 29 0.21 0.13 54 0.22 0.18 5 0.34 0.23 30 0.32 0.13 55 0.21 0.24 6 0.30 0.35 31 0.30 0.23 56 0.26 0.29 7 0.28 0.26 32 0.32 0.14 57 0.25 0.29 8 0.17 0.16 33 0.26 0.14 58 0.24 0.29 9 0.25 0.32 34 0.30 0.13 59 0.21 0.29 10 0.17 0.32 35 0.30 0.14 60 0.19 0.20 11 0.30 0.27 36 0.30 0.14 61 0.23 0.29 12 0.19 0.11 37 0.28 0.13 62 0.23 0.29 13 0.16 0.17 38 0.31 0.14 63 0.22 0.29 14 0.17 0.13 39 0.31 0.19 64 0.22 0.29 15 0.25 0.17 40 0.28 0.19 65 0.22 0.29 16 0.24 0.11 41 0.28 0.19 66 0.22 0.29 17 0.19 0.10 42 0.28 0.19 67 0.16 0.34 18 0.19 0.12 43 0.30 0.20 68 0.21 0.29 19 0.24 0.13 44 0.30 0.25 69 0.21 0.29 20 0.24 0.14 45 0.30 0.23 70 0.21 0.29 21 0.20 0.19 46 0.31 0.15 71 0.21 0.29 22 0.15 0.25 47 0.35 0.23 72 0.21 0.29 23 0.16 0.15 48 0.28 0.20 73 0.20 0.29 24 0.23 0.15 49 0.26 0.25 74 0.21 0.24 25 0.22 0.12 50 0.29 0.25 75 0.21 0.29 Period . G(2.44) . G(3.85) . Period . G(2.44) . G(3.85) . Period . G(2.44) . G(3.85) . 1 0.17 0.19 26 0.22 0.17 51 0.29 0.25 2 0.11 0.24 27 0.27 0.17 52 0.24 0.19 3 0.14 0.22 28 0.27 0.13 53 0.27 0.24 4 0.20 0.17 29 0.21 0.13 54 0.22 0.18 5 0.34 0.23 30 0.32 0.13 55 0.21 0.24 6 0.30 0.35 31 0.30 0.23 56 0.26 0.29 7 0.28 0.26 32 0.32 0.14 57 0.25 0.29 8 0.17 0.16 33 0.26 0.14 58 0.24 0.29 9 0.25 0.32 34 0.30 0.13 59 0.21 0.29 10 0.17 0.32 35 0.30 0.14 60 0.19 0.20 11 0.30 0.27 36 0.30 0.14 61 0.23 0.29 12 0.19 0.11 37 0.28 0.13 62 0.23 0.29 13 0.16 0.17 38 0.31 0.14 63 0.22 0.29 14 0.17 0.13 39 0.31 0.19 64 0.22 0.29 15 0.25 0.17 40 0.28 0.19 65 0.22 0.29 16 0.24 0.11 41 0.28 0.19 66 0.22 0.29 17 0.19 0.10 42 0.28 0.19 67 0.16 0.34 18 0.19 0.12 43 0.30 0.20 68 0.21 0.29 19 0.24 0.13 44 0.30 0.25 69 0.21 0.29 20 0.24 0.14 45 0.30 0.23 70 0.21 0.29 21 0.20 0.19 46 0.31 0.15 71 0.21 0.29 22 0.15 0.25 47 0.35 0.23 72 0.21 0.29 23 0.16 0.15 48 0.28 0.20 73 0.20 0.29 24 0.23 0.15 49 0.26 0.25 74 0.21 0.24 25 0.22 0.12 50 0.29 0.25 75 0.21 0.29 Note: The Smirnov test statistic, better known as the two‐way Kolmogorov‐Smirnov statistic (Berry and Lindgren, 1996, p.535), is based on the maximum vertical distance between the empirical distributions of the observed and the simulated choices. That is, maxx|Fm(x) − Gn(x)|, where m and n are the number of simulated and observed choices and Fm and Gn are the empirical cumulative distribution functions of the simulated and observed choices, respectively. The 5% and 1% critical values for this test are 0.31 and 0.37 respectively. Open in new tab 4. The Cross Model and Similarity In the previous section we saw that allowing for similarity gave us very favourable results. A natural question to ask is whether incorporating considerations of similarity among the strategies in the Cross model would improve its convergence properties. In this section we modify the Cross model to allow the agent to take into account the similarity among strategies. Another modification of the Cross model we consider allows for a declining step‐size or learning rate over time. The Cross model assumes that an agent has a finite set S = {s1, …, sJ } of strategies. The agent is assumed to know S but not the payoff function she faces. At time n she chooses among her strategies according a J × 1 probability vector p(n). Upon choosing a strategy she receives a payoff. She uses this payoff to update the vector p. If at time n she chooses strategy sj and receives a payoff of πj then she updates her probability vector as follows: (3) where α ∈ (0, 1] is the ‘step size’ or learning rate parameter and ej is a J × 1 unit vector with a one in the j‐th position and zeroes elsewhere. For the Cross learning rule to be well‐defined we require ‘effective payoffs’απj to be normalised to lie between zero and one. VBR performed an extensive set of simulations with the Cross model. They assumed the initial mixed strategy vector p(0) to be uniformly distributed over the 101 strategies available to each player. Simulations were performed for the two payoff functions (G(2.44), and G(3.85)) and for different values of α. For α ≥ 0.1 they observed that the Cross model did not necessarily converge to the interior equilibrium. For α ≤ 0.05 they observed that the Cross model did usually converge to the interior equilibrium. This convergence, however, took a very large number of repetitions. For α = 0.05, they observed that the Cross model converged in about 750 repetitions. This was very different from the experimental results in which subjects converged in less than 50 repetitions. To answer whether introducing similarity relations into the Cross model would help in explaining the data we performed a series of simulations with a modified version of the Cross model in which the agent shifts probability mass to all similar strategies. As in the previous Section, we suppose that similar strategies are those that lie within a certain window around the played strategy. This is achieved by replacing the unit vector ej with a vector whose components sum to one, and whose components decrease in value as they get further from the played strategy, and which has zeroes elsewhere. Specifically, ej in (3) is replaced by the J × 1 vector dj which has in its k‐th row the term f(sj, sk)/∑l f(sj, sl). The numerator is the similarity function defined in Section 2, and the denominator is required because probabilities need to sum to one. Formally, the Cross model with similarity is given by, (4) where dj is defined above. As before, we considered different window lengths and allowed for different values of α. Our simulations revealed that the convergence property of the Cross model was destroyed altogether. That is, the Cross model with similarity did not converge to the interior equilibrium. The top row of Figure 6 shows the histograms of the median choices in periods 75 and 750 in simulations of the Cross model with α = 0.05 and no similarity (h = 0) in G(2.44), and the second row shows the same objects for the modified Cross model with Parzen similarity with h = 12. As is readily seen, supposing that players update the assessments of nearby strategies in a similar way worsens the convergence properties of the Cross dynamics. The same is true in G(3.85).20 The intuition for this result is that because the agent updates similar but unplayed strategies the probability of these strategies does not decline as is assumed in the Cross model. This prevents convergence of the median choice to the interior equilibrium, even when learning is slow. Fig. 6. Open in new tabDownload slide Histograms of the Median Choices in Period 75 and 750 in 100 Simulations of G(2.44) Played by Cross Reinforcement Learners with and without Similarity Fig. 6. Open in new tabDownload slide Histograms of the Median Choices in Period 75 and 750 in 100 Simulations of G(2.44) Played by Cross Reinforcement Learners with and without Similarity We also consider the case in which the learning rate parameter (the ‘step‐size’) of the Cross model declines over time. Specifically, we consider the case where the step size decreases at rate 1/n. We hoped that such a modification of the Cross model might improve its convergence properties, with or without incorporating similarity among the strategies. Our simulations, reported in the bottom two rows21 of Figure 6 reveal that this is not the case. An inspection of these figures shows that the convergence properties of the model deteriorate with a declining step‐size, and that this is true with or without the consideration of similarity among the nearby strategies. Our conclusion in this Section is that modifying the Cross model to allow for either similarity among strategies or a decreasing learning rate parameter does not improve the predictive power of the Cross model. Whereas the Cross learning rule converges to the equilibrium selected by the subjects in the VBR experiments, even while it takes a much longer time to do so, modifying the Cross dynamic in the above mentioned ways makes it not converge to any equilibrium at all. The comparison of the histograms in Figure 6 with those in Figure 5, and remembering that in the VBR experiment median choice always converged to the interior equilibrium (Figure 2), reveals that the payoff assessment model with similarity is much better suited than the Cross model for modelling learning behaviour in large coordination games with very little information. 5. Discussion This paper shows that the payoff assessment model with similarity among strategies does very well in predicting the behaviour in the VBR minimal information coordination games. The fact that, with the estimated values of the two parameters, it generated distributions of choices that could not be distinguished from those obtained in the experiment in almost every period was a very strong confirmation of the model. We now turn to discuss some of the limitations of the present study. The specific contribution of this paper is to introduce a similarity relation among strategies. This similarity relation suggested that nearby strategies lead to nearby payoffs. We believe the performance of the payoff assessment model with similarity was so good because the payoff function players in the VBR experiment faced had such a relation between strategies and their payoffs, even though the players did not know this. It is not hard to think of examples in which nearby strategies give very different payoffs. In such situations we do not expect that the similarity functions we proposed in this paper to perform so well in predicting the data.22 It has been conjectured by one of our referees that the model of directional learning or that of hill climbing23 would perform equally well with the VBR experiment. Directional learning requires that the agent have more information about the payoff function than was available to the subjects in the VBR experiment. Hence, directional learning is not a well defined learning procedure with the VBR data. Hill climbing can, however, be implemented even though it also requires more information than the payoff assessment model with similarity. We implemented a hill climbing procedure in some simulations we performed (and have not reported). Our observation was that a hill climbing procedure gets stuck in either of the two equilibria and, hence, is unable to explain the behaviour of the experimental subjects. Intuitively, the VBR payoff function has two hills of equal height and, depending on where the subjects start, a hill climbing procedure results in convergence to either of the hills.24,25 It is also worth stating that an agent is unlikely to use a similarity relation between strategies if the strategy set is small and the agent has enough time to experience all possible strategies many times. This was, for instance, the situation in the data set(s) studied by Erev and Roth (1998) and further analysed in Sarin and Vahid (2001). In many real world situations, however, the set of strategies available to the decision maker is very large and the prior information she has about her environment is very incomplete. We think it is in such situations that similarity functions will find their greatest use. Appendices Appendix: Calculation of The Optimal Weight Matrix We assume that the payoffs of players who are playing G(2.44) and those who are playing G(3.85) are independent of each other, which is a reasonable assumption. The weighting matrix in each game combines the deviation of sample means of payoffs and payoff squares from their respective theoretical means implied by the model. We refer to these deviations as ‘errors’. Although any positive definite choice for the weight matrix leads to consistent estimates of the parameters, the ‘optimal’ weight matrix, i.e. the weight matrix that leads to the asymptotically most efficient GMM estimator, is the inverse of the variance‐covariance matrix of the errors. To obtain a consistent estimate of the optimal weight matrix, we need an initial consistent estimate of the parameters. We use the identity matrix as the weight matrix and the first moments of the payoffs only in the first stage estimation. That is, the first stage estimates of h and λ minimise using the same notation as in Section 4 of the text. We then use these first stage estimates to obtain a consistent estimate of the variance covariance matrix of for each game, using the fact that the variance of the sample mean of an i.i.d. sample is equal to the variance of each observation divided by the number of observations. However, there is a small sample problem in this case. The cross section dimension in this problem is too small to allow us to estimate the weight matrices using the variation in the observed sample. Only 20 subjects played each game, so the rank of a weight matrix estimated from these observations can only be 20. Hence, we use the variation in the simulated payoffs to estimate the variance‐covariance matrix of a sample of 20 observations. There is potentially another interesting econometric problem here. If the model predicts that every subject converges to choosing a single strategy after a certain number of repetitions, the estimated variance covariance matrix of payoffs will be singular, even if the model allows different people to converge to choosing different strategies. This happens because the chosen strategies of each player, and hence her payoffs, will be perfectly correlated after a certain number of repetitions. In such a case, there is no extra information in observed payoffs after convergence.26 Fortunately, this did not happen in the payoffs simulated from our model with the first stage estimated parameters, and therefore we do not discuss it any further. Nevertheless, it is an interesting econometric problem pertinent to any learning model that implies convergence, and it is worthy of future research. Technical Appendix is available for this paper: http://www.res.org.uk/economic/ta/tahome.asp Footnotes 1 "  This idea goes at least as far back as Hume (1748) who argued: ‘From causes which appear similar we expect similar effects. This is the sum of all our experimental conclusions.’ 2 "  Such forms of similarity among strategies are likely to be present in a wide variety of circumstances. For example, firms choose among prices or quantities and are concerned with profits. Hence, both their strategy and payoff sets are ordered and it seems natural for them to expect that similar prices or quantities would lead to similar profits. 3 "  In the natural metric. 4 "  These are largely a consequence of it being linear in payoffs. See Börgers et al. (2004) for the details. 5 "  Which are the demands for output in their setting. 6 "  Which is the compensation per unit of output to ask from the (polluting) upstream factory. 7 "  This similarity function is inspired by the Bartlett window; see, e.g., Brockwell and Davis (1991). 8 "  This similarity function is inspired by the Parzen window; see, e.g., Brockwell and Davis (1991). 9 "  A referee pointed out to us that the Parzen similarity function embodies an idea from psychophysics, called the Weber‐Fechner law, that judged similarity declines nonlinearly as you move away. 10 "  These experiments were designed to be ‘minimal information’ treatments of Van Huyck et al. (1994), and were intended to be appropriate for testing reinforcement learning models; e.g. Börgers and Sarin (2000) and Erev and Roth (1998). 11 "  There were (101)4 payoff relevant states for each strategy of each player. Of course, no player knew this to be the case. 12 "  Although VBR obtained convergence to the interior equilibrium in their simulations of the Cross model of reinforcement learning for a sufficiently slow learning rate parameter, this convergence took a very large number of repetitions. This slow convergence is noted also in other work on reinforcement learning models; e.g., Possajennikov (1997), Roth and Erev (1995). It is also known to afflict models of evolutionary game theory, e.g. Kandori et al. (1993). 13 "  This estimation method is known as the method of simulated moments. See, e.g., Chapter 2 in Gourieroux and Monfort (1996). 14 "  Note that we match moments at each round, rather than the ensemble moment over the entire 75 rounds. This is because the payoff time series is not stationary. 15 "  The choice of both first and second moments was recommended to us by two referees. 16 "  For example, the correlation between a uniform [0, 1] random variable and its square is 0.97. 17 "  VBR performed the Kolmogorov‐Smirnov test only for period 75. They rejected the hypothesis of equality between the observed distribution and those obtained from simulations of the Cross model. In our case, the test does not reject the hypothesis of equality of distributions when we consider only period 75. This is why we perform the stronger test. 18 "  The test statistics for different periods are not independent, and, therefore, the overall level of significance will be different from 5%. 19 "  We also calculated the best value of λ assuming that there is no similarity (h = 0) between the strategies. We think this is a useful exercise as our simulations reveal that the model without similarity can also mimic the data rather well. We find that for the best value of λ = 0.737, the Kolmogorov‐Smirnov tests rejected equality of distributions for 52 periods in G(3.85) and for 12 periods in G(2.44). These results confirm that adding similarity in the model considerably improves its explanatory power. 20 "  Interested readers can refer to the working paper version for figures related to G(3.85). 21 "  Note that the vertical scale of the bottom two rows are different from that of the top two rows in Figure 4. 22 "  The information an agent has about her environment may itself suggest a specific similarity function. For instance, if travelling in the city of Florence, the decision maker is advised to not consider the apartment number 37 to be close to either 36 or 38 (as she might suppose from her experience in other cities). 23 "  Or, even the reinforcement learning model of Barron and Erev (2001) in which one of the ‘cognitive strategies’ considered by the decision maker is hill climbing. 24 "  Busemeyer and Myung (1989) also note that hill climbing algorithms perform well when the payoff function is ‘single‐peaked’. 25 "  It may be possible to adapt the hill climbing idea appropriately to ensure convergence to the interior equilibrium, for all initial conditions. 26 "  The lack of information content in observations after a finite time is also the reason why the time dimension cannot be exploited for consistent estimation and asymptotic inference in these models. References Barron , G. and Erev , I. ( 2001 ). ‘Feedback based decisions and their limited correspondence to description based decisions’ , mimeo , Technion. Berry , D. and Lindgren , B. ( 1996 ). Statistics: Theory and Methods , 2nd edition, Boston: Duxbury Press . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Börgers , T. , Morales , A. and Sarin , R. ( 2004 ). ‘Expedient and monotone learning rules’ , Econometrica , vol. 72 , pp. 383 – 405 . Google Scholar Crossref Search ADS WorldCat Börgers , T. and Sarin , R. ( 2000 ). ‘Naive reinforcement learning with endogenous aspirations’ , International Economic Review , vol. 41 , pp. 921 – 50 . Google Scholar Crossref Search ADS WorldCat Brockwell , P. and Davis , R. ( 1991 ). Time Series: Theory and Methods , 2nd edition, New York: Springer‐Verlag . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Busemeyer , J.R. and Myung , I.J. ( 1989 ). ‘Resource allocation decision making in an uncertain environment’ , Acta Psychologica , vol. 66 , pp. 1 – 19 . Google Scholar Crossref Search ADS WorldCat Camerer , C. and Ho , T. ( 1999 ). ‘Experience weighted attraction learning in normal form games’ , Econometrica , vol. 67 , pp. 827 – 74 . Google Scholar Crossref Search ADS WorldCat Camerer , C. , Hsia , D. and Ho , T. ( 2002 ). ‘EWA learning in bilateral call markets’, in ( R. Zwick and A. Rapoport, eds.), Experimental Business Research , New York: Kluwer Academic Publishers . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Chen , Y. and Gazzale , R.S. ( 2002 ). ‘Supermodularity and convergence: An experimental study of the compensation mechanism’ , mimeo , University of Michigan. Chen , Y. and Khoroshilov , Y. ( 2003 ). ‘Learning under limited information’ , Games and Economic Behavior , vol. 44 , pp. 1 – 25 . Google Scholar Crossref Search ADS WorldCat Cross , J. ( 1973 ). ‘A stochastic learning model of economic behavior’ , Quarterly Journal of Economics , vol. 87 , pp. 239 – 66 . Google Scholar Crossref Search ADS WorldCat Erev , I. and Roth , A. ( 1998 ). ‘Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibrium’ , American Economic Review , vol. 88 , pp. 848 – 81 . OpenURL Placeholder Text WorldCat Gilboa , I. and Schmeidler , D. ( 1995 ). ‘Case‐based decision theory’ , Quarterly Journal of Economics , vol. 110 , pp. 605 – 39 . Google Scholar Crossref Search ADS WorldCat Gilboa , I. and Schmeidler , D. ( 1997 ). ‘Act similarity in case‐based decision theory’ , Economic Theory , vol. 9 , pp. 47 – 61 . Google Scholar Crossref Search ADS WorldCat Gourieroux , C. and Monfort , A. ( 1996 ). Simulation‐Based Econometric Methods , Oxford: Oxford University Press . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Hahn , U. and Ramscar , M. ( 2001 ). Similarity and Categorization , Oxford: Oxford University Press . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Huck , S. , Norman , H. and Oechssler , J. ( 1999 ). ‘Learning in Cournot oligopoly: An experiment’ , Economic Journal, vol. 109 , pp. C80 – 95 . Hume , D. ( 1748 ). An Enquiry Concerning Human Understanding , http://eserver.org/18th/hume‐enquiry.html . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Kaelbling , L.P. , Littman , M.L. and Moore , A.W. ( 1996 ). ‘Reinforcement learning: a survey’ , Journal of Artificial Intelligence Research , vol. 4 , pp. 237 – 85 . Google Scholar Crossref Search ADS WorldCat Kandori , M. , Mailath , G. and Rob , R. ( 1993 ). ‘Learning, mutation and long run equilibria in games’ , Econometrica , vol. 61 , pp. 29 – 56 . Google Scholar Crossref Search ADS WorldCat LiCalzi , M. ( 1995 ). ‘Fictitious play by cases’ , Games and Economic Behavior , vol. 11 , pp. 64 – 89 . Google Scholar Crossref Search ADS WorldCat Possajennikov , A. ( 1997 ). ‘An analysis of a simple reinforcing dynamic: learning to play an “egalitarian” equilibrium’ , mimeo , Tilburg University. Riesbeck , C. K. and Schank , R. C. ( 1989 ). Inside Case‐Based Reasoning , Cambridge: Lawrence Earlbaum Associates . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Roth , A. and Erev , I. ( 1995 ). ‘Learning in extensive‐form games: experimental data and simple dynamic models in the intermediate term’ , Games and Economic Behavior , vol. 8 , pp. 164 – 212 . Google Scholar Crossref Search ADS WorldCat Rubinstein , A. ( 1988 ). ‘Similarity and decision‐making under risk (Is there a utility theory resolution to the Allais paradox?)’ , Journal of Economic Theory , vol. 46 , pp. 145 – 53 . Google Scholar Crossref Search ADS WorldCat Sarin , R. and Vahid , F. ( 1999 ). ‘Payoff assessment without probabilities: a simple dynamic model of choice’ , Games and Economic Behavior , vol. 28 , pp. 294 – 309 . Google Scholar Crossref Search ADS WorldCat Sarin , R. and Vahid , F. ( 2001 ). ‘Predicting how people play games: a simple dynamic model of choice’ , Games and Economic Behavior , vol. 34 , pp. 104 – 22 . Google Scholar Crossref Search ADS WorldCat Selten , R. and Buchta , J. ( 1999 ). ‘Experimental sealed first bid auctions with directly observed bid functions’, in ( D.V. Budescu, I. Erev and R. Zwick, eds.), Games and Human Behavior: Essays in Honor of Amnon Rapoport , Cambridge: Lawrence Earlbaum Associates . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Van Huyck , J. , Battalio , R. and Rankin , F. ( 1996 ). ‘Selection dynamics and adaptive behavior without much information’ , mimeo , Texas A&M University. Van Huyck , J. , Cook , J. and Battalio , R. ( 1994 ). ‘Selection dynamics, asymptotic stability, and adaptive behavior’ , Journal of Political Economy , vol. 102 , pp. 975 – 1005 . Google Scholar Crossref Search ADS WorldCat Author notes " This paper was formerly titled, ‘Payoff Assessments without Probabilities: Incorporating “Similarity” among Strategies’. We are grateful to the editor and three anonymous referees for their comments and to Ray Battalio for the data. Rajiv Sarin thanks the programme to enhance scholarly and creative activities and the Bush programme in policy research at Texas A&M University for financial support. © Royal Economic Society 2004 TI - Strategy Similarity and Coordination JO - Economic Journal DO - 10.1111/j.0013-0133.2004.0229a.x DA - 2004-07-01 UR - https://www.deepdyve.com/lp/oxford-university-press/strategy-similarity-and-coordination-PYuwGe2ms9 SP - 506 VL - 114 IS - 497 DP - DeepDyve ER -