TY - JOUR AU - Nayga, Rodolfo, M AB - Abstract In this paper, we review recent advances in experimental auctions and provide practical advice and guidelines for researchers. We focus on issues related to randomisation to treatment and causal identification of treatment effects, design issues such as selection between different elicitation formats, multiple auction groups in a single session and house money effects. We also discuss sample size and power analysis issues in relation to recent trends in experimental research about pre-registration and pre-analysis plans. We position our discussion with respect to how the agricultural economics profession could benefit from practices adapted in the experimental economics community. We then present the pros and cons of moving auction studies from the laboratory to the field and review the recent literature on behavioural factors that have been identified as important for auction outcomes. 1 Introduction In more than 10 years since the first thorough treatise of auctions as a methodological tool for value elicitation in applied economics and marketing research appeared (Lusk & Shogren, 2007), the literature on experimental auctions has been accumulating at an increasing rate. For example, in the years 2016 to 2018, about 102 papers were published on average per year on ‘experimental auctions’, representing roughly a 32 per cent increase from 2013 to 2015 (own calculations based on a Web of Science search for the terms ‘experimental’ + ‘auctions’; see also Figure 1 for a time trend). Fig. 1. Open in new tabDownload slide Time trend of number of publications on the topic of ‘experimental’ + ‘auction’. Fig. 1. Open in new tabDownload slide Time trend of number of publications on the topic of ‘experimental’ + ‘auction’. Experimental auctions have become a popular method for valuation research because they allow economists, psychologists and marketers, among others, to determine the monetary value people place on non-market goods in order to carry out cost–benefit analysis, to determine the welfare effects of technological innovation or public policy, to forecast new product success and to understand individuals’ behaviour as citizens and consumers (Lusk & Shogren, 2007). Businesses are eager to develop an understanding of the factors affecting consumers’ willingness to pay (WTP) for their products in the hope that this may lead to better product adoption and pricing decisions. One of the big advantages of auctions, often advertised by its proponents, is that (given incentive compatibility) auctions do not, in principle, suffer from the problem of stated preferences surveys because they are not hypothetical; i.e. they involve exchanging real money for real goods in an active market. Moreover, in incentive compatible experimental auctions, the price paid is separate from what the winner(s) bid, providing an incentive for bidders to truthfully reveal their preferences (Cox et al., 1982). Finally, in contrast to non-hypothetical choice experiments run with real products and money where one needs to estimate the WTP values using discrete choice models, one can directly obtain each respondent’s WTP value in auctions from the bids. However, in practice, there are a variety of factors that could potentially affect auction outcomes that deserve greater attention. While Lusk & Shogren (2007) provide a thorough discussion of various issues involving experimental auctions (e.g. training and practice, endowment or full bidding approach, learning and affiliation etc.), there have been significant recent developments on many of these issues. For example, Lusk & Shogren (2007), when discussing the pros and cons of the endowment and the full bidding approach, argue that if there are perfect field substitutes to products offered in the full bidding approach, then the bids for each of the products will be censored at the market price of the products and premiums implied by bids might differ from differences in true values.1 Because field substitutes have no effect on bids in the endowment approach, Lusk & Shogren (2007) recommend the endowment approach when outside options exist for the auction goods. Alfnes (2009), on the other hand, challenges this view and develops a theoretical model that shows that if two alternatives that are offered in an auction differ in only one or two attributes, have the same set of field substitutes and are difficult to resell, then the difference in optimal bids is equal to the difference in value. Another example is the issue of eliciting valuations over multiple rounds in auctions and posting information (i.e. price from previous round) between rounds. While Lusk & Shogren (2007) presented a balanced discussion on the issue of bid affiliation along with some empirical results indicating that posted prices have no effect on bids, their discussion did not settle the issue and instead led to an adversarial collaboration by researchers with different views on this issue. This adversarial collaboration resulted in the paper by Corrigan et al. (2012), which presented results from induced value experiments that show that posting prices between rounds creates larger deviations of bids from induced values. In addition, Corrigan et al. (2012) found that in an auction for lotteries, exposing subjects to price feedback led to more preference reversals. As a consequence, price posting between rounds has become less common in recent literature. In the rest of the paper we take a more systematic exploration of the practicalities of running an auction. Our general aim is to discuss issues that are important to consider in any experimental design as well as to discuss recent trends in the experimental auctions literature. Our literature review cannot be all-inclusive, since the volume of papers published since Lusk & Shogren’s (2007) book is voluminous (take a look at Figure 1 again). Instead, we highlight papers that we believe are important from a methodological point of view; hence, we excluded many papers that only use experimental auctions as a tool for elicitation of values. In the next section, we begin by discussing an issue that we believe is often ignored in studies that experimentally manipulate a factor in order to establish causality. The issue is randomisation to treatment, which is a concern not just in auction studies but in experimental studies more generally. In Section 3 we discuss more specific design issues that have not been given due attention. such as a discussion on elicitation formats, the practice of forming multiple auction groups in a single session, house money effects in auctions and more. In Sections 4 and 5 we discuss issues related to sample size and power, as well as the need to focus not just on p-values but also on the magnitude of the estimates. We then discuss standard ways of analysing auction data in Section 6, as well as an overview of recent advances. Then, we go over recent trends in relation to pre-registration and pre-analysis plans as a means to reduce p-hacking and increase replicability in Section 7. In Section 8, we discuss issues related to the conduct of auctions in the field vs. the laboratory. Our penultimate section (Section 9) is devoted to reviewing the literature on behavioural factors that appear to be important when considering what might affect auction outcomes. We then conclude in the final section. 2 Randomisation to treatment in auctions Many experimental auction studies are used as value elicitation vehicles to uncover consumers’ WTP for novel goods, attributes and services. As a value elicitation mechanism, these auctions do not involve an ‘experiment’ per se in the sense there is no treatment and control group. Rather, the aim is generally to construct a market that does not exist in the field. Controls are often then used in multi-variate regressions to isolate the effect of these variables from other influences with the aim of providing causal interpretation. For example, one can regress bids on gender and then provide a potentially causal interpretation of gender effects on bidding behaviour. There are many other cases, however, where auction studies are conducted with the use of experimental designs that hold all other factors constant, so that the change in the outcome of interest can be more cleanly associated with changes in the manipulated factor. This sort of randomised control trial is typically thought of as the gold standard for causal inference. For example, in the simplest of experimental designs, the single manipulated factor would be varied at two levels, e.g. providing information and not providing information. Causal interpretation of the manipulated factor (information in this particular example) is then based on the Neyman–Rubin model of causal inference due to Neyman; his original work appeared in Polish in a doctoral thesis submitted to the University of Warsaw (for excerpts reprinted in English and some commentary see Rubin, 1990; Speed, 1990; Splawa-Neyman et al., 1990 and Rubin, 1974). Auction studies that seek to estimate effect sizes from treatment and control groups often pay too little attention to the fact that causal interpretation based on the Neyman–Rubin model rests on the validity of its assumptions.2 The Neyman–Rubin model explains causality through a framework of potential outcomes: each unit has two potential outcomes, one if the unit is treated and another one if it is untreated. A causal effect is then defined as the difference between the two potential outcomes; that is, the response of the same unit under a treatment and a control condition (the counterfactual). In the social sciences, however, we cannot observe both alternative conditions for the same unit because the unit changes irreversibly once it is exposed to a treatment. Note that, in the Neyman–Rubin model, treatment effects from a within-subjects design do not have a causal interpretation and so there is a need to invoke additional assumptions (Holland, 1986; West & Thoemmes, 2010). In order to infer causality, one needs to compare the expected outcome of units that received different treatments in order to estimate the treatment effect. The key point is that by randomly assigning units to the treatments, the difference between the experimental and control units can be considered an unbiased estimate of the causal effect that the researcher is trying to isolate (Rubin, 1974). But why do we need to emphasise random assignment? Because only by random assignment will groups of units have the same expectation in the distribution of all covariates that are potentially relevant for the treatment outcome.3 There is, however, an important caveat to remember: two randomly assigned groups will be comparable only with large enough sample sizes so that the sample average of individual characteristics becomes closer to the average of the population. There is a practical implication coming out from the previous discussion given that running experimental auctions requires resources. Given a target sample size dictated by budget concerns (although there is an increasing demand for more sophisticated ways of determining sample size; see Section 4), simpler designs (e.g. two treatments), rather than complicated designs (e.g. |$2\times 2$| design), are more likely to achieve balance of characteristics that are potentially relevant for the treatment, i.e. to achieve randomisation to treatment. The reason why randomisation to treatment is important is because without it, we cannot be confident about the causal interpretation of the effect of the manipulated factor. For example, Briz et al. (2017) document a failure of randomisation to treatment in experimental auctions by utilising information from the practice auctions rounds. While practice auction rounds are normally not reported or analysed in auction studies, Briz et al. (2017) find a statistically significant treatment effect both in the practice as well as in real auction rounds. However, the treatment in their study was applied only after the practice auction rounds and hence the effect they find using the real auction data cannot be interpreted as causal but probably a result of some imbalance between the groups in some characteristics. While it is quite a popular practice to use statistical tests to detect imbalance between groups in one or more characteristics, there is increasing discussion about the appropriateness of such tests. This is because any statistical test would test for the null |$H_0: \mu _A=\mu _B$|⁠, where |$\mu _A$| and |$\mu _B$| are the population means of two treatment groups. However, the researcher is interested in evaluating balance in the sample, not in the population where the samples originate. Thus, the issue of balance does not involve inference to populations (Ho et al., 2007; Imai et al., 2008). As far as the economics literature is concerned, the pitfalls of using balance tests is discussed in Deaton & Cartwright (2016) (see also Mutz et al., 2019, for a more thorough treatise).4Deaton & Cartwright (2016) advise that instead of reporting balance tests, researchers should report the standardised difference in means (Imbens & Wooldridge, 2009; Imbens & Rubin, 2016) calculated as |$|\bar{x_1}-\bar{x_2}|/ \sqrt{(s_1^2+s_2^2)/2}$| for continuous variables and as |$|{p_1}-{p_2}|/ \sqrt{(p_1 (1-p_1) + p_2(1-p_1))/2}$| for dichotomous variables with |$\bar{x_j}$|⁠, |${p_j}$| and |$s_j^2$| (⁠|$j=1,2$|⁠) denoting the group means, prevalences and variances, respectively (Austin, 2009). The standardised difference is a scale-free measure and Cochran & Rubin’s (1973) rule of thumb establishes a threshold of 0.25, below which the effect size of the difference is expected to be small. Proper randomisation to treatment does not always ensure complete balance. However, imbalance in a baseline variable is only potentially important if that variable is related to the outcome variable (Altman, 1985). Although baseline balance is not required to make valid inferences, the general advice is that even with randomisation to treatment, observed covariates should be taken into account in the analysis (Senn, 1994, 2013).5 In our view, randomisation to treatment in auction studies is complicated by the fact that auctions generally require a group of subjects to gather in a single place at a specified time to perform an auction session. The literature has taken various approaches to form groups. In some studies, a whole session comprises a group over which an auction is performed (e.g. Lee & Fox, 2015), while in other studies, subjects in a given session are split into multiple auction groups (e.g. Drichoutis et al., 2017a). In the latter case, it is possible to apply experimental treatments on a between-subject basis within the same session. If one considers time of the day and day of the week as additional confounds, then randomisation to treatment will be harder to achieve given a certain sample size, especially when one adopts the session-group approach rather than the multiple groups-session approach. Our point is even more relevant if there is a lot of session noise, as for example in field experiments where oral instructions, different locations, time-of-day effects etc. are more likely to be an issue. Table 1. Number of Google Scholar search results citing different elicitation mechanisms Mechanism . N of search results . Second- (or 2nd) price auction 13,843 Third- (or 3rd) price auction 159 Fourth- (or 4th) price auction 88 Fifth- (or 5th) price auction 93 (Random) nth-price auction 517 BDM mechanism or BDM auction 1,710 Mechanism . N of search results . Second- (or 2nd) price auction 13,843 Third- (or 3rd) price auction 159 Fourth- (or 4th) price auction 88 Fifth- (or 5th) price auction 93 (Random) nth-price auction 517 BDM mechanism or BDM auction 1,710 Note: table shows Google Scholar cumulative search results for the mechanism name as shown in the first column of the table and possible variants of the name. Results were retrieved on June 24 2019 and were not checked as to whether the respective study actually employed the mechanism or was just citing another study that employed the mechanism. The absolute numbers are, therefore, indicative and should only be interpreted with respect to the resulting ranking. Open in new tab Table 1. Number of Google Scholar search results citing different elicitation mechanisms Mechanism . N of search results . Second- (or 2nd) price auction 13,843 Third- (or 3rd) price auction 159 Fourth- (or 4th) price auction 88 Fifth- (or 5th) price auction 93 (Random) nth-price auction 517 BDM mechanism or BDM auction 1,710 Mechanism . N of search results . Second- (or 2nd) price auction 13,843 Third- (or 3rd) price auction 159 Fourth- (or 4th) price auction 88 Fifth- (or 5th) price auction 93 (Random) nth-price auction 517 BDM mechanism or BDM auction 1,710 Note: table shows Google Scholar cumulative search results for the mechanism name as shown in the first column of the table and possible variants of the name. Results were retrieved on June 24 2019 and were not checked as to whether the respective study actually employed the mechanism or was just citing another study that employed the mechanism. The absolute numbers are, therefore, indicative and should only be interpreted with respect to the resulting ranking. Open in new tab Randomisation to treatment can also be affected even by very subtle factors–so one needs to be very careful and cognisant of confounds. As an example, consider a case where subjects are seated in the laboratory in the order in which they arrive. It could be that if randomisation is performed according to where subjects are seated (which could plausibly happen if ones uses zTree, for example, and connects computers in sequence to the server), then additional care has to be taken so that not all late arrivals (students sometimes also arrive in groups) are allocated to the same auction group and potentially to the same treatment. 3 Experimental design issues In this section, we discuss a few issues that we believe can help in establishing good practice in the field. 3.1. Elicitation formats One of the first design choices a researcher has to make is to decide on the mechanism that will be employed in the study. To get a rough sense of how popular some of these choices are, we used Google Scholar’s search engine to tabulate the number of hits for different auction mechanisms. Table 1 shows that the second-price auction (SPA) clearly is the most popular mechanism in researchers’ toolkit followed by the Becker-DeGroot-Marschak (BDM) mechanism. While the BDM mechanism is not an auction per se, it is often classified as such because it is seen as an alternative when one cannot easily put together a group of people (e.g. in field settings such as in supermarkets). So why do some researchers choose other nth price auctions (NPAs) or the random NPA over the SPA? While we cannot definitively know the answer to this question in every single case, we believe that this choice comes as a response to a popular paper (Shogren et al., 2001) that showed that SPAs fall short in revealing preferences for subjects that are off-margin of the market clearing price (i.e. their value is not close to the second highest price). In the random NPA, even off-margin bidders are not de-motivated to bid their true value because it is likely that their bid is close to the market clearing price. The disadvantage of an NPA is that it can logistically have a higher cost since the number of units sold in each auction increases proportionally with n. Moroever, in the case of a random NPA one cannot predict how many units of the good will be needed or sold in the auction. Therefore, in cases when the good is hard or costly to produce (remember that in most cases auctions are used to elicit values for products that are not yet in the market), then it is likely that researchers will shift their preference to other mechanisms. The BDM mechanism is a popular mechanism because of its ability to elicit valuations on an individual basis, i.e. you do not need a group of people. Therefore, it is normally favoured outside of the laboratory where it is harder to recruit people to participate in an experiment at the same time. This is exactly the reason why the BDM mechanism is the favoured mechanism in neuroimaging studies (see for example Linder et al., 2010; Kang & Camerer, 2013; Lehner et al., 2017; Veling et al., 2017; Tyson-Carr et al., 2018) where interaction between individuals while undertaking a brain scan is almost non-existent. Many neuroimaging studies cite Plassmann et al. (2007) as the earliest demonstration of the use of the BDM task while subjects take a brain scan although it was clearly preceded by Grether et al. (2007) and Rowe (2001). 6 Although the BDM mechanism is the second most popular elicitation mechanism in the literature (see Table 1), there are a number of issues that researchers need to know and consider when using this mechanism. To begin with, Karni & Safra (1987) showed that the BDM mechanism is not incentive compatible in valuing lotteries, even for rational agents, under non-expected utility preferences. Even though this should not be a concern for experimental auction studies that seek to elicit valuations of products, Horowitz (2006) pointed out that the BDM mechanism may not be incentive compatible even when the objects involve no uncertainty, as in the case of regular products. Although Horowitz (2006) points that this non-incentive compatibility also holds for the Vickrey auction and general NPAs, the literature has more often taken aim at the BDM, resulting in an accumulating body of research dealing with the pitfalls of using the BDM. Banerji & Gupta (2014) provided theoretical and experimental results that confirm the role of expectations in the BDM mechanism. Specifically, they varied the support (i.e. the range) of the randomly drawn bid for a chocolate and found a significant difference in valuations, a result that is in accordance with expectation-based reference points. Other relevant studies include Mazar et al. (2013) who tested the sensitivity of valuations to the underlying distribution in the BDM using travel mugs and Amazon vouchers; Urbancic (2011) using a within-subjects design and a gift certificate product redeemable for cookies; and Rosato & Tymula (2016) who used products with higher market values such as a backpack, an iPod Shuffle and a pair of noise-cancelling headphones. Cason & Plott (2014) provide evidence that the BDM mechanism is a problematic measurement tool because bids reflect mistakes rather than true preferences due to a failure of game form recognition; i.e. subjects behave as if they bid in a first-price auction. More recently, Vassilopoulos et al. (2018) argued that previous research findings that casted doubts on the incentive compatibility of the BDM mechanism were made on valid grounds. Specifically, they found that bids derived from the BDM mechanism are indeed dependent on the underlying distributions of the random competing bid (due to the expectations they generate) and on the anchoring of bids to the chosen price support. However, if the BDM mechanism is biased in all the ways described above, what is the alternative? Given that the main reason for using the BDM is that it avoids having to recruit many people to be at the same place at the same time to elicit their valuation, the next best solution could be to employ an SPA with just two subjects forming an auction group. Although this could be considered slightly more complicated than the BDM, it would still be possible to perform this two-person auction in the field. For example, one could have two interviewers who will interview subjects almost simultaneously on two sides of a survey location. Instructions could explain to subjects that they are bidding against an unknown bidder on the other side of the location. Bids can then be easily compared to each other. This way, an SPA would be performed without having to resort to the biases associated with the BDM mechanism. There is one potential caveat to our last suggestion. Subjects with altruistic incentives are likely to submit a zero bid in the hope that their bid would be chosen as the binding price and everyone else participating in the auction would get to buy the item for nothing. If such altruistic motives are prevalent, then small groups of auctions would enhance these motives. Whether this is a significant problem or not (i.e. how prevalent such a behaviour could be in reality) is a topic worthy of further investigation. More generally, people may alter their bidding behaviour if they are able to see the physical characteristics of the other bidder or if they create expectations about other bidders’ behaviour (in the same way, the BDM is influenced by anchors). Furthermore, accumulated knowledge from past experiments dealing with social dilemmas and oligopoly experiments shows a negative relationship between group size and cooperation/collusion. For example, Marwell & Schmitt (1972) and Bonacich et al. (1976) study |$n$|-person iterated prisoners’ dilemmas where they vary |$n$| and find that cooperation rates are lower in larger groups. Huck et al. (2004) investigate how the competitiveness of Cournot markets varies with the number of firms in an industry. In an experiment with two-firm markets (subjects are playing the role of firms) there is evidence for collusion, while markets with four or more firms are never collusive. Nosenzo et al. (2015) find that the negative relationship between group size and cooperation can also be observed in public goods experiments. In the context of auctions, Cox et al. (1982) conduct first-price auction and SPAs for groups of various sizes (⁠|$N=3, 4, 5,$| 6 and 9) and find that their results support the assumption of non-cooperative behaviour underlying the Nash and dominant strategy models of bidding, respectively, only for groups larger than three subjects. Taken together, given that altruistic motives and collusive/cooperative outcomes might confound preference elicitation, the results from the cited studies indicate that there could be an optimum size of auction groups, which is not too few and not too many. More research that would unravel the role of these confounds is warranted. 3.2 Number of bidders in a group An additional design choice that is related to our discussion in the previous subsection is the number of subjects in an auction group. Many studies in the literature have formed auction groups based on the number of subjects that showed up in a given session (e.g. Lee & Fox, 2015). If the number of subjects is not kept constant, e.g. by turning away extra subjects, this will result in auction groups having different numbers of subjects. Given that aversion to turning away subjects should generally be correlated with the difficulty of recruiting subjects, then student subjects should be the easiest group of subjects to turn away. Turning away subjects is a very common practice in the experimental economics literature and this is exactly the reason why an experimenter needs to establish explicit show-up fees. However, recent theoretical (Banerji & Gupta, 2014) and empirical studies (Rosato & Tymula, 2016) have shown that if subjects have reference dependent preferences, then the equilibrium bid is lower when the number of bidders is larger. Hence, one could eliminate a possible confound by keeping the number of bidders constant across auction groups. Given our discussion in the previous subsection about using a small number of subjects in each auction group (taking also into consideration the caveats of having too small groups) that could be an alternative to the BDM, then one could also design a study using multiple auction groups in a given session (ideally, subjects would be randomly matched into groups). This practice could be beneficial on two more fronts. One is that by having multiple groups in a given session and the fact that one can probably perform only one session at a time, perfect collinearity between session (a given session can confound time of the day effects and day of the week effects) and auction group is avoided. This practice may be particularly beneficial in field settings where there might be more session noise. Furthermore, the general experimental economics literature often treats a group of subjects as one independent observation (e.g. Abbink & Hennig-Schmidt, 2006; Keser et al., 1998). Therefore, doing more auction groups in an experiment maximises the number of independent observations. Another advantage of multiple auction groups is that it can reduce the risk of disruption of the experiment in case one of the participants decides to quit in the middle of the experiment; in this case, it would be possible to go forth with the auction, discarding only the auction group affected by the participant’s defection. All in all, there is no harm in doing auction groups of smaller sizes and keeping the group size constant across sessions. In fact, there could be significant practical benefits. The exact number of bidders in a group should be decided after carefully examining the setting of the experiment, also taking into account our discussion in Section 3.1 about the trade-off between too-small and too-large groups. 3.3 House money and experimenter demand effects Experimenters are sometimes rightfully concerned that when subjects receive an endowment of money, they might feel obliged to freely spend some of it in the experiment since they might consider this not their own money or they might feel obliged to reciprocate to the experimenter. This effect is called the ‘house money’ effect or sometimes called ‘windfall money’ (e.g. Jacquemet et al., 2009; Corgnet et al., 2014). Thus, observed bids may not reflect subjects’ true valuations for the product, but rather their need to reciprocate to the experimenter or their moral obligation to affirm that the product the experimenter is offering is of good value (i.e. an experimenter demand effect) (Zizzo, 2010).7 A remedy for the house money effect could be to let subjects feel that they earned part of their endowment. Such tasks, often called real effort tasks (e.g. Abeler et al., 2011), have been used extensively in the experimental economics literature and have already found their way in auctions studies (Drichoutis et al., 2017a; Kechagia & Drichoutis, 2017). These real effort tasks do not normally require any prior knowledge, offer little learning possibility and are simple enough to apply so that everybody can always complete the task successfully. As an example, in some of these tasks, subjects have to count and report the number of zeros shown in a matrix composed of zeros and ones. The difficulty of the task can be varied by varying the dimensions of the matrix. A |$4\times 4$| matrix for subjects from the general population and a |$5\times 5$| matrix for students can be employed, albeit this decision is rather ad hoc. This task can be repeated (the elements of the matrix change in each repetition but should be kept the same for all subjects at a given repetition) and subjects can earn a fixed payment of, e.g. €0.5, every time they correctly solve the task within a given amount of time, e.g. 30 seconds. Since the task is purposefully easy, evidence from Drichoutis et al. (2017a) and Kechagia & Drichoutis (2017) shows that the vast majority of subjects solve this task correctly almost all of the time. It is crucial to make this task easy enough so that subjects would start off in the auction stage with approximately equal endowments, given that unequal endowments can confound bidding behaviour. There are other real effort tasks that one could use but we are not aware of any rigorous assessment of the ability of different tasks to mitigate house money effects in the context of experimental auctions. An alternative approach is to let subjects bid with their own money. For example, one could provide gift vouchers as the participation fee but then make clear to subjects that they will have to pay for anything they bid at the auction. Davis et al. (2010) had a group of subjects that physically receive a payment at the beginning of the session (to be considered their own money) while other subjects received it at the very end of the session with the rest of their earnings. Subjects that received money at the beginning of the session purchased information more frequently, which is consistent with increasing risk aversion. Rosenboim & Shavit (2012) pre-paid one group of students 2 weeks before the experiment and found that this group put a greater effort to reduce their possible losses and that they also bid lower in an SPA. Zhang et al. (2019) used a delayed payment mechanism, where subjects at the day of the experiment had to pay with their own money, while they received their participation fees 2 weeks later. They found that the delayed payment mechanism reduced overbidding behaviour especially for subjects with liquidity constraints, i.e. subjects that did not bring enough money to the session. Taken together, the results from these studies seem to suggest that on-the-spot payments, i.e. right after the end of a session, induces more risk loving behaviour (consistent with a house money effect) that results in overbidding. Therefore, a pre-paid mechanism (either 2 weeks before or in the form of a gift voucher that cannot be cashed) may counteract the house money effect and the resulting bias. Overall, we believe that there remains a need for more research to determine the effect of different payment mechanisms in experimental auction settings. A comprehensive study that compares bidding behaviour across various payments mechanisms would be of interest to the literature. In any case, this topic is strongly affected by regulations and ethical practices that are considered acceptable in the scientific community and by the non-trivial consideration that a subject should perceive that the reward obtained is worth the effort of participating in the study. This is especially challenging in field studies, since obtaining collaboration from operators in the field (for instance, a supermarket chain or a food specialty store hosting the data collection phase) is often conditional on the guarantee (or the expectation) that none of the participants will complain afterwards. 3.4 Number of repetitions and number of auctioned products Generally, experimenters would prefer doing multiple rounds of an auction given previous evidence that it improves outcomes (Corrigan et al., 2012). However, the exact number of rounds to use in an auction is often a matter of trade-off between getting more observations and subjects spending more time in a session. For computerised experiments, this could be a trivial problem as the automated procedure allows the auction to roll faster than a paper and pencil auction experiment (although we do not see much of the latter anymore). The number of rounds could also be dictated by sample size calculations (see Equation (1) in Section 4) but normally a choice between, for example, an eight-round auction and a nine-round auction will not make a big difference in the required sample size. As far as the number of products that one can simultaneously auction, there is normally a trade-off between increasing complexity vs. eliciting information for more products as the number of auctioned products is increased. One also has to be careful because as the number of different products being auctioned goes up, the ability of subjects to differentiate between the different versions of the products might be decreased and confusion could arise. In addition, one has to be cautious about order effects when eliciting valuations for multiple products in multiple valuation tasks (Belton & Sugden, 2018).8 For example, Demont et al. (2012) find that when consumers were asked their WTP for rice obtained through a new parboiler and locally developed parboilers, their WTP was significantly influenced by whether the alternative types where presented in an increasing quality gradient or when the presentation order placed the superior (inferior) rice type between the two inferior (superior) rice types. Given that experimental auctions that use multiple products and multiple rounds generally employ the practice of randomly selecting one product and one round as binding, the consequence of having many products and many rounds is that the probability of any given round or product becoming binding is reduced. This could lead to subjects treating any particular round and product as a low-probability event and so the expected cost of misbehaving in any round or for any product can become relatively small (the cost of misbehaving was a heated discussion for first-price auctions in Harrison (1989) and Harrison (1992); see also Lusk et al. (2007)). 3.5 Experimental instructions We chose to place our last point about experimental instructions in this section because we believe that this is an integral part of a design. For experimental economics in general, instructions are very important because these help in explaining all aspects of the experiment to potentially unfamiliar subjects. Moreover, instructions can facilitate replicability of experiments as well as help identify small details that might explain some of the results obtained under a particular design. In order to be able to evaluate a specific experimental auction design, one needs to take a look at the instructions that were provided to the subjects. Orally transmitted instructions to subjects without written transcripts would make it impossible for anyone to accurately evaluate or replicate a study, not to mention the possibility that the experimenter could introduce unknown confounds between sessions if there are improvisations or deviations from the script. Therefore, when written instructions are in place, it is also important for the experimenter to strictly follow the script. One way to minimise or eliminate these confounds is by providing all instructions in electronic format (given that the experiment is computerised) and by creating interactive screens where subjects can familiarise themselves with the auction environment and answer practice questions. One should move on with the experimental auction only when all subjects have really understood the whole instructional set (unless an explicit purpose of the experiment is to study confusion or attention). That said, one should also be cognisant of the computer literacy of the subjects that participate in a given session. For example, while students are probably very good in taking instructions in electronic form and in interacting with a computer, this may not be true for subjects from the general population. While it is also important to encourage subjects to ask questions during a session, it is normally preferable that this be done in private so that the experimenter can first filter the question and answer she wants to provide to the group. It goes without saying that experimental instructions should always be evaluated along with other methodological and statistical analysis standards. Editorial policies could enforce such submission of instructions by requiring this at the submission stage. From a reviewer’s perspective, one could push for this practice to be uniformly applied in the field by refusing to review submissions without the experimental instructions. From the author’s perspective, the fear of outright rejection could be enough to ensure that instructions are systematically and properly administered in experiments and then submitted to journals for evaluation. 3.6 Experiments in developing countries and the role of culture While many experimental auction studies have been conducted in developed countries, there are many studies that conduct auctions in developing countries, often because the study is trying to answer an interesting public health policy or development question, e.g. consumer acceptance of biofortified food (Birol et al., 2015). Conducting an experiment in a developing country poses some challenges by itself, which should be carefully taken into account in the design stage of an auction. For example, while haggling is very common in certain cultures, it is unclear how cultures accustomed to haggling might bring their field knowledge in experimental auctions settings.9 Because subjects in an auction are asked what their bid is similar to a first mover in a price negotiation, this could produce a natural reflex of ‘starting low’ price bargaining. On the other hand, in cases when haggling is common, subjects may be more familiar with changing prices and guessing the valuation of the seller. Therefore, culture might influence how people perceive strategies in auctions. Ehmke et al. (2008) showed wide variation in differences between real and hypothetical bids in experimental auctions across different cultures and they speculate that these differences might be related to the prevalence of bargaining in local markets. Nonetheless, the theory of experimental auctions suggests that the weakly dominant strategy is to submit bids equal to true values regardless of culture or where the auction is conducted. Hence, given that the auction mechanism is truly incentive compatible, it should override all other motivations for strategic bidding. This is only true, however, if people really understand the mechanism and its implications. Based on the above, it is quite plausible to hypothesise that understanding comes easier to subjects in a developing country rather than in a developed country because in non-haggling cultures, consumers are not trained to think about a reservation price. Moreover, because consumers in developed countries often come across non-negotiable fixed prices, it would perhaps be much easier for these consumers to respond to a choice experiment rather than an auction. To the extent culture influences perceptions of strategy or even how governments might use the results for policy making, the key, as argued by Smith (1982)' is dominance: that the incentives in the experiment outweigh nonpecuniary incentives outside the experiment. Using auctions in developing countries where bargaining is more common in local markets has been commonplace in the literature (e.g. de Groote et al., 2011; Demont et al., 2013, 2012) and in some ways, auctions are more familiar (and thus might exhibit greater ecological validity) in such cultures. All of that being said, there is a dearth of evidence on the subject, and this discussion provides some testable implications for future research studies. There are few studies systematically addressing the specific challenges of conducting auction studies in developing countries (Ehmke & Shogren, 2010). A recent paper by Durand-Morat et al. (2015) reviews the experiences and challenges that researchers may face when conducting contingent valuation (CV) studies in developing countries. We believe that most of the issues discussed in Durand-Morat et al. (2015) within the context of CV studies would also be highly relevant for auction studies. In brief, Durand-Morat et al. (2015) discuss the following: (i) the need of local personnel in the country where the study takes place; the quality of the personnel is even more important when the study involves elicitation of preferences from producers rather than consumers; (ii) that generating probabilistic samples is difficult (and sometimes impossible); (iii) paying subjects is, in some cases, expected by the subjects themselves but there are exceptions where, for example, some respondents were surprised or even suspicious of the monetary compensation (specific mentioning countries such as Bangladesh, Colombia, Ghana, Honduras and Tanzania)10; (iv) face to face interviews are easier, but surveying farmers (instead of consumers) requires establishing and gaining their trust; (v) literacy rate is a challenge that is also related to how well can subjects understand the instructions and/or what exactly it is that is being evaluated and (vi) security and safety issues of the researchers can be a challenge or obstacle in some countries, especially when money is involved. 4 Sample size and statistical testing issues Scientific hypothesis testing relies on methods of statistical inference to empirically establish that an effect is not due to chance alone. This has been the gold standard of science ever since Ronald A. Fisher’s era. A ‘test of significance’ (Fisher, 1925) of a treatment effect establishes that the effect is statistically significant when the test statistic allows us to reject the null hypothesis of no difference between two conditions based on a pre-specified low-probability threshold.11 All statistical hypothesis tests have a probability of making one of two errors: an incorrect rejection of a true null hypothesis (Type I error) representing a false positive or a failure to reject a false null hypothesis (Type II error) representing a false negative.12 False positives have received a great deal of attention; academic journals are less likely to publish null results and p-value hacking makes false positives vastly more likely (Simmons et al., 2011).13 False positives may have serious implications by leading the research community into false avenues and wasting resources. The problem of false positives is further exacerbated by the fact that researchers may not only file-drawer entire studies but also file-drawer subsets of analyses that produce non-significant results (Simonsohn et al., 2014). In addition, researchers rarely take the extra step of replicating their original study (but see Kessler & Meier, 2014, for an exception).14 Given the well-known general lack of reproducibility of scientific studies in economics (Camerer et al., 2016) and psychology (Open Science Collaboration, 2015), respectively, there is a growing concern over the credibility of claims of new discoveries based on ‘statistically significant’ findings.15 There have been several calls for actions and proposed solutions for the seemingly high false-positive rate. Nuzzo (2014) reports that p-values are widely misinterpreted as showing the exact chance of the result being a false alarm when such statements are really only qualified if the odds that a real effect is there are known in the first place. General rule-of-thumb conversions cited in Nuzzo (2014) indicate that ‘‘… a p-value of 0.01 corresponds to a false-alarm probability of at least 11%, depending on the underlying probability that there is a true effect; a p-value of 0.05 raises that chance to at least 29%’. Exact replications are therefore likely to uncover false positives although the incentives for individual researchers to self-replicate one of their experiments are still currently weak. Other researchers have taken aim at the p-value, calling for a change in the default p-value threshold for statistical significance from 0.05 to 0.005 for claims of new discoveries (Benjamin et al., 2018). Another group of researchers responded that instead of adopting another arbitrary threshold, a better solution would be to make academics justify their use of specific p-values (Lakens et al., 2018), while McShane et al. (2019) propose that the p-value should not be used as a threshold and should be treated continuously. Williams (2019) shows that lowering the significance threshold can even increase the rate of false positives by creating a negative selection effect. Moreover, the New England Journal of Medicine has issued new guidelines for statistical reporting ‘…including a requirement to replace P values with estimates of effects or association and 95% confidence intervals when neither the protocol nor the statistical analysis plan has specified methods used to adjust for multiplicity’ (Harrington et al., 2019). The criticism for p-values is not new. The epidemiologist Kenneth Rothman founded the journal Epidemiology in 1990 and as the chief editor for a decade, he enforced the reporting of confidence intervals (CIs) instead of p-values. While his policy was successfully enforced, compliance was superficial as very few authors referred to CIs when discussing results (Fidler et al., 2004). More recently, journals such as Basic and Applied Social Psychology and Political Analysis have moved one step further by banning p-values altogether (Trafimow & Marks, 2015; Gill, 2018) while at about the same time, the American Statistical Association issued a statement on the misuse of p-values and articulated principles of widespread consensus in the statistical community in order to improve conduct and interpretation of quantitative science (Wasserstein & Lazar, 2016).16 Simonsohn et al. (2014) introduced p-curve analysis as a way to distinguish between selective reporting of non-existent effects and the truth.17 This approach overcomes the limitations of previous approaches such as the ‘funnel plots’ method (Egger et al., 1997; Duval & Tweedie, 2000), the ‘fail safe’ method ( Rosenthal, 1979; Orwin, 1983) and the ‘excessive-significance’ test (Ioannidis & Trikalinos, 2007).18 The p-curve tool requires a set of studies to be included in the analysis and, as such, single papers should contain multiple studies and at least one direct replication of one of the studies (Simonsohn et al., 2014). Given that self-replication is rare in the literature, it is quite hard to detect a false positive from single-paper studies. Type II errors, on the other hand, have not been given similar attention. Zhang & Ortmann (2013) reviewed 95 papers published in Experimental Economics between 2010 and 2012 and found that only one article mentions statistical power and sample size issues. Replication studies (e.g. Maniadis et al., 2014) are particularly prone to false negatives because they are typically underpowered (Simonsohn et al., 2013).19 False negatives could also have significant real world impacts that could lead to substantial foregone benefits. For example, when Bonanno et al. (2014) simulate scenarios where rejected truthful health claims are removed because of the enactment of a regulation, they find that both consumers and producers incur welfare losses, but that consumers are penalised more than producers. As far as experimental auction research is concerned, a priori sample size or power calculations are almost non-existent in agricultural economics journals. However, journals and editors outside the subdiscipline are now embracing the idea of including power calculations and we suspect that this would soon be discussed and considered as well in agricultural economics journals and it is increasingly being expected in grant applications. While there are many statistical programs and packages that allow power analysis and/or calculation of optimal sample sizes (see for example Bellemare et al., 2016, Table 2), researchers are better off getting their hands dirty (see Ellis (2010) for an accessible introduction to power analysis). In auctions, sample size calculations are facilitated by the continuous nature of the observed variable (the bid) but could get slightly more complicated if the repeated nature of the auction setting needs to be taken into account. The reason we believe that sample size calculations should be an integral part of any experimental auction study is because researchers may end up with a result that is not statistically significant because the sample size was not large enough to detect a difference of practical significance. In addition, resources might be wasted by using a sample size that is much larger than is needed to detect a relevant difference. In order to calculate an optimal sample size given a continuous outcome of interest (the bid), a dichotomous between-subjects treatment and a multiple-round design (i.e. subjects are asked to submit a bid in multiple rounds for the same product), we first need to assume appropriate values for the Type I and Type II errors. Following the standards in the literature, we can assume |$\alpha =0.05$| (Type I error) and |$\beta =0.20$| (Type II error). To compare the means from the two treatments, |$\mu _0$| and |$\mu _1$|⁠, with common variance of |$\sigma ^2$| in order to achieve a power of at least |$1-\beta $|⁠, given a number of repeated measurements |$M$| (i.e. auction rounds) as well as a value for the correlation |$\rho $| between observations, the per group/treatment minimum sample size is then given by (Kupper & Hafner, 1989, Diggle et al., 2002: 30; Liu & Wu, 2005): $$\begin{equation} n=\frac{2(z_{1-\alpha/2}+z_{1-\beta})^2(1+(M-1)\rho)}{M(\frac{\mu_0-\mu_1}{\sigma})^2} \end{equation}$$ (1) where |$z$| is the value of the |$z$|-statistic. Note that the formula simplifies to Equation 6 in List et al. (2011) in the case of single-round auctions by setting |$M=1$|⁠. To calculate a minimum sample size, one needs to insert values for |$\sigma $|⁠, the minimum meaningful difference |$\mu _0-\mu _1$| and |$\rho $|⁠. The value of |$\sigma $| and |$\rho $| must not be calculated using the collected data (this is often called retrospective, post hoc or observed power). This is because reporting posterior power (or posterior sample size) is nothing more than reporting the p-value in a different way since power is a 1:1 function of the p-value (Gelman, 2019; Hoenig & Heisey, 2001). Drichoutis et al. (2015) and Briz et al. (2017) provide some examples on how one can use prior literature to find the relevant parameters for the sample size calculation. In this case, power is called prospective or a priori power (and similarly for sample size). Sample size calculations should be performed before data collection so that there is a clear stopping rule that defines the data-collection plan. The minimum meaningful difference or expected effect size |$d=\mu _0-\mu _1$| can be derived from theoretical predictions or can be defined as the smallest effect size of interest. It can also be the product of a systematic literature review or it could be informed by auxiliary data, meta-analysis etc. (Gelman & Carlin, 2014). Gelman & Carlin (2014) also suggested using a broad range of effect sizes given that past estimates of effect sizes tend to be overestimates. This is because when the true effect is medium-sized, only small studies that (by chance) overestimate the magnitude of the effect will pass the threshold for discovery (Button et al., 2013).20 Given a range of values for |$d$|⁠, |$\sigma $| and |$\rho $|⁠, one can find an interval of observations per treatment that are needed to detect a given effect size. This is a trivial calculation and a Stata code example is provided in the Supplementary Data (at ERAE online). One can even experiment with the number of rounds at the experimental design stage and find an optimum number of rounds, given that more rounds are inversely related to the number of observations that are needed per treatment group.21 As a side note, Equation (1) was responsible for an interesting exchange of ideas between us and one of our reviewers, which prompted us to further clarify how our discussion of multiple rounds in Section 3.4 is tightly connected to sample size issues. First, note that |$\frac{\partial n}{\partial M}<0$| and |$\frac{\partial n}{\partial M \partial \rho }>0$| (see footnote 21). The first partial derivative suggests that when the experimenter chooses to increase the number of repetitions, the required sample size will be lower for a given value of |$\sigma $|⁠. However, the benefit in terms of reduced sample size also depends on the value of the correlation between rounds |$\rho $|⁠. Since |$\frac{\partial n}{\partial M \partial \rho }>0$|⁠, for increasing values of |$\rho $|⁠, |$\frac{\partial n}{\partial M}$| will increase as well (it will be less negative since |$\frac{\partial n}{\partial M}<0$|⁠), then the benefit of doing more rounds in terms of reduced sample size is reduced for higher values of |$\rho $| (in the extreme case where |$\rho =1$|⁠, there is no benefit in doing multiple rounds; when |$\rho =0$| then more repetitions reduce the necessary size in multiples of the repetition, i.e. |$n_{M=3}=n_{{M=1}/3}$|⁠, or the sample size required with three repetitions is one-third of what is required if a single-round auction is performed). The discussion in the previous paragraph is for between-subjects designs. However, is there an advantage of choosing a within-subject design so that the same subjects are exposed to two (or more) treatments? In the case of a within-subject design with two treatments, the sample size needed is given by (Maxwell & Delaney, 2004: 561): |$n_w=n_b(1-r)/2$| where |$n_b$| is the sample size needed in a between-subjects design and |$r$| is an intraclass correlation coefficient (ICC) with |$0 \leq r<1$| (this point is not clear in Maxwell & Delaney (2004) and one has to trace back the formulas in Venter & Maxwell (1999) to clearly see that |$r$| refers to an ICC). It is easy to see that the benefit in terms of sample size of a within-subjects design depends on |$r$|⁠. If |$r=0$| (i.e. there are no systematic individual differences on the dependent variable) then |$n_w=n_b/2$|⁠, so the needed sample is reduced by half. For larger values of |$r$|⁠, there is a higher benefit in terms of needed sample size. When the proportion of the variation in the dependent variable that can be explained by differences between systematic individual differences goes to 1, then sample size of a within-subjects design will tend to go to zero. Intuitively, when |$r$| is closer to 1, then the systematic individual differences are sizable, which further reduces the number of subjects needed in the within-subjects design. Therefore, the |$1-r$| term in the numerator reflects the benefit of using each subject as his or her own control, so that there is a big benefit (in terms of lower |$n_w$|⁠) when systematic individual differences are large. The formula for within-subjects designs can be generalised to |$n_w=n_b(1-r)/a$| for treatments with |$a$| levels (Maxwell & Delaney, 2004: 562) that points to further economising in terms of sample size than in the simpler two-level design. However, there is a trade-off between power and bias that researchers should be aware of. Czibor et al. (2019) argue that within-subjects designs are not as popular because of the strong assumptions they require for inference (e.g. independence of multiple exposures; the effect is not causal without this assumption), which are likely not to hold because of learning, history, demand effects etc. Sample size calculations can become more complicated, however, with more than one treatment level. One would first have to compare which contrasts are of interest. For example, consider a three treatment scenario where the control is compared with two treatments but the two treatments are never compared to each other. List et al. (2011) provide an example where the optimal allocation weighs more heavily to the control by allocating half of the sample to the control and a quarter to each of the other two treatments. Intuitively, the control is used in two contrasts while the other two treatments are used in just one comparison each; hence the allocation of observations in each treatment should not be equal. Sample size calculations that have closed form expressions such as the ones laid out above are typically used for simple statistical models. For more advanced statistical tests, estimation methods, and special design features, one would need to use simulation methods to approximate the power of complicated experimental designs. One recent contribution customised to the needs of experimental economists is the powerBBK package (Bellemare et al., 2016) implemented in Stata. The package allows the user to specify details about the experimental design, e.g. the number of subjects, number of periods, within or between-subjects design and balance of the design and to specify individual heterogeneity by means of random-effects terms, accommodates non-linear models (i.e., logit, probit, tobit) etc. Now let us take one step back. In the beginning of Section 2 we discussed how experimental auctions are sometimes used as a value elicitation vehicle where the sole interest is in estimating the mean WTP, |$\mu _{wtp}$|⁠, of a random sample with variance |$\sigma ^2$| from a |$N(\mu _{wtp},\sigma ^2)$| population but not in estimating a treatment effect. In this case, we can specify the maximum |$100(1-\alpha )$| per cent CI width so that |$\mu _{wtp}$| is estimated within a tolerance of |$\pm A$| units. The minimum sample size |$n_m$| needed to achieve this precision is the smallest integer satisfying |$n_m \geq [(\sigma /A)z_{1-\alpha /2}]^2$| (Kupper & Hafner, 1989). 5 Economic significance of estimates One of the main caveats of p-values is that they do not provide a measure for ‘how large is large’ (McCloskey & Ziliak, 1996) or the strength of an effect.22 Furthermore, experiments with large sample sizes will be powerful enough to detect as statistically significant even small differences that maybe are not of practical or economic significance. Standardised effect sizes can be an important complement to statistical significance testing. There are many effect size measures but one of the most popular is Cohen’s (1988) |$d$| index, a pure number, free from measurement units, such as many other effect size measures. The |$d$| index is the standardised mean difference of two variables, |$b_1$| and |$b_2$|⁠, with sample sizes |$n_1$| and |$n_2$|⁠, over the pooled standard deviation: |$d=\frac{\bar{b}_1-\bar{b}_2}{s}$| where |$s^2=\frac{\sum _{i=1}^{n_1} (b_{1i}-\bar{b}_1)^2 + \sum _{i=1}^{n_2} (b_{2i}-\bar{b}_2)^2}{n_1-n_2-2} = \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1-n_2-2}$| is the common variance pooled over the variances |$s_j^2=\frac{\sum _{i=1}^{n_j} (b_{ji}-\bar{b}_j)^2}{n_j-1}$| of the two groups for |$j=1,2$|⁠. Given that the t-statistic to test whether the means are different is |$t=\frac{\bar{b}_1-\bar{b}_2}{s\sqrt{1/n_1+1/n_2}}$| we can then write |$d=t\sqrt{1/n_1+1/n_2}$| (Cohen, 1988: 67). This last formula provides useful results because it links the power of the two-sample t-test with the difference between means |$d$|⁠, the common standard deviations |$s$| and sample sizes |$n_1$| and |$n_2$|⁠. The inverse relationship between sample sizes and mean difference indicates that given a difference, larger samples will increase power. Or that given a total sample size, the standard error |$s\sqrt{1/n_1+1/n_2}$| is minimised by having |$n_1=n_2$|⁠; that is, equally split the number of observations into the two treatments (see also Kenny, 1987: 213–214). As a crude guide, Cohen (1988) offers conventional operational definitions of 0.20, 0.50 and 0.80 for ‘small’, ‘medium’ and ‘large’ values of |$d$|⁠, respectively. The values for these effects should be judged relative to the research field or to the specific content being employed in any given investigation. In Cohen’s terminology, large or small effect sizes are not meant to classify treatment effects as ‘important’ or ‘not important’. A ‘small’ effect size is to be interpreted as something that is really happening in the world but which can only be seen through careful study. A ‘large’ effect size is an effect that is big enough that can be spotted with a ‘naked observational eye’ (Cohen, 1988: 13). Cohen (1988) notes that ‘many effects sought in personality, social, and clinical-psychological research are likely to be small effects’. With respect to our previous discussion about how effect size and sample size are related and given a certain power level, this would imply that if one is to identify a psychological treatment effect in an auction setting, then one should aim for higher sample size because what she is really trying to find is probably small. There are many other versions of standardised differences similar to Cohen’s |$d$|⁠. For example, Hedges’s (1981) |$g$| applies a correction to the |$d$| index, Glass’s |$\Delta $| uses the standard deviation of the control group in the formula for |$d$| while Kline (2013) proposes reporting Glass’s |$\Delta $| using the standard deviation for each group. These are all trivial to calculate using today’s software. For example, in Stata these can all be calculated using the esize command. This set of measures based on the differences of the means are often called the ‘Difference’ family or the ‘|$d$|’ family for short. A second type of effect size measures, either called the ‘Correlation’ family or the ‘r’ family, quantifies the ratio of the variance attributable to an effect and is interpreted as the ‘proportion of variance explained’. This family includes simple measures such as the correlation coefficient |$r$|⁠, the index |$q$| that is a measure of differences between correlation coefficients, the index eta-square |$\eta ^2$| that is analogous to the regression |$R^2$| and the index |$\omega ^2$| that is analogous to the adjusted |$R^2$|⁠. These are all effect size measures easily computable with many software packages (for example, with the esize command in Stata). For effect sizes from more advanced econometric models, such as random effect models that are often employed for experimental auction data, the variances coming from different sources must be accounted for (Selya et al., 2012). Cohen’s (1988) |$f^2$|⁠, based on |$R^2$| values of different versions of regression models, can circumvent the shortcomings of other standardised effect size measures. An application of this approach can be found in Briz et al. (2017). The Supplementary Data (at ERAE online) provides an example using Stata code about how to calculate Cohen’s (1988) |$f^2$|⁠. A final way to express a treatment effect in relative terms is to first calculate the predictions of the estimated model, then average out these predictions (which would correspond to an average predicted WTP) and find the ratio of the estimated coefficient of the treatment effect over the average prediction. This approach has been used by Kechagia & Drichoutis (2017). 6 How to analyse auction data One of the distinct advantages of the auction approach, relative to discrete choice methods, is that the outcome of interest, WTP, is directly obtained. When combined with an experimental design where individuals are randomly allocated to control and treatment groups, a study’s main hypothesis can be easily tested by comparing median, means or other moments of the bid distribution across the groups. Moreover, because bids are a continuous measure of value, the distribution can be inspected without having to make distributional assumptions. As other studies have shown, bid distributions can be highly skewed or even bi-modal (e.g. Lusk et al., 2006b). Because summary statistics for bi-modal or multimodal distributions may be deceptive, identifying the mode of the distribution of bids may be useful for further analysis. In this case, simple graphical tools, such as histograms, kernel density plots or Silverman’s (1981) test of multimodality could be useful detection tools.23 Although auctions provide a direct estimate of the monetary value of a good, there is often interest in other measures of demand, such as market shares and elasticities. Auction data can also be used for these purposes without having to resort to econometric models or distributional assumptions. To illustrate, suppose individual i bid |$b_i^A$| for good A and |$b_i^B$| for good B. Which good would the individual be projected to choose if a retailer sets prices |$p^A$| and |$p^B$| on goods A and B? An individual would be projected to choose the good that provides the highest net value or consumer surplus, which is given by the difference in one’s value for the good (i.e. the bid in an incentive compatible auction) and the price. Thus, A is predicted to be chosen over B if |$b_i^A - p^A> b_i^B - p^B$| and |$b_i^A - p^A> 0$|⁠. If |$b_i^A - p^A < 0$| and |$b_i^B - p^B <0$|⁠, then the individual would be predicted to refrain from buying either A or B, as buying either of the goods would generate a loss.24 In a sample of N consumers, the market share of A is simply a count of the number of individuals predicted to choose A divided by N. Own-price and cross-price arc elasticities can be determined by calculating how the market share of A changes when the assumed prices change. For a fuller treatment of this issue, including examples, see Lusk & Shogren (2007) or Lusk (2010). Despite the fact that key insights from auction data often come from summary statistics, there are common features of auction data that often prompt the need for econometric analysis. Most notably, auction bids are often censored from below at zero. It is also the case that auction bids for a conventional good can be censored from above at the field price of the good outside the experiment (Harrison, 2006; Alfnes, 2009). Although auction experiments can be constructed to allow negative bidding (Parkhurst et al., 2004; Lee & Fox, 2015), the practice is still uncommon. However, even in auctions that do not allow negative bids, it is possible to project the conditional mean bid if negative bids had been allowed through the straightforward application of the Tobit model. Likewise, the Tobit model can be used to estimate the conditional mean bid under the assumption that there was no censoring of bid at field prices. While the Tobit model is widely used, our experience is that the interpretation of the model’s coefficients are not well understood. The Tobit model draws a distinction between an uncensored latent or unobservable variable, |$b^*$|⁠, (typically assumed to be normally distributed) and the censored variable that is actually observed, |$b$|⁠. In the case of censoring from below at zero, |$b = b^*$| if |$b> 0$|⁠, but |$b = 0$| if |$b \leq 0$|⁠. An important point is that the estimates from a Tobit model are the projected impacts on the mean of the |$b^*$|⁠, i.e. the uncensored mean. The parameter from a simple model that includes only a constant term is the estimated mean of |$b^*$|⁠, the uncensored distribution (i.e. the distribution that theoretically allows negative bids). As a result, when other explanatory variables are added to the model, the estimated coefficients relate to the marginal effects on the uncensored bid distribution. Overall, there are four values of potential interest in a Tobit model: (i) marginal effects on the latent, uncensored variable, |$\frac{\partial E[b^* | x]}{\partial x}$| (these are the raw coefficient estimates); (ii) on the observed, censored variable, |$\frac{\partial E[b | x]}{\partial x}$|⁠; (iii) on positive bids, |$\frac{\partial E[b | b>0,x]}{\partial x}$| and (iv) on the probability of being uncensored, |$\frac{\partial Pr[b>0 | x]}{\partial x}$| (see Drichoutis et al., 2017a, for an application of the Tobit model with auction data). However, which marginal effects should one use? The answer depends on the question being asked. An example from outside the world of auctions might help provide some clarity. Imagine that a coach is interested in the marginal effect of halftime entertainment spending at a basketball game on the game’s attendance. Data consist of many years of attendance at games and spending at each halftime show. The distribution of attendance is censored from above at the stadium’s capacity, and many observations are exactly equal to the capacity limit. The coefficient from the Tobit model associated with halftime spending is the estimate of the marginal effect of spending on the uncensored mean attendance |$\frac{\partial E[b^* | x]}{\partial x}$|⁠, but as previously mentioned, it is also possible to estimate the marginal effect of spending on the attendance given that attendance is censored from above at capacity. Which estimate should be given to the coach? If there are no plans to expand the size of the stadium in the near future, then we should tell the coach the marginal effect on the censored distribution. If the coach is considering a stadium expansion before the next season, then we should tell him/her the marginal effect on the uncensored distribution. The decision of whether to expand the stadium might also be informed by the probability of attendance being censored at the capacity constraint. A few additional comments about the Tobit model are in order. Just because bids might be censored at zero does not mean that a Tobit model is always required. The extent to which estimates from a Tobit model will diverge from ordinary least squares regression depends on the share of observations that are censored. Unless the share of observations is non-trivial (say, more than 5 per cent), a Tobit model probably is not worth the trouble. It is also important to recognise that a Tobit model implicitly imposes the assumption that the effects of an explanatory variable are identical for censored and uncensored variables. One can relax this assumption by utilising a double-hurdle model (Cragg, 1971). The double-hurdle model is actually two models: (i) a probit model, where the dependent variable takes the value of 1 if the variable is censored and zero otherwise, and (ii) a truncated regression model utilising only those observations that are uncensored. It is sometimes the case that the double hurdle will yield insights that differ from the Tobit or ordinary least squares (e.g. Lusk & Fox, 2002). The double-hurdle model allows one to specify a different set of independent variables that affect the decision to submit a positive bid and the second stage that is concerned with the level of the bid given a positive decision in the first stage. In the special case where the set of independent variables is the same for both stages, then the Tobit model is nested within Cragg’s double-hurdle model (Burke, 2009). Combined with our suggestion of using smaller groups within a given session (see Section 3.2) and provided each subject submits bids in multiple rounds, this design creates a particular nesting with multiple levels. Bids from multiple rounds are nested within an individual, and the individual is nested within an auction group (and groups are nested within a session). This calls for the use of a multilevel mixed-effects model to account for the lack of independence within these groups. An application can be found in Drichoutis et al. (2017a).25 Rather than explicitly modelling these random effects, it is also possible to produce clustered standard errors to account for such groupings. The extent to which one approach is preferred over the other depends on how comfortable one is in making parametric assumptions about the random effects. Typical motivations for estimating regression models in experimental auction studies relate to the desire to control for censoring, as described above, or to control for potential differences in demographics across treatment groups. Randomisation of participants to treatments or the use of within-subject designs lowers the need to worry about demographic controls; however, the general advice in Senn (1994, 2013) is that observed covariates should be taken into account in the analysis (see also discussion in Section 2). However, there is another reason econometric models might be useful in analysis of auction data: consumer heterogeneity. Advances in discrete choice modelling have highlighted the importance and pervasiveness of heterogeneity in consumer preferences. As previously mentioned, auction studies are well suited to studying heterogeneity without econometric analysis. However, the latent class (also referred to as finite mixture) models or random coefficient models that have become common in the study of discrete choice data might also be useful in analysis of auction bids if there is a reason to believe there might be heterogeneity in treatment effects. Finite mixture models allow the researcher to assume a finite number of types of subject that differ in (i) the process giving rise to behaviour and (ii) the behaviour itself, while random coefficient models assume that there is an individual-specific parameter indexing behaviour (i.e. infinite types of subject with respect to that parameter) (Moffatt, 2016). Moffatt (2016) provides a nice introduction and several examples (not in the context of auctions, albeit these can easily be extended to auctions) of how to go about estimating such models in Stata. Heterogeneity in treatment effects might arise in a variety of settings. For instance, many experimental economics studies have explored various price, information or ‘nudge’ policies as they relate to healthfulness of food choice (e.g. Ellison et al., 2014; Muller et al., 2017). Imagine an experimental auction study where a control group bids on unlabelled items and a treatment group bids on the same items that now include red ‘traffic light’ nutritional label for unhealthy items. A simple regression model to test for the effect of the nutritional label is |$b_{ij}=\alpha _0 + \alpha _1T + \beta Z_i + \epsilon _{ij}$|⁠, where |$b_{ij}$| is individual i’s bid on product j; T is an indicator variable taking the value of 1 for the treatment with traffic light labels; Z is vector of demographic controls; |$\epsilon $| is an error term; and |$\alpha _0$|⁠, |$\alpha _1$| and |$\beta $| are coefficients to be estimated. The primary interest is in the sign and significance of |$\alpha _1$|⁠. A reasonable hypothesis is that |$\alpha _1<0$|⁠, i.e. inclusion of ‘red’ warning signs will reduce WTP for a food. However, there may be some people who will exhibit psychological reactance and respond to the policy in an unanticipated manner (Just & Hanks, 2015). Such a response, for example, can be observed if an individual does not like ‘being told what to do’ or interprets the policy as a threat to their autonomy or freedom. Thus, there may be some people for which we might expect |$\alpha _1> 0$|⁠. The simple econometric model outlined above could be modified in several ways to identify heterogeneity in treatment effects. Perhaps the most straightforward approach is to include an interaction between T and some or all of the Zs: |$b_{ij}=\alpha _0 + \alpha _1T + \beta Z_i + \theta TZ_i+ \epsilon _{ij}$|⁠. Now, the marginal effect of T on |$b$| is |$\alpha _1 + \theta Z_i$|⁠. This approach is limited in the sense that heterogeneity in the treatment effect only arises through heterogeneity in observables, |$Z_i$|⁠. One might hypothesize that, for example, men might be more likely to display reactance than females, but surely the effect is more complicated than its relationship to gender. One way to address this concern is to estimate a random coefficient model. In this case |$\alpha _1$| is replaced with the individual-specific parameter, |$\alpha _{i1}$|⁠, which is then assumed to be, for example, normally distributed: |$\alpha _{i1}\sim N(\overline{\alpha }_1,\sigma _{\alpha _1})$|⁠. This model allows for the estimation of the mean effect, |$\overline{\alpha }_1$|⁠, but also allows for differential responses via the estimated standard deviation of the treatment effect, |$\sigma _{\alpha _1}$|⁠. A challenge with this approach is that it requires an assumption about the distribution of the treatment effect. In the case of the reactance example, it is not clear that one would expect the treatment effect to be normally distributed across people. An alternative approach is the latent class model. In this case, a number of classes, |$C$|⁠, are specified and one estimates class-specific parameters, |$\alpha _{0c}+\alpha _{1c}T+b_cZ_i$|⁠, along with the probability of respondents falling into each class. Typically, AIC or BIC measures are used to determine the number of classes, C. This approach can permit more distinct treatment heterogeneity, where for example, |$\alpha _{1c=1}$| might be positive and |$\alpha _{1c=2}$| might be negative, with the model revealing the share of the sample fitting each pattern. There is much more that can be done with auction data to generate actionable insights. Lusk (2010) shows, for example, how auction bids can be used to identify consumer segments via cluster analysis or product groupings via factor analysis or multidimensional scaling. 7 Disclosure, pre-registration, pre-analysis, and open data and materials This section is motivated by the recent move of individuals, scientific societies and journals to embrace and promote more transparency in social science research. Here, we discuss issues that the various Agricultural Economics Associations and journals in the field have been slow to officially embrace.26 We believe that in the near future, these issues will become a higher priority and that most experimental research will eventually have to hold up to these standards. At the present time, however, there is only a handful of pre-registered studies that are related to experimental auctions and consumers’ WTP (all of these are registered to AEA’s CRT registry). However, we hope that our discussion here will stimulate more interest in the field of experimental auctions for agricultural economics research. Given that experimental research is expanding and the advantages of experimentation are becoming more widely known, along with the recent crisis in the replicability of experimental studies, there has been a call to set up practices that will promote transparency in social science research. Miguel et al. (2014) define three core practices for more transparency in social science research: disclosure, pre-analysis and pre-registration plans and open-data/materials. Disclosure is the systematic reporting of standards that researchers are (or should be) obliged to provide such as the measures, manipulations, data exclusions and the final sample size. Many prominent medical journals (BMJ, The Lancet etc.) recommend or require that researchers adhere to the CONSORT (Moher et al., 2010). In the absence of an endorsement of similar standards by associations, Simmons et al. (2012) proposed a 21-word disclosure to accompany manuscripts regarding the authors’ knowledge that they did not p-hack: ‘We report how we determined our sample size, all data exclusions (if any), all manipulations and all measures in the study’. If needed, supplementary material can be used to support the disclosure. Pre-registration involves specifying in detail, in an online repository, information, such as the number of subjects, treatment and relevant stimuli, outcome variable, prediction/hypothesis, pre-analysis plan, that altogether constitute the plan for a study. The whole plan can be hosted in one of the available online repositories that can be accessed by interested parties (editors, reviewers and readers) and receives a time stamp. Although some repositories have time-limited embargoes, the plan can be made public or not at the will of the researchers. For example, AEA’s RCT registry can keep key information hidden until the time when the trial is completed. Some of these repositories are the Open Science Framework (https://osf.io/), AsPredicted (https://aspredicted.org/), American Economic Association’s RCT registry (https://www.socialscienceregistry.org/), the Evidence in Governance and Politics registry (http://egap.org/content/registration) etc. Probably, registration for many economic experiments will be diverted to AEA’s RCT registry since, as of January 2018, registration in the RCT registry has become mandatory for all submissions to AEA journals. A closely related concept that sometimes causes some confusion is registered reports. Registered reports involve peer review of the experimental design prior to any data collection. The manuscript is then provisionally accepted on the condition that the authors follow their protocol, regardless of the results. With pre-registration, reviewers could still be biased against null results. Registered reports allow the focus on the research questions, effectively eliminating publication bias. Note that a registered report does not necessarily have to be pre-registered because the journal might not require publication of the protocol. Currently, only a few journals in economics (e.g. Journal of Development Economics; no journal in agricultural economics yet) allow for registered reports. As one of our reviewers noted, registered reports can also provide additional benefits to science by, for example, incentivising research funders to make funding conditional on a project being accepted as a registered report. As a testament to that, the Research Excellence Framework 2021 (the system for assessing the quality of research in UK higher education institutions) notes in their assessment criteria that registered reports ‘…contribute to the evaluation of rigour for submitted outputs’ (Framework, 2019). Some people share concerns about the time and effort involved with pre-registration and whether pre-registration can effectively prevent deceptive practices. Simonsohn (2018) makes two arguments against these concerns. First, because of what is called self-serving bias, researchers are likely to be self-deceived by convincing themselves after data collection that the analysis that worked was what was planned all along. Pre-registration reduces self-serving bias by reducing ambiguity about the research plan through pre-registration. Second, researchers may engage in deceiving practices either by omission or by commission. We are confident that very few researchers would engage themselves in deception by commission. Pre-registration tackles deception practices by omission by transforming a deceptive practice by omission into an explicit lie. Thus, with wide spread pre-registration, we would expect a significant reduction of deceptive practices because not many researchers would want to explicitly lie as part of their research agenda. So why would someone want to pre-register their research plan? Advocates claim that this is the only way that criticism about data mining, p-hacking and other questionable research practices can be muted.27 Pre-registration creates a new step in the workflow of research but it is seen as a good way to produce rigorous results that would allow a sharp distinction between confirmatory and exploratory analysis. To incentivise researchers to use pre-registration, many journals outside of the discipline of economics are now offering the option of registered reports. If the registered report is given a positive evaluation, then the proposed paper will be given a conditional acceptance and a promise to publish it regardless of the outcome. Currently, more than 100 journals use Registered Reports as a regular submission (the list is maintained by the Center for Open Science: https://perma.cc/KV4F-57ES). The vast majority of these journals are psychology and neuroscience journals; there is one political science journal (the Journal of Experimental Political Science) and one economics journal (the Journal of Development Economics). We feel that the biggest drawback for researchers pre-registering their studies will be a commitment to a specific pre-analysis plan. Committing to a pre-analysis plan includes, among others, deciding on the precise definition of a primary outcome variable (or multiple co-primary outcomes) and of potential secondary outcomes, specific variable definitions, any inclusion or exclusion restrictions, statistical models, hypothesis testing methods (including correcting for multiple hypothesis testing if many primary outcomes are defined) and covariates (potentially including measures of standardised differences as described in Section 2), subgroup analysis etc.28 Olken (2015) summarises the benefits and risks of committing to a pre-analysis plan. Researchers who put a lot of detail and effort in coming up with a pre-analysis plan could benefit from the careful thought processes required of going through their data analysis and selecting which variables to collect and methods to apply for the analysis. This step could also involve researchers writing their statistical programs and run them on mock-up data.29 For the research community, it increases the confidence that the analysis did not just involve picking and reporting the most significant specification. While fully specifying all the analysis ex ante could be considered an ambitious plan, this does not mean that additional analysis cannot be performed if not mentioned in the pre-analysis plan. One could for example still include results from analysis not included in the pre-analysis plan in the paper but these should be clearly indicated in the paper as not part of the pre-analysis plan. Coffman & Niederle (2015) offer arguments about the limited upside of pre-analysis plans and propose that economics move towards valuing replications and robustness checks of positive results instead. Pre-analysis plans are not without criticism. One argument against pre-analysis plans is that if analysis and results not registered in the pre-analysis plan are given a second-tier status, then these are likely to be left out of a manuscript before a paper even goes under review (Sadrieh, 2019). Hence, it is possible that unexpected findings that are discovered along the way could be left out and only analysis/results that were correctly guessed at the conception stage of the experiment are included. Consequently, in the long run, researchers that made the right guesses are rewarded while others are ‘punished’. Alternative incentives for rewarding unexpected discoveries have been termed a full-information solution, whereby all results are published based on the merit of the experimental design. Meta-analysis studies could also paint a more complete picture of the scientific knowledge gained from a specific topic. To address the challenges of committing to a detailed pre-analysis plan, Anderson & Magruder (2017) and Fafchamps & Labonne (2016) revived an idea of testing the out-of-sample performance of predictors in ‘hold-out samples’.30 Roughly speaking, the approach involves withholding a fraction of a sample, say half of it, and run an exploratory analysis on these data. One could then choose which hypotheses to test and the methods to test the hypotheses based on the exploratory analysis and then come up with a more well-informed pre-analysis plan using insights from the exploratory analysis. Once a specific pre-analysis plan has been decided and registered, the researcher can then use the other part of the sample to employ the pre-analysis plan. The split approach does come at a cost, however, since the split sample loses power relative to a full-sample pre-analysis plan on hypotheses that were anticipated. The final point in Miguel et al. (2014) concerns open data and materials. We feel that agricultural economics journals have moved in the right direction consistent with top economics journals. For example, ERAE’s guidelines to authors clearly states that ‘The editors reserve the right to refuse to publish articles where the data, programs, etc. are not provided and where, in their view, there is no justifiable reason for not making them available’. The creation of journal archives has a long history dating back to Dewald et al. (1986) whereby exploiting a change in the editorial policy of the Journal of Money, Credit and Banking that required authors to make the data available on request, they found that the proportion of authors that submitted programs or data was significantly greater after the introduction of the policy. They then tried to replicate nine papers from authors that submitted their data to JMCB and found that only two of these could be replicated in their entirety. Dewald et al. (1986) suggested that journals require data and codes at time of submission. In response to Dewald et al. (1986), the JMCB adopted a data and code archive policy. However, future replication attempts to the same journal (McCullough et al., 2006) were just marginally more successful: of the 186 empirical articles, only 69 had archive entries; 7 could not be replicated due to lack of software or the use of proprietary data; only 14 out of 62 articles could be replicated. McCullough et al. (2008) examined compliance of depositing data in the journal’s archive for journals that required this and found that the Journal of Applied Econometrics had a 99 per cent compliance rate that they attributed to the fact that (i) JAE had an editorial position for the archive manager and (ii) that a paper is not published until the authors have made their data available. However, replication was extremely low, which they attributed to the fact that no code was required to accompany the data files. In the meantime, things have not gotten better. Recently, Chang & Li (2017, 2018) were able to replicate only 33 per cent of papers from 13 journals independently of the authors and 49 per cent with help from the authors. As far as experimental auctions research is concerned, we are not aware of any formal attempt yet to replicate results using data and codes from previously published papers. Independent of the current status of replicability of experimental auction results, there are actions we would endorse for the benefit of the profession and science in general. Major agricultural economics journals should make it mandatory to submit data and codes at the time of submission. Then an editor should be assigned as the manager of the archive. Having a specialised editor who monitors compliance with data and code submission requirements could improve practice by removing the burden from the editors handling regular submissions. However, we also believe that it would be beneficial, for overall transparency’s sake, that the editor not only makes sure that data and codes are submitted but that any paper is accepted conditional on the managing editor or a third party being able to exactly replicate the results of the experimental paper using the data and codes provided by the authors during the submission (this is the verification process described in Clemens (2015) and Christensen & Miguel (2018)). For example, the American Economic Association, following the appointment of an inaugural Data Editor in January 2018, adopted an updated Data and Code Availability Policy in July 2019, the implementation of which will be overseen by the AEA Data Editor for all the journals of the society.31 This will probably lead other (agricultural) economics journals to follow this practice. In relation to the availability of data and codes, platforms have emerged that allow storing all the material that allow replication of results online. For example, Code Ocean is a research collaboration platform where the researcher can upload code and data, install all the needed packages, save the results and then publish everything in a ‘capsule’ that anyone else can execute in the cloud by clicking run. This way, a time-stamped version of the code and results is created, which is immune to future updates of commands that may break down execution of code. The platform supports C/C++, Java, MATLAB, Python, Stata, R etc. and any programming language available on Linux. As far as availability of materials is concerned, as readers and reviewers of experimental auctions papers, we often come across experimental auctions work that does not include experimental instructions. We believe this is a crucial step that is often neglected in the review process, which could make a big difference when evaluating the merit of a paper. This is because even subtle words or the way phrases are expressed in the instructions could induce effects beyond what the authors believe their manipulation is inducing. We believe that one possible reason that instructions sometimes are not submitted with the paper is that there was never instructions distributed or shown to subjects; i.e. the experimenter orally explained the mechanics of the auctions and then let subjects bid on the products. As we explain in Section 3.5, this is not a practice we would endorse. A final and perhaps subtle remark concerning transparency is that authors should make sure, when using links pointing to websites that provide complementary information about their methods, analysis or any other material, that these links will outlive the paper. Normally, this is not the case because the internet changes rapidly. The solution is to make links permanent via services that archive websites, e.g. http://archive.org/web and https://perma.cc/. The careful reader will notice that most of the links in this paper that point to online content use one of these services. Another solution regarding materials associated with any research project is to post everything in an online repository such as the Open Science Framework (https://osf.io/). An alternative or complement (depending on how ones uses it) to repositories for storing data are Data Journals. These journals publish articles that focus specifically on describing the research data that have been made publicly available either through a repository or directly in the Data Journal. The research article published in a Data Journal provides a mechanism for describing the data and its accessibility, along with details regarding the genesis of the data such as the context of the data collection, the choice of software environment etc.32 There are additional benefits to publishing in a Data Journal, including the fact that data are given a DOI that can then be cited in the research paper. This might lead to increased traffic in the research article and data article that might contribute to more citations for the authors. For the research community in general, Data Journals can facilitate reproducibility and could speed up the research process by allowing other researchers to build on the work of the authors as well as create new and fruitful collaborations. A possible downside from the authors’ side is that they will be more exposed to competition for publishing results out of the data set. However, the authors can choose to embargo their data for the duration of the review process. As a testament to the changes that other fields are undergoing, recently the journal Cortex (Chambers, 2018) has introduced the Transparency and Openness Promotion (TOP) Guidelines (Nosek et al., 2015). The TOP Guidelines are a certification scheme in which journals and research organisations declare their level of adherence to a series of standards for enabling research transparency and reproducibility. These standards include, among others, availability of data (e.g. data must be posted to a repository), analysis code and digital research materials (code and materials must be posted to a repository; a Level 3 adherence standard would require that analysis is replicated independently before publication), pre-registration of study procedures and analysis plans (e.g. authors declare whether study has been pre-registered and provide access to reviewers), replication (e.g. the journal uses registered reports as a submission option) etc. In addition, a number of journals have agreed that publishing peer review reports can be beneficial for the research community by increasing transparency of the assessment process (available at https://perma.cc/Z7TM-G66C). These benefits might include reviewer and editorial accountability, training opportunities for educating students about the peer review process as well as a way to provide credit for peer review (since 2012, Publons https://publons.com provides a free service for academics to track, verify and showcase their peer review and editorial contributions for academic journals). 8 Field vs. laboratory Harrison & List (2004) suggested a typology of economic experiments based on six classification criteria in terms of the nature of subjects’ pool, information, the good, the task/trading rules and the stakes (amounts involved), plus the experimental environment. Harrison & List’s (2004) typology distinguish ‘conventional’ laboratory experiments from ‘artefactual’, ‘framed’ and ‘natural’ field experiments, based on a growing role of a target population, context-relevant information, goods and tasks, as well as investigations occurring in a natural environment where the subjects are not aware that they are being observed. A number of studies have moved outside the lab setting to the location where consumers typically make their purchasing decisions (i.e. ‘the field’). Gneezy (2016) call for an increase of the proportion of experimental studies based on field data in marketing research. Vecchio & Borrello (2018) maintain that many researchers who used experimental auctions in food consumer behaviour studies think that more studies should be performed in real market environments. The choice of whether to conduct research in the field or laboratory depends on numerous factors (Harrison & List, 2004); the importance of which may change depending on the specific purpose and audience of the study, while also keeping in mind that the methodological approach may also be affected by the experimental practices considered acceptable in a specific discipline (Croson, 2005). The core trade-off between performing a laboratory or a field study that is often discussed is between control and realism or, to put in another way, between internal and external validity (Roe & Just, 2009). Laboratory settings allow researchers a greater degree of control but often under sterile environments. By contrast, field settings occur in a more natural setting that could include context-relevant information and cues but are often harder to control. All in all, researchers must consider that conducting a study in the field, instead of in the laboratory, will necessarily introduce more noise and reduce the control over the experimental procedure, which could imply the need for more sophisticated models to consider control variables (Gneezy, 2016; Vecchio & Borrello, 2018). An additional factor that a researcher must consider as s/he moves from the laboratory to the field is economies of scale related to the cost of running the experiment.33 Since auctions require (intensive) training, there are significant economies of scale to be gained in running auctions with many participants at one time in the laboratory, relative to the field where the experimenter typically has to conduct the training individually or in small groups over and over again. As we discussed in Section 3.5, uniformity of training is subject to less control if it is delivered by the experimenter because it may change or improve from one time to the other due to experimenter learning or due to other factors. Therefore, laboratory experiments tend to have lower costs of this kind but could be mitigated with electronic delivery of instructions and training. Harrison & List (2004) maintain that laboratory and field studies should complement each other since they have different characteristics and given the possibility that what works in the laboratory does not necessarily works in the field and vice versa. The key issue is that moving from a laboratory to the field can change bids, which is the most relevant information obtained in an experimental auction. Lusk & Fox (2003), for example, show that bids for a food product can increase when moving from a classroom laboratory setting to a bakery setting. The results are not necessarily surprising: people enter a field environment with the intention of making a purchase on the category of good in question. In addition, in the field many substitutes of the auctioned product can be readily available. The move to field settings can have important implications on study findings related to effects and significance of factors and covariates. List (2004) showed that individuals with more experience in the field were less likely to suffer from the endowment effect; List (2003) showed that even experienced subjects, when put in a lab, appear to behave altruistically, but when moved back to a more natural environment behave more in accordance with self-interest; Dyer & Kagel (1996) showed a similar effect with regard to the winner’s curse in common value auctions; Sousa & Munro (2012) confirmed this effect also in virtual online experiments. In practical terms, field experiments can often dramatically reduce the cost of subject recruitment because the researcher travels to where the subjects are, unlike most laboratory experiments where subjects would travel to where the laboratory is. Incentives to participate in the field often include food products or coupons, which are an integral part of the design, such as, for instance, in Lusk et al. (2001), Lusk et al. (2006a) and Klain et al. (2014). However, until recently with the advent of mobile payment systems, such as Square, it was generally difficult to arrange payment mechanisms in field settings where people often did not have cash. Nevertheless, one can still encounter challenges in specific field settings such as grocery stores or supermarkets if an agreement with store managers or company managers is required. For example, while traditional surveys are usually well accepted, an experiment involving sales for goods also sold in the store may be questioned or opposed by the managers who may also be worried about customer complaints that may arise from a situation that is not under their control. According to Gneezy (2016), collaborative experiments may require a lengthy process of reciprocal understanding between academic and non-academic partners, regarding the potential benefits and costs for the latter. Another important aspect to consider when planning a field experiment are issues related to sampling methods and procedures. While a laboratory experiment could rely on subjects randomly picked from a representative panel, it is usually much more difficult to select a random sample in the field, and so the chances of having a biased sample would generally be higher (Belot & James, 2014). Some practical advice can be provided considering previous the discussion: (a) Given the reduced control over the environment and unavoidable noise affecting a field experiment, it is highly desirable to reduce a respondent’s burden; therefore, it is advisable not to design long experiments and use very short instructions (one page or less) for the participant. It is also recommended that researchers should devote a significant amount of time and effort in training data collectors. (b) When choosing the auction mechanism, lean towards the simplest ones, such as the SPAs with very small groups (two to three subjects). (c) In case the experiment is aimed at providing industry-relevant information and statistical inference is among the expected outcomes, ensure an adequate sample size, use multiple locations in the field to get a representative sample of the target population and enforce measures to reduce selection bias to increase generalisability of results, according to the scope, purpose and audience of the study. If this is not feasible due to financial and time constraints, it is advisable to pair the auction study with a hypothetical (e.g. a choice experiment) survey (Lusk, 2010) and use the auction to calibrate the larger study results and correct for hypothetical bias (e.g. Fox et al., 1998; Alfnes & Rickertsen, 2007). (d) Consider the influence of the presence of perfect or partial substitutes in the locations and settings chosen for the field experiment and control for these variables if possible. (e) Carefully plan the collaboration with the non-academic partners (e.g. store or restaurant managers), set up agreements clearly identifying advantages (such as purchase of the auctioned products and share of business-relevant study results), administrative burdens, costs and commitments. (f) Plan logistics issues very carefully, considering the trade-offs between on-site delivery and home delivery of the product: the former can rely on the logistics, storage and payment facilities of the non-academic partner in the field, while the latter can be implemented by collecting bids, payments and delivery information on the spot and sending the purchased product to the respondents’ address (this may need additional care in managing data because of privacy regulations). 9 Behavioural factors in auctions A number of behavioural factors can influence bidding behaviour in experimental auctions. In this section, we will discuss a number of studies that have directly examined some of these behavioural factors in relation to experimental auctions and their implications for the design of future studies. Since trying to review all the experimental auction behavioural studies is a daunting task, we will focus on a handful of contributions that directly test the effect of behavioural factors that have not been thoroughly examined in the past. So, while certainly also important, we will not cover issues related to the effect of information, labels, reference prices/products and endowments. 9.1 Personality traits Personality traits have entered economists’ area of interest as important determinants for economic behaviour. One of the most widely cited papers on the economics and psychology of personality traits (Borghans et al., 2008) makes a convincing case using evidence from the Perry preschool program. The program targeted disadvantaged African–American children in Michigan in the 70s and children were followed up to age 40. The program was found successful in changing personality and motivation of disadvantaged children, which then resulted in measurable success on a variety of measures of socioeconomic achievement over the life cycles of participants (see Borghans et al., 2008, and citations therein). Given the importance of personality in predicting outcomes and explaining variation in economically relevant behaviours, it seems quite possible that personality traits could explain the differences in bidding behaviour in experimental auctions. Grebitus et al. (2013) examined this issue in both hypothetical and non-hypothetical settings. Their results suggest that there is heterogeneity in valuation estimates across personality traits and that traits may partly explain the differences in behaviours or valuations from auction and choice experiments found in previous studies (Lusk & Schroeder, 2006; Gracia et al., 2011). Interestingly, their results also suggest that the effects of personality are stronger in non-hypothetical auctions than in hypothetical auctions. The implication is that people will behave differently in real and hypothetical environments depending on their personality type, suggesting that personality traits may well explain a significant portion of hypothetical bias. 9.2 Cognitive ability It is well known that auction mechanisms may not always provide accurate valuation estimates, given behavioural anomalies on bidding behaviour observed in laboratory experiments. For example, one consistent finding in experimental auction studies is that subjects tend to deviate from rational behaviour and exhibit a pattern of overbidding in SPAs (Kagel & Levin, 1993; Kagel et al., 1987; Andreoni et al., 2007; Cooper & Fang, 2008; Drichoutis et al., 2015; Georganas et al., 2017. Kagel et al., (1987) inferred that subjects submit a higher bid in SPAs due to the impression that submitting higher bids improves the probability of winning with no real cost because the highest bidder pays the second-highest bid.34 Understanding bid deviation in experimental auctions is important since it can potentially explain subjects’ irrational behaviour, and it can also provide more clarity in determining when and how bids should be interpreted when trying to elicit homegrown valuations. Kagel et al. (1987) and Ausubel (2004) argued that the difficulty of understanding the SPA could lead to overbidding in the SPA as compared to the ascending-price English auction, even though both auction mechanisms are strategically equivalent. More recently, Li (2017) showed that the overbidding behaviour in SPAs is due to the fact that it is not an obviously strategy-proof (OSP) mechanism, where an OSP mechanism is defined as one that a cognitively limited agent can recognise the weakly dominant strategy. This concept suggests that more cognitively able bidders will understand the strategic properties of an SPA better than low cognitive ability bidders. Lee et al. (2017) investigated this relationship directly by examining how individuals’ cognitive ability influences bid deviations in SPAs. They first measured subjects’ cognitive abilities using a nonverbal Raven’s Standard Progressive Matrices (RSPM) test and then classified subjects into two groups (i.e. a high cognitive ability group and low cognitive ability group) based on their RSPM test performance. Each group then participated in a series of induced value SPAs. Their results suggest that more cognitively able subjects behave in closer accordance with theory and that cognitive ability partially explains heterogeneity in bidding behaviour. This is an important finding since it implies that experimental auction researchers must make sure that the auction mechanism used in the study is clearly understood, especially by low cognitive ability subjects. 9.3 Emotions With the rise of behavioural economics, factors related to emotions and feelings have found their way in economics, permitting researchers to use a wider set of tools to explain decision-making behaviour. In this line of research, Morgan et al. (2003) proposed that participant’s bid is based on behavioural motives such as ‘spite’. They suggested that subjects overbid in SPAs since the profit earned by a rival bidder could be reduced by a losing bidder’s own bid. This overbidding behaviour can be also explained by a ‘joy of winning’, in which subjects derive extra utility from winning the auction. Interestingly, Cooper & Fang (2008) found that small and medium overbids are consistent with the ‘joy of winning’ hypothesis, while large overbids are more consistent with the ‘spite’ hypothesis. Roider & Schmitz (2012) examined how robust is the standard symmetric sealed-bid auctions model with risk-neutral players and private independent values. Specifically, they were interested in assessing subjects’ bidding behaviour when they anticipate the positive emotions of winning and the negative emotions of losing. They also investigated whether the introduction of anticipated emotions can shed light on various findings of bidding behaviour in auctions with independent private values. Using a simple extension of the standard model of symmetric auctions – where bidders anticipate some (constant) positive emotions of winning and some (constant) negative emotions of losing – they showed that if bidders anticipate the joy of winning, bids will be larger than in the standard model in both first-price auctions and SPAs. However, if bidders anticipate a disutility of losing, the implications depend on the auction format. In an SPA, bidders who still participate bid more when they anticipate negative emotions because they are more eager to avoid losing, and while bidders with very low valuations will not participate, all participating bidders overbid by the same amount due to the anticipated emotions. They also explained that in SPAs the joy of winning and the disutility of losing affect bids in the same way. In contrast, in first-price auctions, participating bidders with small valuations who anticipate that losing is painful, bid less than in the standard model. So for participating bidders with small valuations, a disutility of losing actually reduces bids, while bidders with high valuations bid more. The ‘joy of winning’ hypothesis is closely related to the ‘taste for competition’ (Niederle & Vesterlund, 2007) whereby subjects differ on how eager they are to compete (often women are found to be less eager to compete than men in this literature; see for example Niederle & Vesterlund, 2011). In this respect, observed differences between the BDM mechanism (subject does not win over other subjects) and any NPA (subject wins over other subjects) confound taste for competition with differences between the formats of the mechanisms. The taste for competition is mediated by differences in beliefs, risk attitudes and other regarding preferences (Niederle & Vesterlund, 2011) as well as by factors such as visibility of consumption (i.e. conspicuous consumption), which might be one way for someone to express their competitiveness (Clingingsmith & Sheremeta, 2018). Joy of winning (Astor et al., 2013), competitiveness (Drichoutis et al., 2012), warm glow (Andreoni, 1989, 1990) and visibility of consumption are factors that can probably affect consumption for fair trade, organic, local products etc., and experimental designs that can disentangle the influence of all these effects would be of value to the literature. 9.4 Mood Mood states can influence behaviour by influencing both the content and the process of cognition (Capra, 2004). Moods can also play an important role in the construction of preferences that, in turn, influence decision making and judgement (Johnson et al., 2005; Lichtenstein & Slovic, 2006; Payne et al., 1999; Slovic, 1995). A few studies have examined the effect of moods in experimental auctions. Lerner et al. (2004) found that a negative mood state in the form of sadness (disgust) can increase (decrease) WTP, while Capra et al. (2010) found only weak mood effects on WTP. Drichoutis et al. (2014) also explored how positive and negative mood states affect bidding behaviour in experimental auctions to test the robustness of the findings of these two papers. Their study differs from the other two studies in that they focused not only on the effect of mood states on WTP but also on people’s rationality, as represented by the rate of preference reversal for lotteries. They found that mood states can significantly affect the rate of preference reversal and bidding behaviour in experimental auction valuation. Specifically, they showed that subjects under a positive mood state exhibit more rational behaviour (i.e. fewer preference reversals) and provide lower bid values than others. Results from these studies suggest that researchers may need to take subjects’ moods into account when conducting experimental auctions. 9.5 Other regarding preferences and motives The standard theory predicts that altruistic subjects underbid in the Vickrey auctions compared to the BDM, while spiteful subjects overbid in Vickrey auctions. Flynn et al. (2016) were not able to confirm these predictions, however. While they were able to observe aggregate underbidding in Vickrey auctions, their results were not driven by the choices of altruistic subjects. 9.6 Hormones Behavioural economics has embraced the view that we can use the lens of biology to look at economic behaviour. By now there is accumulating literature suggesting that gender differences in preferences and behaviour can be attributed to hormonal differences between males and females. Given that a number of studies have found that females tend to bid higher than males in auctions (Ham & Kagel, 2006; Casari et al., 2007; Chen et al., 2009; Pearson & Schipper, 2013, but see Demont et al. (2017) where they find the opposite result in Africa), Chen et al. (2009) and Pearson & Schipper (2013) examined how the bidding and profits of females differ across the menstrual cycle. Menstrual cycle information can be used as a proxy of level of hormones in females that naturally fluctuate during the cycle. Chen et al. (2009) found that women bid higher than men in all phases of their menstrual cycle in a first-price auction but not in an SPA. Moreover, for first-price auctions they infer that higher bidding in the follicular phase and lower bidding in the luteal phase are driven entirely by oral hormonal contraceptives. Pearson & Schipper (2013) report that naturally cycling women bid significantly higher than men and earn significantly lower profits than men (in a first-price auction) except during the midcycle (when fecundity is highest). They also found that women who use hormonal contraceptives bid significantly higher and earn substantially lower profits than men. This correlation they found between the use of hormonal contraceptives and bidding or profits, however, may be due to a selection effect or to hormones contained in contraceptives. All hormonal contraceptives contain synthetic versions of the sex hormone progesterone, and some also contain a version of oestradiol. So, the evidence is far from conclusive. Schipper (2015) conducted an auction experiment in which he collected salivary steroid hormones such as testosterone, oestradiol, progesterone and cortisol. He suggested that testosterone may affect bidding and profits via risk aversion. Basal testosterone has also been found to be positively correlated with ‘aggression’, which may be another channel through which basal testosterone affects bidding in auctions. The results indicate that females bid significantly higher and earn significantly lower profits than males. Moreover, females who use hormonal contraceptives bid significantly higher and earn significantly lower profits. With respect to salivary basal hormones, Schipper (2015) found that bids are significantly positively correlated and profits are negatively correlated with basal salivary progesterone, but only in females who do not use hormonal contraceptives. In his study he did not find significant correlations between bidding or profits and salivary basal testosterone, oestradiol or cortisol. As one of our reviewers suggested, a relatively unexplored hormone in the context of auctions is ghrelin. Ghrelin (or the ‘hunger hormone’) is known to regulate appetite and plays an important role in regulating reward cognition in dopamine neurons (Inui et al., 2004; Naleid et al., 2005; Burger & Berner, 2014). Given the potential of hunger to affect bidding behaviour in auctions dealing with food products (Briz et al., 2015, but see also Aarœ & Petersen (2013), for research on the effect of hunger on social and time preferences] and the fact that studies have found WTP to be higher in the morning and sometimes to drop after tasting (see citations in the review of Demont & Ndour, 2015), disentangling the effect of appetite on bidding behaviour could be another area of future research endeavours. Tang et al. (2011) provide some suggestive findings that ghrelin can indeed be responsible for increased WTP for food items. 9.7 Sensory cues Many experimental auction studies conducted by agricultural and applied economists are focused on food products. The literature, however, has remained ambiguous as to whether sensory cues, such as taste or smell, need to be measured in experimental auctions used to elicit consumers’ WTP for food. While a number of studies have included sensory tests in their auctions, many of these studies did not completely isolate the role of taste in bidding behaviour (e.g. Lusk et al., 2001; Umberger et al., 2002; Feuz et al., 2004; Umberger & Feuz, 2004; Platter et al., 2005; Holmquist et al., 2012; Drichoutis et al., 2017a) Typically in these studies, the subjects were asked to taste the food products and were then asked to bid on these products. So, the effect of taste was not directly tested. In other studies that included taste, a within-subjects design was used where subjects were progressively given more information about the foods in the auction and then eventually allowed to taste the food products prior to one of the later bidding rounds (Melton et al., 1996; Bi et al., 2012; Demont et al., 2013, 2012; Akaichi et al., 2017).35 One study that directly tested the effect of taste in experimental auctions is Lewis et al. (2016b). They used a between-subjects design and were able to compare bids from an auction where subjects tasted the products and an auction where subjects did not taste the products. Their results suggest that taste influences bids, and they concluded that it would be valuable to include taste in auction designs when evaluating consumers’ WTP for food products. Studies that try to isolate the effect of sensory cues other than taste are even more rare, e.g. the effect of food smells (olfactory cues) or the effect of touch for foods that are consumed with the hand (haptic cues). A recent study used controlled laboratory experiments to experimentally manipulate the ambient scent of the laboratory with a citrus fragrance (Kechagia & Drichoutis, 2017). The authors found that subjects that participated in an SPA auction inside the scented room were willing to pay up to 49 per cent more than subjects who were not exposed to the scent. In addition to the importance of sensory cues, there is also evidence that presence of the good to be auctioned matters. For example, Bushong et al. (2010) elicited valuations using the BDM mechanism under three different conditions: (i) text displays, (ii) image displays and (iii) displays of the actual items. They found that subjects’ bids were 40–61 per cent larger in the real display than in the image and text displays. Their findings suggest the saliency of the ‘tangibility’ issue in experimental auctions. In particular, follow-up experiments in Bushong et al. (2010) suggest that the presence of real items could trigger preprogrammed consummatory Pavlovian processes that could lead to more appetitive behaviours. 9.8 Attention Product evaluation can be influenced by the amount of attention paid to product stimuli, which can be linked to eye movement (see Orquin & Mueller Loose, 2013, for an overview of studies). When an individual looks at a stimulus, attention is paid to the stimulus (Wedel & Pieters, 2000). Lewis et al. (2016a) used eye tracking to measure how attention to brand, package attributes and product information impacts consumer WTP for branded energy drinks. They found evidence that attention can explain the variation in consumers’ WTP for branded energy drinks containing different sweeteners. In another study, Rihn & Yue (2016) examined the impact of extrinsic cues (specifically production method, origin and nutrient content claim labels) on consumers’ WTP for processed foods (apple juice and salad mix) using an experimental auction in combination with eye-tracking analysis. Their results suggest that consumer visual attention increases for important product attributes that positively or negatively impact their WTP bids. Orquin et al. (2018) review studies of eye movements in consumer decision making and show that subjects are highly susceptible to many visual biases such as visual salience, emotional valence, number of information elements etc. 10 Discussion and conclusions This review provided a state-of-the-art discussion of the best practices in the design and execution of experimental auctions, with the aim of improving cost–benefit and welfare analysis, theory testing, as well as marketing recommendations from their findings. Here, in the final section, we offer a brief list of recommendations distilled from our discussion above and then conclude with some suggested areas for future research. Our first recommendation relates to power analysis. Sample size calculations should be performed at the design stage and these calculations should be used as stopping rules once the desired sample size has been achieved. Given finite resources, adopting this practice will probably lead to simpler designs (i.e. fewer treatments) in order to ensure sufficient number of subjects (and thus sufficient power) for the main queries of interest. It will also result in WTP estimates that are more precisely measured and treatment effects that have been more reliably identified. Another added benefit is that it should improve the credibility and publishability of null results. Second, practice or training rounds in auctions should be more systematically analysed when the aim of the study is to causally identify treatment effects. In order to causally identify a treatment effect, a sufficiently large sample size is needed to ensure that randomisation to treatment was successful. One way to judge whether such randomisation was successful is to analyse data from the training rounds as a form of a placebo test (Rosenbaum, 2002: 214) i.e. a test that examines the effect of the treatment on a variable known to be unaffected by the cause of interest. We believe that systematic adoption of this practice will lead to increased confidence in the underlying causal mechanisms. Another important motivation for conducting practice and training rounds is that people often have misconceptions about the auction mechanisms, which can lead to a non-truthful value revelation. Practice homegrown or practice-induced value auctions should be always employed, unless inexperience with the mechanism is desirable. Quizzes and simple, easy-to-understand detailed instructions should be provided to subjects as well. All this material should then be made available to reviewers to allow them to directly assess the experimental design. This would also contribute to more transparency of research practices. The use of multiple-price lists (e.g. Andersen et al., 2006; Klain et al., 2014) or non-hypothetical choice experiments (Lusk & Schroeder, 2004; Alfnes et al., 2006) can be seen as an attempt to utilise more easily understood mechanisms, with the trade-off being less precise information about consumers’ WTP than the auction method that have been the focus of this review. When the experimental design includes multiple rounds of bidding, research results suggest that it is best to avoid providing price feedback between rounds since this could lead to a number of adverse effects on the bidding behaviour. In addition, data on beliefs about prices of field substitutes should be collected when possible so that it can be incorporated in subsequent econometric analysis. Field auction studies remain under-utilised and more of these studies should be conducted and compared with laboratory settings in order to highlight the role of the natural field context. One factor preventing the wider adoption of experimental auctions in field settings is the need to form auction groups. This is one reason why the BDM mechanism has been more popular in field settings than the Vickrey auction. However, given the limitations of the BDM mechanism discussed above, a second-price Vickrey auction could be conducted even with just two subjects in a group. Or, a group can be formed later, with payments and product delivery occurring at a later time. This was the approach used in a few previous studies (e.g. List & Lucking-Reiley, 2000). Research exploiting field-relevant variables in field settings to test hypotheses of interest is also lacking. For example, the natural variation of experience between dealers and non-dealers of sports cards has been a key manipulating factor in early field valuation experiments (List & Shogren, 1998; List & Lucking-Reiley, 2000). A field setting of particular interest to food and agricultural economists that remains largely unexploited is farmers’ markets (e.g. Toler et al., 2009). Given the growth in the use of experimental auctions, one might suspect that all the low-hanging fruit (research-wise) has been picked. We are more optimistic about the possibilities for new discoveries. What remains to be done? Although mentioned in Lusk & Shogren (2007), there remains a dearth of studies showcasing the external validity of auction studies. To our knowledge, there has been no comparison of experimental auction behaviour with scanner data, for example. There have been a number of such comparisons with other methods, such as choice experiments (e.g. Lusk et al., 2006b; Brooks & Lusk, 2010; Chang et al., 2009), but not with auctions. That is, we need studies that compare auction generated data with real-world purchases. If auction studies are shown to have good external validity, then this will boost the confidence of researchers in the experimental auctions field in promoting this value elicitation tool in academic and business circles. There is also much to learn about how values in auctions are influenced by social networks. Demont et al. (2013) and Richards et al. (2014) both show that people’s values are significantly influenced by others’ values. Understanding how beliefs and new information filter through such networks is key to understanding acceptance of technology, effects of media scares or the success of advertising or new product introduction. An avenue that has not been significantly explored yet in the experimental auction literature is a road already taken by other subfields in economics; that is, the possibility of massively increasing sample sizes by doing more experiments online.36 Auctions impose an additional challenge because of the simultaneity nature of the submission of bids. However, the interactive nature of experiments is not totally impossible to overcome (e.g. Arechar et al., 2018). Furthermore, studies now consistently show that data obtained online (e.g. via Amazon’s Mechanical Turk; but see Dreyfyss (2018) for a warning and some remedies) is not significantly harmed by the lack of control over the conditions under which the responses are recorded (Johnson & Ryan, 2018). A concern for experimental auctions conducted for food items is the low value of the good. Subjects with low induced values tend to bid further away from their induced value (e.g. Drichoutis et al., 2015) as compared to when high induced values are assigned to them, and in mechanisms such as the SPA, incentives for truthful value revelation are weaker for lower-value subjects (Lusk et al., 2007). This would imply that auctions with low-value items are more likely to suffer from measurement error, which could be partly due to the lower cost of misbehaving for low value items. How can we increase the rather weak incentives for accurate preference revelation in experimental auctions? Cason & Plott (2014) show that although many subjects did not bid their induced value, they did state their correct valuation in a second round of bidding after they were exposed to their mistake by rereading the instructions and after receiving feedback. This is consistent with the findings in Malone & Lusk (2018) where in a discrete choice experiment they provide feedback to inattentive respondents that are subsequently given the opportunity to re-answer a ‘trap question’ that checks for attentiveness. In Malone & Lusk (2018) individuals who do not correctly revise their responses after missing a trap question have significantly different choice patterns from individuals who correctly answer the trap question. Therefore, nudging individuals towards their true preference could be a way forward for induced value auctions. However, in homegrown value auctions we do not know what a true preference is. In this case, tools from the CV literature could be tested, as for example, cheap talk scripts, budget constraint reminders and consequentiality scripts (see Drichoutis et al., 2017b, for details on such scripts). Given that we do not know yet the effectiveness of these tools in experimental auctions, some proper evaluation of their effectiveness is another area of future research that would be worth exploring. There may also be alternative mechanisms that can further sharpen the gradient between participants’ bidspace and their payoff space. Experimental auctions allow us to elicit WTP or other valuation measures that provide a mapping of preferences on the monetary space. A key assumption of classical economic analysis is that of stability of preferences. In economic analysis, individual preferences are considered to be stable over time. Andersen et al. (2008) argue that the assumption of stable preferences lies in the ability to assign causation between changing opportunity sets and choices in comparative statics exercises or, in Stigler & Becker’s (1977) words, ‘no significant behaviour has been illuminated by assumptions of differences in tastes’. If preferences are volatile with respect to the passage of time, then Harrison et al. (2005) note that researchers and policy makers using out-of-sample predictions should worry about their conclusions. However, we know little about the individual and aggregate stability of WTP values over time (although see Lusk (2017) for comparison of repeated choice experiments over time or studies such as Dillaway et al. (2011) or Shogren et al. (2000) that conducted experimental auctions with the same participants at different points in time). Prospective studies that would repeatedly elicit consumer valuations over time for a range of products using experimental auctions could provide some feedback (especially if compared with other elicitation mechanisms) about the appropriateness of auctions in value elicitation as a tool that satisfies the assumptions of economic theory. Finally, the rise of the behavioural economics literature clearly shows that decision-making errors and biases can be identified in experimental auction settings. The much more difficult issue is what we do with these biases. One stream of research has sought to develop methods or techniques to eliminate the biases, also referred to as ‘debiasing’ (e.g. Cherry et al., 2003; Shogren, 2006; Kovalsky & Lusk, 2013) with the presumption that more stable, well-informed, market-disciplined preferences are most suitable for cost–benefit analysis. However, many of the products we are interested in valuing are new or non-market goods (if they were traditional market goods, we could use conventional demand estimation methods applied to the revealed preference data). Understanding the process by which people learn and update their preferences for these novel products, even if they are unstable, is important. The findings of studies that examined behavioural biases have also undermined some of the conceptual foundations behind welfare economics (Just, 2017; Lusk, 2014) but have also raised interesting, researchable questions about how people might act on others’ behavioural biases (e.g. Lusk et al., 2014) or respond to paternalism from others (Debnam, 2017; Just & Hanks, 2015). Acknowledgments We would like to thank Uri Simonsohn and Jay Corrigan for helpful suggestions. Two reviewers and the editor provided comments and suggestions that greatly improved the manuscript. Footnotes 1 In the endowment approach, subjects are endowed with one product and are asked their WTP to exchange the endowed product with an upgraded product. In the full bidding approach, subjects bid on the base and upgraded product simultaneously. The endowment approach, given that many units of the good(s) have to be developed as mock-up products or purchased, probably involves a higher cost for the experimenter because most subjects will take home their endowed product. 2 In brief, the basic assumptions are the Stable Unit Treatment Value assumption (SUTVA) and the independence assumption. SUTVA states that treatment assignment of one person does not affect potential outcomes of others and that treatments are stable. SUTVA could be violated if, for example, subjects that received the treatment discuss their participation in the experiment with subjects that did not receive the treatment so that the outcome of the control group is affected by the information exchange between the two groups. Treatment stability in SUTVA would be violated if, for example, the experimenter delivers an informational treatment and because s/he improvises by not reading through a written script, different subjects are exposed to different pieces of information. The independence assumption posits that if the experimenter assigns treatment at random, then assignment to treatment is statistically independent of any other variable. For a more thorough discussion of the invoked assumptions and some additional assumptions required for within-subjects designs, see Holland (1986) and West & Thoemmes (2010). 3 It can be useful to view randomisation through the lenses of identification via instrumental variable (IV) regressions. A valid IV is one that is highly correlated with the explanatory variable of interest and can only affect the outcome variable through this explanatory variable. A coin flip (or random draw) is a perfect IV; it completely determines assignment to the treatment or control but is not directly related to the outcome variable. 4 Hypothesis testing of imbalance is characterised as superfluous and misleading in the CONSORT (Consolidated Standards of Reporting Trials) statement endorsed by prominent medical journals (BMJ, The Lancet etc.) (Moher et al., 2010: 17). 5 Senn (1994) argues that balance is not a necessary or sufficient condition for the validity of statistical inference. Balance concerns the efficiency of statistical inference and valid inference depends on correct conditioning. In Senn’s (1994) words ‘…a conditional analysis of an imbalanced experiment produces a valid inference; an unconditional analysis of a balanced experiment does not’. 6 The only neuroimaging study we are aware of that actually employed a two-person Vickrey auction is the study by Delgado et al. (2008). In this study, subjects before entering the scanner met their opponent that bid according to pre-defined strategies. The subject in the scanner randomly received one of four induced values and then could only select to bid one out of four options. In an interesting variant of the BDM mechanism, Lehner et al. (2017) use motor effort as the currency where subjects bid how many seconds they would be willing to apply 50 per cent of their maximal grip force in order to receive the displayed reward of either money or food. 7 de Quidt et al. (2018) explicitly induce experimenter demand effects by telling participants that they expect high or low actions. By applying demand treatments to both the control and treatment group, de Quidt et al. (2018) show how researchers can reduce or even eliminate bias due to experimenter demand. An application of this technique in auctions would provide valuable insights as per the magnitude of experimenter demand effects in the experimental auctions landscape. 8 As one of the reviewers noted, there are important economies of scope to be gained in moving from single- to multiple-product auctions: the experimenter can construct external validity checks by including products that are readily available on the market to check whether WTP is consistent with market prices. 9 In addition, results coming from studies from developed countries may not always be applicable to developing countries. For example, Depositario et al. (2014) revisit the house money effect and find that this is not relevant for a student sample in the Philippines. 10 There are cases where the experimenter would need to get advice from local personnel to explore how to best structure their design. For example, as noted by one of our reviewers, there could be cases where morality is challenged when making participants pay from their own pocket for the auctioned good. This would be a good argument for using, for example, the endowment relative to the full bidding approach because in the endowment approach, at least participants are receiving a product and only paying price premiums out of their pockets (or from the show up fee). 11 As a side note, according to Brodeur et al. (2016), Fisher (perhaps apocryphally) decided to establish the 5 per cent significance level since he was earning 5 per cent of royalties for his publications. 12 A Type III error, typically not one that researchers often deal with, occurs when a researcher produces the right answer to the wrong question (Kimball, 1957). Kennedy (2002) warns that this is not to be confused with psychologists’ Type III error (Kaiser, 1960), which is concerned with concluding significance in the wrong direction. 13 The p-hacking refers to the practice of monitoring the data recording process or the outcomes of an experiment and choosing when to stop recording data, what variables to report, which comparisons to make, which observations to exclude and which statistical methods to use in order to reach a p-value of 0.05. Brodeur et al. (2018) surveyed thousands of hypothesis tests reported in top economics journals in 2015 and show that selective publication and p-hacking is a substantial problem in research employing difference-in-difference methods and instrumental variables, while randomised control trials and regression discontinuity designs are less problematic. 14 Clemens (2015) proposes the terms ‘verification’ and ‘reproduction’ to distinguish between replications and the terms ‘reanalysis’ and ‘extension’ to distinguish between robustness exercises. 15 For psychological studies, 36 per cent of the replications yielded statistically significant findings while the mean effect size in the replications was approximately half the magnitude of the mean effect size of the original effects (Open Science Collaboration, 2015). For economic science studies, Camerer et al. (2016) found a significant effect in the same direction as in the original study for 11 replications (roughly 61 per cent) while, on average, the replicated effect size was 66 per cent of the original. More recently, Camerer et al. (2018) replicated 21 experimental studies in the social sciences published in Nature and Science between 2010 and 2015 and found a significant effect in the same direction as the original study for 13 (62 per cent) studies, while the effect size of the replications was on average about 50 per cent of the original effect size. 16 Interestingly, Gigerenzer et al. (2004) mention that the Journal of the Experimental Analysis of Behavior and the Journal of Mathematical Psychology were launched as a way to escape The Journal of Experimental Psychology editor’s policy that made null hypothesis testing a necessary condition for the acceptance of papers and small p-values the hallmark of excellent experimentation. 17 The P-curve is the distribution of statistically significant p-values for a set of studies. Its shape is diagnostic of when one can rule out selective reporting as the sole explanation of a set of findings. The P-curves that are left skewed suggest the presence of intense p-hacking; i.e. researchers file-drawer the subsets of analyses that produce non-significant results. 18 A ‘funnel plot’ is a scatterplot designed to check for the existence of publication bias by depicting a treatment effect against a measure of study size such as total sample size, standard error of the treatment effect or the inverse variance of the treatment effect. The ‘fail-safe’ method consists of an algorithm in which an overall z-score is computed by summing individual z-scores and dividing by the square root of the number of scores. The ‘fail-safe’ is the number of studies needed to bring a significant overall p level up to some critical level such as 0.05. The ‘excessive-significance’ test is an exploratory test for examining whether there is an excess of published statistically significant results as compared with what their true proportion should be in a body of evidence. See Simonsohn et al. (2014) for details on the limitations of these approaches. 19 For example, Simonsohn et al. (2013) argue that the null result obtained in the replication study by Maniadis et al. (2014) is just a noisy estimate and that the relative effect size is comparable to the original study by Ariely et al. (2003). 20 In addition, the pressure of using the criterion of statistical significance may have led published research to systematically overestimate effect sizes (Lane & Dunlap, 1978) and report inflated effects (Fanelli & Ioannidis, 2013). Analysing a large number of test statistics (>50,000) published in the American Economic Review, the Journal of Political Economy and the Quarterly Journal of Economics between 2005 and 2011, Brodeur et al. (2016) found a misallocation pattern in the distribution of the test statistics consistent with inflation bias. That is, researchers inflate the value of almost-rejected tests by choosing a slightly more ‘significant’ specification, which amounts to 10 to 20 per cent among the tests that are close to the significant threshold. They do not find that this problem arises in randomised control trials or laboratory experiments. Furthermore, Brodeur et al. (2016) mention that inflation is quite low in articles with a theoretical model compared to articles that do not offer an explicit theoretical contribution, which suggests that experimental studies that are stronger on theory suffer from fewer problems. As a side note, a consequence of the inflation bias is that if a replication study is powered based on the effect size of the original study, the power of the replicated study will be lower than intended (Simonsohn, 2015). Simonsohn (2015) suggests that the sample size of the replication study should be set to 2.5 times that of the original study (given an effect size of the original study that would give the study 33 per cent power). 21 This is easy to show mathematically. If one takes the partial derivative of |$n$| with respect to |$M$| in Equation (1) we have |$\frac{\partial n}{\partial M}=\frac{ 2(z_{1-\alpha /2}+z_{1-\beta })^2 (\rho -1)}{M^2(\frac{\mu _0-\mu _1}{\sigma })^2}$| and because |$-1<\rho <1$| then |$\frac{\partial n}{\partial M}<0$|⁠, which is to say that |$n$| and |$M$| are inversely related. In addition, it follows that |$\frac{\partial n}{\partial M \partial \rho }=\frac{ 2(z_{1-\alpha /2}+z_{1-\beta })^2}{M^2(\frac{\mu _0-\mu _1}{\sigma })^2}>0$|⁠. 22 There is a tendency to describe results that are near-threshold p-values as ‘approaching’ significance (Pritschet et al., 2016), which is consistent with treating significance as a continuum. This is a statistically flawed practice. In a popular blog-post, Hankins (2013) lists more than 500 linguistic terms that researchers use in order to report results that fail the significance test. In fact, there is a web application (called Signify: http://perma.cc/MX9X-KA5Z) that will let one type the p-value of their results and the application will come out with a label that makes that p-value sound significant. 23 In Stata, for example, Silverman’s (1981) test is available via the silvtest user-written package. 24 It is a straightforward matter to extend this logic to more than two goods, in which case the individual would be expected to choose the product that provided the highest net value. It is also possible to calculate shares if WTP premiums are elicited rather than full bids for each item (see Lusk, 2010). 25 Linear multi-level and mixed models are straightforward to estimate via the mixed command in Stata. Non-linear models can be estimated in the latest versions of Stata using, for example, the metobit (qsem) command in version 15 (14). 26 However, see Josephson & Michler (2018) for a discussion of ethical issues facing the profession and a few proposals to address these issues. In addition, one topic, in which agricultural economists have been moving faster than economists, is the hotly debated issue of whether deception should be allowed under certain circumstances. For many economists, deception in experiments is a red flag and, because of potential undesirable spillover to subject pools, is outright banned. In agricultural economics the topic is not a taboo and has been brought forth by Rousu et al. (2015) and Colson et al. (2015). Rousu et al. (2015) examined applied economists’ views of deception and found a clear consensus regarding which forms of deception are perceived as most and least severe. Colson et al. (2015) find consistent support for banning certain practices (e.g. physical or physiological harm) while allowing others (e.g. providing incomplete information) and advise that the most prudent step the profession can take is to carefully define deception and to detail which potentially deceptive practices are allowed and which are prohibited. In response to the views of the profession, the journals of the Agricultural and Applied Economics Association introduced guidelines that allow researchers to publish articles that use some forms of deception. More recently, the journal Food Policy (Bellemare & Mazzocchi, 2019) invited viewpoint articles (Just, 2019; Lusk, 2019) that shaped the current policy of the journal for a case-by-case review (rather than a blanket rule) when dealing with experimental manuscripts where subjects were deceived. In his viewpoint, Lusk (2019) argues that it is important for journals or professions, which ban the use of deception, to actually define what practices fall under the ban. Just (2019) suggests that it will be useful to determine the impacts of deception, using empirical evidence to guide research norms and practice. We chose not to expand more on the issue of deception here because we wanted to avoid reiterating what has already appeared in the literature cited above. 27 As one of our reviewers noted, blinded analysis is another technique often used in life science, whereby information on treatments are blinded (only anonymous codes are used) for the analyst, which probably prevents that results are ‘p-hacked.’ Blinded analysis has not gained momentum in social science. 28 To pinpoint the pitfalls of subgroup analysis, Christensen & Miguel (2018) cite a humorous but instructive case. When a collaborative group of researchers were asked by journal editors to report subgroup analysis in a trial of aspirin and streptokinase use after heart attacks, the researchers found that the medication was beneficial, except for patients born under Libra and Gemini astrological signs, for whom there was a harmful effect. Subgroup analysis is also susceptible to the Simpson’s paradox in which a trend is evident in subgroups of data but disappears or reverses in sign when these groups are combined. 29 This is certainly possible, for example, if one runs a computerised experiment, e.g. using zTree. Mock-up data files can be generated before ever running an experiment and read into the statistical program of one’s choice that could help on building the code for statistical analysis. 30 This is an approach that has long been used in psychology and statistics (see citations in Anderson & Magruder, 2017) as well as to judge the predictive fit in the marketing literature (Erdem, 1996; Roy et al., 1996) and in the economics literature (Norwood et al., 2004; Drichoutis & Lusk, 2014, 2016) for model selection. 31 The AEA Data Editor is responsible for verifying that the data and codes can reproduce the paper’s results. The replication archives will be requested prior to acceptance. Moreover, the Data Editor will attempt to conduct a reproducibility check for data that cannot be deposited to the repository, through a third party who has access to the (confidential or restricted) data. 32 The University of Edinburgh maintains a list of data journals and their policies: https://perma.cc/ZT9Y-7JZF. Two of the journals that are open to data sets from Social Sciences research are Data in Brief and Scientific Data. 33 This particular discussion was prompted by an excellent comment by one of our reviewers. In addition, moving out of the lab would also include lab-in-the field experiments where, for example, the experimenter visits subjects in their households. 34 There is a tendency to attribute deviations from rational choice theory as evidence of non-standard preferences. Cason & Plott (2014) show some evidence that the tension between standard and non-standard theory is further exacerbated by mistakes subjects make when they make decisions. For example, if the subject does not understand well the properties of an SPA and overbidding is observed then it is likely that this overbidding is still rational from the subject’s perspective in the sense that the subject may erroneously believe that this is the strategy that will maximise her profits. From the experimenter’s perspective, it is important to know if and when such mistakes happen because the experimenter will equate choices with preferences and she must make sure that revealed preferences are not the result of a mistake. 35 Several of the studies that ask subjects to taste a food product use sensory evaluation questions as a scoring variable to predict WTP. As one of our reviewers noted, asking consumers to score a product on one or several dimensions could prime respondents to think more (favourably) about the product (attributes) than they otherwise would. This is because when people are asked to conceptualise something, they may be predisposed to accept it. For example, in one study from social psychology (Janis & King, 1954), when participants were asked to act on a role, playing the advocate of a given view point, they showed significantly more opinion change than a passive control group. Shen & Wyer (2008) simply asked subjects to decide whether they would choose to buy each of a set of products and found that this disposed subjects to search for favourable attributes before unfavourable ones in an unrelated product evaluation situation. A potential problem is that if product evaluation questions affect liking scores as well as overall preferences for the product expressed through subjects’ bid, then this would represent an unmeasured factor that could influence both the liking for the product (in the form of a score) and bids, which would then bias the estimated causal relationship. An experiment examining the effect of asking sensory questions on elicited WTP could quantify the extent of such bias, if any at all. In the case where the research question only concerns comparing different treatments and sensory evaluation is elicited across all treatments, then the potential priming issue would most probably not be a concern in the estimation of treatment effects since we would be interested in the difference between the treatments (although it could still have a biased effect on the overall magnitude of WTP). Furthermore, depending on the aim of the experiment, this priming effect could be desirable if it mimics what advertising, labelling, discount pricing etc. are aiming for. 36 See also Katkar & Reiley (2007) and Lucking-Reiley et al. (2007) for early studies on auctions using data from eBay and Augenblick (2016) for a more recent study focused on penny auctions. Review coordinated by Carl Johan Lagerkvist References Aarœ , L. and Petersen , M. B. ( 2013 ). Hunger games: fluctuations in blood glucose levels influence support for social welfare . Psychological Science 24 ( 12 ): 2550 – 2556 . Google Scholar Crossref Search ADS PubMed WorldCat Abbink , K. and Hennig-Schmidt , H. ( 2006 ). Neutral versus loaded instructions in a bribery experiment . Experimental Economics 9 ( 2 ): 103 – 121 . Google Scholar Crossref Search ADS WorldCat Abeler , J. , Falk , A. , Goette , L. et al. . ( 2011 ). Reference points and effort provision . American Economic Review 101 ( 2 ): 470 – 492 . Google Scholar Crossref Search ADS WorldCat Akaichi , F. , Nayga , J. R. M. and Nalley , L. L. ( 2017 ). Are there trade-offs in valuation with respect to greenhouse gas emissions, origin and food miles attributes? European Review of Agricultural Economics 44 ( 1 ): 3 – 31 . Google Scholar Crossref Search ADS WorldCat Alfnes , F. ( 2009 ). Valuing product attributes in Vickrey auctions when market substitutes are available . European Review of Agricultural Economics 36 ( 2 ): 133 – 149 . Google Scholar Crossref Search ADS WorldCat Alfnes , F. , Guttormsen , A. G. , Steine , G. and Kolstad , K. ( 2006 ). Consumers’ willingness to pay for the color of salmon: a choice experiment with real economic incentives . American Journal of Agricultural Economics 88 ( 4 ): 1050 – 1061 . Google Scholar Crossref Search ADS WorldCat Alfnes , F. and Rickertsen , K. ( 2007 ). Extrapolating experimental-auction results using a stated choice survey . European Review of Agricultural Economics 34 ( 3 ): 345 – 363 . Google Scholar Crossref Search ADS WorldCat Altman , D. G. ( 1985 ). Comparability of randomised groups . Journal of the Royal Statistical Society. Series D (The Statistician) 34 ( 1 ): 125 – 136 . OpenURL Placeholder Text WorldCat Andersen , S. , Harrison , G. W. , Lau , M. I. et al. . ( 2006 ). Elicitation using multiple price list formats . Experimental Economics 9 ( 4 ): 383 – 405 . Google Scholar Crossref Search ADS WorldCat Andersen , S. , Harrison , G. W. , Lau , M. I. et al. . ( 2008 ). Lost in state space: are preferences stable? International Economic Review 49 ( 3 ): 1091 – 1112 . Google Scholar Crossref Search ADS WorldCat Anderson , M. L. and Magruder , J. ( 2017 ). Split-sample strategies for avoiding false discoveries. National Bureau of Economic Research Working Paper Series No. 23544. Andreoni , J. ( 1989 ). Giving with impure altruism: applications to charity and Ricardian equivalence . Journal of Political Economy 97 ( 6 ): 1447 – 1458 . Google Scholar Crossref Search ADS WorldCat Andreoni , J. ( 1990 ). Impure altruism and donations to public goods: a theory of warm-glow giving . The Economic Journal 100 ( 401 ): 464 – 477 . Google Scholar Crossref Search ADS WorldCat Andreoni , J. , Che , Y.-K. and Kim , J. ( 2007 ). Asymmetric information about rivals’ types in standard auctions: an experiment . Games and Economic Behavior 59 ( 2 ): 240 – 259 . Google Scholar Crossref Search ADS WorldCat Arechar , A. A. , Gächter , S. and Molleman , L. ( 2018 ). Conducting interactive experiments online . Experimental Economics 21 ( 1 ): 99 – 131 . Google Scholar Crossref Search ADS PubMed WorldCat Ariely , D. , Loewenstein , G. and Prelec , D. ( 2003 ). ‘Coherent arbitrariness’: stable demand curves without stable preferences . Quarterly Journal of Economics 118 ( 1 ): 73 – 105 . Google Scholar Crossref Search ADS WorldCat Ashton , L. ( 2015 ). Hunger games: does hunger affect time preferences? https://ssrn.com/abstract=2538740. Last accessed on November 7, 2019. Astor , P. J. , Adam , M. T. P. , Jähnig , C. et al. . ( 2013 ). The joy of winning and the frustration of losing: a psychophysiological analysis of emotions in first-price sealed-bid auctions . Journal of Neuroscience, Psychology, and Economics 6 ( 1 ): 14 – 30 . Google Scholar Crossref Search ADS WorldCat Augenblick , N. ( 2016 ). The sunk-cost fallacy in penny auctions . The Review of Economic Studies 83 ( 1 ): 58 – 86 . Google Scholar Crossref Search ADS WorldCat Austin , P. C. ( 2009 ). Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples . Statistics in Medicine 28 ( 25 ): 3083 – 3107 . Google Scholar Crossref Search ADS PubMed WorldCat Ausubel , L. M. ( 2004 ). An efficient ascending-bid auction for multiple objects . American Economic Review 94 ( 5 ): 1452 – 1475 . Google Scholar Crossref Search ADS WorldCat Banerji , A. and Gupta , N. ( 2014 ). Detection, identification, and estimation of loss aversion: evidence from an auction experiment . American Economic Journal: Microeconomics 6 ( 1 ): 91 – 133 . Google Scholar Crossref Search ADS WorldCat Bellemare , C. , Bissonnette , L. and Kröger , S. ( 2016 ). Simulating power of economic experiments: the powerbbk package . Journal of the Economic Science Association 2 ( 2 ): 157 – 168 . Google Scholar Crossref Search ADS WorldCat Bellemare , M. F. and Mazzocchi , M. ( 2019 ). Editorial focus: on deception in economic experiments . Food Policy 83 : 1 . Google Scholar Crossref Search ADS WorldCat Belot , M. and James , J. ( 2014 ). A new perspective on the issue of selection bias in randomized controlled field experiments . Economics Letters 124 ( 3 ): 326 – 328 . Google Scholar Crossref Search ADS WorldCat Belton , C. A. and Sugden , R. ( 2018 ). Attention and novelty: an experimental investigation of order effects in multiple valuation tasks . Journal of Economic Psychology 67 : 103 – 115 . Google Scholar Crossref Search ADS WorldCat Benjamin , D. J. , Berger , J. O. , Johannesson , M. et al. . ( 2018 ). Redefine statistical significance . Nature Human Behaviour 2 ( 1 ): 6 – 10 . Google Scholar Crossref Search ADS PubMed WorldCat Bi , X. , House , L. , Gao , Z. et al. . ( 2012 ). Sensory evaluation and experimental auctions: measuring willingness to pay for specific sensory attributes . American Journal of Agricultural Economics 94 ( 2 ): 562 – 568 . Google Scholar Crossref Search ADS WorldCat Birol , E. , Meenakshi , J. V. , Oparinde , A. et al. . ( 2015 ). Developing country consumers’ acceptance of biofortified foods: a synthesis . Food Security 7 ( 3 ): 555 – 568 . Google Scholar Crossref Search ADS WorldCat Bonacich , P. , Shure , G. H. , Kahan , J. P. et al. . ( 1976 ). Cooperation and group size in the n-person prisoners’ dilemma . The Journal of Conflict Resolution 20 ( 4 ): 687 – 706 . Google Scholar Crossref Search ADS WorldCat Bonanno , A. , Huang , R. and Liu , Y. ( 2014 ). Simulating welfare effects of the European nutrition and health claims’ regulation: the Italian yogurt market . European Review of Agricultural Economics 42 ( 3 ): 499 – 533 . Google Scholar Crossref Search ADS WorldCat Borghans , L. , Duckworth , A. L. , Heckman , J. J. et al. . ( 2008 ). The economics and psychology of personality traits . Journal of Human Resources 43 ( 4 ): 972 – 1059 . Google Scholar Crossref Search ADS WorldCat Briz , T. , Drichoutis , A. C. and House , L. ( 2015 ). Examining projection bias in experimental auctions: the role of hunger and immediate gratification . Agricultural and Food Economics 3 ( 22 ). OpenURL Placeholder Text WorldCat Briz , T. , Drichoutis , A. C. and Nayga , R. M. , Jr ( 2017 ). Randomization to treatment failure in experimental auctions: the value of data from training rounds . Journal of Behavioral and Experimental Economics 71 : 56 – 66 . Google Scholar Crossref Search ADS WorldCat Brodeur , A. , Cook , N. and Heyes , A. ( 2018 ). Methods matter: P-hacking and causal inference in economics . IZA Discussion Paper No. 11796 . OpenURL Placeholder Text WorldCat Brodeur , A. , Lé , M. , Sangnier , M. et al. . ( 2016 ). Star wars: the empirics strike back . American Economic Journal: Applied Economics 8 ( 1 ): 1 – 32 . Google Scholar Crossref Search ADS WorldCat Brooks , K. and Lusk , J. L. ( 2010 ). Stated and revealed preferences for organic and cloned milk: combining choice experiment and scanner data . American Journal of Agricultural Economics 92 ( 4 ): 1229 – 1241 . Google Scholar Crossref Search ADS WorldCat Burger , K. S. and Berner , L. A. ( 2014 ). A functional neuroimaging review of obesity, appetitive hormones and ingestive behavior . Physiology & Behavior 136 : 121 – 127 . Google Scholar Crossref Search ADS PubMed WorldCat Burke , W. J. ( 2009 ). Fitting and interpreting Cragg’s Tobit alternative using Stata . Stata Journal 9 ( 4 ): 584 – 592 . Google Scholar Crossref Search ADS WorldCat Bushong , B. , King , L. M. , Camerer , C. F. et al. . ( 2010 ). Pavlovian processes in consumer choice: the physical presence of a good increases willingness-to-pay . American Economic Review 100 ( 4 ): 1556 – 1571 . Google Scholar Crossref Search ADS WorldCat Button , K. S. , Ioannidis , J. P. A. , Mokrysz , C. et al. . ( 2013 ). Power failure: why small sample size undermines the reliability of neuroscience . Nature Reviews Neuroscience 14 : 365 . Google Scholar Crossref Search ADS PubMed WorldCat Camerer , C. F. , Dreber , A. , Forsell , E. et al. . ( 2016 ). Evaluating replicability of laboratory experiments in economics . Science 351 ( 6280 ): 1433 – 1436 . Google Scholar Crossref Search ADS PubMed WorldCat Camerer , C. F. , Dreber , A. , Holzmeister , F. et al. . ( 2018 ). Evaluating the replicability of social science experiments in nature and science between 2010 and 2015 . Nature Human Behaviour 2 ( 9 ): 637 – 644 . Google Scholar Crossref Search ADS PubMed WorldCat Capra , M. C. ( 2004 ). Mood-driven behavior in strategic interactions . American Economic Review 94 ( 2 ): 367 – 372 . Google Scholar Crossref Search ADS WorldCat Capra , M. C. , Meer , S. and Lanier , K. ( 2010 ). The effects of induced mood on bidding in random |$n$|th-price auctions . Journal of Economic Behavior & Organization 75 ( 2 ): 223 – 234 . Google Scholar Crossref Search ADS WorldCat Casari , M. , Ham , J. C. and Kagel , J. H. ( 2007 ). Selection bias, demographic effects, and ability effects in common value auction experiments . American Economic Review 97 ( 4 ): 1278 – 1304 . Google Scholar Crossref Search ADS WorldCat Cason , T. N. and Plott , C. R. ( 2014 ). Misconceptions and game form recognition: challenges to theories of revealed preference and framing . Journal of Political Economy 122 ( 6 ): 1235 – 1270 . Google Scholar Crossref Search ADS WorldCat Chambers , C. D. ( 2018 ). Introducing the transparency and openness promotion (TOP) guidelines and badges for open practices at cortex . Cortex 106 : 316 – 318 . Google Scholar Crossref Search ADS PubMed WorldCat Chang , A. C. and Li , P. ( 2017 ). A preanalysis plan to replicate sixty economics research papers that worked half of the time . American Economic Review 107 ( 5 ): 60 – 64 . Google Scholar Crossref Search ADS WorldCat Chang , A. C. and Li , P. ( 2018 ). Is economics research replicable? Sixty published papers from thirteen journals say “often not” . Critical Finance Review 7 . OpenURL Placeholder Text WorldCat Chang , J. B. , Lusk , J. L. and Norwood , F. B. ( 2009 ). How closely do hypothetical surveys and laboratory experiments predict field behavior? American Journal of Agricultural Economics 91 ( 2 ): 518 – 534 . Google Scholar Crossref Search ADS WorldCat Chen , Y. , Katuscák , P. and Ozdenoren , E. ( 2009 ). Why can’t a woman bid more like a man? Games and Economic Behavior 77 ( 1 ): 181 – 213 . Google Scholar Crossref Search ADS WorldCat Cherry , T. L. , Crocker , T. D. and Shogren , J. F. ( 2003 ). Rationality spillovers . Journal of Environmental Economics and Management 45 ( 1 ): 63 – 84 . Google Scholar Crossref Search ADS WorldCat Christensen , G. and Miguel , E. ( 2018 ). Transparency, reproducibility, and the credibility of economics research . Journal of Economic Literature 56 ( 3 ): 920 – 980 . Google Scholar Crossref Search ADS WorldCat Clemens , M. A. ( 2015 ). The meaning of failed replications: a review and proposal . Journal of Economic Surveys 31 ( 1 ): 326 – 342 . Google Scholar Crossref Search ADS WorldCat Clingingsmith , D. and Sheremeta , R. M. ( 2018 ). Status and the demand for visible goods: experimental evidence on conspicuous consumption . Experimental Economics 21 ( 4 ): 877 – 904 . Google Scholar Crossref Search ADS WorldCat Cochran , W. G. and Rubin , D. B. ( 1973 ). Controlling bias in observational studies: a review . Sankhyā: The Indian Journal of Statistics, Series A 35 ( 4 ): 417 – 446 . OpenURL Placeholder Text WorldCat Coffman , L. C. and Niederle , M. ( 2015 ). Pre-analysis plans have limited upside, especially where replications are feasible . Journal of Economic Perspectives 29 ( 3 ): 81 – 98 . Google Scholar Crossref Search ADS WorldCat Cohen , J. ( 1988 ). Statistical Power Analysis for the Behavioral Sciences , 2nd edn. Hillsdale, NJ, USA : Lawrence Erlbaum Associates . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Colson , G. , Corrigan , J. R. , Grebitus , C. et al. . ( 2015 ). Which deceptive practices, if any, should be allowed in experimental economics research? Results from surveys of applied experimental economists and students . American Journal of Agricultural Economics 98 ( 2 ): 610 – 621 . Google Scholar Crossref Search ADS WorldCat Cooper , D. J. and Fang , H. ( 2008 ). Understanding overbidding in second price auctions: an experimental study . The Economic Journal 118 ( 532 ): 1572 – 1595 . Google Scholar Crossref Search ADS WorldCat Corgnet , B. , Hernán-González , R. , Kujal , P. et al. . ( 2014 ). The effect of earned versus house money on price bubble formation in experimental asset markets . Review of Finance 19 ( 4 ): 1455 – 1488 . Google Scholar Crossref Search ADS WorldCat Corrigan , J. R. , Drichoutis , A. C. , Lusk , J. L. et al. . ( 2012 ). Repeated rounds with price feedback in experimental auction valuation: an adversarial collaboration . American Journal of Agricultural Economics 94 ( 1 ): 97 – 115 . Google Scholar Crossref Search ADS WorldCat Cox , J. C. , Roberson , B. , Smith , V. L. ( 1982 ). Theory and Behavior of Single Object Auctions in Research in Experimental Economics, ed. by V. L. Smith. Greenwich, Connecticut, USA : JAI Press, Inc . Cragg , J. G. ( 1971 ). Some statistical models for limited dependent variables with application to the demand for durable goods . Econometrica 39 ( 5 ): 829 – 844 . Google Scholar Crossref Search ADS WorldCat Croson , R. ( 2005 ). The method of experimental economics . International Negotiation 10 ( 1 ): 131 – 148 . Google Scholar Crossref Search ADS WorldCat Czibor , E. , Jimenez-Gomez , D. and List , J. A. ( 2019 ). The dozen things experimental economists should do (more of) . National Bureau of Economic Research Working Paper Series No. 25451 . OpenURL Placeholder Text WorldCat Davis , L. R. , Joyce , B. P. and Roelofs , M. R. ( 2010 ). My money or yours: house money payment effects . Experimental Economics 13 ( 2 ): 189 – 205 . Google Scholar Crossref Search ADS WorldCat de Groote , H. , Kimenju , S. C. and Morawetz , U. B. ( 2011 ). Estimating consumer willingness to pay for food quality with experimental auctions: the case of yellow versus fortified maize meal in Kenya . Agricultural Economics 42 ( 1 ): 1 – 16 . Google Scholar Crossref Search ADS WorldCat de Quidt , J. , Haushofer , J. and Roth , C. ( 2018 ). Measuring and bounding experimenter demand . American Economic Review 108 ( 11 ): 3266 – 3302 . Google Scholar Crossref Search ADS WorldCat Deaton , A. and Cartwright , N. ( 2016 ). Understanding and misunderstanding randomized controlled trials . National Bureau of Economic Research Working Paper No. 22595 . OpenURL Placeholder Text WorldCat Debnam , J. ( 2017 ). Selection effects and heterogeneous demand responses to the Berkeley soda tax vote . American Journal of Agricultural Economics 99 ( 5 ): 1172 – 1187 . Google Scholar Crossref Search ADS WorldCat Delgado , M. R. , Schotter , A. , Ozbay , E. Y. et al. . ( 2008 ). Understanding overbidding: using the neural circuitry of reward to design economic auctions . Science 321 ( 5897 ): 1849 – 1852 . Google Scholar Crossref Search ADS PubMed WorldCat Demont , M. , Fiamohe , R. and Kinkpé , A. T. ( 2017 ). Comparative advantage in demand and the development of rice value chains in west Africa . World Development 96 : 578 – 590 . Google Scholar Crossref Search ADS WorldCat Demont , M. and Ndour , M. ( 2015 ). Upgrading rice value chains: experimental evidence from 11 African markets . Global Food Security 5 : 70 – 76 . Google Scholar Crossref Search ADS WorldCat Demont , M. , Rutsaert , P. , Ndour , M. et al. . ( 2013 ). Experimental auctions, collective induction and choice shift: willingness-to-pay for rice quality in Senegal . European Review of Agricultural Economics 40 ( 2 ): 261 – 286 . Google Scholar Crossref Search ADS WorldCat Demont , M. , Zossou , E. , Rutsaert , P. et al. . ( 2012 ). Consumer valuation of improved rice parboiling technologies in Benin . Food Quality and Preference 23 ( 1 ): 63 – 70 . Google Scholar Crossref Search ADS WorldCat Depositario , D. P. T. , Nayga , R. M. , Jr , Zhang , Y. Y. et al. . ( 2014 ). Revisiting cash endowment and house money effects in an experimental auction of a novel Agri-food product in the Philippines . Asian Economic Journal 28 ( 2 ): 201 – 215 . Google Scholar Crossref Search ADS WorldCat Dewald , W. G. , Thursby , J. G. and Anderson , R. G. ( 1986 ). Replication in empirical economics: the Journal of Money, Credit and Banking project . The American Economic Review 76 ( 4 ): 587 – 603 . OpenURL Placeholder Text WorldCat Diggle , P. J. , Heagerty , P. , Liang , K.-Y. et al. . ( 2002 ). Analysis of Longitudinal Data , 2nd edn. New York, USA : Oxford University Press Inc. Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Dillaway , R. , Messer , K. D. , Bernard , J. C. et al. . ( 2011 ). Do consumer responses to media food safety information last? Applied Economic Perspectives and Policy 33 ( 3 ): 363 – 383 . Google Scholar Crossref Search ADS WorldCat Dreyfyss , E. ( 2018 ). A bot panic hits Amazon’s mechanical Turk . Wired . https://perma.cc/42U2-AQ58. OpenURL Placeholder Text WorldCat Drichoutis , A. C. , Klonaris , S. and Papoutsi , G. S. ( 2017a ). Do good things come in small packages? Bottle size effects on willingness to pay for pomegranate wine and grape wine . Journal of Wine Economics 12 ( 1 ): 84 – 104 . Google Scholar Crossref Search ADS WorldCat Drichoutis , A. C. and Lusk , J. L. ( 2014 ). Judging statistical models of individual decision making under risk using in- and out-of-sample criteria . PLoS ONE 9 ( 7 ): e102269. OpenURL Placeholder Text WorldCat Drichoutis , A. C. and Lusk , J. L. ( 2016 ). What can multiple price lists really tell us about risk preferences? Journal of Risk and Uncertainty 53 ( 2 ): 89 – 106 . Google Scholar Crossref Search ADS WorldCat Drichoutis , A. C. , Lusk , J. L. and Nayga , R. M. ( 2015 ). The veil of experimental currency units in second price auctions . Journal of the Economic Science Association 1 ( 2 ): 182 – 196 . Google Scholar Crossref Search ADS WorldCat Drichoutis , A. C. , Nayga , R. M. J. , Lusk , J. L. et al. . ( 2012 ). When a risky prospect is valued more than its best possible outcome . Judgment and Decision Making 7 ( 1 ): 1 – 18 . OpenURL Placeholder Text WorldCat Drichoutis , A. C. , A. Vassilopoulos and J. L. Lusk ( 2014 ). Consumers’ willingness to pay for agricultural products certified to ensure fair working conditions . Report to the John S. Latsis Public Benefit Foundation. https://perma.cc/LFP7-XBJM. Drichoutis , A. C. , Vassilopoulos , A. , Lusk , J. L. et al. . ( 2017b ). Consumer preferences for fair labour certification . European Review of Agricultural Economics 44 ( 3 ): 455 – 474 . Google Scholar Crossref Search ADS WorldCat Durand-Morat , A. , Wailes , E. J. , Nayga , J. et al. . ( 2015 ). Challenges of conducting contingent valuation studies in developing countries . American Journal of Agricultural Economics 98 ( 2 ): 597 – 609 . Google Scholar Crossref Search ADS WorldCat Duval , S. and Tweedie , R. ( 2000 ). Trim and fill: a simple funnel-plot-based method of testing and adjusting for publication bias in meta-analysis . Biometrics 56 ( 2 ): 455 – 463 . Google Scholar Crossref Search ADS PubMed WorldCat Dyer , D. and Kagel , J. H. ( 1996 ). Bidding in common value auctions: how the commercial construction industry corrects for the winner’s curse . Management Science 42 ( 10 ): 1463 – 1475 . Google Scholar Crossref Search ADS WorldCat Egger , M. , Smith , G. D. , Schneider , M. et al. . ( 1997 ). Bias in meta-analysis detected by a simple, graphical test . BMJ 315 ( 7109 ): 629 – 634 . Google Scholar Crossref Search ADS PubMed WorldCat Ehmke , M. and Shogren , J. F. ( 2010 ). The experimental mindset within development economics: proper use and handling are everything . Applied Economic Perspectives and Policy 32 ( 4 ): 549 – 563 . Google Scholar Crossref Search ADS WorldCat Ehmke , M. D. , Lusk , J. L. and List , J. A. ( 2008 ). Is hypothetical bias a universal phenomenon? A multinational investigation . Land Economics 84 ( 3 ): 489 – 500 . Google Scholar Crossref Search ADS WorldCat Ellis , P. D. ( 2010 ). The Essential Guide to Effect Sizes: Statistical Power, Meta-Analysis, and the Interpretation of Research Results . Cambridge : Cambridge University Press . Google Scholar Crossref Search ADS Google Scholar Google Preview WorldCat COPAC Ellison , B. , Lusk , J. L. and Davis , D. ( 2014 ). The impact of restaurant calorie labels on food choice: results from a field experiment . Economic Inquiry 52 ( 2 ): 666 – 681 . Google Scholar Crossref Search ADS WorldCat Erdem , T. ( 1996 ). A dynamic analysis of market structure based on panel data . Marketing Science 15 ( 4 ): 359 – 378 . Google Scholar Crossref Search ADS WorldCat Fafchamps , M. and Labonne , J. ( 2016 ). Using split samples to improve inference about causal effects . National Bureau of Economic Research Working Paper Series No. 21842 . OpenURL Placeholder Text WorldCat Fanelli , D. and Ioannidis , J. P. A. ( 2013 ). US studies may overestimate effect sizes in softer research . Proceedings of the National Academy of Sciences . 110(37) : 15031 – 15036 . Feuz , D. M. , Umberger , W. J. , Calkins , C. R. et al. . ( 2004 ). U.S. consumers’ willingness to pay for flavor and tenderness in steaks as determined with an experimental auction . Journal of Agricultural and Resource Economics 29 ( 3 ): 501 – 516 . OpenURL Placeholder Text WorldCat Fidler , F. , Thomason , N. , Cumming , G. et al. . ( 2004 ). Editors can lead researchers to confidence intervals, but can’t make them think: statistical reform lessons from medicine . Psychological Science 15 ( 2 ): 119 – 126 . Google Scholar Crossref Search ADS PubMed WorldCat Fisher , R. A. ( 1925 ). Statistical Methods for Research Workers . Edinburgh : Oliver and Boyd . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Flynn , N. , Kah , C. and Kerschbamer , R. ( 2016 ). Vickrey auction vs BDM: difference in bidding behaviour and the impact of other-regarding motives . Journal of the Economic Science Association 2 ( 2 ): 101 – 108 . Google Scholar Crossref Search ADS WorldCat Fox , J. A. , Shogren , J. F. , Hayes , D. J. et al. . ( 1998 ). CVM-X: calibrating contingent values with experimental auction markets . American Journal of Agricultural Economics 80 ( 3 ): 455 – 465 . Google Scholar Crossref Search ADS WorldCat Gelman , A. ( 2019 ). Don’t calculate post-hoc power using observed estimate of effect size . Annals of Surgery 269 ( 1 ): e9-e10 . Google Scholar Crossref Search ADS PubMed WorldCat Gelman , A. and Carlin , J. ( 2014 ). Beyond power calculations: assessing type S (sign) and type M (magnitude) errors . Perspectives on Psychological Science 9 ( 6 ): 641 – 651 . Google Scholar Crossref Search ADS PubMed WorldCat Georganas , S. , Levin , D. and McGee , P. ( 2017 ). Optimistic irrationality and overbidding in private value auctions . Experimental Economics 20 ( 4 ): 772 – 792 . Google Scholar Crossref Search ADS WorldCat Gigerenzer , G. , Krauss , S. and Vitouch , O. ( 2004 ). The Null Ritual: What You Always Wanted to Know About Significance Testing but Were Afraid to Ask . Thousand Oaks, CA : SAGE Publications, Inc. , 391 – 408 . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Gill , J. ( 2018 ). Comments from the new editor . Political Analysis 26 ( 1 ): 1 – 2 . Google Scholar Crossref Search ADS WorldCat Gneezy , A. ( 2016 ). Field experimentation in marketing research . Journal of Marketing Research 54 ( 1 ): 140 – 143 . Google Scholar Crossref Search ADS WorldCat Gracia , A. , Loureiro , M. L. and Nayga , J. R. M. ( 2011 ). Are valuations from nonhypothetical choice experiments different from those of experimental auctions? American Journal of Agricultural Economics 93 ( 5 ): 1358 – 1373 . Google Scholar Crossref Search ADS WorldCat Grebitus , C. , Lusk , J. L. and Nayga , R. M. ( 2013 ). Explaining differences in real and hypothetical experimental auctions and choice experiments with personality . Journal of Economic Psychology 36 ( Supplement C ): 11 – 26 . Google Scholar Crossref Search ADS WorldCat Grether , D. M. , Plott , C. R. , Rowe , D. B. et al. . ( 2007 ). Mental processes and strategic equilibration: an fMRI study of selling strategies in second price auctions . Experimental Economics 10 ( 2 ): 105 – 122 . Google Scholar Crossref Search ADS WorldCat Ham , J. C. and Kagel , J. H. ( 2006 ). Gender effects in private value auctions . Economics Letters 92 ( 3 ): 375 – 382 . Google Scholar Crossref Search ADS WorldCat Hankins , M. ( 2013 ). Still not significant . Probable Error. http://perma.cc/Z6B9-PHCV. OpenURL Placeholder Text WorldCat Harrington , D. , D’Agostino , R. B. , Gatsonis , C. et al. . ( 2019 ). New guidelines for statistical reporting in the journal . New England Journal of Medicine 381 ( 3 ): 285 – 286 . Google Scholar Crossref Search ADS PubMed WorldCat Harrison , G. W. ( 1989 ). Theory and misbehavior of first-price auctions . The American Economic Review 79 ( 4 ): 749 – 762 . OpenURL Placeholder Text WorldCat Harrison , G. W. ( 1992 ). Theory and misbehavior of first-price auctions: reply . The American Economic Review 82 ( 5 ): 1426 – 1443 . OpenURL Placeholder Text WorldCat Harrison , G. W. ( 2006 ). Experimental evidence on alternative environmental valuation methods . Environmental & Resource Economics 36 : 125 – 162 . Google Scholar Crossref Search ADS WorldCat Harrison , G. W. , Johnson , E. , McInnes , M. M. et al. . ( 2005 ). Temporal stability of estimates of risk aversion . Applied Financial Economics Letters 1 ( 1 ): 31 – 35 . Google Scholar Crossref Search ADS WorldCat Harrison , G. W. and List , J. A. ( 2004 ). Field experiments . Journal of Economic Literature 42 ( 4 ): 1009 – 1055 . Google Scholar Crossref Search ADS WorldCat Hedges , L. V. ( 1981 ). Distribution theory for Glass’s estimator of effect size and related estimators . Journal of Educational Statistics 6 ( 2 ): 107 – 128 . Google Scholar Crossref Search ADS WorldCat Ho , D. E. , Imai , K. , King , G. et al. . ( 2007 ). Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference . Political Analysis 15 ( 3 ): 199 – 236 . Google Scholar Crossref Search ADS WorldCat Hoenig , J. M. and Heisey , D. M. ( 2001 ). The abuse of power . The American Statistician 55 ( 1 ): 19 – 24 . Google Scholar Crossref Search ADS WorldCat Holland , P. W. ( 1986 ). Statistics and causal inference . Journal of the American Statistical Association 81 ( 396 ): 945 – 960 . Google Scholar Crossref Search ADS WorldCat Holmquist , C. , McCluskey , J. and Ross , C. ( 2012 ). Consumer preferences and willingness to pay for oak attributes in Washington chardonnays . American Journal of Agricultural Economics 94 ( 2 ): 556 – 561 . Google Scholar Crossref Search ADS WorldCat Horowitz , J. K. ( 2006 ). The Becker–DeGroot–Marschak mechanism is not necessarily incentive compatible, even for non-random goods . Economics Letters 93 ( 1 ): 6 – 11 . Google Scholar Crossref Search ADS WorldCat Huck , S. , Normann , H.-T. and Oechssler , J. ( 2004 ). Two are few and four are many: number effects in experimental oligopolies . Journal of Economic Behavior & Organization 53 ( 4 ): 435 – 446 . Google Scholar Crossref Search ADS WorldCat Imai , K. , King , G. and Stuart , E. A. ( 2008 ). Misunderstandings between experimentalists and observationalists about causal inference . Journal of the Royal Statistical Society: Series A (Statistics in Society) 171 ( 2 ): 481 – 502 . Google Scholar Crossref Search ADS WorldCat Imbens , G. W. and Rubin , D. B. ( 2016 ). Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction . Cambridge and New York : Cambridge University Press . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Imbens , G. W. and Wooldridge , J. M. ( 2009 ). Recent developments in the econometrics of program evaluation . Journal of Economic Literature 47 ( 1 ): 5 – 86 . Google Scholar Crossref Search ADS WorldCat Inui , A. , Asakawa , A. , Bowers , C. Y. et al. . ( 2004 ). Ghrelin, appetite, and gastric motility: the emerging role of the stomach as an endocrine organ . The FASEB Journal 18 ( 3 ): 439 – 456 . Google Scholar Crossref Search ADS PubMed WorldCat Ioannidis , J. P. A. and Trikalinos , T. A. ( 2007 ). An exploratory test for an excess of significant findings . Clinical Trials 4 ( 3 ): 245 – 253 . Google Scholar Crossref Search ADS PubMed WorldCat Jacquemet , N. , Joule , R.-V. , Luchini , S. et al. . ( 2009 ). Earned wealth, engaged bidders? Evidence from a second-price auction . Economics Letters 105 ( 1 ): 36 – 38 . Google Scholar Crossref Search ADS WorldCat Janis , I. L. and King , B. T. ( 1954 ). The influence of role playing on opinion change . The Journal of Abnormal and Social Psychology 49 ( 2 ): 211 – 218 . Google Scholar Crossref Search ADS WorldCat Johnson , D. and Ryan , J. ( 2018 ). Amazon mechanical Turk workers can provide consistent and economically meaningful data . Munich Personal RePEc Archive No. 88450 . OpenURL Placeholder Text WorldCat Johnson , E. J. , Steffel , M. and Goldstein , D. G. ( 2005 ). Making better decisions: from measuring to constructing preferences . Health Psychology 24 ( 4 ): S17 – S22 . Google Scholar Crossref Search ADS PubMed WorldCat Josephson , A. and Michler , J. D. ( 2018 ). Viewpoint: beasts of the field? Ethics in agricultural and applied economics . Food Policy 79 : 1 – 11 . Google Scholar Crossref Search ADS WorldCat Just , D. R. ( 2017 ). The behavioral welfare paradox: practical, ethical and welfare implications of nudging . Agricultural and Resource Economics Review 46 ( 1 ): 1 – 20 . Google Scholar Crossref Search ADS WorldCat Just , D. R. ( 2019 ). Viewpoint: is the ban on deception necessary or even desirable? Food Policy 83 : 5 – 6 . Google Scholar Crossref Search ADS WorldCat Just , D. R. and Hanks , A. S. ( 2015 ). The hidden cost of regulation: emotional responses to command and control . American Journal of Agricultural Economics 97 ( 5 ): 1385 – 1399 . Google Scholar Crossref Search ADS WorldCat Kagel , J. H. , Harstad , R. M. and Levin , D. ( 1987 ). Information impact and allocation rules in auctions with affiliated private values: a laboratory study . Econometrica 55 ( 6 ): 1275 – 1304 . Google Scholar Crossref Search ADS WorldCat Kagel , J. H. and Levin , D. ( 1993 ). Independent private value auctions: bidder behaviour in first-, second- and third-price auctions with varying numbers of bidders . The Economic Journal 103 ( 419 ): 868 – 879 . Google Scholar Crossref Search ADS WorldCat Kaiser , H. F. ( 1960 ). Directional statistical decisions . Psychological Review 67 ( 3 ): 160 – 167 . Google Scholar Crossref Search ADS PubMed WorldCat Kang , M. and Camerer , C. ( 2013 ). Fmri evidence of a hot-cold empathy gap in hypothetical and real aversive choices . Frontiers in Neuroscience 7 : 104 . OpenURL Placeholder Text WorldCat Karni , E. and Safra , Z. ( 1987 ). “Preference reversal” and the observability of preferences by experimental methods . Econometrica 55 ( 3 ): 675 – 685 . Google Scholar Crossref Search ADS WorldCat Katkar , R. and Reiley , D. H. ( 2007 ). Public versus secret reserve prices in ebay auctions: results from a pokemon field experiment . The B.E. Journal of Economic Analysis & Policy 5 ( 2 ). OpenURL Placeholder Text WorldCat Kechagia , V. and Drichoutis , A. C. ( 2017 ). The effect of olfactory sensory cues on willingness to pay and choice under risk . Journal of Behavioral and Experimental Economics 70 : 33 – 46 . Google Scholar Crossref Search ADS WorldCat Kennedy , P. E. ( 2002 ). Sinning in the basement: what are the rules? The ten commandments of applied econometrics . Journal of Economic Surveys 16 ( 4 ): 569 – 589 . Google Scholar Crossref Search ADS WorldCat Kenny , D. A. ( 1987 ). Chapter 13: The two-group design . In: Statistics for the Social and Behavioral Sciences . Little, Brown , 203 – 223 . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Keser , C. , Ehrhart , K.-M. and Berninghaus , S. K. ( 1998 ). Coordination and local interaction: experimental evidence . Economics Letters 58 ( 3 ): 269 – 275 . Google Scholar Crossref Search ADS WorldCat Kessler , J. B. and Meier , S. ( 2014 ). Learning from (failed) replications: cognitive load manipulations and charitable giving . Journal of Economic Behavior & Organization 102 : 10 – 13 . Google Scholar Crossref Search ADS WorldCat Kimball , A. W. ( 1957 ). Errors of the third kind in statistical consulting . Journal of the American Statistical Association 52 ( 278 ): 133 – 142 . Google Scholar Crossref Search ADS WorldCat Klain , T. J. , Lusk , J. L. , Tonsor , G. T. et al. . ( 2014 ). An experimental approach to valuing information . Agricultural Economics 45 ( 5 ): 635 – 648 . Google Scholar Crossref Search ADS WorldCat Kline , R. B. ( 2013 ). Beyond Significance Testing: Statistics Reform in the Behavioral Sciences . American Psychological Association . Google Scholar Crossref Search ADS Google Scholar Google Preview WorldCat COPAC Kovalsky , K. L. and Lusk , J. L. ( 2013 ). Do consumers really know how much they are willing to pay? Journal of Consumer Affairs 47 ( 1 ): 98 – 127 . Google Scholar Crossref Search ADS WorldCat Kupper , L. L. and Hafner , K. B. ( 1989 ). How appropriate are popular sample size formulas? The American Statistician 43 ( 2 ): 101 – 105 . OpenURL Placeholder Text WorldCat Lakens , D. , Adolfi , F. G. , Albers , C. J. et al. . ( 2018 ). Justify your alpha . Nature Human Behaviour 2 ( 3 ): 168 – 171 . Google Scholar Crossref Search ADS WorldCat Lane , D. M. and Dunlap , W. P. ( 1978 ). Estimating effect size: bias resulting from the significance criterion in editorial decisions . British Journal of Mathematical and Statistical Psychology 31 ( 2 ): 107 – 112 . Google Scholar Crossref Search ADS WorldCat Lee , J. Y. and Fox , J. A. ( 2015 ). Bidding behavior in experimental auctions with positive and negative values . Economics Letters 136 : 151 – 153 . Google Scholar Crossref Search ADS WorldCat Lee , J. Y. , Nayga , R. M. J. , Deck , C. et al. . ( 2017 ). Cognitive ability and bidding behavior in second price auctions: an experimental study . Munich Personal RePEc Archive No. 81495 . OpenURL Placeholder Text WorldCat Lehner , R. , Balsters , J. H. , Herger , A. et al. . ( 2017 ). Monetary, food, and social rewards induce similar Pavlovian-to-instrumental transfer effects . Frontiers in Behavioral Neuroscience 10 ( 247 ). OpenURL Placeholder Text WorldCat Lerner , J. , Small , D. and Loewenstein , G. ( 2004 ). Heart strings and purse strings: carryover effects of emotions on economic decisions . Psychological Science 15 ( 5 ): 337 – 341 . Google Scholar Crossref Search ADS PubMed WorldCat Lewis , K. E. , Grebitus , C. and Nayga , R. M. ( 2016a ). The impact of brand and attention on consumers’ willingness to pay: evidence from an eye tracking experiment . Canadian Journal of Agricultural Economics/Revue Canadienne d’Agroeconomie 64 ( 4 ): 753 – 777 . Google Scholar Crossref Search ADS WorldCat Lewis , K. E. , Grebitus , C. and Nayga , R. M. ( 2016b ). The importance of taste in experimental auctions: consumers’ valuation of calorie and sweetener labeling of soft drinks . Agricultural Economics 47 ( 1 ): 47 – 57 . Google Scholar Crossref Search ADS WorldCat Li , S. ( 2017 ). Obviously strategy-proof mechanisms . American Economic Review 107 ( 11 ): 3257 – 3287 . Google Scholar Crossref Search ADS WorldCat Lichtenstein , S. and Slovic , P. ( 2006 ). The Construction of Preference . Cambridge, UK : Cambridge University Press . Google Scholar Crossref Search ADS Google Scholar Google Preview WorldCat COPAC Linder , N. S. , Uhl , G. , Fliessbach , K. et al. . ( 2010 ). Organic labeling influences food valuation and choice . NeuroImage 53 ( 1 ): 215 – 220 . Google Scholar Crossref Search ADS PubMed WorldCat List , J. A. ( 2003 ). Does market experience eliminate market anomalies? The Quarterly Journal of Economics 118 ( 1 ): 41 – 71 . Google Scholar Crossref Search ADS WorldCat List , J. A. ( 2004 ). Neoclassical theory versus prospect theory: evidence from the marketplace . Econometrica 72 ( 2 ): 615 – 625 . Google Scholar Crossref Search ADS WorldCat List , J. A. and Lucking-Reiley , D. ( 2000 ). Demand reduction in multiunit auctions: evidence from a sportscard field experiment . The American Economic Review 90 ( 4 ): 961 – 972 . Google Scholar Crossref Search ADS WorldCat List , J. A. , Sadoff , S. and Wagner , M. ( 2011 ). So you want to run an experiment, now what? Some simple rules of thumb for optimal experimental design . Experimental Economics 14 ( 4 ): 439 . Google Scholar Crossref Search ADS WorldCat List , J. A. and Shogren , J. F. ( 1998 ). Calibration of the difference between actual and hypothetical valuations in a field experiment . Journal of Economic Behavior & Organization 37 ( 2 ): 193 – 205 . Google Scholar Crossref Search ADS WorldCat Liu , H. and Wu , T. ( 2005 ). Sample size calculation and power analysis of time-averaged difference . Journal of Modern Applied Statistical Methods 4 ( 2 ): 434 – 445 . Google Scholar Crossref Search ADS WorldCat Lucking-Reiley , D. , Bryan , D. , Prasad , N. et al. . ( 2007 ). Pennies from eBay: the determinants of price in online auctions . The Journal of Industrial Economics 55 ( 2 ): 223 – 233 . Google Scholar Crossref Search ADS WorldCat Lusk , J. ( 2010 ). Experimental auction markets for studying consumer preferences . In: S. R. Jaeger and H. MacFie (eds) , Consumer-Driven Innovation in Food and Personal Care Products, Woodhead Publishing Series in Food Science, Technology and Nutrition . Woodhead Publishing , 332 – 357 . Google Scholar Crossref Search ADS Google Scholar Google Preview WorldCat COPAC Lusk , J. L. ( 2014 ). Are you smart enough to know what to eat? A critique of behavioural economics as justification for regulation . European Review of Agricultural Economics 41 ( 3 ): 355 – 373 . Google Scholar Crossref Search ADS WorldCat Lusk , J. L. ( 2017 ). Consumer research with big data: applications from the food demand survey (foods) . American Journal of Agricultural Economics 99 ( 2 ): 303 – 320 . Google Scholar Crossref Search ADS WorldCat Lusk , J. L. ( 2019 ). Viewpoint: the costs and benefits of deception in economic experiments . Food Policy 83 : 2 – 4 . Google Scholar Crossref Search ADS WorldCat Lusk , J. L. , Alexander , C. and Rousu , M. C. ( 2007 ). Designing experimental auctions for marketing research: the effect of values, distributions, and mechanisms on incentives for truthful bidding . Review of Marketing Science 5 ( 1 ). OpenURL Placeholder Text WorldCat Lusk , J. L. , Daniel , M. S. , Mark , D. R. et al. . ( 2001 ). Alternative calibration and auction institutions for predicting consumer willingness to pay for nongenetically modified corn chips . Journal of Agricultural and Resource Economics 26 ( 1 ): 40 – 57 . OpenURL Placeholder Text WorldCat Lusk , J. L. and Fox , J. A. ( 2002 ). Consumer demand for mandatory labeling of beef from cattle administered growth hormones or fed genetically modified corn . Journal of Agricultural and Applied Economics 34 ( 1 ): 27 – 38 . Google Scholar Crossref Search ADS WorldCat Lusk , J. L. and Fox , J. A. ( 2003 ). Value elicitation in retail and laboratory environments . Economics Letters 79 ( 1 ): 27 – 34 . Google Scholar Crossref Search ADS WorldCat Lusk , J. L. , Marette , S. and Norwood , F. B. ( 2014 ). The paternalist meets his match . Applied Economic Perspectives and Policy 36 ( 1 ): 61 – 108 . Google Scholar Crossref Search ADS WorldCat Lusk , J. L. , Norwood , F. B. and Pruitt , J. R. ( 2006a ). Consumer demand for a ban on antibiotic drug use in pork production . American Journal of Agricultural Economics 88 ( 4 ): 1015 – 1033 . Google Scholar Crossref Search ADS WorldCat Lusk , J. L. , Pruitt , J. R. and Norwood , B. ( 2006b ). External validity of a framed field experiment . Economics Letters 93 ( 2 ): 285 – 290 . Google Scholar Crossref Search ADS WorldCat Lusk , J. L. and Schroeder , T. C. ( 2004 ). Are choice experiments incentive compatible? A test with quality differentiated beef steaks . American Journal of Agricultural Economics 86 ( 2 ): 467 – 482 . Google Scholar Crossref Search ADS WorldCat Lusk , J. L. and Schroeder , T. C. ( 2006 ). Auction bids and shopping choices . Advances in Economic Analysis & Policy 6 ( 1 ). OpenURL Placeholder Text WorldCat Lusk , J. L. and Shogren , J. F. ( 2007 ). Experimental Auctions: Methods and Applications in Economic and Marketing Research . Cambridge, UK : Cambridge University Press . Google Scholar Crossref Search ADS Google Scholar Google Preview WorldCat COPAC Malone , T. and Lusk , J. L. ( 2018 ). Releasing the trap: a method to reduce inattention bias in survey data with application to U.S. beer taxes . Economic Inquiry 57 ( 1 ): 584 – 599 . Google Scholar Crossref Search ADS WorldCat Maniadis , Z. , Tufano , F. and List , J. A. ( 2014 ). One swallow doesn’t make a summer: new evidence on anchoring effects . American Economic Review 104 ( 1 ): 277 – 290 . Google Scholar Crossref Search ADS WorldCat Marwell , G. and Schmitt , D. R. ( 1972 ). Cooperation in a three-person prisoner’s dilemma . Journal of Personality and Social Psychology 21 ( 3 ): 376 – 383 . Google Scholar Crossref Search ADS WorldCat Maxwell , S. E. and Delaney , H. D. ( 2004 ). Designing Experiments and Analyzing Data: A Model Comparison Perspective , 2nd edn. Mahwah, N.J. : Lawrence Erlbaum Associates . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Mazar , N. , Koszegi , B. and Ariely , D. ( 2013 ). True context-dependent preferences? The causes of market-dependent valuations . Journal of Behavioral Decision Making 27(3) : 200 – 208 . OpenURL Placeholder Text WorldCat McCloskey , D. N. and Ziliak , S. T. ( 1996 ). The standard error of regressions . Journal of Economic Literature 34 ( 1 ): 97 – 114 . OpenURL Placeholder Text WorldCat McCullough , B. , McGeary , K. A. and Harrison , T. D. ( 2008 ). Do economics journal archives promote replicable research? Canadian Journal of Economics/Revue Canadienne d’Economique 41 ( 4 ): 1406 – 1420 . Google Scholar Crossref Search ADS WorldCat McCullough , B. D. , McGeary , K. A. and Harrison , T. D. ( 2006 ). Lessons from the JMCB archive . Journal of Money, Credit and Banking 38 ( 4 ): 1093 – 1107 . Google Scholar Crossref Search ADS WorldCat McShane , B. B. , Gal , D. , Gelman , A. et al. . ( 2019 ). Abandon statistical significance . The American Statistician 73 ( supp.1 ): 235 – 245 . Google Scholar Crossref Search ADS WorldCat Melton , B. E. , Huffman , W. E. , Shogren , J. F. et al. . ( 1996 ). Consumer preferences for fresh food items with multiple quality attributes: evidence from an experimental auction of pork chops . American Journal of Agricultural Economics 78 ( 4 ): 916 – 923 . Google Scholar Crossref Search ADS WorldCat Miguel , E. , Camerer , C. , Casey , K. et al. . ( 2014 ). Promoting transparency in social science research . Science 343 ( 6166 ): 30 – 31 . Google Scholar Crossref Search ADS PubMed WorldCat Moffatt , P. G. ( 2016 ). Experimetrics: Econometrics for Experimental Economics . London : Palgrave . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Moher , D. , Hopewell , S. , Schulz , K. F. et al. . ( 2010 ). CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials . BMJ 340 : c332 . Google Scholar Crossref Search ADS PubMed WorldCat Morgan , J. , Steiglitz , K. and Reis , G. ( 2003 ). The spite motive and equilibrium behavior in auctions . Contributions in Economic Analysis & Policy 2 ( 1 ). OpenURL Placeholder Text WorldCat Muller , L. , Anne , L. , Jayson , L. et al. . ( 2017 ). Distributional impacts of fat taxes and thin subsidies . The Economic Journal 127 ( 604 ): 2066 – 2092 . Google Scholar Crossref Search ADS WorldCat Mutz , D. C. , Pemantle , R. and Pham , P. ( 2019 ). The perils of balance testing in experimental design: messy analyses of clean data . The American Statistician 73 ( 1 ): 32 – 42 . Google Scholar Crossref Search ADS WorldCat Naleid , A. M. , Grace , M. K. , Cummings , D. E. et al. . ( 2005 ). Ghrelin induces feeding in the mesolimbic reward pathway between the ventral tegmental area and the nucleus accumbens . Peptides 26 ( 11 ): 2274 – 2279 . Google Scholar Crossref Search ADS PubMed WorldCat Niederle , M. and Vesterlund , L. ( 2007 ). Do women shy away from competition? Do men compete too much? The Quarterly Journal of Economics 122 ( 3 ): 1067 – 1101 . Google Scholar Crossref Search ADS WorldCat Niederle , M. and Vesterlund , L. ( 2011 ). Gender and competition . Annual Review of Economics 3 ( 1 ): 601 – 630 . Google Scholar Crossref Search ADS WorldCat Norwood , B. F. , Roberts , M. C. and Lusk , J. L. ( 2004 ). Ranking crop yield models using out-of-sample likelihood functions . American Journal of Agricultural Economics 86 ( 4 ): 1032 – 1043 . Google Scholar Crossref Search ADS WorldCat Nosek , B. A. , Alter , G. , Banks , G. C. et al. . ( 2015 ). Promoting an open research culture . Science 348 ( 6242 ): 1422 – 1425 . Google Scholar Crossref Search ADS PubMed WorldCat Nosenzo , D. , Quercia , S. and Sefton , M. ( 2015 ). Cooperation in small groups: the effect of group size . Experimental Economics 18 ( 1 ): 4 – 14 . Google Scholar Crossref Search ADS WorldCat Nuzzo , R. ( 2014 ). Scientific method: statistical errors –p values, the ‘gold standard’ of statistical validity, are not as reliable as many scientists assume . Nature 506 : 150 – 152 . Google Scholar Crossref Search ADS PubMed WorldCat Olken , B. A. ( 2015 ). Promises and perils of pre-analysis plans . Journal of Economic Perspectives 29 ( 3 ): 61 – 80 . Google Scholar Crossref Search ADS WorldCat Open Science Collaboration ( 2015 ). Estimating the reproducibility of psychological science . Science 349 ( 6251 ). doi: 10.1126/science.aac4716 . OpenURL Placeholder Text WorldCat Crossref Orquin , J. L. and Mueller Loose , S. ( 2013 ). Attention and choice: a review on eye movements in decision making . Acta Psychologica 144 ( 1 ): 190 – 206 . Google Scholar Crossref Search ADS PubMed WorldCat Orquin , J. L. , Perkovic , S. and Grunert , K. G. ( 2018 ). Visual biases in decision making . Applied Economic Perspectives and Policy 40 ( 4 ): 523 – 537 . Google Scholar Crossref Search ADS WorldCat Orwin , R. G. ( 1983 ). A fail-safe N for effect size in meta-analysis . Journal of Educational Statistics 8 ( 2 ): 157 – 159 . OpenURL Placeholder Text WorldCat Parkhurst , G. M. , Shogren , J. F. and Dickinson , D. L. ( 2004 ). Negative values in Vickrey auctions . American Journal of Agricultural Economics 86 ( 1 ): 222 – 235 . Google Scholar Crossref Search ADS WorldCat Payne , J. W. , Bettman , J. R. and Schkade , D. A. ( 1999 ). Measuring constructed preferences: towards a building code . Journal of Risk and Uncertainty 19 ( 1 ): 243 – 270 . Google Scholar Crossref Search ADS WorldCat Pearson , M. and Schipper , B. C. ( 2013 ). Menstrual cycle and competitive bidding . Games and Economic Behavior 78 : 1 – 20 . Google Scholar Crossref Search ADS WorldCat Plassmann , H. , O’Doherty , J. and Rangel , A. ( 2007 ). Orbitofrontal cortex encodes willingness to pay in everyday economic transactions . The Journal of Neuroscience 27 ( 37 ): 9984 – 9988 . Google Scholar Crossref Search ADS PubMed WorldCat Platter , W. J. , Tatum , J. D. , Belk , K. E. et al. . ( 2005 ). Effects of marbling and shear force on consumers’ willingness to pay for beef strip loin steaks . Journal of Animal Science 83 ( 4 ): 890 – 899 . Google Scholar Crossref Search ADS PubMed WorldCat Pritschet , L. , Powell , D. and Horne , Z. ( 2016 ). Marginally significant effects as evidence for hypotheses: changing attitudes over four decades . Psychological Science 27 ( 7 ): 1036 – 1042 . Google Scholar Crossref Search ADS PubMed WorldCat Framework , R. E. ( 2019 ) Panel criteria and working methods . REF 2019/02 . Richards , T. J. , Hamilton , S. F. and Allender , W. J. ( 2014 ). Social networks and new product choice . American Journal of Agricultural Economics 96 ( 2 ): 489 – 516 . Google Scholar Crossref Search ADS WorldCat Rihn , A. L. and Yue , C. ( 2016 ). Visual attention’s influence on consumers’ willingness-to-pay for processed food products . Agribusiness 32 ( 3 ): 314 – 328 . Google Scholar Crossref Search ADS WorldCat Roe , B. E. and Just , D. R. ( 2009 ). Internal and external validity in economics research: tradeoffs between experiments, field experiments, natural experiments, and field data . American Journal of Agricultural Economics 91 ( 5 ): 1266 – 1271 . Google Scholar Crossref Search ADS WorldCat Roider , A. and Schmitz , P. W. ( 2012 ). Auctions with anticipated emotions: overbidding, underbidding, and optimal reserve prices . The Scandinavian Journal of Economics 114 ( 3 ): 808 – 830 . OpenURL Placeholder Text WorldCat Rosato , A. and Tymula , A. A. ( 2016 ). Loss aversion and competition in Vickrey auctions: money ain’t no good . The University of Sydney, Economics Working Paper Series 2016-09 . Rosenbaum , P. R. ( 2002 ). Observational Studies , 2nd edn. New York : Springer . Google Scholar Crossref Search ADS Google Scholar Google Preview WorldCat COPAC Rosenboim , M. and Shavit , T. ( 2012 ). Whose money is it anyway? Using prepaid incentives in experimental economics to create a natural environment . Experimental Economics 15 ( 1 ): 145 – 157 . Google Scholar Crossref Search ADS WorldCat Rosenthal , R. ( 1979 ). The file drawer problem and tolerance for null results . Psychological Bulletin 86 ( 3 ): 638 – 641 . Google Scholar Crossref Search ADS WorldCat Rousu , M. C. , Colson , G. , Corrigan , J. R. et al. . ( 2015 ). Deception in experiments: towards guidelines on use in applied economics research . Applied Economic Perspectives and Policy 37 ( 3 ): 524 – 536 . Google Scholar Crossref Search ADS WorldCat Rowe , D. B. ( 2001 ). Bayesian source separation for reference function determination in fMRI . Magnetic Resonance in Medicine 46 ( 2 ): 374 – 378 . Google Scholar Crossref Search ADS PubMed WorldCat Roy , R. , Chintagunta , P. K. and Haldar , S. ( 1996 ). A framework for investigating habits, “the hand of the past,” and heterogeneity in dynamic brand choice . Marketing Science 15 ( 3 ): 280 – 299 . Google Scholar Crossref Search ADS WorldCat Rubin , D. B. ( 1974 ). Estimating causal effects of treatments in randomized and nonrandomized studies . Journal of Educational Psychology 66 ( 5 ): 688 – 701 . Google Scholar Crossref Search ADS WorldCat Rubin , D. B. ( 1990 ). [On the application of probability theory to agricultural experiments. Essay on principles. Section 9.] Comment: Neyman (1923) and causal inference in experiments and observational studies . Statistical Science 5 ( 4 ): 472 – 480 . Google Scholar Crossref Search ADS WorldCat Sadrieh , A. ( 2019 ). Outlet for null effects . Online ESA Experimental Methods Discussion comment . https://groups.google.com/d/msg/esa-discuss/oUojJzi7Fg4/Nnu04KshBgAJ. Last accessed on November 7, 2019. Schipper , B. C. ( 2015 ). Sex hormones and competitive bidding . Management Science 61 ( 2 ): 249 – 486 . Google Scholar Crossref Search ADS WorldCat Selya , A. S. , Rose , J. S. , Dierker , L. C. et al. . ( 2012 ). A practical guide to calculating Cohen’s |${f}^2$|⁠, a measure of local effect size, from PROC MIXED . Frontiers in Psychology 3 : 111 . Google Scholar Crossref Search ADS PubMed WorldCat Senn , S. ( 1994 ). Testing for baseline balance in clinical trials . Statistics in Medicine 13 ( 17 ): 1715 – 1726 . Google Scholar Crossref Search ADS PubMed WorldCat Senn , S. ( 2013 ). Seven myths of randomisation in clinical trials . Statistics in Medicine 32 ( 9 ): 1439 – 1450 . Google Scholar Crossref Search ADS PubMed WorldCat Shen , H. and Wyer , R. S. , Jr ( 2008 ). Procedural priming and consumer judgments: effects on the impact of positively and negatively valenced information . Journal of Consumer Research 34 ( 5 ): 727 – 737 . Google Scholar Crossref Search ADS WorldCat Shogren , J. F. ( 2006 ). Valuation in the lab . Environmental & Resource Economics 34 : 163 – 172 . Google Scholar Crossref Search ADS WorldCat Shogren , J. F. , List , J. A. and Hayes , D. J. ( 2000 ). Preference learning in consecutive experimental auctions . American Journal of Agricultural Economics 82 ( 4 ): 1016 – 1021 . Google Scholar Crossref Search ADS WorldCat Shogren , J. F. , M. Margolis , C. Koo , and J. A. List ( 2001 ). A random |$n$|th-price auction. Journal of Economic Behavior and Organization 46 ( 4 ): 409 – 421 . Crossref Search ADS Silverman , B. W. ( 1981 ). Using kernel density estimates to investigate multimodality . Journal of the Royal Statistical Society. Series B (Methodological) 43 ( 1 ): 97 – 99 . Google Scholar Crossref Search ADS WorldCat Simmons , J. , Nelson , L. and Simonsohn , U. ( 2012 ). A 21 word solution . Dialogue: The Official Newsletter of the Society for Personality and Social Psychology 26 ( 2 ): 4 – 7 . OpenURL Placeholder Text WorldCat Simmons , J. P. , Nelson , L. D. and Simonsohn , U. ( 2011 ). False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant . Psychological Science 22 ( 11 ): 1359 – 1366 . Google Scholar Crossref Search ADS PubMed WorldCat Simonsohn , U. ( 2015 ). Small telescopes: detectability and the evaluation of replication results . Psychological Science 26 ( 5 ): 559 – 569 . Google Scholar Crossref Search ADS PubMed WorldCat Simonsohn , U. ( 2018 ). Re: Preregistration for lab experiment? Economic science association experimental methods discussion Google group . https://goo.gl/FQ5W2U. Last accessed on November 7, 2019. Simonsohn , U. , Nelson , L. D. and Simmons , J. P. ( 2014 ). P-curve: a key to the file-drawer . Journal of Experimental Psychology: General 143 ( 2 ): 534 – 547 . Google Scholar Crossref Search ADS PubMed WorldCat Simonsohn , U. , Simmons , J. P. and Nelson , L. D. ( 2013 ). Anchoring is not a false-positive: Maniadis, Tufano, and List’s (2014) ‘failure-to-replicate’ is actually entirely consistent with the original . Working paper . http://dx.doi.org/10.2139/ssrn.2351926. Last accessed on November 7, 2019. OpenURL Placeholder Text WorldCat Slovic , P. ( 1995 ). The construction of preference . American Psychologist 50 ( 5 ): 364 – 371 . Google Scholar Crossref Search ADS WorldCat Smith , V. L. ( 1982 ). Microeconomic systems as an experimental science . The American Economic Review 72 ( 5 ): 923 – 955 . OpenURL Placeholder Text WorldCat Sousa , Y. F. D. and Munro , A. ( 2012 ). Truck, barter and exchange versus the endowment effect: virtual field experiments in an online game environment . Journal of Economic Psychology 33 ( 3 ): 482 – 493 . Google Scholar Crossref Search ADS WorldCat Speed , T. P. ( 1990 ). Introductory remarks on Neyman (1923) . Statistical Science 5 ( 4 ): 463 – 464 . Google Scholar Crossref Search ADS WorldCat Splawa-Neyman , J. , Dabrowska , D. M. and Speed , T. P. ( 1990 ). On the application of probability theory to agricultural experiments. Essay on principles. Section 9 . Statistical Science 5 ( 4 ): 465 – 472 . Google Scholar Crossref Search ADS WorldCat Stigler , G. J. and Becker , G. S. ( 1977 ). De Gustibus non Est Disputandum . The American Economic Review 67 ( 2 ): 76 – 90 . OpenURL Placeholder Text WorldCat Tang , D. W. , Han , J. E. , Rangel , A. et al. . ( 2011 ). Ghrelin administration in humans increases bids for food items while decreasing bids for non-food items . Appetite 57 : S42 . Google Scholar Crossref Search ADS WorldCat Toler , S. , Briggeman , B. C. , Lusk , J. L. et al. . ( 2009 ). Fairness, farmers markets, and local production . American Journal of Agricultural Economics 91 ( 5 ): 1272 – 1278 . Google Scholar Crossref Search ADS WorldCat Trafimow , D. and Marks , M. ( 2015 ). Editorial . Basic and Applied Social Psychology 37 ( 1 ): 1 – 2 . Google Scholar Crossref Search ADS WorldCat Tyson-Carr , J. , Kokmotou , K. , Soto , V. et al. . ( 2018 ). Neural correlates of economic value and valuation context: an event-related potential study . Journal of Neurophysiology 119 ( 5 ): 1924 – 1933 . Google Scholar Crossref Search ADS PubMed WorldCat Umberger , W. J. and Feuz , D. M. ( 2004 ). The usefulness of experimental auctions in determining consumers’ willingness-to-pay for quality-differentiated products . Applied Economic Perspectives and Policy 26 ( 2 ): 170 – 185 . OpenURL Placeholder Text WorldCat Umberger , W. J. , Feuz , D. M. , Calkins , C. R. et al. . ( 2002 ). U.S. consumer preference and willingness-to-pay for domestic corn-fed beef versus international grass-fed beef measured through an experimental auction . Agribusiness 18 ( 4 ): 491 – 504 . Google Scholar Crossref Search ADS WorldCat Urbancic , M. ( 2011 ). Testing distributional dependence in the Becker–DeGroot–Marschak mechanism . Working paper, UC Berkeley . Vassilopoulos , A. , Drichoutis , A. C. and Nayga , R. M. , Jr. ( 2018 ). Loss aversion, expectations and anchoring in the BDM mechanism . Munich Personal RePEc Archive No. 85635 . OpenURL Placeholder Text WorldCat Vecchio , R. and Borrello , M. ( 2018 ). Measuring food preferences through experimental auctions: a review . Food Research International. 116 : 1113 – 1120 . Google Scholar Crossref Search ADS WorldCat Veling , H. , Chen , Z. , Tombrock , M. C. et al. . ( 2017 ). Training impulsive choices for healthy and sustainable food . Journal of Experimental Psychology: Applied 23 ( 2 ): 204 – 215 . Google Scholar Crossref Search ADS PubMed WorldCat Venter , A. and Maxwell , S. E. ( 1999 ). Maximizing Power in Randomized Designs when N is Small . Thousand Oaks, CA : Sage , 31 – 58 . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Wasserstein , R. L. and Lazar , N. A. ( 2016 ). The ASA’s statement on p-values: context, process, and purpose . The American Statistician 70 ( 2 ): 129 – 133 . Google Scholar Crossref Search ADS WorldCat Wedel , M. and Pieters , R. ( 2000 ). Eye fixations on advertisements and memory for brands: a model and findings . Marketing Science 19 ( 4 ): 297 – 312 . Google Scholar Crossref Search ADS WorldCat West , S. G. and Thoemmes , F. ( 2010 ). Campbell’s and Rubin’s perspectives on causal inference . Psychological Methods 15 ( 1 ): 18 – 37 . Google Scholar Crossref Search ADS PubMed WorldCat Williams , C. R. ( 2019 ). How redefining statistical significance can worsen the replication crisis . Economics Letters 181 : 65 – 69 . Google Scholar Crossref Search ADS WorldCat Zhang , L. and Ortmann , A. ( 2013 ). Exploring the meaning of significance in experimental economics . UNSW Australian School of Business Research Paper No. 2013-32 . OpenURL Placeholder Text WorldCat Zhang , Y. Y. , R. M. Nayga, Jr and D. P. T. Depositario ( 2019 ). Learning and the possibility of losing own money reduce overbidding: delayed payment in experimental auctions . PLoS O 14 ( 5 ): 1 – 19 . OpenURL Placeholder Text WorldCat Zizzo , D. J. ( 2010 ). Experimenter demand effects in economic experiments . Experimental Economics 13 ( 1 ): 75 – 98 . Google Scholar Crossref Search ADS WorldCat © Oxford University Press and Foundation for the European Review of Agricultural Economics 2019; all rights reserved. For permissions, please e-mail: journals.permissions@oup.com. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) TI - How to run an experimental auction: a review of recent advances JF - European Review of Agricultural Economics DO - 10.1093/erae/jbz038 DA - 2019-12-01 UR - https://www.deepdyve.com/lp/oxford-university-press/how-to-run-an-experimental-auction-a-review-of-recent-advances-eadyMuaZUL SP - 862 VL - 46 IS - 5 DP - DeepDyve ER -