Home

Footer

DeepDyve Logo
FacebookTwitter

Features

  • Search and discover articles on DeepDyve, PubMed, and Google Scholar
  • Read the full-text of open access and premium content
  • Organize articles with folders and bookmarks
  • Collaborate on and share articles and folders

Info

  • Pricing
  • Enterprise Plans
  • Browse Journals & Topics
  • About DeepDyve

Help

  • Help
  • Publishers
  • Contact Us

Popular Topics

  • COVID-19
  • Climate Change
  • Biopharmaceuticals
Terms |
Privacy |
Security |
Help |
Enterprise Plans |
Contact Us

Select data courtesy of the U.S. National Library of Medicine.

© 2023 DeepDyve, Inc. All rights reserved.

Public Opinion Quarterly

Subject:
History
Publisher:
Oxford University Press —
Oxford University Press
ISSN:
0033-362X
Scimago Journal Rank:
108

2023

Volume 87
S1 (Aug)Issue 2 (Jun)Issue 1 (Apr)
Volume 86
Issue 4 (Jan)

2022

Volume 86
S1 (Jun)Issue 4 (Dec)Issue 3 (Sep)Issue 2 (May)Issue 1 (Feb)

2021

Volume Advance Article
OctoberSeptemberAugustJulyJuneAprilMarchFebruaryJanuary
Volume 86
Issue 1 (Dec)
Volume 85
S1 (Sep)Issue 4 (Nov)Issue 3 (Oct)Issue 2 (Oct)
Volume 84
Issue 4 (Mar)Issue 3 (May)Issue 2 (Jul)Issue 1 (Mar)

2020

Volume Advance Article
DecemberNovemberOctoberSeptemberJuly
Volume 2020
July
Volume 84
S1 (Aug)Issue 4 (Oct)Issue 1 (Jul)

2019

Volume Advance Article
December
Volume 83
S1 (Jul)Issue 4 (Dec)Issue 3 (Nov)Issue 2 (Sep)Issue 1 (May)

2018

Volume Advance Article
Issue 2 (May)
Volume 82
S1 (Apr)Issue 4 (Dec)Issue 3 (Oct)Issue 1 (Jan)

2017

Volume 82
Issue 1 (Dec)
Volume 81
Issue 5 (Apr)Issue 4 (Dec)Issue 3 (Sep)Issue 2 (May)Issue 1 (Mar)

2016

Volume 2016
October
Volume 80
S1 (Jan)Issue 4 (Dec)Issue 3 (Sep)Issue 2 (Jan)Issue 1 (Jan)

2015

Volume 79
S1 (Jan)Issue 4 (Jan)Issue 3 (Jan)Issue 2 (May)Issue 1 (Jan)

2014

Volume 78
S1 (Jun)Issue 4 (Nov)Issue 3 (Oct)Issue 2 (Aug)Issue 1 (Feb)

2013

Volume 77
S1 (Jan)Issue 4 (Dec)Issue 3 (Sep)Issue 2 (Jun)Issue 1 (Apr)

2012

Volume 76
Issue 4 (Nov)Issue 3 (Sep)Issue 2 (Jul)Issue 1 (Mar)

2011

Volume Advance Article
August
Volume 75
Issue 5 (Dec)Issue 4 (Nov)Issue 3 (Sep)Issue 2 (Jan)Issue 1 (Feb)

2010

Volume 74
Issue 5 (Jan)Issue 4 (Oct)Issue 3 (Jan)Issue 2 (Mar)Issue 1 (Feb)

2009

Volume 2009
AugustApril
Volume 73
Issue 5 (Jan)Issue 4 (Nov)Issue 3 (Jan)Issue 2 (May)Issue 1 (Mar)
Volume 72
Issue 5 (Jan)

2008

Volume Advance Article
March
Volume 72
Issue 5 (Dec)Issue 4 (Nov)Issue 3 (Aug)Issue 2 (May)Issue 1 (Feb)

2007

Volume 71
Issue 5 (Jan)Issue 4 (Jan)Issue 3 (Aug)Issue 2 (Jun)Issue 1 (Feb)

2006

Volume 2006
SeptemberMay
Volume 70
Issue 5 (Jan)Issue 4 (Dec)Issue 3 (Sep)Issue 2 (Jan)Issue 1 (Jan)

2005

Volume 69
Issue 5 (Jan)Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

2004

Volume Advance Article
November
Volume 2004
June
Volume 68
Issue 4 (Dec)Issue 3 (Sep)Issue 2 (Jun)Issue 1 (Mar)

2003

Volume 2003
June
Volume 67
Issue 4 (Feb)Issue 3 (Nov)Issue 2 (Aug)Issue 1 (Mar)

2002

Volume Advance Article
April
Volume 66
Issue 4 (Feb)Issue 3 (Sep)Issue 2 (Aug)Issue 1 (Mar)

2001

Volume 65
Issue 4 (Feb)Issue 3 (Nov)Issue 2 (Jun)Issue 1 (Mar)
Volume 63
Issue 2 (Aug)

2000

Volume 64
Issue 4 (Feb)Issue 3 (Nov)Issue 2 (Aug)Issue 1 (Mar)

1999

Volume 63
Issue 4 (Feb)Issue 3 (Nov)Issue 1 (May)

1998

Volume 62
Issue 4 (Feb)Issue 3 (Nov)Issue 2 (Aug)Issue 1 (May)

1997

Volume 61
Issue 4 (Feb)Issue 3 (Nov)Issue 2 (Aug)Issue 1 (May)

1996

Volume 60
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1995

Volume 59
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1994

Volume 58
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1993

Volume 57
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1992

Volume 56
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1991

Volume 55
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1990

Volume 54
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1989

Volume 53
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1988

Volume 52
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1987

Volume 51
Issue 4_PART_2 (Jan)Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1986

Volume 50
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1985

Volume 49
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1984

Volume 48
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1B (Jan)Issue 1A (Jan)

1983

Volume 47
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1982

Volume 46
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1981

Volume 45
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1980

Volume 44
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1979

Volume 43
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1978

Volume 42
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1977

Volume 41
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1976

Volume 40
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1975

Volume 39
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1974

Volume 38
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1973

Volume 37
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1972

Volume 36
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1971

Volume 35
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1970

Volume 34
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1969

Volume 33
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1968

Volume 32
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1967

Volume 31
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1966

Volume 30
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1965

Volume 29
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1964

Volume 28
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1963

Volume 27
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1962

Volume 26
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1961

Volume 25
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1960

Volume 24
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1959

Volume 23
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1958

Volume 22
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1957

Volume 21
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1956

Volume 20
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1955

Volume 19
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1954

Volume 18
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1953

Volume 17
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1952

Volume 16
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1951

Volume 15
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1950

Volume 14
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1949

Volume 13
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1948

Volume 12
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1947

Volume 11
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1946

Volume 10
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1945

Volume 9
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1944

Volume 8
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1943

Volume 7
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1942

Volume 6
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jan)Issue 1 (Jan)

1941

Volume 5
Issue 4 (Jan)Issue 3 (Jan)Issue 2 (Jun)Issue 1 (Mar)

1940

Volume 4
Issue 4 (Dec)Issue 3 (Sep)Issue 2 (Jun)Issue 1 (Mar)

1939

Volume 3
Issue 4 (Oct)Issue 3 (Jul)Issue 2 (Apr)Issue 1 (Jan)

1938

Volume 2
Issue 4 (Oct)Issue 3 (Jul)Issue 2 (Apr)Issue 1 (Jan)

1937

Volume 1
Issue 4 (Oct)Issue 3 (Jul)Issue 2 (Apr)Issue 1 (Jan)

0030

Volume Advance Article
March

0019

Volume 0019
January

0001

Volume 70
Issue 5 (Jan)
journal article
LitStream Collection
Reducing Turnout Misreporting in Online Surveys

2018 Public Opinion Quarterly

doi: 10.1093/poq/nfy017

Abstract Assessing individual-level theories of electoral participation requires survey-based measures of turnout. Yet, due to a combination of sampling problems and respondent misreporting, postelection surveys routinely overestimate turnout, often by large margins. Using an online survey experiment fielded after the 2015 British general election, we implement three alternative survey questions aimed at correcting for turnout misreporting and test them against a standard direct turnout question used in postelection studies. Comparing estimated to actual turnout rates, we find that while all question designs overestimate aggregate turnout, the item-count technique alleviates the misreporting problem substantially, whereas a direct turnout question with additional face-saving options and a crosswise model design help little or not at all. Also, regression models of turnout estimated using the item-count measure yield substantively similar inferences regarding the correlates of electoral participation to models estimated using “gold-standard” validated vote measures. These findings stand in contrast to those suggesting that item-count techniques do not help with misreporting in an online setting and are particularly relevant given the increasing use of online surveys in election studies. Self-reported turnout rates in postelection surveys often considerably exceed official rates.1 This phenomenon of “vote overreporting” (e.g., Bernstein, Chadha, and Montjoy 2001; McDonald 2003) represents a major challenge for election research, raising questions about the validity of turnout models estimated using survey data (e.g., Brehm 1993; Bernstein, Chadha, and Montjoy 2001; Cassel 2003; Karp and Brockington 2005; Jones 2008). While vote overreporting is attributable in part to sampling and survey nonresponse biases (e.g., Brehm 1993; Jackman 1999; Voogt and Saris 2003), much previous research focuses on the tendency of survey respondents—particularly those who did not vote—to misreport their turnout (Presser 1990; Abelson, Loftus, and Greenwald 1992; Holtgraves, Eck, and Lasky 1997; Belli et al. 1999; Belli, Moore, and VanHoewyk 2006; Holbrook and Krosnick 2010b; Hanmer, Banks, and White 2014; Persson and Solevid 2014; Zeglovits and Kritzinger 2014; Thomas et al. 2017). This paper investigates whether misreporting can be alleviated by different sensitive survey techniques designed to reduce social desirability pressures arising from turnout-related questions. In particular, we examine the crosswise model (CM) and the item-count technique (ICT). Whereas these approaches had been of limited use to scholars estimating multivariate models of turnout, recent methodological advances (Blair and Imai 2010; Imai 2011; Blair and Imai 2012; Jann, Jerke, and Krumpal 2012; Blair, Imai, and Zhou 2015) have made estimating such models relatively straightforward. In an online survey experiment fielded shortly after the 2015 UK general election, we design new CM and ICT turnout questions and test them against a standard direct turnout question and a direct question with face-saving response options. Our findings show that while all question designs overestimate aggregate national turnout, ICT yields more accurate estimates compared to the standard direct question, whereas the face-saving design and CM improve accuracy little or not at all. Also, regression models of turnout estimated using ICT measures yield inferences regarding the correlates of electoral participation that are more consistent with those from models estimated using “gold-standard” validated vote measures. In contrast to recent studies that cast doubt on the suitability of ICT questions for reducing turnout misreporting in online surveys (Holbrook and Krosnick 2010b; Thomas et al. 2017), we show that ICT questions designed following current best practice appear to substantially reduce turnout misreporting in an online survey. Our results suggest that earlier mixed findings regarding ICT’s effectiveness could be due to the particular ICT designs used in those studies. TURNOUT AS A SENSITIVE TOPIC Existing research has sought to alleviate turnout misreporting in a number of ways. One approach is to disregard self-reports and instead measure respondent turnout using official records. Such “vote validation” exercises have been undertaken in several national election studies (e.g., in Sweden, New Zealand, Norway, the UK, and—until 1990—the United States). Although often considered the gold standard in dealing with misreporting, the vote validation approach in the US context has raised doubts, with Berent, Krosnick, and Lupia (2016) showing that matching errors artificially drive down “validated” turnout rates. While it is an open question to what extent matching errors are an issue outside the US context, vote validation has two additional downsides that limit its utility as a general solution for turnout misreporting. First, in many countries official records of who has voted in an election are not available. Second, these records, when available, are often decentralized, making validation a time-consuming and expensive undertaking. Another set of approaches for dealing with turnout misreporting focus on alleviating social desirability bias (for overviews, see Tourangeau and Yan [2007]; Holbrook and Krosnick [2010b]). Voting is an admired and highly valued civic behavior (Holbrook, Green, and Krosnick 2003; Karp and Brockington 2005; Bryan et al. 2011), creating incentives for nonvoters to deliberately or unconsciously misreport when asked about their electoral participation. Starting from this premise, some suggest that misreporting can be alleviated via appropriate choice of survey mode, with respondents more willing to report sensitive information in self- rather than interviewer-administered surveys (Hochstim 1967). Although Holbrook and Krosnick (2010b) find that turnout misreporting is reduced in self-administered online surveys compared to interviewer-administered telephone surveys, a systematic review of over 100 postelection surveys found no significant difference in turnout misreporting across survey modes (Selb and Munzert 2013). Reviewing studies on a variety of sensitive topics, Tourangeau and Yan (2007, p. 878) conclude that “even when the questions are self-administered... many respondents still misreport.” If choice of survey mode alone cannot resolve the misreporting problem, can we design turnout questions that do? One design-based approach for reducing misreporting is the “bogus pipeline” (Jones and Sigall 1971; Roese and Jamieson 1993), where the interviewer informs the respondent that their answer to the sensitive question will be verified against official records, thus increasing the respondent’s motivation to tell the truth (assuming being caught lying is more embarrassing than admitting to the sensitive behavior). Hanmer, Banks, and White (2014) find that this approach significantly reduces turnout misreporting. However, provided researchers do not want to mislead survey respondents, the applicability of the bogus pipeline is limited, since it necessitates vote validation for at least some respondents, which is costly and sometimes impossible. A simple alternative design-based approach is to combine “forgiving” question wording (Fowler 1995), which attempts to normalize nonvoting in the question preamble, with the provision of answer options that permit the respondent to admit nonvoting in a “face-saving” manner. Although turnout misreporting is unaffected by “forgiving” wording2 (Abelson, Loftus, and Greenwald 1992; Holtgraves, Eck, and Lasky 1997; Persson and Solevid 2014) and only moderately reduced by “face-saving” answer options (Belli et al. 1999; Belli, Moore, and VanHoewyk 2006; Persson and Solevid 2014; Zeglovits and Kritzinger 2014), many election studies incorporate one or both of these features in their turnout questions. We therefore include such turnout question designs as comparators in our experiments. Other design-based approaches to the misreporting problem involve indirect questions, which aim to reduce social desirability pressures by protecting privacy such that survey researchers are unable to infer individual respondents’ answers to the sensitive item. The well-known randomized response technique ensures this using a randomization device: Warner (1965), for example, asks respondents to truthfully state either whether they do bear the sensitive trait of interest, or whether they do not bear the sensitive trait of interest, based on the outcome of a whirl of a spinner unobserved by the interviewer. The researcher is thus unaware of which question an individual respondent is answering, but can estimate the rate of the sensitive behavior in the sample because she knows the probability with which respondents answer each item. Research suggests that this design fails to reduce turnout misreporting (Locander, Sudman, and Bradburn 1976; Holbrook and Krosnick 2010a) and raises concerns about its practicality: in telephone and self-administered surveys, it is difficult to ensure that respondents have a randomization device to hand and that they appropriately employ it (Holbrook and Krosnick 2010a).3 Recognizing these practical limitations, researchers have developed variants of the randomized response technique that do not require randomization devices. One recent example is the crosswise model (CM) (Yu, Tian, and Tang 2008; Tan, Tian, and Tang 2009) where respondents are asked two yes/no questions—a nonsensitive question where the population distribution of true responses is known, and the sensitive question of substantive interest—and indicate only whether or not their answers to the questions are identical. Based on respondents’ answers and the known distribution of answers to the nonsensitive item, researchers can again estimate the rate of the sensitive trait. CM has been shown to reduce misreporting on some sensitive topics (e.g., Coutts and Jann 2011; Jann, Jerke, and Krumpal 2012), but is as yet untested with regard to turnout. A final example of indirect questioning is the item-count technique (ICT), or “list experiment.” In this design, respondents are randomized into a control and treatment group. The control group receives a list of nonsensitive items, while the treatment group receives the same list plus the sensitive item. Respondents are asked to count the total number of listed items that satisfy certain criteria rather than answering with regard to each individual listed item. The prevalence of the sensitive trait is estimated based on the difference in mean item counts across the two groups (Miller 1984; Droitcour et al. 1991). The ICT performance record is mixed, with regard to both turnout (e.g., Holbrook and Krosnick 2010b; Comşa and Postelnicu 2013; Thomas et al. 2017) and other sensitive survey items (e.g., Tourangeau and Yan 2007; Wolter and Laier 2014). This mixed success may reflect the challenges researchers face in creating valid lists of control items—challenges that have been addressed in a recent series of articles (Blair and Imai 2012; Glynn 2013; Aronow et al. 2015). Below, we investigate whether an ICT question designed according to current best practice can reduce nonvoter misreporting in an online survey. Methods EXPERIMENTAL DESIGN Our survey experiment was designed to test whether new ICT and CM turnout question designs are effective at reducing misreporting, relative to more standard direct turnout questions with forgiving wording and face-saving response options. Our experiment was run online through YouGov across four survey waves in the aftermath of the UK general election on May 7, 2015 (see the Appendix for further sampling details). To limit memory error concerns, fieldwork was conducted soon after the election, June 8–15, 2015, with a sample of 6,228 respondents from the British population. Appendix Table A.2 reports sample descriptives, showing that these are broadly in line with those from the British Election Study (BES) face-to-face postelection survey, a high-quality probability sample, and with census data. SURVEY INSTRUMENTS Respondents were randomly assigned to one of four turnout questions. Direct question: Our baseline turnout question is the direct question used by the BES, which already incorporates a “forgiving’” introduction. Respondents were asked: “Talking with people about the recent general election on May 7th, we have found that a lot of people didn’t manage to vote. How about you, did you manage to vote in the general election?” Respondents could answer yes or no, or offer “don’t know.” The estimated aggregate turnout from this question is the (weighted or unweighted) proportion of respondents answering “Yes.” Direct face-saving question: This variant incorporates the preamble and question wording of the direct question, but response options are now those that Belli, Moore, and VanHoewyk (2006) propose for when data are collected within a few weeks of Election Day: “I did not vote in the general election”; “I thought about voting this time but didn’t”; “I usually vote but didn’t this time”; “I am sure I voted in the general election”; and “Don’t know.” The second and third answer options allow respondents to report nonvoting in the election while also indicating having had some intent to vote or having voted on other occasions, and may therefore make it easier for nonvoters to admit not having voted. Aggregate turnout is estimated as the (weighted or unweighted) proportion of respondents giving the penultimate response. Crosswise model (CM): Our CM question involves giving respondents the following question: “Talking with people about the recent general election on May 7th, we have found that a lot of people didn’t manage to vote or were reluctant to say whether or not they had voted. In order to provide additional protection of your privacy, this question uses a method to keep your answer totally confidential, so that nobody can tell for sure whether you voted or not. Please read the instructions carefully before answering the question. “Two questions are asked below. Please think about how you would answer each question separately (either with Yes or No). After that please indicate whether your answers to both questions are the same (No to both questions or Yes to both questions) or different (Yes to one question and No to the other).” The two questions were “Is your mother’s birthday in January, February or March?” and “Did you manage to vote in the general election?” This follows Jann, Jerke, and Krumpal (2012) in asking about parental birthdays as the nonsensitive question, as this satisfies key criteria for CM effectiveness (Yu, Tian, and Tang 2008): the probability of an affirmative response is known, unequal to 0.5, and uncorrelated with true turnout. We calculate the probability that a respondent’s mother was born in January, February, or March based on Office of National Statistics data on the birth dates of British women, 1938–1983. The calculated probability is 25.2 percent. So that respondents understand why they are being asked such a complex question, and consistent with Jann, Jerke, and Krumpal (2012), the preamble explicitly states that the question is designed to protect privacy. Following Yu, Tian, and Tang (2008), the CM estimate of aggregate turnout is π^CM=(r/n+p−1)/(2p−1), where n is the total number of respondents, r is the number who report matching answers, and p is the known probability of an affirmative answer to the nonsensitive question. The standard error is se^(π^CM)=((r/n)(1−r/n))/((n−1)(2p−1)2).4 Item-count technique (ICT): In the ICT design, respondents were asked: “The next question deals with the recent general election on May 7th. Here is a list of four (five) things that some people did and some people did not do during the election campaign or on Election Day. Please say how many of these things you did.” The list asked respondents whether they had: discussed the election with family and friends; voted in the election (sensitive item); criticized a politician on social media; avoided watching the leaders debate; and put up a poster for a political party in their window or garden. Respondents could provide an answer between 0 and 4 or say they did not know. This design incorporates a number of recommendations from recent studies of ICT effectiveness. First, to avoid drawing undue attention to our sensitive item, each nonsensitive item relates to activities that respondents might engage in during election periods (Kuklinski, Cobb, and Gilens 1997; Aronow et al. 2015; Lax, Phillips, and Stollwerk 2016). This contrasts with existing ICT-based turnout questions, which include non-political behaviors in the control list (Holbrook and Krosnick 2010b; Comşa and Postelnicu 2013; Thomas et al. 2017) and which have had mixed success in reducing misreporting. Second, we are careful to avoid ceiling and floor effects, which occur when a respondent in the treatment group engages in either all or none of the nonsensitive behaviors and therefore perceives that their answer to the sensitive item is no longer concealed from the researcher (Blair and Imai 2012; Glynn 2013). To minimize such effects, we include a “low-cost” control activity that most respondents should have undertaken (“discussed the election with family and friends”) and a “high-cost” activity that few respondents should have undertaken (“put up a poster for a political party”). In addition to implementing these recommendations, the control list includes some “norm-defiant” behaviors, such as “avoided watching the leaders debate” and “criticised a politician on social media.” Our intent here is to reduce embarrassment at admitting nonvoting by signaling to respondents that it is recognized that some people do not like and/or do not engage with politics. Unlike the CM design, and consistent with standard ICT designs for online surveys (e.g., Aronow et al. 2015; Lax, Phillips, and Stollwerk 2016), the preamble does not explicitly state that the question is designed to protect privacy. Our ICT-based estimate of aggregate turnout is the difference in (weighted or unweighted) mean item counts comparing the control and treatment groups (Blair and Imai 2012). For the weighted estimate, standard errors were calculated using Taylor linearization in the “survey” package (Lumley 2004) in R. Tests reported in Supplementary Materials Section B report diagnostics suggesting that this ICT design successfully minimizes ceiling and floor effects and satisfies other key identifying assumptions laid out in Blair and Imai (2012). RANDOMIZATION Respondents were randomly assigned to one of the four turnout questions described above. Due to its lower statistical efficiency, ICT received double weight in the randomization. Of the 6,228 respondents, 1,260 received the direct question, 1,153 the direct face-saving question, 2,581 the ICT question, and 1,234 the CM question. Supplementary Materials Section A suggests that randomization was successful. Results COMPARING TURNOUT ESTIMATES We begin our analysis by comparing headline turnout estimates. Figure 1 displays, for each survey technique, weighted and unweighted Britain-wide turnout estimates. Given the similarity between weighted and unweighted estimates, we focus on the former.5 Figure 1. View largeDownload slide 2015 turnout estimates by estimation method. For each turnout question, points indicate weighted and unweighted point estimates for 2015 general election turnout. Lines indicate 95 percent confidence intervals. The dashed vertical line indicates actual GB turnout. Figure 1. View largeDownload slide 2015 turnout estimates by estimation method. For each turnout question, points indicate weighted and unweighted point estimates for 2015 general election turnout. Lines indicate 95 percent confidence intervals. The dashed vertical line indicates actual GB turnout. The standard direct technique performs poorly, yielding a turnout estimate of 91.2 percent [89.3 percent, 93 percent], 24.7 points higher than actual turnout. In line with previous US (Belli, Moore, and VanHoewyk 2006) and Austrian (Zeglovits and Kritzinger 2014) studies, the face-saving question yields a modest improvement. It significantly reduces estimated turnout compared to the direct technique, but still performs poorly in absolute terms, estimating turnout at 86.6 percent [84.1 percent, 89 percent], 20.1 points higher than actual turnout. CM performs worst of all the techniques we test, estimating turnout at 94.3 percent [88.4 percent, 100 percent], 27.9 points higher than actual turnout. In contrast, while ICT is clearly less efficient (with a relatively wide confidence interval), it nevertheless yields a substantively and statistically significant improvement in turnout estimate accuracy compared to all other techniques.6 Though still 9.2 points higher than actual GB turnout, the ICT estimate of 75.7 percent [66.9 percent, 84.4 percent] represents a two-thirds reduction in error compared to the direct question estimate. Taking the difference between the ICT and direct turnout estimates in our data, one gets an implied misreporting rate of 15.5 percent [6.5 percent, 24.4 percent]. The confidence interval contains—and is therefore consistent with—the 10 percent rate of misreporting found by Rivers and Wells (2015), who validate the votes of a subset of YouGov respondents after the 2015 general election. In sum, the face-saving and ICT questions yield aggregate turnout estimates that are, respectively, moderately and substantially more accurate than those from the direct question, while CM yields no improvement.7 ICT, however, still overestimates actual 2015 turnout, which may partly be because ICT does not correct all misreporting. It may also be partly explained by the fact that while YouGov samples from this period have been found to overestimate aggregate turnout due to both misreporting and oversampling of politically interested individuals who are more likely to vote (Rivers and Wells 2015), ICT tackles only misreporting. Before probing the face-saving and ICT results using multivariate analysis, we pause to consider why the CM design failed. One possibility is that, faced with a somewhat unusual question and in the absence of a practice run, some respondents found the CM question unduly taxing and simply answered “don’t know.” If the propensity to do so is negatively correlated with turnout, this could explain why CM overestimates turnout. However, table 1 casts doubt on this explanation, showing that the proportion of “don’t know” responses are not substantially higher for CM compared to other treatments.8 Table 1. Rates of “don’t know” responses by treatment group Method “Don’t know” rate Direct 0.020 Face-saving 0.016 ICT control 0.036 ICT sensitive 0.021 CM 0.036 Method “Don’t know” rate Direct 0.020 Face-saving 0.016 ICT control 0.036 ICT sensitive 0.021 CM 0.036 Note.—Entries show, for each treatment group, the rate of “don’t know” responses to the item measuring turnout. View Large Table 1. Rates of “don’t know” responses by treatment group Method “Don’t know” rate Direct 0.020 Face-saving 0.016 ICT control 0.036 ICT sensitive 0.021 CM 0.036 Method “Don’t know” rate Direct 0.020 Face-saving 0.016 ICT control 0.036 ICT sensitive 0.021 CM 0.036 Note.—Entries show, for each treatment group, the rate of “don’t know” responses to the item measuring turnout. View Large A more plausible explanation for the disappointing performance of our CM lies in a combination of two features of this design. First, while such an unusual design necessitates an explanatory preamble, stating that it represents “an additional protection of your privacy,” may heighten the perceived sensitivity of the turnout question for respondents (Clifford and Jerit 2015). Second, in the absence of a run-through illustrating how the design preserves anonymity, respondents whose sensitivity was heightened by the preamble may distrust the design and become particularly susceptible to social desirability bias. This is consistent with Coutts and Jann (2011), who find that in an online setting, randomized response designs—which share many characteristics with CM—elicit relatively low levels of respondent trust. Solving this problem is not easy: doing a CM run-through in online surveys is time consuming and may frustrate respondents.9 COMPARING TURNOUT MODELS The improvement in aggregate turnout estimates yielded by face-saving and ICT questions suggests that they may alleviate turnout misreporting compared to the direct question. But do these techniques also yield inferences concerning the predictors of turnout that are more consistent with those drawn from data where misreporting is absent? To address this question, we estimate demographic models of turnout for the 2015 British general election based on direct, face-saving, and ICT questions. (Given its poor performance in estimating aggregate turnout, we do not estimate a model for the CM question.) We then compare each of these models against a benchmark model estimated using validated measures of individual turnout, based on official electoral records rather than respondent self-reports.10 To the best of our knowledge, the only publicly available individual-level validated vote measures for the 2015 general election are those from the postelection face-to-face survey of the 2015 British Election Study (Fieldhouse et al. 2016).11 Generated via probability sampling and persistent recontact efforts, the 2015 BES face-to-face survey is widely considered to be the “gold standard” among 2015 election surveys in terms of survey sample quality (Sturgis et al. 2016, p. 48). If the models estimated from online survey data using our turnout measures yield similar inferences to those estimated from the BES face-to-face data using validated turnout measures, we can be more confident that the former are properly correcting for misreporting.12 We estimate four regression models. First, a benchmark model is estimated using the 1,690 BES face-to-face respondents whose turnout was validated.13 This is a binary logistic regression with a response variable coded as 1 if official records show a respondent voted and 0 otherwise. Our second and third models are binary logistic regressions estimated using our online survey data and have as their response variable turnout as measured by the direct question and the direct face-saving question, respectively. For our fourth model, we use the ICT regression methods developed in Imai (2011) to model the responses to the ICT question in our online survey.14 All four regression models include the same explanatory variables. First, we include a measure of self-reported party identification.15 To avoid unduly small subsamples, respondents are classified into four groups: Conservative identifiers; Labour identifiers; identifiers of any other party; and those who do not identify with any party or who answer “don’t know.” Our second and third explanatory variables are age group (18–24; 25–39; 40–59; 60 and above) and gender (male or female). Our fourth explanatory variable is a respondent’s highest level of educational qualification, classified according to the UK Regulated Qualifications Framework (no qualifications, unclassified qualifications, or don’t know; Levels 1–2; Level 3; Level 4 and above). These predictors constitute the full set of variables that are measured in a comparable format in both our experimental data and the 2015 BES face-to-face data. Logistic regression coefficients are difficult to substantively interpret or compare across models. Therefore, we follow Blair and Imai (2012) and focus on predicted prevalence of the sensitive behavior for different political and demographic groups. Specifically, for a given sample and regression model, we ask what the predicted turnout rate in the sample would be if all BES face-to-face respondents were assigned to a particular category on a variable of interest, while holding all other explanatory variables at their observed values.16 Figure 2 graphs the group-specific predicted turnout rates for the regression models. The left panel shows that the regression based on the direct question (open circles) generates group-specific predicted turnout rates that all far exceed those from the benchmark validated vote model (filled circles).17 It also performs poorly in terms of recovering how turnout is associated with most variables. While the benchmark model yields predicted turnout rates for older and more qualified voters that are noticeably higher than for younger and less qualified voters, there is barely any variation in turnout rates by age group and education according to the direct question model. Only with respect to party identification does the direct question model recover the key pattern present in the benchmark model: that those with no clear party identification are less likely to vote than those who do. Figure 2. View largeDownload slide Comparing turnout models against BES validated vote model. Political and demographic groups are listed along the y-axis. For each group, we plot the predicted turnout rates based on a regression model, averaging over the distribution of covariates in the BES validated vote sample. Predicted turnout rates for the direct, face-saving, and ICT regression models are shown, respectively, in the left, middle, and right panels (open circles). Predicted turnout rates from the benchmark BES validated vote model are displayed in every panel (filled circles). Reading example: the filled circle for “Male” in each panel indicates that based on the validated vote regression model, if we set all respondents in the BES sample to “Male” while holding all other explanatory variables at their observed values, the predicted turnout rate would be 73.8 percent [70.1 percent, 76.7 percent]. Figure 2. View largeDownload slide Comparing turnout models against BES validated vote model. Political and demographic groups are listed along the y-axis. For each group, we plot the predicted turnout rates based on a regression model, averaging over the distribution of covariates in the BES validated vote sample. Predicted turnout rates for the direct, face-saving, and ICT regression models are shown, respectively, in the left, middle, and right panels (open circles). Predicted turnout rates from the benchmark BES validated vote model are displayed in every panel (filled circles). Reading example: the filled circle for “Male” in each panel indicates that based on the validated vote regression model, if we set all respondents in the BES sample to “Male” while holding all other explanatory variables at their observed values, the predicted turnout rate would be 73.8 percent [70.1 percent, 76.7 percent]. The middle panel shows that the regression based on the face-saving turnout question (open circles) improves somewhat on the direct question regression. The group-specific predicted turnout rates are generally slightly closer to the benchmark rates (filled circles), although most remain significantly higher. In terms of relative turnout patterns, there is some evidence of higher predicted turnout rates for higher age groups, but the differences between young and old voters are too small and predicted turnout rate barely varies by education. In addition, the difference in the predicted turnout rates of those with and without a clear party identity is actually more muted in the face-saving model than in the benchmark model or the direct question model. The right panel shows that the regression based on the ICT turnout questions (open circles) improves on both the direct and face-saving models. Although the uncertainty surrounding each group-specific turnout rate is considerably greater, most point estimates are closely aligned with the benchmark rates (filled circles). Moreover, this is not simply the result of an intercept shift: the ICT model also recovers relative patterns of turnout that are generally more consistent with the benchmark model. Regarding party identification, the difference in predicted turnout rates of those who do and do not have a clear party identification is of similar magnitude to that in the direct and face-saving models. Regarding age and education, as in the benchmark model, predicted turnout rates increase substantially with age group and qualification level.18 Predicted turnout for 18–24-year-olds seems unduly low. But there is considerable uncertainty surrounding this estimate due to the small proportion of respondents in this age group in the online sample (see Appendix Table A.2).19 Table 2 summarizes the performance of the different models vis-à-vis the benchmark model. The first three columns show the mean, median, and maximum absolute differences in predicted group-specific turnout rates across the 14 political and demographic groups listed in figure 2, comparing the benchmark model with each of the three remaining models. According to all measures, the face-saving model performs slightly better than the direct question model. But the ICT model performs substantially better than both, reducing mean and median discrepancies from the benchmark model by almost two-thirds. The final column of table 2 gives the fraction of group-specific predicted turnout rates that are significantly different from their benchmark model counterpart (0.05 significance level, two-tailed).20 While almost all of the predicted turnout rates from the direct and face-saving models are significantly different from their benchmark counterparts, this is the case for only two of the 14 predicted turnout rates from the ICT model. Table 2. Summary of differences from benchmark validated vote model Absolute differences Sig. difference Method Mean Median Maximum Direct 0.2 0.18 0.34 14/14 Face-saving 0.15 0.14 0.28 13/14 ICT 0.06 0.05 0.18 2/14 Absolute differences Sig. difference Method Mean Median Maximum Direct 0.2 0.18 0.34 14/14 Face-saving 0.15 0.14 0.28 13/14 ICT 0.06 0.05 0.18 2/14 Note.—For a given test model (row), the first three columns show the mean, median, and maximum absolute discrepancy between group-specific predicted turnout rates generated by this model and those generated by the benchmark validated vote model. The final column gives the fraction of group-specific predicted turnout rates that are significantly different from their benchmark model counterpart (0.05 significance level, two-tailed). View Large Table 2. Summary of differences from benchmark validated vote model Absolute differences Sig. difference Method Mean Median Maximum Direct 0.2 0.18 0.34 14/14 Face-saving 0.15 0.14 0.28 13/14 ICT 0.06 0.05 0.18 2/14 Absolute differences Sig. difference Method Mean Median Maximum Direct 0.2 0.18 0.34 14/14 Face-saving 0.15 0.14 0.28 13/14 ICT 0.06 0.05 0.18 2/14 Note.—For a given test model (row), the first three columns show the mean, median, and maximum absolute discrepancy between group-specific predicted turnout rates generated by this model and those generated by the benchmark validated vote model. The final column gives the fraction of group-specific predicted turnout rates that are significantly different from their benchmark model counterpart (0.05 significance level, two-tailed). View Large Overall, this analysis suggests that, as well as generating better aggregate estimates of turnout, ICT outperforms other techniques when it comes to estimating how turnout varies across political and demographic groups. Conclusions This paper compared the performance of several sensitive survey techniques designed to reduce turnout misreporting in postelection surveys. To do so, we ran an experiment shortly after the 2015 UK general election. One group of respondents received the standard BES turnout question. Another group received a face-saving turnout question previously untested in the UK. For a third group, we measured turnout using the crosswise model, the first time this has been tested in the turnout context. For a fourth group, we measured turnout using a new item-count question designed following current best practice. ICT estimates of aggregate turnout were significantly closer to the official 2015 turnout rate. We also introduced a more nuanced approach to validating ICT turnout measures: comparing inferences from demographic models of turnout estimated using ICT measures to those from models estimated using validated vote measures. Inferences from the ICT model were consistently closer to, and often statistically indistinguishable from, those from the benchmark validated vote model. Thus, in contrast to Holbrook and Krosnick (2010b) and Thomas et al. (2017), our findings suggest that carefully designed ICTs can significantly reduce turnout misreporting in online surveys. This suggests that in settings where practical or financial constraints make vote validation impossible, postelection surveys might usefully include ICT turnout questions. We also found that the direct turnout question with face-saving options did improve on the standard direct question, in both the accuracy of aggregate turnout estimates and the validity of demographic turnout models. However, consistent with previous research (e.g., Belli, Moore, and VanHoewyk 2006; Zeglovits and Kritzinger 2014), these improvements were moderate compared to those from ICT. In contrast, CM performed no better or worse than the standard direct turnout question in terms of estimating aggregate turnout. Taken together with Holbrook and Krosnick (2010a), this finding highlights the difficulty of successfully implementing randomized response questions and variants thereof in self-administered surveys. Of course, there are limitations to our findings. First, our evidence comes only from online surveys and the mechanisms behind social desirability bias may be different in this mode compared to when a respondent interacts with a human interviewer by telephone or face-to-face. That said, other studies do show that ICT reduces misreporting in telephone (Holbrook and Krosnick 2010b) and face-to-face surveys (Comşa and Postelnicu 2013). Second, a well-acknowledged drawback of ICT is its statistical inefficiency. While ICT significantly improves on other techniques despite this inefficiency, future research should investigate whether further efficiency-improving adaptations of the ICT design—such as the “double-list experiment” (Droitcour et al. 1991) and combining direct questions with ICT (Aronow et al. 2015)—are effective in the context of turnout measurement. Finally, our regression validation focused only on how basic descriptive respondent characteristics are correlated with turnout and our survey was conducted during one specific time period in relation to the election. Future research could also validate using attitudinal turnout correlates and could compare turnout questions when fielded closer to and further from Election Day. Supplementary Data Supplementary data are freely available at Public Opinion Quarterly online. The authors thank participants at the North East Research Development (NERD) Workshop and the 2016 Annual Meeting of the European Political Science Association in Brussels, as well as the editors and three anonymous reviewers, for helpful comments. This work was supported solely by internal funding to P.M.K. from the School of Government and International Affairs, Durham University. Appendix: Information on survey samples EXPERIMENTAL SURVEY DATA Our survey experiment was fielded via four online surveys run by YouGov. The fieldwork dates for each survey “wave” were, respectively, June 8–9 (Wave 1), June 9–10 (Wave 2), June 10–11 (Wave 3), and June 11–12, 2015 (Wave 4). Table A.1 reports the sample size for each treatment group in each survey wave. Table A.1. Treatment sample sizes by survey wave Treatment group Wave Direct Face-Saving ICT control ICT sensitive CM All 1 307 289 292 333 295 1516 2 335 312 350 342 311 1650 3 326 271 329 290 314 1530 4 292 281 324 321 314 1532 Treatment group Wave Direct Face-Saving ICT control ICT sensitive CM All 1 307 289 292 333 295 1516 2 335 312 350 342 311 1650 3 326 271 329 290 314 1530 4 292 281 324 321 314 1532 Note.—This table shows the distribution of treatment assignment by survey wave. View Large Table A.1. Treatment sample sizes by survey wave Treatment group Wave Direct Face-Saving ICT control ICT sensitive CM All 1 307 289 292 333 295 1516 2 335 312 350 342 311 1650 3 326 271 329 290 314 1530 4 292 281 324 321 314 1532 Treatment group Wave Direct Face-Saving ICT control ICT sensitive CM All 1 307 289 292 333 295 1516 2 335 312 350 342 311 1650 3 326 271 329 290 314 1530 4 292 281 324 321 314 1532 Note.—This table shows the distribution of treatment assignment by survey wave. View Large The target population for each survey wave was the adult population of Great Britain. YouGov maintains an online panel of over 800,000 UK adults (recruited via their own website, advertising, and partnerships with other websites) and holds data on the sociodemographic characteristics and newspaper readership of each panel member. Drawing on this information, YouGov uses targeted quota sampling, not random probability sampling, to select a subsample of panelists for participation in each survey. Quotas are based on the distribution of age, gender, social grade, party identification, region, and type of newspaper readership in the British adult population. YouGov has multiple surveys running at any time and uses a proprietary algorithm to determine, on a rolling basis, which panelists to email invites to and how to allocate invitees to surveys when they respond. Any given survey thus contains a reasonable number of panelists who are “slow” to respond to invites. Along with the modest cash incentives YouGov offers to survey participants, this is designed to increase the rate at which less politically engaged panelists take part in a survey. Due to the way respondents are assigned to surveys, YouGov does not calculate a per-survey participation rate. However, the overall rate at which panelists invited to participate in a survey do respond is 21 percent. The average response time for an email invite is 19 hours from the point of sending. Descriptive statistics for the sample are provided in table A.2, and a comparison to 2015 UK population characteristics is provided in table A.3. Table A.2. Sample characteristics: experimental data versus 2015 BES face-to-face survey Experimental BES N Mean N Mean Age group 18–24 6,227 0.08 2,955 0.07 Age group 25–39 6,227 0.18 2,955 0.21 Age group 40–59 6,227 0.41 2,955 0.35 Age group 60+ 6,227 0.33 2,955 0.37 Female 6,228 0.52 2,987 0.54 Male 6,228 0.48 2,987 0.46 Qualifications None/Other/Don’t know 6,228 0.21 2,987 0.30 Qualifications Level 1–2 6,228 0.22 2,987 0.21 Qualifications Level 3 6,228 0.19 2,987 0.13 Qualifications Level 4+ 6,228 0.38 2,987 0.36 Party ID None/Don’t know 6,228 0.17 2,964 0.16 Party ID Conservative 6,228 0.29 2,964 0.31 Party ID Labour 6,228 0.29 2,964 0.32 Party ID other party 6,228 0.24 2,964 0.21 Social grade DE 6,228 0.20 Social grade C2 6,228 0.15 Social grade C1 6,228 0.26 Social grade AB 6,228 0.39 Wave 1 6,228 0.24 Wave 2 6,228 0.26 Wave 3 6,228 0.25 Wave 4 6,228 0.25 Direct treatment 6,228 0.20 Face-saving treatment 6,228 0.19 ICT control treatment 6,228 0.21 ICT sensitive treatment 6,228 0.21 CM treatment 6,228 0.20 Experimental BES N Mean N Mean Age group 18–24 6,227 0.08 2,955 0.07 Age group 25–39 6,227 0.18 2,955 0.21 Age group 40–59 6,227 0.41 2,955 0.35 Age group 60+ 6,227 0.33 2,955 0.37 Female 6,228 0.52 2,987 0.54 Male 6,228 0.48 2,987 0.46 Qualifications None/Other/Don’t know 6,228 0.21 2,987 0.30 Qualifications Level 1–2 6,228 0.22 2,987 0.21 Qualifications Level 3 6,228 0.19 2,987 0.13 Qualifications Level 4+ 6,228 0.38 2,987 0.36 Party ID None/Don’t know 6,228 0.17 2,964 0.16 Party ID Conservative 6,228 0.29 2,964 0.31 Party ID Labour 6,228 0.29 2,964 0.32 Party ID other party 6,228 0.24 2,964 0.21 Social grade DE 6,228 0.20 Social grade C2 6,228 0.15 Social grade C1 6,228 0.26 Social grade AB 6,228 0.39 Wave 1 6,228 0.24 Wave 2 6,228 0.26 Wave 3 6,228 0.25 Wave 4 6,228 0.25 Direct treatment 6,228 0.20 Face-saving treatment 6,228 0.19 ICT control treatment 6,228 0.21 ICT sensitive treatment 6,228 0.21 CM treatment 6,228 0.20 Note.—All respondent attributes were coded as binary indicators. Columns 1–2 and 3–4 summarize, respectively, the distribution of each indicator in our experimental data and in the 2015 BES face-to-face sample. View Large Table A.2. Sample characteristics: experimental data versus 2015 BES face-to-face survey Experimental BES N Mean N Mean Age group 18–24 6,227 0.08 2,955 0.07 Age group 25–39 6,227 0.18 2,955 0.21 Age group 40–59 6,227 0.41 2,955 0.35 Age group 60+ 6,227 0.33 2,955 0.37 Female 6,228 0.52 2,987 0.54 Male 6,228 0.48 2,987 0.46 Qualifications None/Other/Don’t know 6,228 0.21 2,987 0.30 Qualifications Level 1–2 6,228 0.22 2,987 0.21 Qualifications Level 3 6,228 0.19 2,987 0.13 Qualifications Level 4+ 6,228 0.38 2,987 0.36 Party ID None/Don’t know 6,228 0.17 2,964 0.16 Party ID Conservative 6,228 0.29 2,964 0.31 Party ID Labour 6,228 0.29 2,964 0.32 Party ID other party 6,228 0.24 2,964 0.21 Social grade DE 6,228 0.20 Social grade C2 6,228 0.15 Social grade C1 6,228 0.26 Social grade AB 6,228 0.39 Wave 1 6,228 0.24 Wave 2 6,228 0.26 Wave 3 6,228 0.25 Wave 4 6,228 0.25 Direct treatment 6,228 0.20 Face-saving treatment 6,228 0.19 ICT control treatment 6,228 0.21 ICT sensitive treatment 6,228 0.21 CM treatment 6,228 0.20 Experimental BES N Mean N Mean Age group 18–24 6,227 0.08 2,955 0.07 Age group 25–39 6,227 0.18 2,955 0.21 Age group 40–59 6,227 0.41 2,955 0.35 Age group 60+ 6,227 0.33 2,955 0.37 Female 6,228 0.52 2,987 0.54 Male 6,228 0.48 2,987 0.46 Qualifications None/Other/Don’t know 6,228 0.21 2,987 0.30 Qualifications Level 1–2 6,228 0.22 2,987 0.21 Qualifications Level 3 6,228 0.19 2,987 0.13 Qualifications Level 4+ 6,228 0.38 2,987 0.36 Party ID None/Don’t know 6,228 0.17 2,964 0.16 Party ID Conservative 6,228 0.29 2,964 0.31 Party ID Labour 6,228 0.29 2,964 0.32 Party ID other party 6,228 0.24 2,964 0.21 Social grade DE 6,228 0.20 Social grade C2 6,228 0.15 Social grade C1 6,228 0.26 Social grade AB 6,228 0.39 Wave 1 6,228 0.24 Wave 2 6,228 0.26 Wave 3 6,228 0.25 Wave 4 6,228 0.25 Direct treatment 6,228 0.20 Face-saving treatment 6,228 0.19 ICT control treatment 6,228 0.21 ICT sensitive treatment 6,228 0.21 CM treatment 6,228 0.20 Note.—All respondent attributes were coded as binary indicators. Columns 1–2 and 3–4 summarize, respectively, the distribution of each indicator in our experimental data and in the 2015 BES face-to-face sample. View Large Table A.3. Sample characteristics compared to 2011 Census Experimental BES Census Age group 18–24 8.0 7.3 11.8 Age group 25–39 18.3 20.9 25.4 Age group 40–59 41.1 34.9 34.2 Age group 60+ 32.6 36.9 28.6 Female 52.3 54.1 51.4 Male 47.7 45.9 48.6 Experimental BES Census Age group 18–24 8.0 7.3 11.8 Age group 25–39 18.3 20.9 25.4 Age group 40–59 41.1 34.9 34.2 Age group 60+ 32.6 36.9 28.6 Female 52.3 54.1 51.4 Male 47.7 45.9 48.6 Note.—The first two column show the relative frequency of age groups and gender in the experimental data and in the 2015 BES face-to-face survey. The final column shows the GB population frequency of each demographic group according to the 2011 Census. View Large Table A.3. Sample characteristics compared to 2011 Census Experimental BES Census Age group 18–24 8.0 7.3 11.8 Age group 25–39 18.3 20.9 25.4 Age group 40–59 41.1 34.9 34.2 Age group 60+ 32.6 36.9 28.6 Female 52.3 54.1 51.4 Male 47.7 45.9 48.6 Experimental BES Census Age group 18–24 8.0 7.3 11.8 Age group 25–39 18.3 20.9 25.4 Age group 40–59 41.1 34.9 34.2 Age group 60+ 32.6 36.9 28.6 Female 52.3 54.1 51.4 Male 47.7 45.9 48.6 Note.—The first two column show the relative frequency of age groups and gender in the experimental data and in the 2015 BES face-to-face survey. The final column shows the GB population frequency of each demographic group according to the 2011 Census. View Large 2015 BRITISH ELECTION STUDY FACE-TO-FACE SURVEY The 2015 British Election Study face-to-face study (Fieldhouse et al. 2016) was funded by the British Economic and Social Research Council (ESRC). Fieldwork was conducted by GfK Social Research between May 8 and September 13, 2015, with 97 percent of the interviews being conducted within three months of the general election date (May 7, 2015). Interviews were carried out via computer-assisted interviewing. Full details of the sampling procedure are given in Moon and Bhaumik (2015). Here we provide a brief overview based on their account. The sample was designed to be representative of all British adults who were eligible to vote in the 2015 general election. It was selected via multistage cluster sampling as follows: first, a stratified random sample of 300 parliamentary constituencies was drawn; second, two Lower Layer Super Output Areas (LSOAs) per constituency were randomly selected, with probability proportional to size; third, household addresses were sampled randomly within each LSOA; and fourth, one individual was randomly selected per household. Overall, 2,987 interviews were conducted. According to the standard AAPOR conventions for reporting response rates, this represents a 55.9 percent response rate (response rate 3). Descriptive statistics for the sample are provided in table A.2, and a comparison to 2015 UK population characteristics is provided in table A.3. Turnout was validated against the marked electoral register using the name and address information of face-to-face respondents who had given their permission for their voting behavior to be validated. The marked electoral register is the copy of the electoral register used by officials at polling stations on Election Day. Officials at polling stations put a mark on the register to indicate when a listed elector has voted. The marked registers are kept by UK local authorities for 12 months after Election Day. The BES team collaborated with the UK Electoral Commission, which asked local authorities to send copies of marked registers for inspection.21 Respondents were coded into five categories based on inspection of the register (Mellon and Prosser 2015, appendix B): Voted: The respondent appeared on the electoral register and was marked as having voted. Not voted—registered: The respondent appeared on the electoral register but was not marked as having voted. Not voted—unregistered: The respondent did not appear on the electoral register, but there was sufficient information to infer that they were not registered to vote, for example, other people were registered to vote at the address, or if no one was registered at the address people were registered at surrounding addresses. Insufficient information: We did not have sufficient information in the register to assess whether the respondent was registered and voted, because either we were missing the necessary pages from the register or we had not been sent the register. Ineligible: The respondent was on the electoral register but was marked ineligible to vote in the general election. Mellon and Prosser (2015) report that validated turnout for a subset of respondents was coded by multiple coders, and that reliability was high (coders gave the same outcome in 94.8 percent of cases). Footnotes 1. The average difference between survey and official turnout rate across 150 Comparative Study of Electoral Systems (CSES) postelection surveys is around 12 percentage points (Comparative Study of Electoral Systems 2017). 2. Other changes to the preamble of a turnout question aimed at increasing truthful reporting, such as asking for polling station location, were equally unsuccessful (Presser 1990). 3. Rosenfeld, Imai, and Shapiro (2016) do find that a randomized response design appears to reduce misreporting of sensitive vote choices, but also find evidence of potential noncompliance in respondent implementation of the randomization device (796). 4. For weighted CM estimates, we replace the term r/n with ∑yiwi, where yi is a binary indicator of whether respondent i reports matching answers, and wi denotes the survey weight for observation i. Weights are standardized so that ∑wi=1. We also replace n in the denominator of the standard error equation with effective sample size based on Kish’s approximate formula (Kish 1965). 5. We use standard YouGov weights, generated by raking the sample to the population marginal distributions of age-group × gender, social grade, newspaper readership, region, and party identification. 6. Despite the slight overlap in confidence intervals for weighted estimates, the differences between the weighted ICT and face-saving estimates are statistically significant (weighted, z= 3.39, P-value < 0.01; un-weighted, z= 4.33, P-value < 0.01). Schenker and Gentleman (2001) show that overlapping confidence intervals do not necessarily imply non-significant differences. The differences between ICT and direct estimates are also significant (weighted, z= 2.34, P-value = 0.019; un-weighted, z= 3.43, P-value < 0.01). 7. Supplementary Materials Section C shows that question effects are consistent when each of the four survey waves is treated as a distinct replication of our experiment. 8. The difference in the rate of “don’t know” responses between CM and other treatments is often statistically significant ( z= –2.51, P-value = 0.012 for CM vs. direct question; z= –3.06, P-value < 0.01 for CM vs. face-saving question; z= –0.13, P-value = 0.9 for CM vs. ICT control; z= –2.32, P-value = 0.02 for CM vs. ICT sensitive). However, the maximum magnitude of any difference in “don’t know” rates is two percentage points. 9. The complexity of CM designs can lead to noncompliance and misclassification, and thus less accurate measures of sensitive behaviors relative to a direct question (Höglinger and Diekmann 2017). 10. We must estimate distinct regression models for each question type because the ICT turnout measure does not yield individual-level turnout measures and therefore cannot be modeled using standard regression methods. 11. Data from the online survey vote validation study reported in Rivers and Wells (2015) is not currently publicly available. 12. Note that differences between turnout models estimated from the two data sources may be due not only to residual misreporting in the online self-reports, but also to differences in the sample characteristics of a face-to-face versus an online survey. Indeed, Karp and Lühiste (2016) argue that turnout models estimated from online and face-to-face samples yield different inferences regarding the relationship between demographics and political participation. However, their evidence is based on direct and nonvalidated measures of turnout. It is possible that once misreporting is addressed in both types of survey mode, inferences become more similar. 13. Of this subsample, 1,286 (76.1 percent) voted. The 17 respondents who were measured as “ineligible” to vote were coded as having not voted. 14. We estimate the ICT regression model using the “list” package (Blair and Imai 2010) in R. 15. For the online data, this was measured by YouGov right after the 2015 general election. 16. First, we simulate 10,000 Monte Carlo draws of the model parameters from a multivariate normal distribution with mean vector and variance-covariance matrix equal to the estimated coefficients and variance-covariance matrix of the regression model. Second, for each draw, we calculate predicted turnout probabilities for all respondents in the BES face-to-face sample—setting all respondents to be in the political or demographic group of interest and leaving other predictor variables at their actual value—and store the mean turnout probability in the sample. The result is 10,000 simulations of the predicted turnout rate if all respondents in the sample were in a particular category on a particular political or demographic variable, averaging over the sample distribution of the other explanatory variables. The point estimate for the predicted turnout rate is the mean of these 10,000 simulations, and the 95 percent confidence interval is given by the 2.5th and 97.5th percentiles. Our results are substantively unchanged if predicted turnout rates were calculated based on the experimental survey sample. 17. Supplementary Materials Section E graphs the corresponding differences in predicted turnout rates, and Section D reports raw regression coefficients for each model. 18. The differences between the group-specific predicted turnout rates from the ICT and direct models imply that younger voters and less qualified voters in particular tend to misreport voting. This is consistent with the differences between the BES benchmark model and the direct model in figure 2 and with earlier UK vote validation studies. Swaddle and Heath (1989), for example, find that “the groups with the lowest turnout are the ones who are most likely to exaggerate their turnout.” This is different from misreporting patterns found in US studies (Bernstein, Chadha, and Montjoy 2001). 19. The confidence interval for this age group is also wide for the direct and face-saving models, but the uncertainty induced by small sample size is amplified by the inefficiency of the ICT measures. 20. Significance tests are based on the Monte Carlo simulations described above. 21. Despite persistent reminders from the BES team and their vote validation partner organization, the Electoral Commission, several local authorities did not supply their marked electoral registers. As a result, overall the validated vote variable is missing for around 15 percent of the face-to-face respondents who agreed to be matched (Mellon and Prosser 2015). References Abelson , Robert P. , Elizabeth F. Loftus , and Anthony G. Greenwald . 1992 . “ Attempts to Improve the Accuracy of Self-Reports of Voting .” In Questions About Questions: Inquiries into the Cognitive Bases of Surveys , edited by Judith M. Tanur , pp. 138 – 53 . New York : Russell Sage Foundation . Aronow , Peter , Alexander Coppock , Forrest W. Crawford , and Donald P. Green . 2015 . “ Combining List Experiments and Direct Question Estimates of Sensitive Behavior Prevalence .” Journal of Survey Statistics and Methodology 3 : 43 – 66 . Google Scholar CrossRef Search ADS PubMed Belli , Robert F. , Sean E. Moore , and John VanHoewyk . 2006 . “ An Experimental Comparison of Question Forms Used to Reduce Vote Overreporting .” Electoral Studies 25 : 751 – 59 . Google Scholar CrossRef Search ADS Belli , Robert F. , Michael W. Traugott , Margaret Young , and Katherine A. McGonagle . 1999 . “ Reducing Vote Over-Reporting in Surveys: Social Desirability, Memory Failure, and Source Monitoring .” Public Opinion Quarterly 63 : 90 – 108 . Google Scholar CrossRef Search ADS Berent , Matthew K. , Jon A. Krosnick , and Arthur Lupia . 2016 . “ Measuring Voter Registration and Turnout in Surveys: Do Official Government Records Yield More Accurate Assessments ?” Public Opinion Quarterly 80 : 597 – 621 . Google Scholar CrossRef Search ADS Bernstein , Robert , Anita Chadha , and Robert Montjoy . 2001 . “ Overreporting Voting: Why It Happens and Why It Matters .” Public Opinion Quarterly 65 : 22 – 44 . Google Scholar CrossRef Search ADS PubMed Blair , Graeme , and Kosuke Imai . 2010 . “ list: Statistical Methods for the Item Count Technique and List Experiment .” Comprehensive R Archive Network (CRAN) . Available at http://CRAN.R-project.org/package=list. ———. 2012 . “ Statistical Analysis of List Experiments .” Political Analysis 20 : 47 – 77 . CrossRef Search ADS Blair , Graeme , Kosuke Imai , and Yang-Yang Zhou . 2015 . “ Design and Analysis of the Randomized Response Technique .” Journal of the American Statistical Association 110 : 1304 – 19 . Google Scholar CrossRef Search ADS Brehm , John . 1993 . The Phantom Respondents: Opinion Surveys and Political Representation . Ann Arbor : Michigan University Press . Bryan , Christopher J. , Gregory M. Walton , Todd Rogers , and Carol S. Dweck . 2011 . “ Motivating Voter Turnout by Invoking the Self .” Proceedings of the National Academy of Sciences 108 : 12653 – 56 . Google Scholar CrossRef Search ADS Cassel , Carol A . 2003 . “ Overreporting and Electoral Participation Research .” American Politics Research 31 : 81 – 92 . Google Scholar CrossRef Search ADS Clifford , Scott , and Jennifer Jerit . 2015 . “ Do Attempts to Improve Respondent Attention Increase Social Desirability Bias ?” Public Opinion Quarterly 79 : 790 – 802 . Google Scholar CrossRef Search ADS The Comparative Study of Electoral Systems (CSES). 2017. “CSES Module 4 Fourth Advance Release” [dataset]. April 11, 2017 version. doi:10.7804/cses.module4.2017-04-11 Comşa , Mircea , and Camil Postelnicu . 2013 . “ Measuring Social Desirability Effects on Self-Reported Turnout Using the Item Count Technique .” International Journal of Public Opinion Research 25 : 153 – 72 . Google Scholar CrossRef Search ADS Coutts , Elisabeth , and Ben Jann . 2011 . “ Sensitive Questions in Online Surveys: Experimental Results for the Randomized Response Technique (RRT) and the Unmatched Count Technique (UCT) .” Sociological Methods & Research 40 : 169 – 93 . Google Scholar CrossRef Search ADS Droitcour , Judith , Rachel A. Caspar , Michael L. Hubbard , Teresa L. Parsley , Wendy Visscher , and Trena M. Ezzati . 1991 . “ The Item Count Technique as a Method of Indirect Questioning: A Review of its Development and a Case Study Application .” In: Paul B. Biemer, Robert M. Groves, Lars E. Lyberg, Nancy A. Mathiowetz, and Seymour Sudman. 1991. “ Measurement Errors in Surveys .” New York : Wiley . Chapter 11. Available at https://onlinelibrary.wiley.com/doi/10.1002/9781118150382.ch11 Google Scholar CrossRef Search ADS Fieldhouse , Ed , Jane Green , Geoffrey Evans , Hermann Schmitt , Cees van der Eijk , Jonathan Mellon , and Chris Prosser . 2016 . British Election Study, 2015: Face-to-Face Postelection Survey. [Data Collection] . UK Data Service . Fowler , Floyd J . 1995 . Improving Survey Questions: Design and Evaluation . Thousand Oaks, CA : Sage Publications . Glynn , Adam N . 2013 . “ What Can We Learn with Statistical Truth Serum? Design and Analysis of the List Experiment .” Public Opinion Quarterly 77 : 159 – 77 . Google Scholar CrossRef Search ADS Hanmer , Michael J. , Antoine J. Banks , and Ismail K. White . 2014 . “ Experiments to Reduce the Over-Reporting of Voting: A Pipeline to the Truth .” Political Analysis 22 : 130 – 41 . Google Scholar CrossRef Search ADS Hochstim , Joseph R . 1967 . “ A Critical Comparison of Three Strategies of Collecting Data from Households .” Journal of the American Statistical Association 62 : 976 – 89 . Google Scholar CrossRef Search ADS Högliner , Marc , and Andreas Diekmann . 2017 . “ Uncovering a Blind Spot in Sensitive Question Research: False Positives Undermine the Crosswise-Model RRT .” Political Analysis 25 : 131 – 37 . Google Scholar CrossRef Search ADS Holbrook , Allyson L. , Melanie C. Green , and Jon A. Krosnick . 2003 . “ Telephone vs. Face-to-Face Interviewing of National Probability Samples with Long Questionnaires: Comparisons of Respondent Satisficing and Social Desirability Bias .” Public Opinion Quarterly 67 : 79 – 125 . Google Scholar CrossRef Search ADS Holbrook , Allyson L. , and Jon A. Krosnick . 2010a . “ Measuring Voter Turnout by Using the Randomized Response Technique: Evidence Calling into Question the Method’s Validity .” Public Opinion Quarterly 74 : 328 – 43 . Google Scholar CrossRef Search ADS ———. 2010b . “ Social Desirability Bias in Voter Turnout Reports: Tests Using the Item Count Techniques .” Public Opinion Quarterly 74 : 37 – 67 . CrossRef Search ADS Holtgraves , Thomas , James Eck , and Benjamin Lasky . 1997 . “ Face Management, Question Wording, and Social Desirability .” Journal of Applied Social Psychology 27 : 1650 – 71 . Google Scholar CrossRef Search ADS Imai , Kosuke . 2011 . “ Multivariate Regression Analysis for the Outcome Count Technique .” Journal of the American Statistical Association 106 : 407 – 16 . Google Scholar CrossRef Search ADS Jackman , Simon . 1999 . “ Correcting Surveys for Non-Response and Measurement Error Using Auxiliary Information .” Electoral Studies 18 : 7 – 27 . Google Scholar CrossRef Search ADS Jann , Ben , Julia Jerke , and Ivar Krumpal . 2012 . “ Asking Sensitive Questions Using the Crosswise Model: An Experimental Survey Measuring Plagiarism .” Public Opinion Quarterly 76 : 32 – 49 . Google Scholar CrossRef Search ADS Jones , Edward E. , and Harald Sigall . 1971 . “ The Bogus Pipeline: New Paradigm for Measuring Affect and Attitude .” Psychological Bulletin 76 : 349 – 64 . Google Scholar CrossRef Search ADS Jones , Emily . 2008 . “ Vote Overreporting: The Statistical and Policy Implications .” Policy Perspectives 15 : 83 – 97 . Google Scholar CrossRef Search ADS Karp , Jeffrey A. , and David Brockington . 2005 . “ Social Desirability and Response Validity: A Comparative Analysis of Overreporting Voter Turnout in Five Countries .” Journal of Politics 67 : 825 – 40 . Google Scholar CrossRef Search ADS Karp , Jeffrey A. , and Maarja Lühiste . 2016 . “ Explaining Political Engagement with Online Panels: Comparing the British and American Election Studies .” Public Opinion Quarterly 80 : 666 – 93 . Google Scholar CrossRef Search ADS Kish , Leslie . 1965 . Survey Sampling . New York : John Wiley and Sons . Kuklinski , James H. , Michael D. Cobb , and Martin Gilens . 1997 . “ Racial Attitudes and the ‘New South.’ ” Journal of Politics 59 : 323 – 49 . Google Scholar CrossRef Search ADS Lax , Jeffrey R. , Justin H. Phillips , and Alissa F. Stollwerk . 2016 . “ Are Survey Respondents Lying about Their Support for Same-Sex Marriage ?” Public Opinion Quarterly 80 : 510 – 33 . Google Scholar CrossRef Search ADS PubMed Locander , William , Seymour Sudman , and Norman Bradburn . 1976 . “ An Investigation of Interview Method, Threat and Response Distortion .” Journal of the American Statistical Association 71 : 269 – 75 . Google Scholar CrossRef Search ADS Lumley , Thomas . 2004 . “ Analysis of Complex Survey Samples .” Journal of Statistical Software 9 ( 8 ). Available at https://www.jstatsoft.org/issue/view/v009 McDonald , Michael P . 2003 . “ On the Over-Report Bias of the National Election Study Turnout Rate .” Political Analysis 11 : 180 – 86 . Google Scholar CrossRef Search ADS Mellon , Jonathan and Christopher Prosser . 2017 . “ Missing Nonvoters and Misweighted Samples: Explaining the 2015 Great British Polling Miss .” Public Opinion Quarterly 81 ( 3 ), 661 – 87 . Google Scholar CrossRef Search ADS Miller , Judith D . 1984 . “ A New Survey Technique for Studying Deviant Behavior .” Moon , Nick , and Claire Bhaumik . 2015 . “British Election Study 2015: Technical Report.” GfK UK Social Research . Persson , Mikael , and Maria Solevid . 2014 . “ Measuring Political Participation—Testing Social Desirability Bias in a Web-Survey Experiment .” International Journal of Public Opinion Research 26 : 98 – 112 . Google Scholar CrossRef Search ADS Presser , Stanley . 1990 . “ Can Context Changes Reduce Vote Over-Reporting ?” Public Opinion Quarterly 54 : 586 – 93 . Google Scholar CrossRef Search ADS Rivers , Douglas , and Anthony Wells . 2015 . “Polling Error in the 2015 UK General Election: An Analysis of YouGov’s Pre- and Postelection Polls.” YouGov Inc . Roese , Neal J. , and David W. Jamieson . 1993 . “ Twenty Years of Bogus Pipeline Research: A Critical Review and Meta-Analysis .” Psychological Bulletin 114 : 809 – 32 . Google Scholar CrossRef Search ADS Rosenfeld , Bryn , Kosuke Imai , and Jacob N. Shapiro . 2016 . “ An Empirical Validation Study of Popular Survey Methodologies for Sensitive Questions .” American Journal of Political Science 60 : 783 – 802 . Google Scholar CrossRef Search ADS Schenker , Nathaniel , and Jane F. Gentleman . 2001 . “ On Judging the Significance of Differences by Examining the Overlap Between Confidence Intervals .” American Statistician 55 : 182 – 86 . Google Scholar CrossRef Search ADS Selb , Peter , and Simon Munzert . 2013 . “ Voter Overrepresentation, Vote Misreporting, and Turnout Bias in Postelection Surveys .” Electoral Studies 32 : 186 – 96 . Google Scholar CrossRef Search ADS Sturgis , Patrick , Nick Baker , Mario Callegaro , Stephen Fisher , Jane Green , Will Jennings , Jouni Kuha , Benjamin E. Lauderdale , and Patten Smith . 2016 . “ Report of the Inquiry into the 2015 British General Election Opinion Polls .” Market Research Society; British Polling Council . Swaddle , Kevin , and Anthony Heath . 1989 . “ Official and Reported Turnout in the British General Election of 1987 .” British Journal of Political Science 19 : 537 – 51 . Google Scholar CrossRef Search ADS Tan , Ming T. , Guo-Liang Tian , and Man-Lai Tang . 2009 . “ Sample Surveys with Sensitive Questions: A Nonrandomized Response Approach .” American Statistician 63 : 9 – 16 . Google Scholar CrossRef Search ADS Thomas , Kathrin , David Johann , Sylvia Kritzinger , Carolina Plescia , and Eva Zeglovits . 2017 . “ Estimating Sensitive Behavior: The ICT and High Incidence Electoral Behavior .” International Journal of Public Opinion Research 29 : 157 – 71 . Tourangeau , Roger , and Ting Yan . 2007 . “ Sensitive Questions in Surveys .” Psychological Bulletin 133 : 859 – 83 . Google Scholar CrossRef Search ADS PubMed Voogt , Robert J. J. , and Willem E. Saris . 2003 . “ To Participate or Not to Participate: The Link Between Survey Participation, Electoral Participation, and Political Interest .” Political Analysis 11 : 164 – 79 . Google Scholar CrossRef Search ADS Warner , Stanley L . 1965 . “ Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias .” Journal of the American Statistical Association 60 : 63 – 69 . Google Scholar CrossRef Search ADS PubMed Wolter , Felix , and Bastian Laier . 2014 . “ The Effectiveness of the Item Count Technique in Eliciting Valid Answers to Sensitive Questions: An Evaluation in the Context of Self-Reported Delinquency .” Survey Research Methods 8 : 153 – 68 . Yu , Jun-Wu , Guo-Liang Tian , and Man-Lai Tang . 2008 . “ Two New Models for Survey Sampling with Sensitive Characteristic: Design and Analysis .” Metrika 67 : 251 – 63 . Google Scholar CrossRef Search ADS Zeglovits , Eva , and Sylvia Kritzinger . 2014 . “ New Attempts to Reduce Overreporting of Voter Turnout and Their Effects .” International Journal of Public Opinion Research 26 : 224 – 34 . Google Scholar CrossRef Search ADS © The Author(s) 2018. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)
journal article
LitStream Collection
Can Reshuffles Improve Government Popularity?: Evidence from a “Pooling the Polls” Analysis

2018 Public Opinion Quarterly

doi: 10.1093/poq/nfy015

Abstract Scholars have recently argued that prime ministers reshuffle their cabinets strategically. Although some scholars assume that cabinet reshuffles help prime ministers increase their government’s popularity, this assumption has not been tested formally because of the endogeneity problem. In Japan, polling firms sometimes provide respondents with cues about a reshuffle when asking about cabinet approval following reshuffles, while others do not. I utilized this convention in the Japanese media to test the assumption that reshuffles increase cabinet approval ratings. Applying a dynamic linear model to pooled poll data from 2001 to 2015, I achieved high internal, external, and ecological validity. The analyses show that cues about reshuffles increase cabinet approval ratings by 2.4 percentage points on average, and the credible interval of the effect does not include zero. This result reinforces the findings of previous research on the theory of cabinet management. A chain of delegation is the key principle of representative democracy, and the path from a prime minister (PM) to individual ministers is part of that chain (Strøm 2000). Whether and how a PM can delegate power to appropriately qualified ministers and control them are important questions in a parliamentary democracy. Many studies have investigated these questions through formal and empirical analyses (e.g., Dewan and Myatt 2007; Berlinski, Dewan, and Dowding 2010; Kam et al. 2010; Dewan and Hortala-Vallve 2011). Similar questions arise in a presidential democracy, and researchers of presidential countries have studied delegation and accountability between president and cabinet (e.g., Camerlo and Pérez-Liñán 2015a, 2015b).1 Recently, scholars have focused on cabinet reshuffles as an effective way for PMs to improve their strategic government management. However, the empirical literature, with a few exceptions, has not addressed the important question of how cabinet reshuffles affect government popularity. Although this is a very simple question, it is not easy to identify the effect of reshuffles without bias, because reshuffles are strategic and thus endogenous. This study employed a unique research design to overcome the endogeneity problem and attempted to elucidate whether cabinet reshuffles positively impact government popularity. Utilizing the Japanese media environment, which provides conditions similar to a survey experiment, I estimated how knowing that a PM has reshuffled the cabinet affects citizens’ approval of their government. Although the same idea has appeared in previous Japanese literature that has considered a single reshuffle (Sugawara 2009; Suzuki 2009), I generalized the analysis for multiple cabinets and reshuffles. I employed a dynamic linear model for pooled poll data to overcome potential omitted variable bias and obtain a result with high internal, external, and ecological validity. The results show that information on cabinet reshuffles increases government popularity, implying that PMs can increase their popularity by reshuffling their cabinet. Cause and Effect of Cabinet Reshuffles Recent theoretical and empirical research on delegation and accountability problems with cabinet members has focused on cabinet reshuffles. Scholars have used formal models and interpreted the aim of cabinet reshuffles in various ways. Indriðason and Kam (2008) viewed cabinet reshuffles as a device through which PMs combat moral hazard. Quiroz Flores and Smith (2011; see also Quiroz Flores [2016]) interpreted that cabinet reshuffles allow leaders to deal with internal and external threats (i.e., internal party competitions and elections, respectively). From the reverse perspective, Dewan and Myatt (2007) argued that by not reshuffling (i.e., protecting) ministers hit by scandal, PMs may encourage the policy activism of clean ministers. In addition to arguments from game theory, a number of empirical analyses have been conducted to examine when and how frequently cabinet reshuffles occur. These studies have revealed the strategic nature of cabinet reshuffles. Kam and Indriðason (2005) showed through cross-country analysis that reshuffles are more likely in situations where a PM’s agency loss to cabinet ministers is high. Huber and Martinez-Gallardo (2008) argued that PMs utilize reshuffles to deal with problems of adverse selection and moral hazard by demonstrating that ministerial turnover is less likely if PMs are careful in screening ministers (e.g., the policy value of a portfolio is high), the talent pool is large, and/or the PM’s ability to replace ministers is constrained by a coalition. Quiroz Flores (2016) confirmed the strategic cabinet change in parliamentary democracies expected from Quiroz Flores and Smith’s (2011) formal theory, which contends that PMs are likely to depose competent ministers who might be rivals in intraparty competitions. Bäck et al. (2012) explored cabinet reshuffles as a valid measure of PMs’ intra-executive power and identified that European integration increased the frequency of reshuffles in Sweden because it strengthened the PM’s power. Martínez-Gallardo’s (2014) and Camerlo and Pérez-Liñán’s (2015a) analyses revealed that the strategic replacement of ministers also occurs in presidential democracies. Some previous research has assumed that cabinet reshuffles help PMs bolster their government’s popularity. For example, Kam and Indriðason (2005, p. 341) proposed hypotheses such as “reshuffles become more likely as the PM’s personal popularity declines”; and Burden (2015) proposed a similar argument and analysis. While other previous research did not make such an assumption, it seems to assume that, at least, cabinet reshuffles do not harm government popularity. In fact, when researchers consider the cost of reshuffles in creating formal models, they usually do not consider the possibility that reshuffles or firing ministers per se leads citizens to form a negative image of the government (Dewan and Myatt 2007; Quiroz Flores and Smith 2011). Why, then, can cabinet reshuffles improve government popularity? Kam and Indriðason (2005) argued that PMs could arrest the decline in their popularity by firing scandal-hit ministers and ministers responsible for unpopular policy, while a similar view was espoused by Bäck et al. (2012). In fact, Dewan and Dowding (2005) showed that individual ministerial resignations due to scandal or policy failure recover falling government popularity. Another possible mechanism is that newness or freshness of reshuffled cabinets is, in itself, attractive for voters, as just-inaugurated cabinets or presidents usually enjoy high popularity (e.g., Norpoth 1996). However, to the best of this author’s knowledge, these arguments have not yet been tested formally. It is not self-evident that reshuffles have a positive effect on government popularity or that they do not impair a government’s image. In fact, some previous studies have referred to the possibility that reshuffles might negatively impact government popularity. Indriðason and Kam (2008, p. 633) pointed out the possibility that “the public interprets reshuffles as a signal of a policy failure, departmental efficiency declines, etc.” Hansen et al. (2013, pp. 228–29) also argued that “cabinet reshuffles and ministerial dismissal may carry considerable cost for the prime minister, such as… the possibility of signaling discontinuity and turmoil to the public.” Moreover, reshuffles prevent ministers from acquiring experience in a particular portfolio, as Huber and Martinez-Gallardo (2004) argued; thus, the public may not welcome reshuffles. Researchers have most likely hesitated to tackle the question of the effect of cabinet reshuffles on government popularity because serious endogeneity originates from the strategic nature of cabinet reshuffles. If, as Kam and Indriðason (2005) have implied, PMs tend to reshuffle their cabinet when their popularity declines, events that negatively impact popularity are likely to coincide with reshuffles. In fact, Hansen et al. (2013) demonstrated that ministerial turnover is more likely when the unemployment rate is rising. Thus, simple correlation analysis may produce a negatively biased estimate of the effect of reshuffles on government popularity. Another possibility is that PMs tend to make other efforts at the same time as reshuffles in order to recover their declining popularity. In this case, even if we observe that government popularity increases after a cabinet reshuffle, it may be attributable to the PM’s other efforts. One notable exception to the hesitation shown by researchers is Dewan and Dowding (2005), who demonstrated, using an instrumental variable approach with ministers’ age as an instrument, that ministerial resignations have a positive effect on government popularity when there is high media coverage. Although they appear to have estimated the effect of ministerial resignations precisely, they focused on personal resignations due to “resignation issues” such as scandal and policy failure; thus, it is questionable whether it is possible to generalize their results to cabinet reshuffles. Research Design To deal with the endogeneity problem discussed above, I developed a novel research design that exploits the convention of opinion polls about government popularity in the Japanese mass media. Before introducing the research design, I will briefly overview Japanese politics in the period analyzed in this study (2001–2015). During this time, there were two major parties: the Liberal Democratic Party (LDP) and the Democratic Party of Japan (DPJ). The LDP is a right-leaning party, and the DPJ is a center-left party. The LDP has been the dominant party for most of the postwar period. Although the LDP changed its coalition partners several times in the 1990s, it has maintained a coalition with Komeito from 1999 to the present day. The charismatic LDP leader Junichiro Koizumi maintained his cabinet with high popularity ratings from 2001 to 2006, but his successors—Shinzo Abe, Yasuo Fukuda, and Taro Aso—failed to retain their popularity owing to scandals and economic depression, and their cabinets lasted for no more than 12 months. Japanese voters turned away from the LDP regime and instead chose the DPJ, founded in 1998 and the second party since then, in the 2009 general election. The DPJ formed a government with the Social Democratic Party and the People’s New Party as junior partners.2 However, the DPJ demonstrated its poor ability to manage government and lack of intra-party governance; accordingly, PMs from the DPJ—Yukio Hatoyama, Naoto Kan, and Yoshihiko Noda—suffered low popularity ratings except for their honeymoon period. The LDP returned to power in the 2012 general election, and its leader Abe made a comeback as PM. Abe’s cabinet maintained relatively high popularity ratings to the end of the period analyzed in this study. Figure 1 shows weekly government popularity (cabinet approval ratings, explained below) during the period of the analysis. The method for estimating popularity is explained in the next section. Figure 1. View largeDownload slide Estimated weekly cabinet approval ratings. Solid lines represent point estimates, and shaded bands represent 95 percent credible intervals. Cross marks indicate weeks when the PMs reshuffled their cabinet. Dotted lines indicate Jiji Press’s monthly polling results. Figure 1. View largeDownload slide Estimated weekly cabinet approval ratings. Solid lines represent point estimates, and shaded bands represent 95 percent credible intervals. Cross marks indicate weeks when the PMs reshuffled their cabinet. Dotted lines indicate Jiji Press’s monthly polling results. In Japan, government popularity is measured by cabinet approval. Many polling firms report a cabinet approval rating derived from their surveys. Cabinet approval is usually interpreted as the PM’s personal popularity, and it plays a critical role in securing support for the governing party and maintaining the cabinet in Japan. For example, Krauss and Nyblade (2005) showed that an increase in cabinet approval resulted in an increase in the LDP’s share of the vote during the LDP’s regime. Maeda (2013) reported that when the DPJ was in power, cabinet approval had a positive influence on DPJ support. Burden (2015) and Matsumoto and Laver (2015) argued that when the LDP is in power, the party applies pressure to its leader (the PM) to resign or reshuffle the cabinet when cabinet approval is low relative to the party’s popularity. Masuyama (2007) also showed that high cabinet approval is required for the PM to maintain his government. Cabinet approval is influenced by various factors, such as support for the governing party (Nishizawa 2001; Iida 2005; but see also Nakamura [2006]), several economic indicators (Inoguchi 1983), people’s economic evaluation (Nishizawa 2001; McElwain 2015), media coverage (Fukumoto and Mizuyoshi 2007; Hosogai 2010), and foreign disputes (Ohmura and Ohmura 2014). Some researchers have sought to examine how cabinet reshuffles affect cabinet approval in Japan. Nishizawa (2001) performed a time-series analysis of monthly cabinet approval ratings and found that the coefficient of the dummy variable for cabinet reshuffles was not statistically significant. In contrast, Nakamura (2006) increased the sophistication of Nishizawa’s time-series model and argued that reshuffles have a positive effect on cabinet approval. Ohmura and Ohmura (2014) also analyzed monthly cabinet approval ratings and found that cabinet reshuffles significantly increased such ratings during the Cold War period, but significantly decreased them in the post–Cold War period. However, their analysis suffers from the endogeneity problem discussed in the previous section. Although details vary depending on polling firms, cabinet approval in Japan is commonly measured by questions such as “Do you support [the PM’s surname] cabinet?” However, the wording of the polling question is sometimes modified when a significant event occurs. One such event is a cabinet reshuffle. Some polling firms provide respondents with information that is henceforth called a “reshuffle cue,” such as telling respondents that the PM recently reshuffled his cabinet prior to asking for cabinet approval, while others do not. A polling firm that provides a reshuffle cue at the time of one reshuffle does not necessarily do so again at the time of another reshuffle. Sugawara (2009) criticized media reports about Yasuo Fukuda’s cabinet reshuffle in August 2008 that were based on opinion polls with and without reshuffle cues. Sugawara pointed out that polls using different wordings for their questions cannot simply be compared and the reported rise in cabinet approval following the reshuffle was an artifact of the reshuffle cue. Suzuki (2009) presented the same argument as Sugawara (2009), independently and at almost the same time. In contrast to Sugawara’s (2009) and Suzuki’s (2009) arguments, I argue that we can exploit this variation in question wording as an opportunity to examine the rise of approval ratings attributable to reshuffle cues. This is a situation similar to a survey experiment in which the assignment to the “treatment group,” whose respondents are provided with a reshuffle cue, can be seen as random (which is potentially problematic, as discussed below), because each survey selects an independent random sample of the electorate. If cabinet reshuffles convey a positive impression to citizens, reshuffle cues will increase their focus on reshuffles and cause an increase in cabinet approval ratings. On the contrary, if cabinet reshuffles bring negative issues to citizens’ minds, such as policy failure, conflict in the government, or the appointment of inexperienced ministers, reshuffle cues will decrease cabinet approval ratings. It should be noted that because Japanese polling firms almost always put the question on cabinet approval at the top of their surveys, there is no risk of carryover effects. Using these experiment-like conditions, we can estimate the effect of cabinet reshuffles on government popularity with high internal, external, and ecological validity. First, it is evident that the research design is ecologically valid because, unlike an experiment with fictitious stimuli, the cabinets and reshuffles concerned existed for respondents. Second, the results obtained through this research design have high generalizability. Some readers may observe that an internally and/or ecologically valid estimate of the effect of reshuffle cues can be obtained by a temporal survey experiment conducted immediately after a reshuffle. Other readers might suspect that it is sufficient if we compare, as did Sugawara (2009) and Suzuki (2009), polling firms’ results with and without a reshuffle cue in only one reshuffle case. However, the results from such a research design cannot be generalizable to other cases, because such onetime results may be attributable to the special circumstances of the reshuffle concerned. Instead, my research design investigates the average effect of reshuffle cues over multiple cases. Two factors, however, may jeopardize the internal validity of the design. The first is house effects. Different surveys that share survey contents and are conducted at the same point in time but by different houses (survey organizations) produce different results (Smith 1978). Differences in survey design and implementation, such as sampling procedure, question wording, answer options, and whether interviewers repeatedly ask questions, produce house effects. As discussed later in this paper, some polling firms were likely to provide reshuffle cues and others never did so, which means that house effects should be a confounding factor. To deal with this problem, I pooled a large number of opinion polls regardless of their timing, and estimated house effects as explained in the next section. Heuristically, by including the fixed effects of polling firms to eliminate house effects, we can estimate the effect of reshuffle cues accurately. The second factor is the average approval rating of PMs, which differs widely from one PM to another. If reshuffle cues were provided more frequently when particular PMs were in power (which is factually correct according to my data), simply comparing polling results with and without a reshuffle cue would be insufficient. However, this problem can be addressed by pooling opinion poll data as well. I modeled a latent time series of cabinet approval ratings for each PM using pooled poll data and detected the effect of reshuffle cues as a deviation from true approval. This procedure allowed me to “control” the factor of each PM’s average approval rating. An additional important assumption for an unbiased estimation of reshuffle cues is whether the decision of a particular polling firm to provide a reshuffle cue or not is taken regardless of the expected poll result. This problem is discussed after the main results are provided. One important caveat is that this study estimates the effect of reshuffle cues, not the effect of reshuffles per se. I investigated whether citizens tend to approve or disapprove of a cabinet when they hear it was recently reshuffled; this does not necessarily indicate that reshuffles actually increase or reduce government popularity. In the extreme case, if no one knows there has been a cabinet reshuffle, approval ratings should not change. However, I believe that this study contributes substantively to research on representative democracy because, if the analysis shows that reshuffle cues have positive effects, it will reject conclusively the possibility that, on average, reshuffles impair a government’s image. This is a significant step toward understanding governmental management, given the difficulty of estimating the effect of reshuffles per se without endogeneity, using common observational data. Data and Methods I used data from opinion polls conducted by 11 polling firms: Jiji Press, Kyodo News, Yomiuri Shimbun, Asahi Shimbun, Mainichi Shimbun, Sankei Shimbun and Fuji News Network (FNN), Nikkei Research, Japanese Broadcasting Corporation (NHK), Japan News Network (JNN), All-Nippon News Network (ANN), and Nippon News Network (NNN).3 I restricted the time period analyzed to between April 26, 2001 (the beginning of the first Koizumi cabinet), and November 29, 2015, for several reasons. First, few opinion polls introduced a reshuffle cue in the earlier period. Second, too long a period would undermine the assumption that house effects are constant during the period. The third, and perhaps most important, reason is that some reshuffles in the earlier period were concurrent with a change in coalition partners. Cabinet reshuffles after Koizumi’s election did not coincide with coalition changes; therefore, we can focus purely on the effect of cabinet reshuffles. There were 13 cabinet reshuffles during the study period. I collected information from all opinion polls in the study period that contained a question about cabinet approval, irrespective of whether or not the PM had reshuffled his cabinet just before the poll was conducted. The information includes dates, number of respondents, survey mode, and cabinet approval rating. I examined whether the question in polls conducted immediately after cabinet reshuffles contained a reshuffle cue, and found two types of wording that provide a reshuffle cue. One type (type I) has a lead sentence that tells respondents that the PM has just reshuffled his cabinet. For example, Yomiuri Shimbun usually asks respondents about cabinet approval as follows: “Do you support the Koizumi cabinet or not?” In contrast, a survey conducted by Yomiuri between October 31 and November 1, 2005, just after a cabinet reshuffle, asked a question with the lead sentence “Prime Minister Koizumi reshuffled his cabinet. Do you support Koizumi’s reshuffled cabinet or not?” The other type (type II) does not include such a lead sentence but contains the word “reshuffle” (“kaizo” in Japanese) or hints at this word to respondents. For example, a survey conducted by Yomiuri between September 27 and 28, 2004, asked, “Do you support the reshuffled Koizumi cabinet or not?” Both types contain a reshuffle cue, and an additional analysis that distinguishes the two types was conducted. I gathered results from 1,958 opinion polls. Of the 125 polls conducted either in the week in which the PM reshuffled his cabinet or the following week, 31 provided a reshuffle cue. The statistics, disaggregated by polling firm, are shown in table 1. They show that polling firms that provide a reshuffle cue in some polls did not always do so. Data sources are listed in Online Appendix A. Table 1. Polling result statistics by polling firms Polling firm Start date of the analyzed polls Number of polls Total Immediately following a reshuffle Providing a reshuffle cue Type I Type II 1 Jiji Press May 13, 2001 175 7 0 0 2 Kyodo News Apr 28, 2001 205 13 13 0 3 Yomiuri Shimbun Apr 28, 2001 235 17 3 3 4 Asahi Shimbun Apr 28, 2001 237 13 0 1 5 Mainichi Shimbun Apr 28, 2001 165 11 0 0 6 Sankei-FNN Apr 29, 2001 131 12 0 2 7 Nikkei Research Apr 29, 2001 152 11 1 2 8 NHK May 6, 2001 197 10 0 0 9 JNN Apr 29, 2001 187 11 1 0 10 ANN Sep 28, 2006 119 10 0 1 11 NNN Dec 15, 2002 155 10 2 2 Total 1,958 125 20 11 Polling firm Start date of the analyzed polls Number of polls Total Immediately following a reshuffle Providing a reshuffle cue Type I Type II 1 Jiji Press May 13, 2001 175 7 0 0 2 Kyodo News Apr 28, 2001 205 13 13 0 3 Yomiuri Shimbun Apr 28, 2001 235 17 3 3 4 Asahi Shimbun Apr 28, 2001 237 13 0 1 5 Mainichi Shimbun Apr 28, 2001 165 11 0 0 6 Sankei-FNN Apr 29, 2001 131 12 0 2 7 Nikkei Research Apr 29, 2001 152 11 1 2 8 NHK May 6, 2001 197 10 0 0 9 JNN Apr 29, 2001 187 11 1 0 10 ANN Sep 28, 2006 119 10 0 1 11 NNN Dec 15, 2002 155 10 2 2 Total 1,958 125 20 11 Note.—The numbers that precede the polling firms’ names correspond to the indicator number in the statistical model. For Mainichi Shimbun and Senkei-FNN, the last poll from which results were collected was October 2015, and November 2015 for other firms. View Large Table 1. Polling result statistics by polling firms Polling firm Start date of the analyzed polls Number of polls Total Immediately following a reshuffle Providing a reshuffle cue Type I Type II 1 Jiji Press May 13, 2001 175 7 0 0 2 Kyodo News Apr 28, 2001 205 13 13 0 3 Yomiuri Shimbun Apr 28, 2001 235 17 3 3 4 Asahi Shimbun Apr 28, 2001 237 13 0 1 5 Mainichi Shimbun Apr 28, 2001 165 11 0 0 6 Sankei-FNN Apr 29, 2001 131 12 0 2 7 Nikkei Research Apr 29, 2001 152 11 1 2 8 NHK May 6, 2001 197 10 0 0 9 JNN Apr 29, 2001 187 11 1 0 10 ANN Sep 28, 2006 119 10 0 1 11 NNN Dec 15, 2002 155 10 2 2 Total 1,958 125 20 11 Polling firm Start date of the analyzed polls Number of polls Total Immediately following a reshuffle Providing a reshuffle cue Type I Type II 1 Jiji Press May 13, 2001 175 7 0 0 2 Kyodo News Apr 28, 2001 205 13 13 0 3 Yomiuri Shimbun Apr 28, 2001 235 17 3 3 4 Asahi Shimbun Apr 28, 2001 237 13 0 1 5 Mainichi Shimbun Apr 28, 2001 165 11 0 0 6 Sankei-FNN Apr 29, 2001 131 12 0 2 7 Nikkei Research Apr 29, 2001 152 11 1 2 8 NHK May 6, 2001 197 10 0 0 9 JNN Apr 29, 2001 187 11 1 0 10 ANN Sep 28, 2006 119 10 0 1 11 NNN Dec 15, 2002 155 10 2 2 Total 1,958 125 20 11 Note.—The numbers that precede the polling firms’ names correspond to the indicator number in the statistical model. For Mainichi Shimbun and Senkei-FNN, the last poll from which results were collected was October 2015, and November 2015 for other firms. View Large I estimated weekly cabinet approval ratings using the dynamic linear model proposed by Jackman (2005) and Beck, Jackman, and Rosenthal (2006), named “pooling the polls” by Jackman (2009). This model contains two parts: an observational model and a transition model. The former represents how observed variables are generated from the latent variable, with some errors; the latter represents the temporal changes in an unobservable latent variable. In this study, the observed variables comprise the results of opinion polls, while the latent variable comprises the “true” cabinet approval rating. I also altered the original model to estimate the effect of a reshuffle cue. The observational model shows the result of opinion poll i qi about PM p (p∈{1,…,P}) cabinet approval rating conducted by polling firm ji∈{1,…,J} by the survey mode mi∈{1,…,M} in week ti∈{1,…,T}.4 This value is equal to an approval rating that the polling firm intended to measure μi plus a sampling error. Therefore, qi follows the following distribution: qi~N(μi,σi2), (1) where σi represents the sampling error. In this study, J=11 and table 1 shows which number corresponds to which polling firm. The surveys used here adopted face-to-face interviews or telephone interviews; no polling firms employed internet polling. Thus, M=2 I coded face-to-face interviews as 1 and telephone interviews as 2. In principle, letting ni denote the number of respondents in poll i means the sampling error is equal to qi(1−qi)/ni However, some survey designs that do not use a random sample violate this formula. The problem here was that telephone surveys based on random digit dialing (RDD) or quota sampling commonly use a weighting method that causes sampling variance to deviate from the formula above. Unfortunately, we could not identify accurately the extent to which a survey design amplified the sampling error for an individual poll because details of sampling procedures and weighting methods are not available. Therefore, I added a parameter of the design effect τ introduced by Fisher et al. (2011) and Pickup et al. (2011): σi=τjidiqi(1−qi)ni, (2) where τ represents the ratio of the standard error of survey results based on RDD or quota sampling to the results of a random sample. di is equal to 1 if survey i used RDD or quota sampling, and is otherwise zero; thus, a design effect is added only when a survey is RDD based or quota sampling based. I estimated a design effect for each polling firm. We cannot take μi as the true approval rating, because it may be contaminated by other elements such as survey mode, question wording, and answer options. I considered that μi is composed of the true approval rating at the period ti αp,ti mode effect γmi house effect δji and the effect of a reshuffle cue λ Therefore, μi=αp,ti+γmi+δji+λxi, (3) where xi is a dummy variable that equals 1 if the opinion poll i provided respondents with a reshuffle cue. I assumed that mode effects and house effects are constant throughout the analysis period regardless of PM. In addition, I constrained ∑m=1Mγm=∑j=1Jδj=0 for identification. Although Jackman (2005, 2009) and Beck, Jackman, and Rosenthal (2006) did not consider mode effects (see Bowling [2005] for a review), I consider these to be necessary here to ensure that the assumption that house effects are fixed is correct, as one polling firm switched its survey mode from face-to-face to telephone. The main purpose of the analysis was to estimate λ which represents the difference in the results of two cabinet approval surveys; the surveys were conducted by the same polling firm using the same method in the same week, but one provided a reshuffle cue and the other did not. In the transition model, I assumed that the true approval rating αp,t follows the following process: αp,t~N(αp,t−1+κzp,t,ωp2), (4) where t=2,…,Tp Tp is the last week of PM p tenure, zp,t is a dummy variable that equals 1 if a reshuffle occurred in PM p week t and κ is a coefficient parameter of zp,t This is a random walk–based model. I set different stochastic processes according to the PM; that is, αp,1 does not depend on αp−1,Tp−15 As noted earlier, we cannot interpret κ as the effect of a reshuffle on a cabinet approval rating because other events that influence cabinet approval might have occurred simultaneously.6 The parameters to be estimated were αp,t κ ωp γm δj λ and τm The posterior distribution was estimated by the Markov chain Monte Carlo (MCMC) method.7 The prior distributions were set as follows: κ,γm,δj,λ~N(0,1002) and ωp,τm~U(0,100) I set the prior distribution of each PM’s approval rating in the first week as αp,1~U(qpinit−0.2,qpinit+0.2) where qpinit is the result of Jiji Press’s opinion poll conducted for the first time after PM p was inaugurated. I set three chains of different initial values. For each chain, I obtained 2,000 samples at every twentieth interval after 500 iterations as adaptation and a further 500 iterations as burn-in. All chains were judged to converge, as the Gelman-Rubin statistics of all parameters were below 1.1. Results I first confirm that the true latent cabinet approval ratings were appropriately estimated. Figure 1 shows the estimated weekly approval ratings. Solid lines represent point estimates, and shaded bands represent 95 percent credible intervals (CIs).8 Cross marks indicate weeks when PMs reshuffled their cabinets. For comparison, dotted lines indicate Jiji Press’s polling results. Jiji’s opinion polls are often used in time-series studies of Japanese politics (e.g., Burden 2015; Matsumoto and Laver 2015) because Jiji has conducted monthly opinion polls in the same manner for many years. The estimated results are nearly parallel to Jiji’s results, although Jiji’s values are lower than the pooled results, as reflected in the house effect shown below. The estimated results capture detailed changes in approval ratings that cannot be derived from Jiji’s monthly polls. Figure 1 shows that in most cases, cabinet approval ratings increased when a PM reshuffled the cabinet. Indeed, the point estimate of κ which represents the average change in approval ratings concurrent with reshuffles, was 0.027, while the 95 percent CI is [0.007, 0.049]. This implies that cabinet approval ratings increase 2.7 percentage points on average in the week of a reshuffle. However, we cannot interpret this as the causal effect of reshuffles, due to the possibility of endogeneity. It may be no more than a correlation. Figure 2 shows the estimated effects of a reshuffle cue, house effects, and mode effects. Dots represent point estimates, and segments represent 95 percent CIs. The estimation result of λ shows that when a reshuffle cue is provided to respondents, cabinet approval ratings increase 2.4 percentage points on average. The 95 percent CI is [0.012, 0.036] and does not include zero, implying that citizens respond favorably to cabinet reshuffles.9 Figure 2. View largeDownload slide Estimated effects of a reshuffle cue, house effects, and mode effects. Dots represent point estimates, and segments represent 95 percent CIs. Figure 2. View largeDownload slide Estimated effects of a reshuffle cue, house effects, and mode effects. Dots represent point estimates, and segments represent 95 percent CIs. How great is this effect? According to the estimation results of the transition model, the standard deviation of weekly random shocks ωp was between 0.018 and 0.038. Supposing that random shocks follow a normal distribution, a 2.4-percentage-point increase corresponds to the 74th to 92nd percentile. Therefore, the effect of reshuffle cues is substantial. The effect of information about a cabinet reshuffle may be even greater than the effect of reshuffle cues estimated above. Some respondents hear the news of a cabinet reshuffle and change their stance to approve the cabinet before a survey. Such respondents approve of the cabinet irrespective of whether or not the poll provides a reshuffle cue. Thus, if we compare counterfactual situations—one where a cabinet is reshuffled and all citizens are informed of the reshuffle and one where a cabinet is not reshuffled and no information on a reshuffle is provided to citizens—the difference in the cabinet approval rating is likely to be greater than 2.4 percentage points on average. This section briefly reviews the remaining results. There are significant house effects. For example, Jiji Press underestimates cabinet approval by 5.2 percentage points and JNN overestimates it by 5.8 percentage points on average. In contrast, the mode effects are small and not statistically distinguishable from zero. The results of other parameters are shown in table A1 in the Appendix. SELF-SELECTION BIAS? One caveat about the above analysis is that there might be self-selection bias. Some readers may suspect that whether polling firms add a reshuffle cue to their questions depends on how much attention the reshuffle received. Further, the effect of a reshuffle cue is overestimated if polling firms provide a reshuffle cue when they expect the public to welcome the reshuffle. To dispel such concerns, I examined when reshuffle cues tended to be provided. Table 2 shows whether each polling firm provided a reshuffle cue in opinion polls conducted following a cabinet reshuffle. Polling firms did not provide reshuffle cues after the reshuffle of the Fukuda cabinet, probably because Sugawara’s (2009) and Suzuki’s (2009) criticisms impacted the industry. Other than this, there is no notable pattern of cue provision. It is the case that some cabinet reshuffles cause more cue provisions, but figure 1 shows that such reshuffles did not necessarily coincide with a significant fluctuation in cabinet approval ratings. Therefore, it is reasonable to assume that polling firms did not provide reshuffle cues because they expected the reshuffle to receive much attention or because it was welcomed by the public.10 Table 2. Pattern of the provision of reshuffle cues Party Prime minister Koizumi Koizumi Koizumi Koizumi Abe (1) Fukuda Kan Kan Noda Noda Noda Abe (2) Abe (2) LDP LDP LDP LDP LDP LDP DPJ DPJ DPJ DPJ DPJ LDP LDP Month 9 9 9 10 8 8 9 1 1 6 10 9 10 Polling firm Year 02 03 04 05 07 08 10 11 12 12 12 14 15 Total 1 Jiji – – – 0 – – – 0 0 0 0 0 0 0 2 Kyodo 1 1 1 1 1 1 1 1 1 1 1 1 1 13 3 Yomiuri 1 1* 1 1* 1* 1* 0 0 0 0 0 0 0 6 4 Asahi 1 0 0 0 0 0 0 0 0 0 0 0 0 1 5 Mainichi 0 0 0 0 0 0 0 0 0 – – 0 0 0 6 Sankei-FNN 1 0 0 0 0 0 0 0 0 0 0 1 – 2 7 Nikkei 0 0 0 0 1 1 1 0 0 – – 0 0 3 8 NHK – – – 0 0 0 0 0 0 0 0 0 0 0 9 JNN 1 0 0 0 0 0 – 0 0 0 0 0 – 1 10 ANN – – – – 0 1 0 0 0 0 0 0 0 1 11 NNN – – 1 0 0 1 0 1 1 0 – 0 0 4 Total 5 2 3 2 3 5 2 2 2 1 1 2 1 31 Party Prime minister Koizumi Koizumi Koizumi Koizumi Abe (1) Fukuda Kan Kan Noda Noda Noda Abe (2) Abe (2) LDP LDP LDP LDP LDP LDP DPJ DPJ DPJ DPJ DPJ LDP LDP Month 9 9 9 10 8 8 9 1 1 6 10 9 10 Polling firm Year 02 03 04 05 07 08 10 11 12 12 12 14 15 Total 1 Jiji – – – 0 – – – 0 0 0 0 0 0 0 2 Kyodo 1 1 1 1 1 1 1 1 1 1 1 1 1 13 3 Yomiuri 1 1* 1 1* 1* 1* 0 0 0 0 0 0 0 6 4 Asahi 1 0 0 0 0 0 0 0 0 0 0 0 0 1 5 Mainichi 0 0 0 0 0 0 0 0 0 – – 0 0 0 6 Sankei-FNN 1 0 0 0 0 0 0 0 0 0 0 1 – 2 7 Nikkei 0 0 0 0 1 1 1 0 0 – – 0 0 3 8 NHK – – – 0 0 0 0 0 0 0 0 0 0 0 9 JNN 1 0 0 0 0 0 – 0 0 0 0 0 – 1 10 ANN – – – – 0 1 0 0 0 0 0 0 0 1 11 NNN – – 1 0 0 1 0 1 1 0 – 0 0 4 Total 5 2 3 2 3 5 2 2 2 1 1 2 1 31 Note.—Zero means that the polling firm did not provide a reshuffle cue in the poll conducted either the week that the PM reshuffled his cabinet or the following week; 1 means that the polling firm provided a type I reshuffle cue; and an underlined 1 means that the polling firm provided a type II reshuffle cue. A dash means that the polling firm did not conduct an opinion poll immediately following that reshuffle. The last row shows the number of polls that provided either a type I or type II reshuffle cue about that reshuffle, and the last column shows the number of times that a polling firm provided either a type I or type II reshuffle cue during the analysis period. An asterisk means that in either the week the cabinet was reshuffled or the following week, the polling firm conducted more than two opinion polls, one that provided a reshuffle cue and others that did not. View Large Table 2. Pattern of the provision of reshuffle cues Party Prime minister Koizumi Koizumi Koizumi Koizumi Abe (1) Fukuda Kan Kan Noda Noda Noda Abe (2) Abe (2) LDP LDP LDP LDP LDP LDP DPJ DPJ DPJ DPJ DPJ LDP LDP Month 9 9 9 10 8 8 9 1 1 6 10 9 10 Polling firm Year 02 03 04 05 07 08 10 11 12 12 12 14 15 Total 1 Jiji – – – 0 – – – 0 0 0 0 0 0 0 2 Kyodo 1 1 1 1 1 1 1 1 1 1 1 1 1 13 3 Yomiuri 1 1* 1 1* 1* 1* 0 0 0 0 0 0 0 6 4 Asahi 1 0 0 0 0 0 0 0 0 0 0 0 0 1 5 Mainichi 0 0 0 0 0 0 0 0 0 – – 0 0 0 6 Sankei-FNN 1 0 0 0 0 0 0 0 0 0 0 1 – 2 7 Nikkei 0 0 0 0 1 1 1 0 0 – – 0 0 3 8 NHK – – – 0 0 0 0 0 0 0 0 0 0 0 9 JNN 1 0 0 0 0 0 – 0 0 0 0 0 – 1 10 ANN – – – – 0 1 0 0 0 0 0 0 0 1 11 NNN – – 1 0 0 1 0 1 1 0 – 0 0 4 Total 5 2 3 2 3 5 2 2 2 1 1 2 1 31 Party Prime minister Koizumi Koizumi Koizumi Koizumi Abe (1) Fukuda Kan Kan Noda Noda Noda Abe (2) Abe (2) LDP LDP LDP LDP LDP LDP DPJ DPJ DPJ DPJ DPJ LDP LDP Month 9 9 9 10 8 8 9 1 1 6 10 9 10 Polling firm Year 02 03 04 05 07 08 10 11 12 12 12 14 15 Total 1 Jiji – – – 0 – – – 0 0 0 0 0 0 0 2 Kyodo 1 1 1 1 1 1 1 1 1 1 1 1 1 13 3 Yomiuri 1 1* 1 1* 1* 1* 0 0 0 0 0 0 0 6 4 Asahi 1 0 0 0 0 0 0 0 0 0 0 0 0 1 5 Mainichi 0 0 0 0 0 0 0 0 0 – – 0 0 0 6 Sankei-FNN 1 0 0 0 0 0 0 0 0 0 0 1 – 2 7 Nikkei 0 0 0 0 1 1 1 0 0 – – 0 0 3 8 NHK – – – 0 0 0 0 0 0 0 0 0 0 0 9 JNN 1 0 0 0 0 0 – 0 0 0 0 0 – 1 10 ANN – – – – 0 1 0 0 0 0 0 0 0 1 11 NNN – – 1 0 0 1 0 1 1 0 – 0 0 4 Total 5 2 3 2 3 5 2 2 2 1 1 2 1 31 Note.—Zero means that the polling firm did not provide a reshuffle cue in the poll conducted either the week that the PM reshuffled his cabinet or the following week; 1 means that the polling firm provided a type I reshuffle cue; and an underlined 1 means that the polling firm provided a type II reshuffle cue. A dash means that the polling firm did not conduct an opinion poll immediately following that reshuffle. The last row shows the number of polls that provided either a type I or type II reshuffle cue about that reshuffle, and the last column shows the number of times that a polling firm provided either a type I or type II reshuffle cue during the analysis period. An asterisk means that in either the week the cabinet was reshuffled or the following week, the polling firm conducted more than two opinion polls, one that provided a reshuffle cue and others that did not. View Large In addition, I reestimated the model using only the following four companies: Jiji Press, Mainichi Shimbun, NHK, and Kyodo News. The former three have never provided a reshuffle cue, while Kyodo News has provided reshuffle cues in every case. Thus, there is no concern about self-selection bias. The estimated model was the same as the original, except that the mode effect term was omitted because the survey mode was perfectly collinear with the four companies.11 The point estimate was 0.031, slightly higher than the original result. Uncertainty increased due to the reduction of the sample size, but the 95 percent CI still did not include zero ([0.012, 0.053]). Details of this analysis are shown in Online Appendix C. WHEN IS THE EFFECT STRONG? To examine heterogeneity in the effect size of reshuffle cues, I sought to explain it by several factors. I altered Equation (3) as follows: μi=αp,ti+γmi+δji+λrixi, (5) λr~N(θ'wr,η2), (6) where ri∈{1,…,13} is the index of reshuffles for poll i wr is a vector of covariates for reshuffle r, including 1 for an intercept, θ is a vector of their coefficients, and η2 is an error variance. I considered two variables to explain the size of the effect: cabinet approval ratings in the week immediately preceding a reshuffle (estimated by the dynamic linear model [i.e., αp,ti−1 and the number of affected ministers. If reshuffles are effective in correcting falls in popularity, as Kam and Indriðason (2005) and others have argued, then the effect of reshuffle cues should be large when cabinet approval is low. On the other hand, if reshuffles provide a negative signal to the public implying discontinuity and turmoil, as Hansen et al. (2013) have indicated, the number of affected ministers should negatively impact the effect of the reshuffle cues. Further, I included a dummy variable for the DPJ government, and estimated θ simultaneously with the other parameters of the dynamic linear model. Although this analysis does not exclude the possibility of omitted variables and the substantial sample size is too small (i.e., the number of reshuffle cases is only 13), it provides an insight into how citizens respond to reshuffles. Table 3 shows the results. The lagged cabinet approval has a negative impact on the effect of reshuffle cues; that is, the lower the cabinet approval rating, the greater the effect of the reshuffle cue.12 This implies that reshuffles have a corrective effect on government popularity, which is in line with Dewan and Dowding’s (2005) analysis of resignations of a single minister. A positive coefficient for the number of affected ministers indicates that on average, citizens do not have a negative response to a discontinuous cabinet; rather, they welcome the freshness of reshuffled cabinets.13 The effect size of reshuffle cues does not depend on the governing party. Details of this analysis are shown in Online Appendix D. Table 3. Estimated coefficients on the size of the effect of reshuffle cues Point 95% CI Intercept 0.028 [−0.066, 0.125] Lagged cabinet approval −0.142 [−0.270, −0.014] Number of altered ministers 0.005 [−0.002, 0.011] DPJ dummy 0.008 [−0.037, 0.051] Point 95% CI Intercept 0.028 [−0.066, 0.125] Lagged cabinet approval −0.142 [−0.270, −0.014] Number of altered ministers 0.005 [−0.002, 0.011] DPJ dummy 0.008 [−0.037, 0.051] Note.—The dependent variable is an estimated effect of a reshuffle cue for each reshuffle. Lagged cabinet approval is simultaneously estimated by the dynamic linear model. View Large Table 3. Estimated coefficients on the size of the effect of reshuffle cues Point 95% CI Intercept 0.028 [−0.066, 0.125] Lagged cabinet approval −0.142 [−0.270, −0.014] Number of altered ministers 0.005 [−0.002, 0.011] DPJ dummy 0.008 [−0.037, 0.051] Point 95% CI Intercept 0.028 [−0.066, 0.125] Lagged cabinet approval −0.142 [−0.270, −0.014] Number of altered ministers 0.005 [−0.002, 0.011] DPJ dummy 0.008 [−0.037, 0.051] Note.—The dependent variable is an estimated effect of a reshuffle cue for each reshuffle. Lagged cabinet approval is simultaneously estimated by the dynamic linear model. View Large Additional analyses and robustness checks were conducted (see Online Appendices E, F, and G). Their results indicate that both type I and type II reshuffle cues have a positive effect on cabinet approval ratings, and that the conclusion does not change even when the effect of reshuffle cues was reestimated by a simpler frequentist approach. Further, the main results are robust to altering the statistical model and coding the data. Conclusions In Japan, a number of polling firms conduct opinion polls week after week to measure cabinet approval rating. When a cabinet is reshuffled, some polling companies add information on the reshuffle (a reshuffle cue) to their questionnaire on cabinet approval, while others do not. Even if a polling firm provides a reshuffle cue at the time of a particular reshuffle, it may not necessarily do so at the time of another reshuffle. I exploited this situation as an opportunity to investigate whether citizens evaluate reshuffles positively and are more likely to approve government without endogeneity. I employed the results of polls conducted from 2001 to 2015 by 11 polling firms and a dynamic linear model to consider a PM’s individual popularity, as well as house effects, mode effects, and sampling uncertainty. The results showed that reshuffle cues increase cabinet approval ratings by 2.4 percentage points on average, and the conclusion that reshuffles can have a positive effect on cabinet approval is statistically credible. Furthermore, the supplementary analysis implies that reshuffles have a corrective effect on declining popularity and citizens favor the freshness of reshuffled cabinets. Political scientists have studied cabinet reshuffles as a tool through which PMs effectively delegate power to ministers and successfully manage the government. However, scholars have paid insufficient attention to the potential cost of reshuffles on government popularity, while some have assumed without empirical evidence that reshuffles have a positive effect on popularity. The results of this study provide evidence in support of such assumptions and reinforce previous research on the theory of cabinet management. The results also provide some justification for game theory research on reshuffles, most of which assumes that there is at least no cost of reshuffles with respect to government popularity, but such research should include the potential benefit of cabinet reshuffles in its models in some cases. This study has some limitations. First, it was limited to Japan; future research should investigate whether the positive effect of cabinet reshuffles on government popularity is observed in other countries. Despite conducting an exploratory analysis of the conditions that obtain when citizens welcome reshuffles, I did not fully investigate its mechanism—whether voters evaluate the competence of reshuffled cabinets, merely respond to the newness of reshuffled cabinets, or other reasons. More analyses that are free from omitted variables, such as experimental studies, are required in order to examine the psychological mechanism that leads to the positive effects of reshuffles. While my research design did not enable us to determine how long the effect of reshuffle lasts, appropriate time-series analyses may resolve this question. Finally, the implications of this study for survey research are as follows. This study demonstrated that adding only one word to the poll question (changing “cabinet” to “reshuffled cabinet,” or “naikaku” to “kaizo naikaku” in Japanese) has a substantive framing effect and results in changes in responses. Various textbooks about survey research have repeatedly warned survey designers about this phenomenon. I have introduced a new case that shows a framing effect when subtle information is added. However, my results also imply that even when surveys contain different questions and thus are not simply comparable, we can compare and unify the results by using a suitable statistical model. Certainly, future surveys should be designed carefully to avoid unnecessary framing effects; however, on occasion, there is a need to analyze past surveys that were not necessarily designed appropriately. Statistical modeling, such as pooling the polling techniques, can satisfy this need. Supplementary Data Supplementary data are freely available at Public Opinion Quarterly online. Replication files are available at the Harvard Dataverse (http://dx.doi.org/10.7910/DVN/FAU3VR). The author thanks Yukio Maeda for providing data from Jiji Press and Kyodo News, and Kyodo News and TV Asahi for providing information on the method used in their opinion polls. An earlier version of this paper was presented at the Workshop on the Frontiers of Statistical Analysis and Formal Theory of Political Science (Gakushuin University, Tokyo, January 5, 2016) and the Workshop on Political Communication (Rikkyo University, Tokyo, August 26, 2016). The author thanks Kentaro Fukumoto, Masataka Harada, Hiroshi Hirano, Noriyo Isozaki, Yukio Maeda, Kenneth Mori McElwain, Kuniaki Nemoto, Ikuma Ogura, Teppei Yamamoto, Soichiro Yamauchi, participants of the workshops, and three anonymous reviewers for their helpful comments. Finally, the author would like to thank Shoko Omori for her research assistance. This work was supported by the Japan Society for the Promotion of Science KAKENHI [13J08571 to H.M.]. Appendix Table A1. Estimated results of miscellaneous parameters in the main analysis Point 95% CI Std. dev. of noise in the random walk ωp  Koizumi 0.029 [0.026, 0.033]  Abe (1) 0.031 [0.023, 0.039]  Fukuda 0.032 [0.022, 0.042]  Aso 0.029 [0.022, 0.037]  Hatoyama 0.030 [0.020, 0.041]  Kan 0.038 [0.030, 0.047]  Noda 0.021 [0.015, 0.027]  Abe (2) 0.018 [0.014, 0.022] Design effect τj  Kyodo News 1.685 [1.490, 1.886]  Yomiuri Shimbun 2.193 [1.917, 2.484]  Asahi Shimbun 2.011 [1.783, 2.247]  Mainichi Shimbun 2.487 [2.202, 2.793]  Sankei Shimbun-FNN 1.832 [1.557, 2.109]  Nikkei Research 2.059 [1.775, 2.372]  NHK 2.006 [1.741, 2.299]  JNN 2.614 [2.338, 2.929]  NNN 2.039 [1.718, 2.429] Point 95% CI Std. dev. of noise in the random walk ωp  Koizumi 0.029 [0.026, 0.033]  Abe (1) 0.031 [0.023, 0.039]  Fukuda 0.032 [0.022, 0.042]  Aso 0.029 [0.022, 0.037]  Hatoyama 0.030 [0.020, 0.041]  Kan 0.038 [0.030, 0.047]  Noda 0.021 [0.015, 0.027]  Abe (2) 0.018 [0.014, 0.022] Design effect τj  Kyodo News 1.685 [1.490, 1.886]  Yomiuri Shimbun 2.193 [1.917, 2.484]  Asahi Shimbun 2.011 [1.783, 2.247]  Mainichi Shimbun 2.487 [2.202, 2.793]  Sankei Shimbun-FNN 1.832 [1.557, 2.109]  Nikkei Research 2.059 [1.775, 2.372]  NHK 2.006 [1.741, 2.299]  JNN 2.614 [2.338, 2.929]  NNN 2.039 [1.718, 2.429] Note.—The design effects for Jiji Press and ANN were not estimated, because they did not employ RDD or quota sampling. View Large Table A1. Estimated results of miscellaneous parameters in the main analysis Point 95% CI Std. dev. of noise in the random walk ωp  Koizumi 0.029 [0.026, 0.033]  Abe (1) 0.031 [0.023, 0.039]  Fukuda 0.032 [0.022, 0.042]  Aso 0.029 [0.022, 0.037]  Hatoyama 0.030 [0.020, 0.041]  Kan 0.038 [0.030, 0.047]  Noda 0.021 [0.015, 0.027]  Abe (2) 0.018 [0.014, 0.022] Design effect τj  Kyodo News 1.685 [1.490, 1.886]  Yomiuri Shimbun 2.193 [1.917, 2.484]  Asahi Shimbun 2.011 [1.783, 2.247]  Mainichi Shimbun 2.487 [2.202, 2.793]  Sankei Shimbun-FNN 1.832 [1.557, 2.109]  Nikkei Research 2.059 [1.775, 2.372]  NHK 2.006 [1.741, 2.299]  JNN 2.614 [2.338, 2.929]  NNN 2.039 [1.718, 2.429] Point 95% CI Std. dev. of noise in the random walk ωp  Koizumi 0.029 [0.026, 0.033]  Abe (1) 0.031 [0.023, 0.039]  Fukuda 0.032 [0.022, 0.042]  Aso 0.029 [0.022,