Reducing Turnout Misreporting in Online Surveys2018 Public Opinion Quarterly
doi: 10.1093/poq/nfy017
Abstract Assessing individual-level theories of electoral participation requires survey-based measures of turnout. Yet, due to a combination of sampling problems and respondent misreporting, postelection surveys routinely overestimate turnout, often by large margins. Using an online survey experiment fielded after the 2015 British general election, we implement three alternative survey questions aimed at correcting for turnout misreporting and test them against a standard direct turnout question used in postelection studies. Comparing estimated to actual turnout rates, we find that while all question designs overestimate aggregate turnout, the item-count technique alleviates the misreporting problem substantially, whereas a direct turnout question with additional face-saving options and a crosswise model design help little or not at all. Also, regression models of turnout estimated using the item-count measure yield substantively similar inferences regarding the correlates of electoral participation to models estimated using “gold-standard” validated vote measures. These findings stand in contrast to those suggesting that item-count techniques do not help with misreporting in an online setting and are particularly relevant given the increasing use of online surveys in election studies. Self-reported turnout rates in postelection surveys often considerably exceed official rates.1 This phenomenon of “vote overreporting” (e.g., Bernstein, Chadha, and Montjoy 2001; McDonald 2003) represents a major challenge for election research, raising questions about the validity of turnout models estimated using survey data (e.g., Brehm 1993; Bernstein, Chadha, and Montjoy 2001; Cassel 2003; Karp and Brockington 2005; Jones 2008). While vote overreporting is attributable in part to sampling and survey nonresponse biases (e.g., Brehm 1993; Jackman 1999; Voogt and Saris 2003), much previous research focuses on the tendency of survey respondents—particularly those who did not vote—to misreport their turnout (Presser 1990; Abelson, Loftus, and Greenwald 1992; Holtgraves, Eck, and Lasky 1997; Belli et al. 1999; Belli, Moore, and VanHoewyk 2006; Holbrook and Krosnick 2010b; Hanmer, Banks, and White 2014; Persson and Solevid 2014; Zeglovits and Kritzinger 2014; Thomas et al. 2017). This paper investigates whether misreporting can be alleviated by different sensitive survey techniques designed to reduce social desirability pressures arising from turnout-related questions. In particular, we examine the crosswise model (CM) and the item-count technique (ICT). Whereas these approaches had been of limited use to scholars estimating multivariate models of turnout, recent methodological advances (Blair and Imai 2010; Imai 2011; Blair and Imai 2012; Jann, Jerke, and Krumpal 2012; Blair, Imai, and Zhou 2015) have made estimating such models relatively straightforward. In an online survey experiment fielded shortly after the 2015 UK general election, we design new CM and ICT turnout questions and test them against a standard direct turnout question and a direct question with face-saving response options. Our findings show that while all question designs overestimate aggregate national turnout, ICT yields more accurate estimates compared to the standard direct question, whereas the face-saving design and CM improve accuracy little or not at all. Also, regression models of turnout estimated using ICT measures yield inferences regarding the correlates of electoral participation that are more consistent with those from models estimated using “gold-standard” validated vote measures. In contrast to recent studies that cast doubt on the suitability of ICT questions for reducing turnout misreporting in online surveys (Holbrook and Krosnick 2010b; Thomas et al. 2017), we show that ICT questions designed following current best practice appear to substantially reduce turnout misreporting in an online survey. Our results suggest that earlier mixed findings regarding ICT’s effectiveness could be due to the particular ICT designs used in those studies. TURNOUT AS A SENSITIVE TOPIC Existing research has sought to alleviate turnout misreporting in a number of ways. One approach is to disregard self-reports and instead measure respondent turnout using official records. Such “vote validation” exercises have been undertaken in several national election studies (e.g., in Sweden, New Zealand, Norway, the UK, and—until 1990—the United States). Although often considered the gold standard in dealing with misreporting, the vote validation approach in the US context has raised doubts, with Berent, Krosnick, and Lupia (2016) showing that matching errors artificially drive down “validated” turnout rates. While it is an open question to what extent matching errors are an issue outside the US context, vote validation has two additional downsides that limit its utility as a general solution for turnout misreporting. First, in many countries official records of who has voted in an election are not available. Second, these records, when available, are often decentralized, making validation a time-consuming and expensive undertaking. Another set of approaches for dealing with turnout misreporting focus on alleviating social desirability bias (for overviews, see Tourangeau and Yan [2007]; Holbrook and Krosnick [2010b]). Voting is an admired and highly valued civic behavior (Holbrook, Green, and Krosnick 2003; Karp and Brockington 2005; Bryan et al. 2011), creating incentives for nonvoters to deliberately or unconsciously misreport when asked about their electoral participation. Starting from this premise, some suggest that misreporting can be alleviated via appropriate choice of survey mode, with respondents more willing to report sensitive information in self- rather than interviewer-administered surveys (Hochstim 1967). Although Holbrook and Krosnick (2010b) find that turnout misreporting is reduced in self-administered online surveys compared to interviewer-administered telephone surveys, a systematic review of over 100 postelection surveys found no significant difference in turnout misreporting across survey modes (Selb and Munzert 2013). Reviewing studies on a variety of sensitive topics, Tourangeau and Yan (2007, p. 878) conclude that “even when the questions are self-administered... many respondents still misreport.” If choice of survey mode alone cannot resolve the misreporting problem, can we design turnout questions that do? One design-based approach for reducing misreporting is the “bogus pipeline” (Jones and Sigall 1971; Roese and Jamieson 1993), where the interviewer informs the respondent that their answer to the sensitive question will be verified against official records, thus increasing the respondent’s motivation to tell the truth (assuming being caught lying is more embarrassing than admitting to the sensitive behavior). Hanmer, Banks, and White (2014) find that this approach significantly reduces turnout misreporting. However, provided researchers do not want to mislead survey respondents, the applicability of the bogus pipeline is limited, since it necessitates vote validation for at least some respondents, which is costly and sometimes impossible. A simple alternative design-based approach is to combine “forgiving” question wording (Fowler 1995), which attempts to normalize nonvoting in the question preamble, with the provision of answer options that permit the respondent to admit nonvoting in a “face-saving” manner. Although turnout misreporting is unaffected by “forgiving” wording2 (Abelson, Loftus, and Greenwald 1992; Holtgraves, Eck, and Lasky 1997; Persson and Solevid 2014) and only moderately reduced by “face-saving” answer options (Belli et al. 1999; Belli, Moore, and VanHoewyk 2006; Persson and Solevid 2014; Zeglovits and Kritzinger 2014), many election studies incorporate one or both of these features in their turnout questions. We therefore include such turnout question designs as comparators in our experiments. Other design-based approaches to the misreporting problem involve indirect questions, which aim to reduce social desirability pressures by protecting privacy such that survey researchers are unable to infer individual respondents’ answers to the sensitive item. The well-known randomized response technique ensures this using a randomization device: Warner (1965), for example, asks respondents to truthfully state either whether they do bear the sensitive trait of interest, or whether they do not bear the sensitive trait of interest, based on the outcome of a whirl of a spinner unobserved by the interviewer. The researcher is thus unaware of which question an individual respondent is answering, but can estimate the rate of the sensitive behavior in the sample because she knows the probability with which respondents answer each item. Research suggests that this design fails to reduce turnout misreporting (Locander, Sudman, and Bradburn 1976; Holbrook and Krosnick 2010a) and raises concerns about its practicality: in telephone and self-administered surveys, it is difficult to ensure that respondents have a randomization device to hand and that they appropriately employ it (Holbrook and Krosnick 2010a).3 Recognizing these practical limitations, researchers have developed variants of the randomized response technique that do not require randomization devices. One recent example is the crosswise model (CM) (Yu, Tian, and Tang 2008; Tan, Tian, and Tang 2009) where respondents are asked two yes/no questions—a nonsensitive question where the population distribution of true responses is known, and the sensitive question of substantive interest—and indicate only whether or not their answers to the questions are identical. Based on respondents’ answers and the known distribution of answers to the nonsensitive item, researchers can again estimate the rate of the sensitive trait. CM has been shown to reduce misreporting on some sensitive topics (e.g., Coutts and Jann 2011; Jann, Jerke, and Krumpal 2012), but is as yet untested with regard to turnout. A final example of indirect questioning is the item-count technique (ICT), or “list experiment.” In this design, respondents are randomized into a control and treatment group. The control group receives a list of nonsensitive items, while the treatment group receives the same list plus the sensitive item. Respondents are asked to count the total number of listed items that satisfy certain criteria rather than answering with regard to each individual listed item. The prevalence of the sensitive trait is estimated based on the difference in mean item counts across the two groups (Miller 1984; Droitcour et al. 1991). The ICT performance record is mixed, with regard to both turnout (e.g., Holbrook and Krosnick 2010b; Comşa and Postelnicu 2013; Thomas et al. 2017) and other sensitive survey items (e.g., Tourangeau and Yan 2007; Wolter and Laier 2014). This mixed success may reflect the challenges researchers face in creating valid lists of control items—challenges that have been addressed in a recent series of articles (Blair and Imai 2012; Glynn 2013; Aronow et al. 2015). Below, we investigate whether an ICT question designed according to current best practice can reduce nonvoter misreporting in an online survey. Methods EXPERIMENTAL DESIGN Our survey experiment was designed to test whether new ICT and CM turnout question designs are effective at reducing misreporting, relative to more standard direct turnout questions with forgiving wording and face-saving response options. Our experiment was run online through YouGov across four survey waves in the aftermath of the UK general election on May 7, 2015 (see the Appendix for further sampling details). To limit memory error concerns, fieldwork was conducted soon after the election, June 8–15, 2015, with a sample of 6,228 respondents from the British population. Appendix Table A.2 reports sample descriptives, showing that these are broadly in line with those from the British Election Study (BES) face-to-face postelection survey, a high-quality probability sample, and with census data. SURVEY INSTRUMENTS Respondents were randomly assigned to one of four turnout questions. Direct question: Our baseline turnout question is the direct question used by the BES, which already incorporates a “forgiving’” introduction. Respondents were asked: “Talking with people about the recent general election on May 7th, we have found that a lot of people didn’t manage to vote. How about you, did you manage to vote in the general election?” Respondents could answer yes or no, or offer “don’t know.” The estimated aggregate turnout from this question is the (weighted or unweighted) proportion of respondents answering “Yes.” Direct face-saving question: This variant incorporates the preamble and question wording of the direct question, but response options are now those that Belli, Moore, and VanHoewyk (2006) propose for when data are collected within a few weeks of Election Day: “I did not vote in the general election”; “I thought about voting this time but didn’t”; “I usually vote but didn’t this time”; “I am sure I voted in the general election”; and “Don’t know.” The second and third answer options allow respondents to report nonvoting in the election while also indicating having had some intent to vote or having voted on other occasions, and may therefore make it easier for nonvoters to admit not having voted. Aggregate turnout is estimated as the (weighted or unweighted) proportion of respondents giving the penultimate response. Crosswise model (CM): Our CM question involves giving respondents the following question: “Talking with people about the recent general election on May 7th, we have found that a lot of people didn’t manage to vote or were reluctant to say whether or not they had voted. In order to provide additional protection of your privacy, this question uses a method to keep your answer totally confidential, so that nobody can tell for sure whether you voted or not. Please read the instructions carefully before answering the question. “Two questions are asked below. Please think about how you would answer each question separately (either with Yes or No). After that please indicate whether your answers to both questions are the same (No to both questions or Yes to both questions) or different (Yes to one question and No to the other).” The two questions were “Is your mother’s birthday in January, February or March?” and “Did you manage to vote in the general election?” This follows Jann, Jerke, and Krumpal (2012) in asking about parental birthdays as the nonsensitive question, as this satisfies key criteria for CM effectiveness (Yu, Tian, and Tang 2008): the probability of an affirmative response is known, unequal to 0.5, and uncorrelated with true turnout. We calculate the probability that a respondent’s mother was born in January, February, or March based on Office of National Statistics data on the birth dates of British women, 1938–1983. The calculated probability is 25.2 percent. So that respondents understand why they are being asked such a complex question, and consistent with Jann, Jerke, and Krumpal (2012), the preamble explicitly states that the question is designed to protect privacy. Following Yu, Tian, and Tang (2008), the CM estimate of aggregate turnout is π^CM=(r/n+p−1)/(2p−1), where n is the total number of respondents, r is the number who report matching answers, and p is the known probability of an affirmative answer to the nonsensitive question. The standard error is se^(π^CM)=((r/n)(1−r/n))/((n−1)(2p−1)2).4 Item-count technique (ICT): In the ICT design, respondents were asked: “The next question deals with the recent general election on May 7th. Here is a list of four (five) things that some people did and some people did not do during the election campaign or on Election Day. Please say how many of these things you did.” The list asked respondents whether they had: discussed the election with family and friends; voted in the election (sensitive item); criticized a politician on social media; avoided watching the leaders debate; and put up a poster for a political party in their window or garden. Respondents could provide an answer between 0 and 4 or say they did not know. This design incorporates a number of recommendations from recent studies of ICT effectiveness. First, to avoid drawing undue attention to our sensitive item, each nonsensitive item relates to activities that respondents might engage in during election periods (Kuklinski, Cobb, and Gilens 1997; Aronow et al. 2015; Lax, Phillips, and Stollwerk 2016). This contrasts with existing ICT-based turnout questions, which include non-political behaviors in the control list (Holbrook and Krosnick 2010b; Comşa and Postelnicu 2013; Thomas et al. 2017) and which have had mixed success in reducing misreporting. Second, we are careful to avoid ceiling and floor effects, which occur when a respondent in the treatment group engages in either all or none of the nonsensitive behaviors and therefore perceives that their answer to the sensitive item is no longer concealed from the researcher (Blair and Imai 2012; Glynn 2013). To minimize such effects, we include a “low-cost” control activity that most respondents should have undertaken (“discussed the election with family and friends”) and a “high-cost” activity that few respondents should have undertaken (“put up a poster for a political party”). In addition to implementing these recommendations, the control list includes some “norm-defiant” behaviors, such as “avoided watching the leaders debate” and “criticised a politician on social media.” Our intent here is to reduce embarrassment at admitting nonvoting by signaling to respondents that it is recognized that some people do not like and/or do not engage with politics. Unlike the CM design, and consistent with standard ICT designs for online surveys (e.g., Aronow et al. 2015; Lax, Phillips, and Stollwerk 2016), the preamble does not explicitly state that the question is designed to protect privacy. Our ICT-based estimate of aggregate turnout is the difference in (weighted or unweighted) mean item counts comparing the control and treatment groups (Blair and Imai 2012). For the weighted estimate, standard errors were calculated using Taylor linearization in the “survey” package (Lumley 2004) in R. Tests reported in Supplementary Materials Section B report diagnostics suggesting that this ICT design successfully minimizes ceiling and floor effects and satisfies other key identifying assumptions laid out in Blair and Imai (2012). RANDOMIZATION Respondents were randomly assigned to one of the four turnout questions described above. Due to its lower statistical efficiency, ICT received double weight in the randomization. Of the 6,228 respondents, 1,260 received the direct question, 1,153 the direct face-saving question, 2,581 the ICT question, and 1,234 the CM question. Supplementary Materials Section A suggests that randomization was successful. Results COMPARING TURNOUT ESTIMATES We begin our analysis by comparing headline turnout estimates. Figure 1 displays, for each survey technique, weighted and unweighted Britain-wide turnout estimates. Given the similarity between weighted and unweighted estimates, we focus on the former.5 Figure 1. View largeDownload slide 2015 turnout estimates by estimation method. For each turnout question, points indicate weighted and unweighted point estimates for 2015 general election turnout. Lines indicate 95 percent confidence intervals. The dashed vertical line indicates actual GB turnout. Figure 1. View largeDownload slide 2015 turnout estimates by estimation method. For each turnout question, points indicate weighted and unweighted point estimates for 2015 general election turnout. Lines indicate 95 percent confidence intervals. The dashed vertical line indicates actual GB turnout. The standard direct technique performs poorly, yielding a turnout estimate of 91.2 percent [89.3 percent, 93 percent], 24.7 points higher than actual turnout. In line with previous US (Belli, Moore, and VanHoewyk 2006) and Austrian (Zeglovits and Kritzinger 2014) studies, the face-saving question yields a modest improvement. It significantly reduces estimated turnout compared to the direct technique, but still performs poorly in absolute terms, estimating turnout at 86.6 percent [84.1 percent, 89 percent], 20.1 points higher than actual turnout. CM performs worst of all the techniques we test, estimating turnout at 94.3 percent [88.4 percent, 100 percent], 27.9 points higher than actual turnout. In contrast, while ICT is clearly less efficient (with a relatively wide confidence interval), it nevertheless yields a substantively and statistically significant improvement in turnout estimate accuracy compared to all other techniques.6 Though still 9.2 points higher than actual GB turnout, the ICT estimate of 75.7 percent [66.9 percent, 84.4 percent] represents a two-thirds reduction in error compared to the direct question estimate. Taking the difference between the ICT and direct turnout estimates in our data, one gets an implied misreporting rate of 15.5 percent [6.5 percent, 24.4 percent]. The confidence interval contains—and is therefore consistent with—the 10 percent rate of misreporting found by Rivers and Wells (2015), who validate the votes of a subset of YouGov respondents after the 2015 general election. In sum, the face-saving and ICT questions yield aggregate turnout estimates that are, respectively, moderately and substantially more accurate than those from the direct question, while CM yields no improvement.7 ICT, however, still overestimates actual 2015 turnout, which may partly be because ICT does not correct all misreporting. It may also be partly explained by the fact that while YouGov samples from this period have been found to overestimate aggregate turnout due to both misreporting and oversampling of politically interested individuals who are more likely to vote (Rivers and Wells 2015), ICT tackles only misreporting. Before probing the face-saving and ICT results using multivariate analysis, we pause to consider why the CM design failed. One possibility is that, faced with a somewhat unusual question and in the absence of a practice run, some respondents found the CM question unduly taxing and simply answered “don’t know.” If the propensity to do so is negatively correlated with turnout, this could explain why CM overestimates turnout. However, table 1 casts doubt on this explanation, showing that the proportion of “don’t know” responses are not substantially higher for CM compared to other treatments.8 Table 1. Rates of “don’t know” responses by treatment group Method “Don’t know” rate Direct 0.020 Face-saving 0.016 ICT control 0.036 ICT sensitive 0.021 CM 0.036 Method “Don’t know” rate Direct 0.020 Face-saving 0.016 ICT control 0.036 ICT sensitive 0.021 CM 0.036 Note.—Entries show, for each treatment group, the rate of “don’t know” responses to the item measuring turnout. View Large Table 1. Rates of “don’t know” responses by treatment group Method “Don’t know” rate Direct 0.020 Face-saving 0.016 ICT control 0.036 ICT sensitive 0.021 CM 0.036 Method “Don’t know” rate Direct 0.020 Face-saving 0.016 ICT control 0.036 ICT sensitive 0.021 CM 0.036 Note.—Entries show, for each treatment group, the rate of “don’t know” responses to the item measuring turnout. View Large A more plausible explanation for the disappointing performance of our CM lies in a combination of two features of this design. First, while such an unusual design necessitates an explanatory preamble, stating that it represents “an additional protection of your privacy,” may heighten the perceived sensitivity of the turnout question for respondents (Clifford and Jerit 2015). Second, in the absence of a run-through illustrating how the design preserves anonymity, respondents whose sensitivity was heightened by the preamble may distrust the design and become particularly susceptible to social desirability bias. This is consistent with Coutts and Jann (2011), who find that in an online setting, randomized response designs—which share many characteristics with CM—elicit relatively low levels of respondent trust. Solving this problem is not easy: doing a CM run-through in online surveys is time consuming and may frustrate respondents.9 COMPARING TURNOUT MODELS The improvement in aggregate turnout estimates yielded by face-saving and ICT questions suggests that they may alleviate turnout misreporting compared to the direct question. But do these techniques also yield inferences concerning the predictors of turnout that are more consistent with those drawn from data where misreporting is absent? To address this question, we estimate demographic models of turnout for the 2015 British general election based on direct, face-saving, and ICT questions. (Given its poor performance in estimating aggregate turnout, we do not estimate a model for the CM question.) We then compare each of these models against a benchmark model estimated using validated measures of individual turnout, based on official electoral records rather than respondent self-reports.10 To the best of our knowledge, the only publicly available individual-level validated vote measures for the 2015 general election are those from the postelection face-to-face survey of the 2015 British Election Study (Fieldhouse et al. 2016).11 Generated via probability sampling and persistent recontact efforts, the 2015 BES face-to-face survey is widely considered to be the “gold standard” among 2015 election surveys in terms of survey sample quality (Sturgis et al. 2016, p. 48). If the models estimated from online survey data using our turnout measures yield similar inferences to those estimated from the BES face-to-face data using validated turnout measures, we can be more confident that the former are properly correcting for misreporting.12 We estimate four regression models. First, a benchmark model is estimated using the 1,690 BES face-to-face respondents whose turnout was validated.13 This is a binary logistic regression with a response variable coded as 1 if official records show a respondent voted and 0 otherwise. Our second and third models are binary logistic regressions estimated using our online survey data and have as their response variable turnout as measured by the direct question and the direct face-saving question, respectively. For our fourth model, we use the ICT regression methods developed in Imai (2011) to model the responses to the ICT question in our online survey.14 All four regression models include the same explanatory variables. First, we include a measure of self-reported party identification.15 To avoid unduly small subsamples, respondents are classified into four groups: Conservative identifiers; Labour identifiers; identifiers of any other party; and those who do not identify with any party or who answer “don’t know.” Our second and third explanatory variables are age group (18–24; 25–39; 40–59; 60 and above) and gender (male or female). Our fourth explanatory variable is a respondent’s highest level of educational qualification, classified according to the UK Regulated Qualifications Framework (no qualifications, unclassified qualifications, or don’t know; Levels 1–2; Level 3; Level 4 and above). These predictors constitute the full set of variables that are measured in a comparable format in both our experimental data and the 2015 BES face-to-face data. Logistic regression coefficients are difficult to substantively interpret or compare across models. Therefore, we follow Blair and Imai (2012) and focus on predicted prevalence of the sensitive behavior for different political and demographic groups. Specifically, for a given sample and regression model, we ask what the predicted turnout rate in the sample would be if all BES face-to-face respondents were assigned to a particular category on a variable of interest, while holding all other explanatory variables at their observed values.16 Figure 2 graphs the group-specific predicted turnout rates for the regression models. The left panel shows that the regression based on the direct question (open circles) generates group-specific predicted turnout rates that all far exceed those from the benchmark validated vote model (filled circles).17 It also performs poorly in terms of recovering how turnout is associated with most variables. While the benchmark model yields predicted turnout rates for older and more qualified voters that are noticeably higher than for younger and less qualified voters, there is barely any variation in turnout rates by age group and education according to the direct question model. Only with respect to party identification does the direct question model recover the key pattern present in the benchmark model: that those with no clear party identification are less likely to vote than those who do. Figure 2. View largeDownload slide Comparing turnout models against BES validated vote model. Political and demographic groups are listed along the y-axis. For each group, we plot the predicted turnout rates based on a regression model, averaging over the distribution of covariates in the BES validated vote sample. Predicted turnout rates for the direct, face-saving, and ICT regression models are shown, respectively, in the left, middle, and right panels (open circles). Predicted turnout rates from the benchmark BES validated vote model are displayed in every panel (filled circles). Reading example: the filled circle for “Male” in each panel indicates that based on the validated vote regression model, if we set all respondents in the BES sample to “Male” while holding all other explanatory variables at their observed values, the predicted turnout rate would be 73.8 percent [70.1 percent, 76.7 percent]. Figure 2. View largeDownload slide Comparing turnout models against BES validated vote model. Political and demographic groups are listed along the y-axis. For each group, we plot the predicted turnout rates based on a regression model, averaging over the distribution of covariates in the BES validated vote sample. Predicted turnout rates for the direct, face-saving, and ICT regression models are shown, respectively, in the left, middle, and right panels (open circles). Predicted turnout rates from the benchmark BES validated vote model are displayed in every panel (filled circles). Reading example: the filled circle for “Male” in each panel indicates that based on the validated vote regression model, if we set all respondents in the BES sample to “Male” while holding all other explanatory variables at their observed values, the predicted turnout rate would be 73.8 percent [70.1 percent, 76.7 percent]. The middle panel shows that the regression based on the face-saving turnout question (open circles) improves somewhat on the direct question regression. The group-specific predicted turnout rates are generally slightly closer to the benchmark rates (filled circles), although most remain significantly higher. In terms of relative turnout patterns, there is some evidence of higher predicted turnout rates for higher age groups, but the differences between young and old voters are too small and predicted turnout rate barely varies by education. In addition, the difference in the predicted turnout rates of those with and without a clear party identity is actually more muted in the face-saving model than in the benchmark model or the direct question model. The right panel shows that the regression based on the ICT turnout questions (open circles) improves on both the direct and face-saving models. Although the uncertainty surrounding each group-specific turnout rate is considerably greater, most point estimates are closely aligned with the benchmark rates (filled circles). Moreover, this is not simply the result of an intercept shift: the ICT model also recovers relative patterns of turnout that are generally more consistent with the benchmark model. Regarding party identification, the difference in predicted turnout rates of those who do and do not have a clear party identification is of similar magnitude to that in the direct and face-saving models. Regarding age and education, as in the benchmark model, predicted turnout rates increase substantially with age group and qualification level.18 Predicted turnout for 18–24-year-olds seems unduly low. But there is considerable uncertainty surrounding this estimate due to the small proportion of respondents in this age group in the online sample (see Appendix Table A.2).19 Table 2 summarizes the performance of the different models vis-à-vis the benchmark model. The first three columns show the mean, median, and maximum absolute differences in predicted group-specific turnout rates across the 14 political and demographic groups listed in figure 2, comparing the benchmark model with each of the three remaining models. According to all measures, the face-saving model performs slightly better than the direct question model. But the ICT model performs substantially better than both, reducing mean and median discrepancies from the benchmark model by almost two-thirds. The final column of table 2 gives the fraction of group-specific predicted turnout rates that are significantly different from their benchmark model counterpart (0.05 significance level, two-tailed).20 While almost all of the predicted turnout rates from the direct and face-saving models are significantly different from their benchmark counterparts, this is the case for only two of the 14 predicted turnout rates from the ICT model. Table 2. Summary of differences from benchmark validated vote model Absolute differences Sig. difference Method Mean Median Maximum Direct 0.2 0.18 0.34 14/14 Face-saving 0.15 0.14 0.28 13/14 ICT 0.06 0.05 0.18 2/14 Absolute differences Sig. difference Method Mean Median Maximum Direct 0.2 0.18 0.34 14/14 Face-saving 0.15 0.14 0.28 13/14 ICT 0.06 0.05 0.18 2/14 Note.—For a given test model (row), the first three columns show the mean, median, and maximum absolute discrepancy between group-specific predicted turnout rates generated by this model and those generated by the benchmark validated vote model. The final column gives the fraction of group-specific predicted turnout rates that are significantly different from their benchmark model counterpart (0.05 significance level, two-tailed). View Large Table 2. Summary of differences from benchmark validated vote model Absolute differences Sig. difference Method Mean Median Maximum Direct 0.2 0.18 0.34 14/14 Face-saving 0.15 0.14 0.28 13/14 ICT 0.06 0.05 0.18 2/14 Absolute differences Sig. difference Method Mean Median Maximum Direct 0.2 0.18 0.34 14/14 Face-saving 0.15 0.14 0.28 13/14 ICT 0.06 0.05 0.18 2/14 Note.—For a given test model (row), the first three columns show the mean, median, and maximum absolute discrepancy between group-specific predicted turnout rates generated by this model and those generated by the benchmark validated vote model. The final column gives the fraction of group-specific predicted turnout rates that are significantly different from their benchmark model counterpart (0.05 significance level, two-tailed). View Large Overall, this analysis suggests that, as well as generating better aggregate estimates of turnout, ICT outperforms other techniques when it comes to estimating how turnout varies across political and demographic groups. Conclusions This paper compared the performance of several sensitive survey techniques designed to reduce turnout misreporting in postelection surveys. To do so, we ran an experiment shortly after the 2015 UK general election. One group of respondents received the standard BES turnout question. Another group received a face-saving turnout question previously untested in the UK. For a third group, we measured turnout using the crosswise model, the first time this has been tested in the turnout context. For a fourth group, we measured turnout using a new item-count question designed following current best practice. ICT estimates of aggregate turnout were significantly closer to the official 2015 turnout rate. We also introduced a more nuanced approach to validating ICT turnout measures: comparing inferences from demographic models of turnout estimated using ICT measures to those from models estimated using validated vote measures. Inferences from the ICT model were consistently closer to, and often statistically indistinguishable from, those from the benchmark validated vote model. Thus, in contrast to Holbrook and Krosnick (2010b) and Thomas et al. (2017), our findings suggest that carefully designed ICTs can significantly reduce turnout misreporting in online surveys. This suggests that in settings where practical or financial constraints make vote validation impossible, postelection surveys might usefully include ICT turnout questions. We also found that the direct turnout question with face-saving options did improve on the standard direct question, in both the accuracy of aggregate turnout estimates and the validity of demographic turnout models. However, consistent with previous research (e.g., Belli, Moore, and VanHoewyk 2006; Zeglovits and Kritzinger 2014), these improvements were moderate compared to those from ICT. In contrast, CM performed no better or worse than the standard direct turnout question in terms of estimating aggregate turnout. Taken together with Holbrook and Krosnick (2010a), this finding highlights the difficulty of successfully implementing randomized response questions and variants thereof in self-administered surveys. Of course, there are limitations to our findings. First, our evidence comes only from online surveys and the mechanisms behind social desirability bias may be different in this mode compared to when a respondent interacts with a human interviewer by telephone or face-to-face. That said, other studies do show that ICT reduces misreporting in telephone (Holbrook and Krosnick 2010b) and face-to-face surveys (Comşa and Postelnicu 2013). Second, a well-acknowledged drawback of ICT is its statistical inefficiency. While ICT significantly improves on other techniques despite this inefficiency, future research should investigate whether further efficiency-improving adaptations of the ICT design—such as the “double-list experiment” (Droitcour et al. 1991) and combining direct questions with ICT (Aronow et al. 2015)—are effective in the context of turnout measurement. Finally, our regression validation focused only on how basic descriptive respondent characteristics are correlated with turnout and our survey was conducted during one specific time period in relation to the election. Future research could also validate using attitudinal turnout correlates and could compare turnout questions when fielded closer to and further from Election Day. Supplementary Data Supplementary data are freely available at Public Opinion Quarterly online. The authors thank participants at the North East Research Development (NERD) Workshop and the 2016 Annual Meeting of the European Political Science Association in Brussels, as well as the editors and three anonymous reviewers, for helpful comments. This work was supported solely by internal funding to P.M.K. from the School of Government and International Affairs, Durham University. Appendix: Information on survey samples EXPERIMENTAL SURVEY DATA Our survey experiment was fielded via four online surveys run by YouGov. The fieldwork dates for each survey “wave” were, respectively, June 8–9 (Wave 1), June 9–10 (Wave 2), June 10–11 (Wave 3), and June 11–12, 2015 (Wave 4). Table A.1 reports the sample size for each treatment group in each survey wave. Table A.1. Treatment sample sizes by survey wave Treatment group Wave Direct Face-Saving ICT control ICT sensitive CM All 1 307 289 292 333 295 1516 2 335 312 350 342 311 1650 3 326 271 329 290 314 1530 4 292 281 324 321 314 1532 Treatment group Wave Direct Face-Saving ICT control ICT sensitive CM All 1 307 289 292 333 295 1516 2 335 312 350 342 311 1650 3 326 271 329 290 314 1530 4 292 281 324 321 314 1532 Note.—This table shows the distribution of treatment assignment by survey wave. View Large Table A.1. Treatment sample sizes by survey wave Treatment group Wave Direct Face-Saving ICT control ICT sensitive CM All 1 307 289 292 333 295 1516 2 335 312 350 342 311 1650 3 326 271 329 290 314 1530 4 292 281 324 321 314 1532 Treatment group Wave Direct Face-Saving ICT control ICT sensitive CM All 1 307 289 292 333 295 1516 2 335 312 350 342 311 1650 3 326 271 329 290 314 1530 4 292 281 324 321 314 1532 Note.—This table shows the distribution of treatment assignment by survey wave. View Large The target population for each survey wave was the adult population of Great Britain. YouGov maintains an online panel of over 800,000 UK adults (recruited via their own website, advertising, and partnerships with other websites) and holds data on the sociodemographic characteristics and newspaper readership of each panel member. Drawing on this information, YouGov uses targeted quota sampling, not random probability sampling, to select a subsample of panelists for participation in each survey. Quotas are based on the distribution of age, gender, social grade, party identification, region, and type of newspaper readership in the British adult population. YouGov has multiple surveys running at any time and uses a proprietary algorithm to determine, on a rolling basis, which panelists to email invites to and how to allocate invitees to surveys when they respond. Any given survey thus contains a reasonable number of panelists who are “slow” to respond to invites. Along with the modest cash incentives YouGov offers to survey participants, this is designed to increase the rate at which less politically engaged panelists take part in a survey. Due to the way respondents are assigned to surveys, YouGov does not calculate a per-survey participation rate. However, the overall rate at which panelists invited to participate in a survey do respond is 21 percent. The average response time for an email invite is 19 hours from the point of sending. Descriptive statistics for the sample are provided in table A.2, and a comparison to 2015 UK population characteristics is provided in table A.3. Table A.2. Sample characteristics: experimental data versus 2015 BES face-to-face survey Experimental BES N Mean N Mean Age group 18–24 6,227 0.08 2,955 0.07 Age group 25–39 6,227 0.18 2,955 0.21 Age group 40–59 6,227 0.41 2,955 0.35 Age group 60+ 6,227 0.33 2,955 0.37 Female 6,228 0.52 2,987 0.54 Male 6,228 0.48 2,987 0.46 Qualifications None/Other/Don’t know 6,228 0.21 2,987 0.30 Qualifications Level 1–2 6,228 0.22 2,987 0.21 Qualifications Level 3 6,228 0.19 2,987 0.13 Qualifications Level 4+ 6,228 0.38 2,987 0.36 Party ID None/Don’t know 6,228 0.17 2,964 0.16 Party ID Conservative 6,228 0.29 2,964 0.31 Party ID Labour 6,228 0.29 2,964 0.32 Party ID other party 6,228 0.24 2,964 0.21 Social grade DE 6,228 0.20 Social grade C2 6,228 0.15 Social grade C1 6,228 0.26 Social grade AB 6,228 0.39 Wave 1 6,228 0.24 Wave 2 6,228 0.26 Wave 3 6,228 0.25 Wave 4 6,228 0.25 Direct treatment 6,228 0.20 Face-saving treatment 6,228 0.19 ICT control treatment 6,228 0.21 ICT sensitive treatment 6,228 0.21 CM treatment 6,228 0.20 Experimental BES N Mean N Mean Age group 18–24 6,227 0.08 2,955 0.07 Age group 25–39 6,227 0.18 2,955 0.21 Age group 40–59 6,227 0.41 2,955 0.35 Age group 60+ 6,227 0.33 2,955 0.37 Female 6,228 0.52 2,987 0.54 Male 6,228 0.48 2,987 0.46 Qualifications None/Other/Don’t know 6,228 0.21 2,987 0.30 Qualifications Level 1–2 6,228 0.22 2,987 0.21 Qualifications Level 3 6,228 0.19 2,987 0.13 Qualifications Level 4+ 6,228 0.38 2,987 0.36 Party ID None/Don’t know 6,228 0.17 2,964 0.16 Party ID Conservative 6,228 0.29 2,964 0.31 Party ID Labour 6,228 0.29 2,964 0.32 Party ID other party 6,228 0.24 2,964 0.21 Social grade DE 6,228 0.20 Social grade C2 6,228 0.15 Social grade C1 6,228 0.26 Social grade AB 6,228 0.39 Wave 1 6,228 0.24 Wave 2 6,228 0.26 Wave 3 6,228 0.25 Wave 4 6,228 0.25 Direct treatment 6,228 0.20 Face-saving treatment 6,228 0.19 ICT control treatment 6,228 0.21 ICT sensitive treatment 6,228 0.21 CM treatment 6,228 0.20 Note.—All respondent attributes were coded as binary indicators. Columns 1–2 and 3–4 summarize, respectively, the distribution of each indicator in our experimental data and in the 2015 BES face-to-face sample. View Large Table A.2. Sample characteristics: experimental data versus 2015 BES face-to-face survey Experimental BES N Mean N Mean Age group 18–24 6,227 0.08 2,955 0.07 Age group 25–39 6,227 0.18 2,955 0.21 Age group 40–59 6,227 0.41 2,955 0.35 Age group 60+ 6,227 0.33 2,955 0.37 Female 6,228 0.52 2,987 0.54 Male 6,228 0.48 2,987 0.46 Qualifications None/Other/Don’t know 6,228 0.21 2,987 0.30 Qualifications Level 1–2 6,228 0.22 2,987 0.21 Qualifications Level 3 6,228 0.19 2,987 0.13 Qualifications Level 4+ 6,228 0.38 2,987 0.36 Party ID None/Don’t know 6,228 0.17 2,964 0.16 Party ID Conservative 6,228 0.29 2,964 0.31 Party ID Labour 6,228 0.29 2,964 0.32 Party ID other party 6,228 0.24 2,964 0.21 Social grade DE 6,228 0.20 Social grade C2 6,228 0.15 Social grade C1 6,228 0.26 Social grade AB 6,228 0.39 Wave 1 6,228 0.24 Wave 2 6,228 0.26 Wave 3 6,228 0.25 Wave 4 6,228 0.25 Direct treatment 6,228 0.20 Face-saving treatment 6,228 0.19 ICT control treatment 6,228 0.21 ICT sensitive treatment 6,228 0.21 CM treatment 6,228 0.20 Experimental BES N Mean N Mean Age group 18–24 6,227 0.08 2,955 0.07 Age group 25–39 6,227 0.18 2,955 0.21 Age group 40–59 6,227 0.41 2,955 0.35 Age group 60+ 6,227 0.33 2,955 0.37 Female 6,228 0.52 2,987 0.54 Male 6,228 0.48 2,987 0.46 Qualifications None/Other/Don’t know 6,228 0.21 2,987 0.30 Qualifications Level 1–2 6,228 0.22 2,987 0.21 Qualifications Level 3 6,228 0.19 2,987 0.13 Qualifications Level 4+ 6,228 0.38 2,987 0.36 Party ID None/Don’t know 6,228 0.17 2,964 0.16 Party ID Conservative 6,228 0.29 2,964 0.31 Party ID Labour 6,228 0.29 2,964 0.32 Party ID other party 6,228 0.24 2,964 0.21 Social grade DE 6,228 0.20 Social grade C2 6,228 0.15 Social grade C1 6,228 0.26 Social grade AB 6,228 0.39 Wave 1 6,228 0.24 Wave 2 6,228 0.26 Wave 3 6,228 0.25 Wave 4 6,228 0.25 Direct treatment 6,228 0.20 Face-saving treatment 6,228 0.19 ICT control treatment 6,228 0.21 ICT sensitive treatment 6,228 0.21 CM treatment 6,228 0.20 Note.—All respondent attributes were coded as binary indicators. Columns 1–2 and 3–4 summarize, respectively, the distribution of each indicator in our experimental data and in the 2015 BES face-to-face sample. View Large Table A.3. Sample characteristics compared to 2011 Census Experimental BES Census Age group 18–24 8.0 7.3 11.8 Age group 25–39 18.3 20.9 25.4 Age group 40–59 41.1 34.9 34.2 Age group 60+ 32.6 36.9 28.6 Female 52.3 54.1 51.4 Male 47.7 45.9 48.6 Experimental BES Census Age group 18–24 8.0 7.3 11.8 Age group 25–39 18.3 20.9 25.4 Age group 40–59 41.1 34.9 34.2 Age group 60+ 32.6 36.9 28.6 Female 52.3 54.1 51.4 Male 47.7 45.9 48.6 Note.—The first two column show the relative frequency of age groups and gender in the experimental data and in the 2015 BES face-to-face survey. The final column shows the GB population frequency of each demographic group according to the 2011 Census. View Large Table A.3. Sample characteristics compared to 2011 Census Experimental BES Census Age group 18–24 8.0 7.3 11.8 Age group 25–39 18.3 20.9 25.4 Age group 40–59 41.1 34.9 34.2 Age group 60+ 32.6 36.9 28.6 Female 52.3 54.1 51.4 Male 47.7 45.9 48.6 Experimental BES Census Age group 18–24 8.0 7.3 11.8 Age group 25–39 18.3 20.9 25.4 Age group 40–59 41.1 34.9 34.2 Age group 60+ 32.6 36.9 28.6 Female 52.3 54.1 51.4 Male 47.7 45.9 48.6 Note.—The first two column show the relative frequency of age groups and gender in the experimental data and in the 2015 BES face-to-face survey. The final column shows the GB population frequency of each demographic group according to the 2011 Census. View Large 2015 BRITISH ELECTION STUDY FACE-TO-FACE SURVEY The 2015 British Election Study face-to-face study (Fieldhouse et al. 2016) was funded by the British Economic and Social Research Council (ESRC). Fieldwork was conducted by GfK Social Research between May 8 and September 13, 2015, with 97 percent of the interviews being conducted within three months of the general election date (May 7, 2015). Interviews were carried out via computer-assisted interviewing. Full details of the sampling procedure are given in Moon and Bhaumik (2015). Here we provide a brief overview based on their account. The sample was designed to be representative of all British adults who were eligible to vote in the 2015 general election. It was selected via multistage cluster sampling as follows: first, a stratified random sample of 300 parliamentary constituencies was drawn; second, two Lower Layer Super Output Areas (LSOAs) per constituency were randomly selected, with probability proportional to size; third, household addresses were sampled randomly within each LSOA; and fourth, one individual was randomly selected per household. Overall, 2,987 interviews were conducted. According to the standard AAPOR conventions for reporting response rates, this represents a 55.9 percent response rate (response rate 3). Descriptive statistics for the sample are provided in table A.2, and a comparison to 2015 UK population characteristics is provided in table A.3. Turnout was validated against the marked electoral register using the name and address information of face-to-face respondents who had given their permission for their voting behavior to be validated. The marked electoral register is the copy of the electoral register used by officials at polling stations on Election Day. Officials at polling stations put a mark on the register to indicate when a listed elector has voted. The marked registers are kept by UK local authorities for 12 months after Election Day. The BES team collaborated with the UK Electoral Commission, which asked local authorities to send copies of marked registers for inspection.21 Respondents were coded into five categories based on inspection of the register (Mellon and Prosser 2015, appendix B): Voted: The respondent appeared on the electoral register and was marked as having voted. Not voted—registered: The respondent appeared on the electoral register but was not marked as having voted. Not voted—unregistered: The respondent did not appear on the electoral register, but there was sufficient information to infer that they were not registered to vote, for example, other people were registered to vote at the address, or if no one was registered at the address people were registered at surrounding addresses. Insufficient information: We did not have sufficient information in the register to assess whether the respondent was registered and voted, because either we were missing the necessary pages from the register or we had not been sent the register. Ineligible: The respondent was on the electoral register but was marked ineligible to vote in the general election. Mellon and Prosser (2015) report that validated turnout for a subset of respondents was coded by multiple coders, and that reliability was high (coders gave the same outcome in 94.8 percent of cases). Footnotes 1. The average difference between survey and official turnout rate across 150 Comparative Study of Electoral Systems (CSES) postelection surveys is around 12 percentage points (Comparative Study of Electoral Systems 2017). 2. Other changes to the preamble of a turnout question aimed at increasing truthful reporting, such as asking for polling station location, were equally unsuccessful (Presser 1990). 3. Rosenfeld, Imai, and Shapiro (2016) do find that a randomized response design appears to reduce misreporting of sensitive vote choices, but also find evidence of potential noncompliance in respondent implementation of the randomization device (796). 4. For weighted CM estimates, we replace the term r/n with ∑yiwi, where yi is a binary indicator of whether respondent i reports matching answers, and wi denotes the survey weight for observation i. Weights are standardized so that ∑wi=1. We also replace n in the denominator of the standard error equation with effective sample size based on Kish’s approximate formula (Kish 1965). 5. We use standard YouGov weights, generated by raking the sample to the population marginal distributions of age-group × gender, social grade, newspaper readership, region, and party identification. 6. Despite the slight overlap in confidence intervals for weighted estimates, the differences between the weighted ICT and face-saving estimates are statistically significant (weighted, z= 3.39, P-value < 0.01; un-weighted, z= 4.33, P-value < 0.01). Schenker and Gentleman (2001) show that overlapping confidence intervals do not necessarily imply non-significant differences. The differences between ICT and direct estimates are also significant (weighted, z= 2.34, P-value = 0.019; un-weighted, z= 3.43, P-value < 0.01). 7. Supplementary Materials Section C shows that question effects are consistent when each of the four survey waves is treated as a distinct replication of our experiment. 8. The difference in the rate of “don’t know” responses between CM and other treatments is often statistically significant ( z= –2.51, P-value = 0.012 for CM vs. direct question; z= –3.06, P-value < 0.01 for CM vs. face-saving question; z= –0.13, P-value = 0.9 for CM vs. ICT control; z= –2.32, P-value = 0.02 for CM vs. ICT sensitive). However, the maximum magnitude of any difference in “don’t know” rates is two percentage points. 9. The complexity of CM designs can lead to noncompliance and misclassification, and thus less accurate measures of sensitive behaviors relative to a direct question (Höglinger and Diekmann 2017). 10. We must estimate distinct regression models for each question type because the ICT turnout measure does not yield individual-level turnout measures and therefore cannot be modeled using standard regression methods. 11. Data from the online survey vote validation study reported in Rivers and Wells (2015) is not currently publicly available. 12. Note that differences between turnout models estimated from the two data sources may be due not only to residual misreporting in the online self-reports, but also to differences in the sample characteristics of a face-to-face versus an online survey. Indeed, Karp and Lühiste (2016) argue that turnout models estimated from online and face-to-face samples yield different inferences regarding the relationship between demographics and political participation. However, their evidence is based on direct and nonvalidated measures of turnout. It is possible that once misreporting is addressed in both types of survey mode, inferences become more similar. 13. Of this subsample, 1,286 (76.1 percent) voted. The 17 respondents who were measured as “ineligible” to vote were coded as having not voted. 14. We estimate the ICT regression model using the “list” package (Blair and Imai 2010) in R. 15. For the online data, this was measured by YouGov right after the 2015 general election. 16. First, we simulate 10,000 Monte Carlo draws of the model parameters from a multivariate normal distribution with mean vector and variance-covariance matrix equal to the estimated coefficients and variance-covariance matrix of the regression model. Second, for each draw, we calculate predicted turnout probabilities for all respondents in the BES face-to-face sample—setting all respondents to be in the political or demographic group of interest and leaving other predictor variables at their actual value—and store the mean turnout probability in the sample. The result is 10,000 simulations of the predicted turnout rate if all respondents in the sample were in a particular category on a particular political or demographic variable, averaging over the sample distribution of the other explanatory variables. The point estimate for the predicted turnout rate is the mean of these 10,000 simulations, and the 95 percent confidence interval is given by the 2.5th and 97.5th percentiles. Our results are substantively unchanged if predicted turnout rates were calculated based on the experimental survey sample. 17. Supplementary Materials Section E graphs the corresponding differences in predicted turnout rates, and Section D reports raw regression coefficients for each model. 18. The differences between the group-specific predicted turnout rates from the ICT and direct models imply that younger voters and less qualified voters in particular tend to misreport voting. This is consistent with the differences between the BES benchmark model and the direct model in figure 2 and with earlier UK vote validation studies. Swaddle and Heath (1989), for example, find that “the groups with the lowest turnout are the ones who are most likely to exaggerate their turnout.” This is different from misreporting patterns found in US studies (Bernstein, Chadha, and Montjoy 2001). 19. The confidence interval for this age group is also wide for the direct and face-saving models, but the uncertainty induced by small sample size is amplified by the inefficiency of the ICT measures. 20. Significance tests are based on the Monte Carlo simulations described above. 21. Despite persistent reminders from the BES team and their vote validation partner organization, the Electoral Commission, several local authorities did not supply their marked electoral registers. As a result, overall the validated vote variable is missing for around 15 percent of the face-to-face respondents who agreed to be matched (Mellon and Prosser 2015). References Abelson , Robert P. , Elizabeth F. Loftus , and Anthony G. Greenwald . 1992 . “ Attempts to Improve the Accuracy of Self-Reports of Voting .” In Questions About Questions: Inquiries into the Cognitive Bases of Surveys , edited by Judith M. Tanur , pp. 138 – 53 . New York : Russell Sage Foundation . Aronow , Peter , Alexander Coppock , Forrest W. Crawford , and Donald P. Green . 2015 . “ Combining List Experiments and Direct Question Estimates of Sensitive Behavior Prevalence .” Journal of Survey Statistics and Methodology 3 : 43 – 66 . Google Scholar CrossRef Search ADS PubMed Belli , Robert F. , Sean E. Moore , and John VanHoewyk . 2006 . “ An Experimental Comparison of Question Forms Used to Reduce Vote Overreporting .” Electoral Studies 25 : 751 – 59 . Google Scholar CrossRef Search ADS Belli , Robert F. , Michael W. Traugott , Margaret Young , and Katherine A. McGonagle . 1999 . “ Reducing Vote Over-Reporting in Surveys: Social Desirability, Memory Failure, and Source Monitoring .” Public Opinion Quarterly 63 : 90 – 108 . Google Scholar CrossRef Search ADS Berent , Matthew K. , Jon A. Krosnick , and Arthur Lupia . 2016 . “ Measuring Voter Registration and Turnout in Surveys: Do Official Government Records Yield More Accurate Assessments ?” Public Opinion Quarterly 80 : 597 – 621 . Google Scholar CrossRef Search ADS Bernstein , Robert , Anita Chadha , and Robert Montjoy . 2001 . “ Overreporting Voting: Why It Happens and Why It Matters .” Public Opinion Quarterly 65 : 22 – 44 . Google Scholar CrossRef Search ADS PubMed Blair , Graeme , and Kosuke Imai . 2010 . “ list: Statistical Methods for the Item Count Technique and List Experiment .” Comprehensive R Archive Network (CRAN) . Available at http://CRAN.R-project.org/package=list. ———. 2012 . “ Statistical Analysis of List Experiments .” Political Analysis 20 : 47 – 77 . CrossRef Search ADS Blair , Graeme , Kosuke Imai , and Yang-Yang Zhou . 2015 . “ Design and Analysis of the Randomized Response Technique .” Journal of the American Statistical Association 110 : 1304 – 19 . Google Scholar CrossRef Search ADS Brehm , John . 1993 . The Phantom Respondents: Opinion Surveys and Political Representation . Ann Arbor : Michigan University Press . Bryan , Christopher J. , Gregory M. Walton , Todd Rogers , and Carol S. Dweck . 2011 . “ Motivating Voter Turnout by Invoking the Self .” Proceedings of the National Academy of Sciences 108 : 12653 – 56 . Google Scholar CrossRef Search ADS Cassel , Carol A . 2003 . “ Overreporting and Electoral Participation Research .” American Politics Research 31 : 81 – 92 . Google Scholar CrossRef Search ADS Clifford , Scott , and Jennifer Jerit . 2015 . “ Do Attempts to Improve Respondent Attention Increase Social Desirability Bias ?” Public Opinion Quarterly 79 : 790 – 802 . Google Scholar CrossRef Search ADS The Comparative Study of Electoral Systems (CSES). 2017. “CSES Module 4 Fourth Advance Release” [dataset]. April 11, 2017 version. doi:10.7804/cses.module4.2017-04-11 Comşa , Mircea , and Camil Postelnicu . 2013 . “ Measuring Social Desirability Effects on Self-Reported Turnout Using the Item Count Technique .” International Journal of Public Opinion Research 25 : 153 – 72 . Google Scholar CrossRef Search ADS Coutts , Elisabeth , and Ben Jann . 2011 . “ Sensitive Questions in Online Surveys: Experimental Results for the Randomized Response Technique (RRT) and the Unmatched Count Technique (UCT) .” Sociological Methods & Research 40 : 169 – 93 . Google Scholar CrossRef Search ADS Droitcour , Judith , Rachel A. Caspar , Michael L. Hubbard , Teresa L. Parsley , Wendy Visscher , and Trena M. Ezzati . 1991 . “ The Item Count Technique as a Method of Indirect Questioning: A Review of its Development and a Case Study Application .” In: Paul B. Biemer, Robert M. Groves, Lars E. Lyberg, Nancy A. Mathiowetz, and Seymour Sudman. 1991. “ Measurement Errors in Surveys .” New York : Wiley . Chapter 11. Available at https://onlinelibrary.wiley.com/doi/10.1002/9781118150382.ch11 Google Scholar CrossRef Search ADS Fieldhouse , Ed , Jane Green , Geoffrey Evans , Hermann Schmitt , Cees van der Eijk , Jonathan Mellon , and Chris Prosser . 2016 . British Election Study, 2015: Face-to-Face Postelection Survey. [Data Collection] . UK Data Service . Fowler , Floyd J . 1995 . Improving Survey Questions: Design and Evaluation . Thousand Oaks, CA : Sage Publications . Glynn , Adam N . 2013 . “ What Can We Learn with Statistical Truth Serum? Design and Analysis of the List Experiment .” Public Opinion Quarterly 77 : 159 – 77 . Google Scholar CrossRef Search ADS Hanmer , Michael J. , Antoine J. Banks , and Ismail K. White . 2014 . “ Experiments to Reduce the Over-Reporting of Voting: A Pipeline to the Truth .” Political Analysis 22 : 130 – 41 . Google Scholar CrossRef Search ADS Hochstim , Joseph R . 1967 . “ A Critical Comparison of Three Strategies of Collecting Data from Households .” Journal of the American Statistical Association 62 : 976 – 89 . Google Scholar CrossRef Search ADS Högliner , Marc , and Andreas Diekmann . 2017 . “ Uncovering a Blind Spot in Sensitive Question Research: False Positives Undermine the Crosswise-Model RRT .” Political Analysis 25 : 131 – 37 . Google Scholar CrossRef Search ADS Holbrook , Allyson L. , Melanie C. Green , and Jon A. Krosnick . 2003 . “ Telephone vs. Face-to-Face Interviewing of National Probability Samples with Long Questionnaires: Comparisons of Respondent Satisficing and Social Desirability Bias .” Public Opinion Quarterly 67 : 79 – 125 . Google Scholar CrossRef Search ADS Holbrook , Allyson L. , and Jon A. Krosnick . 2010a . “ Measuring Voter Turnout by Using the Randomized Response Technique: Evidence Calling into Question the Method’s Validity .” Public Opinion Quarterly 74 : 328 – 43 . Google Scholar CrossRef Search ADS ———. 2010b . “ Social Desirability Bias in Voter Turnout Reports: Tests Using the Item Count Techniques .” Public Opinion Quarterly 74 : 37 – 67 . CrossRef Search ADS Holtgraves , Thomas , James Eck , and Benjamin Lasky . 1997 . “ Face Management, Question Wording, and Social Desirability .” Journal of Applied Social Psychology 27 : 1650 – 71 . Google Scholar CrossRef Search ADS Imai , Kosuke . 2011 . “ Multivariate Regression Analysis for the Outcome Count Technique .” Journal of the American Statistical Association 106 : 407 – 16 . Google Scholar CrossRef Search ADS Jackman , Simon . 1999 . “ Correcting Surveys for Non-Response and Measurement Error Using Auxiliary Information .” Electoral Studies 18 : 7 – 27 . Google Scholar CrossRef Search ADS Jann , Ben , Julia Jerke , and Ivar Krumpal . 2012 . “ Asking Sensitive Questions Using the Crosswise Model: An Experimental Survey Measuring Plagiarism .” Public Opinion Quarterly 76 : 32 – 49 . Google Scholar CrossRef Search ADS Jones , Edward E. , and Harald Sigall . 1971 . “ The Bogus Pipeline: New Paradigm for Measuring Affect and Attitude .” Psychological Bulletin 76 : 349 – 64 . Google Scholar CrossRef Search ADS Jones , Emily . 2008 . “ Vote Overreporting: The Statistical and Policy Implications .” Policy Perspectives 15 : 83 – 97 . Google Scholar CrossRef Search ADS Karp , Jeffrey A. , and David Brockington . 2005 . “ Social Desirability and Response Validity: A Comparative Analysis of Overreporting Voter Turnout in Five Countries .” Journal of Politics 67 : 825 – 40 . Google Scholar CrossRef Search ADS Karp , Jeffrey A. , and Maarja Lühiste . 2016 . “ Explaining Political Engagement with Online Panels: Comparing the British and American Election Studies .” Public Opinion Quarterly 80 : 666 – 93 . Google Scholar CrossRef Search ADS Kish , Leslie . 1965 . Survey Sampling . New York : John Wiley and Sons . Kuklinski , James H. , Michael D. Cobb , and Martin Gilens . 1997 . “ Racial Attitudes and the ‘New South.’ ” Journal of Politics 59 : 323 – 49 . Google Scholar CrossRef Search ADS Lax , Jeffrey R. , Justin H. Phillips , and Alissa F. Stollwerk . 2016 . “ Are Survey Respondents Lying about Their Support for Same-Sex Marriage ?” Public Opinion Quarterly 80 : 510 – 33 . Google Scholar CrossRef Search ADS PubMed Locander , William , Seymour Sudman , and Norman Bradburn . 1976 . “ An Investigation of Interview Method, Threat and Response Distortion .” Journal of the American Statistical Association 71 : 269 – 75 . Google Scholar CrossRef Search ADS Lumley , Thomas . 2004 . “ Analysis of Complex Survey Samples .” Journal of Statistical Software 9 ( 8 ). Available at https://www.jstatsoft.org/issue/view/v009 McDonald , Michael P . 2003 . “ On the Over-Report Bias of the National Election Study Turnout Rate .” Political Analysis 11 : 180 – 86 . Google Scholar CrossRef Search ADS Mellon , Jonathan and Christopher Prosser . 2017 . “ Missing Nonvoters and Misweighted Samples: Explaining the 2015 Great British Polling Miss .” Public Opinion Quarterly 81 ( 3 ), 661 – 87 . Google Scholar CrossRef Search ADS Miller , Judith D . 1984 . “ A New Survey Technique for Studying Deviant Behavior .” Moon , Nick , and Claire Bhaumik . 2015 . “British Election Study 2015: Technical Report.” GfK UK Social Research . Persson , Mikael , and Maria Solevid . 2014 . “ Measuring Political Participation—Testing Social Desirability Bias in a Web-Survey Experiment .” International Journal of Public Opinion Research 26 : 98 – 112 . Google Scholar CrossRef Search ADS Presser , Stanley . 1990 . “ Can Context Changes Reduce Vote Over-Reporting ?” Public Opinion Quarterly 54 : 586 – 93 . Google Scholar CrossRef Search ADS Rivers , Douglas , and Anthony Wells . 2015 . “Polling Error in the 2015 UK General Election: An Analysis of YouGov’s Pre- and Postelection Polls.” YouGov Inc . Roese , Neal J. , and David W. Jamieson . 1993 . “ Twenty Years of Bogus Pipeline Research: A Critical Review and Meta-Analysis .” Psychological Bulletin 114 : 809 – 32 . Google Scholar CrossRef Search ADS Rosenfeld , Bryn , Kosuke Imai , and Jacob N. Shapiro . 2016 . “ An Empirical Validation Study of Popular Survey Methodologies for Sensitive Questions .” American Journal of Political Science 60 : 783 – 802 . Google Scholar CrossRef Search ADS Schenker , Nathaniel , and Jane F. Gentleman . 2001 . “ On Judging the Significance of Differences by Examining the Overlap Between Confidence Intervals .” American Statistician 55 : 182 – 86 . Google Scholar CrossRef Search ADS Selb , Peter , and Simon Munzert . 2013 . “ Voter Overrepresentation, Vote Misreporting, and Turnout Bias in Postelection Surveys .” Electoral Studies 32 : 186 – 96 . Google Scholar CrossRef Search ADS Sturgis , Patrick , Nick Baker , Mario Callegaro , Stephen Fisher , Jane Green , Will Jennings , Jouni Kuha , Benjamin E. Lauderdale , and Patten Smith . 2016 . “ Report of the Inquiry into the 2015 British General Election Opinion Polls .” Market Research Society; British Polling Council . Swaddle , Kevin , and Anthony Heath . 1989 . “ Official and Reported Turnout in the British General Election of 1987 .” British Journal of Political Science 19 : 537 – 51 . Google Scholar CrossRef Search ADS Tan , Ming T. , Guo-Liang Tian , and Man-Lai Tang . 2009 . “ Sample Surveys with Sensitive Questions: A Nonrandomized Response Approach .” American Statistician 63 : 9 – 16 . Google Scholar CrossRef Search ADS Thomas , Kathrin , David Johann , Sylvia Kritzinger , Carolina Plescia , and Eva Zeglovits . 2017 . “ Estimating Sensitive Behavior: The ICT and High Incidence Electoral Behavior .” International Journal of Public Opinion Research 29 : 157 – 71 . Tourangeau , Roger , and Ting Yan . 2007 . “ Sensitive Questions in Surveys .” Psychological Bulletin 133 : 859 – 83 . Google Scholar CrossRef Search ADS PubMed Voogt , Robert J. J. , and Willem E. Saris . 2003 . “ To Participate or Not to Participate: The Link Between Survey Participation, Electoral Participation, and Political Interest .” Political Analysis 11 : 164 – 79 . Google Scholar CrossRef Search ADS Warner , Stanley L . 1965 . “ Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias .” Journal of the American Statistical Association 60 : 63 – 69 . Google Scholar CrossRef Search ADS PubMed Wolter , Felix , and Bastian Laier . 2014 . “ The Effectiveness of the Item Count Technique in Eliciting Valid Answers to Sensitive Questions: An Evaluation in the Context of Self-Reported Delinquency .” Survey Research Methods 8 : 153 – 68 . Yu , Jun-Wu , Guo-Liang Tian , and Man-Lai Tang . 2008 . “ Two New Models for Survey Sampling with Sensitive Characteristic: Design and Analysis .” Metrika 67 : 251 – 63 . Google Scholar CrossRef Search ADS Zeglovits , Eva , and Sylvia Kritzinger . 2014 . “ New Attempts to Reduce Overreporting of Voter Turnout and Their Effects .” International Journal of Public Opinion Research 26 : 224 – 34 . Google Scholar CrossRef Search ADS © The Author(s) 2018. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. All rights reserved. For permissions, please e-mail: [email protected] This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)
Can Reshuffles Improve Government Popularity?: Evidence from a “Pooling the Polls” Analysis2018 Public Opinion Quarterly
doi: 10.1093/poq/nfy015
Abstract Scholars have recently argued that prime ministers reshuffle their cabinets strategically. Although some scholars assume that cabinet reshuffles help prime ministers increase their government’s popularity, this assumption has not been tested formally because of the endogeneity problem. In Japan, polling firms sometimes provide respondents with cues about a reshuffle when asking about cabinet approval following reshuffles, while others do not. I utilized this convention in the Japanese media to test the assumption that reshuffles increase cabinet approval ratings. Applying a dynamic linear model to pooled poll data from 2001 to 2015, I achieved high internal, external, and ecological validity. The analyses show that cues about reshuffles increase cabinet approval ratings by 2.4 percentage points on average, and the credible interval of the effect does not include zero. This result reinforces the findings of previous research on the theory of cabinet management. A chain of delegation is the key principle of representative democracy, and the path from a prime minister (PM) to individual ministers is part of that chain (Strøm 2000). Whether and how a PM can delegate power to appropriately qualified ministers and control them are important questions in a parliamentary democracy. Many studies have investigated these questions through formal and empirical analyses (e.g., Dewan and Myatt 2007; Berlinski, Dewan, and Dowding 2010; Kam et al. 2010; Dewan and Hortala-Vallve 2011). Similar questions arise in a presidential democracy, and researchers of presidential countries have studied delegation and accountability between president and cabinet (e.g., Camerlo and Pérez-Liñán 2015a, 2015b).1 Recently, scholars have focused on cabinet reshuffles as an effective way for PMs to improve their strategic government management. However, the empirical literature, with a few exceptions, has not addressed the important question of how cabinet reshuffles affect government popularity. Although this is a very simple question, it is not easy to identify the effect of reshuffles without bias, because reshuffles are strategic and thus endogenous. This study employed a unique research design to overcome the endogeneity problem and attempted to elucidate whether cabinet reshuffles positively impact government popularity. Utilizing the Japanese media environment, which provides conditions similar to a survey experiment, I estimated how knowing that a PM has reshuffled the cabinet affects citizens’ approval of their government. Although the same idea has appeared in previous Japanese literature that has considered a single reshuffle (Sugawara 2009; Suzuki 2009), I generalized the analysis for multiple cabinets and reshuffles. I employed a dynamic linear model for pooled poll data to overcome potential omitted variable bias and obtain a result with high internal, external, and ecological validity. The results show that information on cabinet reshuffles increases government popularity, implying that PMs can increase their popularity by reshuffling their cabinet. Cause and Effect of Cabinet Reshuffles Recent theoretical and empirical research on delegation and accountability problems with cabinet members has focused on cabinet reshuffles. Scholars have used formal models and interpreted the aim of cabinet reshuffles in various ways. Indriðason and Kam (2008) viewed cabinet reshuffles as a device through which PMs combat moral hazard. Quiroz Flores and Smith (2011; see also Quiroz Flores [2016]) interpreted that cabinet reshuffles allow leaders to deal with internal and external threats (i.e., internal party competitions and elections, respectively). From the reverse perspective, Dewan and Myatt (2007) argued that by not reshuffling (i.e., protecting) ministers hit by scandal, PMs may encourage the policy activism of clean ministers. In addition to arguments from game theory, a number of empirical analyses have been conducted to examine when and how frequently cabinet reshuffles occur. These studies have revealed the strategic nature of cabinet reshuffles. Kam and Indriðason (2005) showed through cross-country analysis that reshuffles are more likely in situations where a PM’s agency loss to cabinet ministers is high. Huber and Martinez-Gallardo (2008) argued that PMs utilize reshuffles to deal with problems of adverse selection and moral hazard by demonstrating that ministerial turnover is less likely if PMs are careful in screening ministers (e.g., the policy value of a portfolio is high), the talent pool is large, and/or the PM’s ability to replace ministers is constrained by a coalition. Quiroz Flores (2016) confirmed the strategic cabinet change in parliamentary democracies expected from Quiroz Flores and Smith’s (2011) formal theory, which contends that PMs are likely to depose competent ministers who might be rivals in intraparty competitions. Bäck et al. (2012) explored cabinet reshuffles as a valid measure of PMs’ intra-executive power and identified that European integration increased the frequency of reshuffles in Sweden because it strengthened the PM’s power. Martínez-Gallardo’s (2014) and Camerlo and Pérez-Liñán’s (2015a) analyses revealed that the strategic replacement of ministers also occurs in presidential democracies. Some previous research has assumed that cabinet reshuffles help PMs bolster their government’s popularity. For example, Kam and Indriðason (2005, p. 341) proposed hypotheses such as “reshuffles become more likely as the PM’s personal popularity declines”; and Burden (2015) proposed a similar argument and analysis. While other previous research did not make such an assumption, it seems to assume that, at least, cabinet reshuffles do not harm government popularity. In fact, when researchers consider the cost of reshuffles in creating formal models, they usually do not consider the possibility that reshuffles or firing ministers per se leads citizens to form a negative image of the government (Dewan and Myatt 2007; Quiroz Flores and Smith 2011). Why, then, can cabinet reshuffles improve government popularity? Kam and Indriðason (2005) argued that PMs could arrest the decline in their popularity by firing scandal-hit ministers and ministers responsible for unpopular policy, while a similar view was espoused by Bäck et al. (2012). In fact, Dewan and Dowding (2005) showed that individual ministerial resignations due to scandal or policy failure recover falling government popularity. Another possible mechanism is that newness or freshness of reshuffled cabinets is, in itself, attractive for voters, as just-inaugurated cabinets or presidents usually enjoy high popularity (e.g., Norpoth 1996). However, to the best of this author’s knowledge, these arguments have not yet been tested formally. It is not self-evident that reshuffles have a positive effect on government popularity or that they do not impair a government’s image. In fact, some previous studies have referred to the possibility that reshuffles might negatively impact government popularity. Indriðason and Kam (2008, p. 633) pointed out the possibility that “the public interprets reshuffles as a signal of a policy failure, departmental efficiency declines, etc.” Hansen et al. (2013, pp. 228–29) also argued that “cabinet reshuffles and ministerial dismissal may carry considerable cost for the prime minister, such as… the possibility of signaling discontinuity and turmoil to the public.” Moreover, reshuffles prevent ministers from acquiring experience in a particular portfolio, as Huber and Martinez-Gallardo (2004) argued; thus, the public may not welcome reshuffles. Researchers have most likely hesitated to tackle the question of the effect of cabinet reshuffles on government popularity because serious endogeneity originates from the strategic nature of cabinet reshuffles. If, as Kam and Indriðason (2005) have implied, PMs tend to reshuffle their cabinet when their popularity declines, events that negatively impact popularity are likely to coincide with reshuffles. In fact, Hansen et al. (2013) demonstrated that ministerial turnover is more likely when the unemployment rate is rising. Thus, simple correlation analysis may produce a negatively biased estimate of the effect of reshuffles on government popularity. Another possibility is that PMs tend to make other efforts at the same time as reshuffles in order to recover their declining popularity. In this case, even if we observe that government popularity increases after a cabinet reshuffle, it may be attributable to the PM’s other efforts. One notable exception to the hesitation shown by researchers is Dewan and Dowding (2005), who demonstrated, using an instrumental variable approach with ministers’ age as an instrument, that ministerial resignations have a positive effect on government popularity when there is high media coverage. Although they appear to have estimated the effect of ministerial resignations precisely, they focused on personal resignations due to “resignation issues” such as scandal and policy failure; thus, it is questionable whether it is possible to generalize their results to cabinet reshuffles. Research Design To deal with the endogeneity problem discussed above, I developed a novel research design that exploits the convention of opinion polls about government popularity in the Japanese mass media. Before introducing the research design, I will briefly overview Japanese politics in the period analyzed in this study (2001–2015). During this time, there were two major parties: the Liberal Democratic Party (LDP) and the Democratic Party of Japan (DPJ). The LDP is a right-leaning party, and the DPJ is a center-left party. The LDP has been the dominant party for most of the postwar period. Although the LDP changed its coalition partners several times in the 1990s, it has maintained a coalition with Komeito from 1999 to the present day. The charismatic LDP leader Junichiro Koizumi maintained his cabinet with high popularity ratings from 2001 to 2006, but his successors—Shinzo Abe, Yasuo Fukuda, and Taro Aso—failed to retain their popularity owing to scandals and economic depression, and their cabinets lasted for no more than 12 months. Japanese voters turned away from the LDP regime and instead chose the DPJ, founded in 1998 and the second party since then, in the 2009 general election. The DPJ formed a government with the Social Democratic Party and the People’s New Party as junior partners.2 However, the DPJ demonstrated its poor ability to manage government and lack of intra-party governance; accordingly, PMs from the DPJ—Yukio Hatoyama, Naoto Kan, and Yoshihiko Noda—suffered low popularity ratings except for their honeymoon period. The LDP returned to power in the 2012 general election, and its leader Abe made a comeback as PM. Abe’s cabinet maintained relatively high popularity ratings to the end of the period analyzed in this study. Figure 1 shows weekly government popularity (cabinet approval ratings, explained below) during the period of the analysis. The method for estimating popularity is explained in the next section. Figure 1. View largeDownload slide Estimated weekly cabinet approval ratings. Solid lines represent point estimates, and shaded bands represent 95 percent credible intervals. Cross marks indicate weeks when the PMs reshuffled their cabinet. Dotted lines indicate Jiji Press’s monthly polling results. Figure 1. View largeDownload slide Estimated weekly cabinet approval ratings. Solid lines represent point estimates, and shaded bands represent 95 percent credible intervals. Cross marks indicate weeks when the PMs reshuffled their cabinet. Dotted lines indicate Jiji Press’s monthly polling results. In Japan, government popularity is measured by cabinet approval. Many polling firms report a cabinet approval rating derived from their surveys. Cabinet approval is usually interpreted as the PM’s personal popularity, and it plays a critical role in securing support for the governing party and maintaining the cabinet in Japan. For example, Krauss and Nyblade (2005) showed that an increase in cabinet approval resulted in an increase in the LDP’s share of the vote during the LDP’s regime. Maeda (2013) reported that when the DPJ was in power, cabinet approval had a positive influence on DPJ support. Burden (2015) and Matsumoto and Laver (2015) argued that when the LDP is in power, the party applies pressure to its leader (the PM) to resign or reshuffle the cabinet when cabinet approval is low relative to the party’s popularity. Masuyama (2007) also showed that high cabinet approval is required for the PM to maintain his government. Cabinet approval is influenced by various factors, such as support for the governing party (Nishizawa 2001; Iida 2005; but see also Nakamura [2006]), several economic indicators (Inoguchi 1983), people’s economic evaluation (Nishizawa 2001; McElwain 2015), media coverage (Fukumoto and Mizuyoshi 2007; Hosogai 2010), and foreign disputes (Ohmura and Ohmura 2014). Some researchers have sought to examine how cabinet reshuffles affect cabinet approval in Japan. Nishizawa (2001) performed a time-series analysis of monthly cabinet approval ratings and found that the coefficient of the dummy variable for cabinet reshuffles was not statistically significant. In contrast, Nakamura (2006) increased the sophistication of Nishizawa’s time-series model and argued that reshuffles have a positive effect on cabinet approval. Ohmura and Ohmura (2014) also analyzed monthly cabinet approval ratings and found that cabinet reshuffles significantly increased such ratings during the Cold War period, but significantly decreased them in the post–Cold War period. However, their analysis suffers from the endogeneity problem discussed in the previous section. Although details vary depending on polling firms, cabinet approval in Japan is commonly measured by questions such as “Do you support [the PM’s surname] cabinet?” However, the wording of the polling question is sometimes modified when a significant event occurs. One such event is a cabinet reshuffle. Some polling firms provide respondents with information that is henceforth called a “reshuffle cue,” such as telling respondents that the PM recently reshuffled his cabinet prior to asking for cabinet approval, while others do not. A polling firm that provides a reshuffle cue at the time of one reshuffle does not necessarily do so again at the time of another reshuffle. Sugawara (2009) criticized media reports about Yasuo Fukuda’s cabinet reshuffle in August 2008 that were based on opinion polls with and without reshuffle cues. Sugawara pointed out that polls using different wordings for their questions cannot simply be compared and the reported rise in cabinet approval following the reshuffle was an artifact of the reshuffle cue. Suzuki (2009) presented the same argument as Sugawara (2009), independently and at almost the same time. In contrast to Sugawara’s (2009) and Suzuki’s (2009) arguments, I argue that we can exploit this variation in question wording as an opportunity to examine the rise of approval ratings attributable to reshuffle cues. This is a situation similar to a survey experiment in which the assignment to the “treatment group,” whose respondents are provided with a reshuffle cue, can be seen as random (which is potentially problematic, as discussed below), because each survey selects an independent random sample of the electorate. If cabinet reshuffles convey a positive impression to citizens, reshuffle cues will increase their focus on reshuffles and cause an increase in cabinet approval ratings. On the contrary, if cabinet reshuffles bring negative issues to citizens’ minds, such as policy failure, conflict in the government, or the appointment of inexperienced ministers, reshuffle cues will decrease cabinet approval ratings. It should be noted that because Japanese polling firms almost always put the question on cabinet approval at the top of their surveys, there is no risk of carryover effects. Using these experiment-like conditions, we can estimate the effect of cabinet reshuffles on government popularity with high internal, external, and ecological validity. First, it is evident that the research design is ecologically valid because, unlike an experiment with fictitious stimuli, the cabinets and reshuffles concerned existed for respondents. Second, the results obtained through this research design have high generalizability. Some readers may observe that an internally and/or ecologically valid estimate of the effect of reshuffle cues can be obtained by a temporal survey experiment conducted immediately after a reshuffle. Other readers might suspect that it is sufficient if we compare, as did Sugawara (2009) and Suzuki (2009), polling firms’ results with and without a reshuffle cue in only one reshuffle case. However, the results from such a research design cannot be generalizable to other cases, because such onetime results may be attributable to the special circumstances of the reshuffle concerned. Instead, my research design investigates the average effect of reshuffle cues over multiple cases. Two factors, however, may jeopardize the internal validity of the design. The first is house effects. Different surveys that share survey contents and are conducted at the same point in time but by different houses (survey organizations) produce different results (Smith 1978). Differences in survey design and implementation, such as sampling procedure, question wording, answer options, and whether interviewers repeatedly ask questions, produce house effects. As discussed later in this paper, some polling firms were likely to provide reshuffle cues and others never did so, which means that house effects should be a confounding factor. To deal with this problem, I pooled a large number of opinion polls regardless of their timing, and estimated house effects as explained in the next section. Heuristically, by including the fixed effects of polling firms to eliminate house effects, we can estimate the effect of reshuffle cues accurately. The second factor is the average approval rating of PMs, which differs widely from one PM to another. If reshuffle cues were provided more frequently when particular PMs were in power (which is factually correct according to my data), simply comparing polling results with and without a reshuffle cue would be insufficient. However, this problem can be addressed by pooling opinion poll data as well. I modeled a latent time series of cabinet approval ratings for each PM using pooled poll data and detected the effect of reshuffle cues as a deviation from true approval. This procedure allowed me to “control” the factor of each PM’s average approval rating. An additional important assumption for an unbiased estimation of reshuffle cues is whether the decision of a particular polling firm to provide a reshuffle cue or not is taken regardless of the expected poll result. This problem is discussed after the main results are provided. One important caveat is that this study estimates the effect of reshuffle cues, not the effect of reshuffles per se. I investigated whether citizens tend to approve or disapprove of a cabinet when they hear it was recently reshuffled; this does not necessarily indicate that reshuffles actually increase or reduce government popularity. In the extreme case, if no one knows there has been a cabinet reshuffle, approval ratings should not change. However, I believe that this study contributes substantively to research on representative democracy because, if the analysis shows that reshuffle cues have positive effects, it will reject conclusively the possibility that, on average, reshuffles impair a government’s image. This is a significant step toward understanding governmental management, given the difficulty of estimating the effect of reshuffles per se without endogeneity, using common observational data. Data and Methods I used data from opinion polls conducted by 11 polling firms: Jiji Press, Kyodo News, Yomiuri Shimbun, Asahi Shimbun, Mainichi Shimbun, Sankei Shimbun and Fuji News Network (FNN), Nikkei Research, Japanese Broadcasting Corporation (NHK), Japan News Network (JNN), All-Nippon News Network (ANN), and Nippon News Network (NNN).3 I restricted the time period analyzed to between April 26, 2001 (the beginning of the first Koizumi cabinet), and November 29, 2015, for several reasons. First, few opinion polls introduced a reshuffle cue in the earlier period. Second, too long a period would undermine the assumption that house effects are constant during the period. The third, and perhaps most important, reason is that some reshuffles in the earlier period were concurrent with a change in coalition partners. Cabinet reshuffles after Koizumi’s election did not coincide with coalition changes; therefore, we can focus purely on the effect of cabinet reshuffles. There were 13 cabinet reshuffles during the study period. I collected information from all opinion polls in the study period that contained a question about cabinet approval, irrespective of whether or not the PM had reshuffled his cabinet just before the poll was conducted. The information includes dates, number of respondents, survey mode, and cabinet approval rating. I examined whether the question in polls conducted immediately after cabinet reshuffles contained a reshuffle cue, and found two types of wording that provide a reshuffle cue. One type (type I) has a lead sentence that tells respondents that the PM has just reshuffled his cabinet. For example, Yomiuri Shimbun usually asks respondents about cabinet approval as follows: “Do you support the Koizumi cabinet or not?” In contrast, a survey conducted by Yomiuri between October 31 and November 1, 2005, just after a cabinet reshuffle, asked a question with the lead sentence “Prime Minister Koizumi reshuffled his cabinet. Do you support Koizumi’s reshuffled cabinet or not?” The other type (type II) does not include such a lead sentence but contains the word “reshuffle” (“kaizo” in Japanese) or hints at this word to respondents. For example, a survey conducted by Yomiuri between September 27 and 28, 2004, asked, “Do you support the reshuffled Koizumi cabinet or not?” Both types contain a reshuffle cue, and an additional analysis that distinguishes the two types was conducted. I gathered results from 1,958 opinion polls. Of the 125 polls conducted either in the week in which the PM reshuffled his cabinet or the following week, 31 provided a reshuffle cue. The statistics, disaggregated by polling firm, are shown in table 1. They show that polling firms that provide a reshuffle cue in some polls did not always do so. Data sources are listed in Online Appendix A. Table 1. Polling result statistics by polling firms Polling firm Start date of the analyzed polls Number of polls Total Immediately following a reshuffle Providing a reshuffle cue Type I Type II 1 Jiji Press May 13, 2001 175 7 0 0 2 Kyodo News Apr 28, 2001 205 13 13 0 3 Yomiuri Shimbun Apr 28, 2001 235 17 3 3 4 Asahi Shimbun Apr 28, 2001 237 13 0 1 5 Mainichi Shimbun Apr 28, 2001 165 11 0 0 6 Sankei-FNN Apr 29, 2001 131 12 0 2 7 Nikkei Research Apr 29, 2001 152 11 1 2 8 NHK May 6, 2001 197 10 0 0 9 JNN Apr 29, 2001 187 11 1 0 10 ANN Sep 28, 2006 119 10 0 1 11 NNN Dec 15, 2002 155 10 2 2 Total 1,958 125 20 11 Polling firm Start date of the analyzed polls Number of polls Total Immediately following a reshuffle Providing a reshuffle cue Type I Type II 1 Jiji Press May 13, 2001 175 7 0 0 2 Kyodo News Apr 28, 2001 205 13 13 0 3 Yomiuri Shimbun Apr 28, 2001 235 17 3 3 4 Asahi Shimbun Apr 28, 2001 237 13 0 1 5 Mainichi Shimbun Apr 28, 2001 165 11 0 0 6 Sankei-FNN Apr 29, 2001 131 12 0 2 7 Nikkei Research Apr 29, 2001 152 11 1 2 8 NHK May 6, 2001 197 10 0 0 9 JNN Apr 29, 2001 187 11 1 0 10 ANN Sep 28, 2006 119 10 0 1 11 NNN Dec 15, 2002 155 10 2 2 Total 1,958 125 20 11 Note.—The numbers that precede the polling firms’ names correspond to the indicator number in the statistical model. For Mainichi Shimbun and Senkei-FNN, the last poll from which results were collected was October 2015, and November 2015 for other firms. View Large Table 1. Polling result statistics by polling firms Polling firm Start date of the analyzed polls Number of polls Total Immediately following a reshuffle Providing a reshuffle cue Type I Type II 1 Jiji Press May 13, 2001 175 7 0 0 2 Kyodo News Apr 28, 2001 205 13 13 0 3 Yomiuri Shimbun Apr 28, 2001 235 17 3 3 4 Asahi Shimbun Apr 28, 2001 237 13 0 1 5 Mainichi Shimbun Apr 28, 2001 165 11 0 0 6 Sankei-FNN Apr 29, 2001 131 12 0 2 7 Nikkei Research Apr 29, 2001 152 11 1 2 8 NHK May 6, 2001 197 10 0 0 9 JNN Apr 29, 2001 187 11 1 0 10 ANN Sep 28, 2006 119 10 0 1 11 NNN Dec 15, 2002 155 10 2 2 Total 1,958 125 20 11 Polling firm Start date of the analyzed polls Number of polls Total Immediately following a reshuffle Providing a reshuffle cue Type I Type II 1 Jiji Press May 13, 2001 175 7 0 0 2 Kyodo News Apr 28, 2001 205 13 13 0 3 Yomiuri Shimbun Apr 28, 2001 235 17 3 3 4 Asahi Shimbun Apr 28, 2001 237 13 0 1 5 Mainichi Shimbun Apr 28, 2001 165 11 0 0 6 Sankei-FNN Apr 29, 2001 131 12 0 2 7 Nikkei Research Apr 29, 2001 152 11 1 2 8 NHK May 6, 2001 197 10 0 0 9 JNN Apr 29, 2001 187 11 1 0 10 ANN Sep 28, 2006 119 10 0 1 11 NNN Dec 15, 2002 155 10 2 2 Total 1,958 125 20 11 Note.—The numbers that precede the polling firms’ names correspond to the indicator number in the statistical model. For Mainichi Shimbun and Senkei-FNN, the last poll from which results were collected was October 2015, and November 2015 for other firms. View Large I estimated weekly cabinet approval ratings using the dynamic linear model proposed by Jackman (2005) and Beck, Jackman, and Rosenthal (2006), named “pooling the polls” by Jackman (2009). This model contains two parts: an observational model and a transition model. The former represents how observed variables are generated from the latent variable, with some errors; the latter represents the temporal changes in an unobservable latent variable. In this study, the observed variables comprise the results of opinion polls, while the latent variable comprises the “true” cabinet approval rating. I also altered the original model to estimate the effect of a reshuffle cue. The observational model shows the result of opinion poll i qi about PM p (p∈{1,…,P}) cabinet approval rating conducted by polling firm ji∈{1,…,J} by the survey mode mi∈{1,…,M} in week ti∈{1,…,T}.4 This value is equal to an approval rating that the polling firm intended to measure μi plus a sampling error. Therefore, qi follows the following distribution: qi~N(μi,σi2), (1) where σi represents the sampling error. In this study, J=11 and table 1 shows which number corresponds to which polling firm. The surveys used here adopted face-to-face interviews or telephone interviews; no polling firms employed internet polling. Thus, M=2 I coded face-to-face interviews as 1 and telephone interviews as 2. In principle, letting ni denote the number of respondents in poll i means the sampling error is equal to qi(1−qi)/ni However, some survey designs that do not use a random sample violate this formula. The problem here was that telephone surveys based on random digit dialing (RDD) or quota sampling commonly use a weighting method that causes sampling variance to deviate from the formula above. Unfortunately, we could not identify accurately the extent to which a survey design amplified the sampling error for an individual poll because details of sampling procedures and weighting methods are not available. Therefore, I added a parameter of the design effect τ introduced by Fisher et al. (2011) and Pickup et al. (2011): σi=τjidiqi(1−qi)ni, (2) where τ represents the ratio of the standard error of survey results based on RDD or quota sampling to the results of a random sample. di is equal to 1 if survey i used RDD or quota sampling, and is otherwise zero; thus, a design effect is added only when a survey is RDD based or quota sampling based. I estimated a design effect for each polling firm. We cannot take μi as the true approval rating, because it may be contaminated by other elements such as survey mode, question wording, and answer options. I considered that μi is composed of the true approval rating at the period ti αp,ti mode effect γmi house effect δji and the effect of a reshuffle cue λ Therefore, μi=αp,ti+γmi+δji+λxi, (3) where xi is a dummy variable that equals 1 if the opinion poll i provided respondents with a reshuffle cue. I assumed that mode effects and house effects are constant throughout the analysis period regardless of PM. In addition, I constrained ∑m=1Mγm=∑j=1Jδj=0 for identification. Although Jackman (2005, 2009) and Beck, Jackman, and Rosenthal (2006) did not consider mode effects (see Bowling [2005] for a review), I consider these to be necessary here to ensure that the assumption that house effects are fixed is correct, as one polling firm switched its survey mode from face-to-face to telephone. The main purpose of the analysis was to estimate λ which represents the difference in the results of two cabinet approval surveys; the surveys were conducted by the same polling firm using the same method in the same week, but one provided a reshuffle cue and the other did not. In the transition model, I assumed that the true approval rating αp,t follows the following process: αp,t~N(αp,t−1+κzp,t,ωp2), (4) where t=2,…,Tp Tp is the last week of PM p tenure, zp,t is a dummy variable that equals 1 if a reshuffle occurred in PM p week t and κ is a coefficient parameter of zp,t This is a random walk–based model. I set different stochastic processes according to the PM; that is, αp,1 does not depend on αp−1,Tp−15 As noted earlier, we cannot interpret κ as the effect of a reshuffle on a cabinet approval rating because other events that influence cabinet approval might have occurred simultaneously.6 The parameters to be estimated were αp,t κ ωp γm δj λ and τm The posterior distribution was estimated by the Markov chain Monte Carlo (MCMC) method.7 The prior distributions were set as follows: κ,γm,δj,λ~N(0,1002) and ωp,τm~U(0,100) I set the prior distribution of each PM’s approval rating in the first week as αp,1~U(qpinit−0.2,qpinit+0.2) where qpinit is the result of Jiji Press’s opinion poll conducted for the first time after PM p was inaugurated. I set three chains of different initial values. For each chain, I obtained 2,000 samples at every twentieth interval after 500 iterations as adaptation and a further 500 iterations as burn-in. All chains were judged to converge, as the Gelman-Rubin statistics of all parameters were below 1.1. Results I first confirm that the true latent cabinet approval ratings were appropriately estimated. Figure 1 shows the estimated weekly approval ratings. Solid lines represent point estimates, and shaded bands represent 95 percent credible intervals (CIs).8 Cross marks indicate weeks when PMs reshuffled their cabinets. For comparison, dotted lines indicate Jiji Press’s polling results. Jiji’s opinion polls are often used in time-series studies of Japanese politics (e.g., Burden 2015; Matsumoto and Laver 2015) because Jiji has conducted monthly opinion polls in the same manner for many years. The estimated results are nearly parallel to Jiji’s results, although Jiji’s values are lower than the pooled results, as reflected in the house effect shown below. The estimated results capture detailed changes in approval ratings that cannot be derived from Jiji’s monthly polls. Figure 1 shows that in most cases, cabinet approval ratings increased when a PM reshuffled the cabinet. Indeed, the point estimate of κ which represents the average change in approval ratings concurrent with reshuffles, was 0.027, while the 95 percent CI is [0.007, 0.049]. This implies that cabinet approval ratings increase 2.7 percentage points on average in the week of a reshuffle. However, we cannot interpret this as the causal effect of reshuffles, due to the possibility of endogeneity. It may be no more than a correlation. Figure 2 shows the estimated effects of a reshuffle cue, house effects, and mode effects. Dots represent point estimates, and segments represent 95 percent CIs. The estimation result of λ shows that when a reshuffle cue is provided to respondents, cabinet approval ratings increase 2.4 percentage points on average. The 95 percent CI is [0.012, 0.036] and does not include zero, implying that citizens respond favorably to cabinet reshuffles.9 Figure 2. View largeDownload slide Estimated effects of a reshuffle cue, house effects, and mode effects. Dots represent point estimates, and segments represent 95 percent CIs. Figure 2. View largeDownload slide Estimated effects of a reshuffle cue, house effects, and mode effects. Dots represent point estimates, and segments represent 95 percent CIs. How great is this effect? According to the estimation results of the transition model, the standard deviation of weekly random shocks ωp was between 0.018 and 0.038. Supposing that random shocks follow a normal distribution, a 2.4-percentage-point increase corresponds to the 74th to 92nd percentile. Therefore, the effect of reshuffle cues is substantial. The effect of information about a cabinet reshuffle may be even greater than the effect of reshuffle cues estimated above. Some respondents hear the news of a cabinet reshuffle and change their stance to approve the cabinet before a survey. Such respondents approve of the cabinet irrespective of whether or not the poll provides a reshuffle cue. Thus, if we compare counterfactual situations—one where a cabinet is reshuffled and all citizens are informed of the reshuffle and one where a cabinet is not reshuffled and no information on a reshuffle is provided to citizens—the difference in the cabinet approval rating is likely to be greater than 2.4 percentage points on average. This section briefly reviews the remaining results. There are significant house effects. For example, Jiji Press underestimates cabinet approval by 5.2 percentage points and JNN overestimates it by 5.8 percentage points on average. In contrast, the mode effects are small and not statistically distinguishable from zero. The results of other parameters are shown in table A1 in the Appendix. SELF-SELECTION BIAS? One caveat about the above analysis is that there might be self-selection bias. Some readers may suspect that whether polling firms add a reshuffle cue to their questions depends on how much attention the reshuffle received. Further, the effect of a reshuffle cue is overestimated if polling firms provide a reshuffle cue when they expect the public to welcome the reshuffle. To dispel such concerns, I examined when reshuffle cues tended to be provided. Table 2 shows whether each polling firm provided a reshuffle cue in opinion polls conducted following a cabinet reshuffle. Polling firms did not provide reshuffle cues after the reshuffle of the Fukuda cabinet, probably because Sugawara’s (2009) and Suzuki’s (2009) criticisms impacted the industry. Other than this, there is no notable pattern of cue provision. It is the case that some cabinet reshuffles cause more cue provisions, but figure 1 shows that such reshuffles did not necessarily coincide with a significant fluctuation in cabinet approval ratings. Therefore, it is reasonable to assume that polling firms did not provide reshuffle cues because they expected the reshuffle to receive much attention or because it was welcomed by the public.10 Table 2. Pattern of the provision of reshuffle cues Party Prime minister Koizumi Koizumi Koizumi Koizumi Abe (1) Fukuda Kan Kan Noda Noda Noda Abe (2) Abe (2) LDP LDP LDP LDP LDP LDP DPJ DPJ DPJ DPJ DPJ LDP LDP Month 9 9 9 10 8 8 9 1 1 6 10 9 10 Polling firm Year 02 03 04 05 07 08 10 11 12 12 12 14 15 Total 1 Jiji – – – 0 – – – 0 0 0 0 0 0 0 2 Kyodo 1 1 1 1 1 1 1 1 1 1 1 1 1 13 3 Yomiuri 1 1* 1 1* 1* 1* 0 0 0 0 0 0 0 6 4 Asahi 1 0 0 0 0 0 0 0 0 0 0 0 0 1 5 Mainichi 0 0 0 0 0 0 0 0 0 – – 0 0 0 6 Sankei-FNN 1 0 0 0 0 0 0 0 0 0 0 1 – 2 7 Nikkei 0 0 0 0 1 1 1 0 0 – – 0 0 3 8 NHK – – – 0 0 0 0 0 0 0 0 0 0 0 9 JNN 1 0 0 0 0 0 – 0 0 0 0 0 – 1 10 ANN – – – – 0 1 0 0 0 0 0 0 0 1 11 NNN – – 1 0 0 1 0 1 1 0 – 0 0 4 Total 5 2 3 2 3 5 2 2 2 1 1 2 1 31 Party Prime minister Koizumi Koizumi Koizumi Koizumi Abe (1) Fukuda Kan Kan Noda Noda Noda Abe (2) Abe (2) LDP LDP LDP LDP LDP LDP DPJ DPJ DPJ DPJ DPJ LDP LDP Month 9 9 9 10 8 8 9 1 1 6 10 9 10 Polling firm Year 02 03 04 05 07 08 10 11 12 12 12 14 15 Total 1 Jiji – – – 0 – – – 0 0 0 0 0 0 0 2 Kyodo 1 1 1 1 1 1 1 1 1 1 1 1 1 13 3 Yomiuri 1 1* 1 1* 1* 1* 0 0 0 0 0 0 0 6 4 Asahi 1 0 0 0 0 0 0 0 0 0 0 0 0 1 5 Mainichi 0 0 0 0 0 0 0 0 0 – – 0 0 0 6 Sankei-FNN 1 0 0 0 0 0 0 0 0 0 0 1 – 2 7 Nikkei 0 0 0 0 1 1 1 0 0 – – 0 0 3 8 NHK – – – 0 0 0 0 0 0 0 0 0 0 0 9 JNN 1 0 0 0 0 0 – 0 0 0 0 0 – 1 10 ANN – – – – 0 1 0 0 0 0 0 0 0 1 11 NNN – – 1 0 0 1 0 1 1 0 – 0 0 4 Total 5 2 3 2 3 5 2 2 2 1 1 2 1 31 Note.—Zero means that the polling firm did not provide a reshuffle cue in the poll conducted either the week that the PM reshuffled his cabinet or the following week; 1 means that the polling firm provided a type I reshuffle cue; and an underlined 1 means that the polling firm provided a type II reshuffle cue. A dash means that the polling firm did not conduct an opinion poll immediately following that reshuffle. The last row shows the number of polls that provided either a type I or type II reshuffle cue about that reshuffle, and the last column shows the number of times that a polling firm provided either a type I or type II reshuffle cue during the analysis period. An asterisk means that in either the week the cabinet was reshuffled or the following week, the polling firm conducted more than two opinion polls, one that provided a reshuffle cue and others that did not. View Large Table 2. Pattern of the provision of reshuffle cues Party Prime minister Koizumi Koizumi Koizumi Koizumi Abe (1) Fukuda Kan Kan Noda Noda Noda Abe (2) Abe (2) LDP LDP LDP LDP LDP LDP DPJ DPJ DPJ DPJ DPJ LDP LDP Month 9 9 9 10 8 8 9 1 1 6 10 9 10 Polling firm Year 02 03 04 05 07 08 10 11 12 12 12 14 15 Total 1 Jiji – – – 0 – – – 0 0 0 0 0 0 0 2 Kyodo 1 1 1 1 1 1 1 1 1 1 1 1 1 13 3 Yomiuri 1 1* 1 1* 1* 1* 0 0 0 0 0 0 0 6 4 Asahi 1 0 0 0 0 0 0 0 0 0 0 0 0 1 5 Mainichi 0 0 0 0 0 0 0 0 0 – – 0 0 0 6 Sankei-FNN 1 0 0 0 0 0 0 0 0 0 0 1 – 2 7 Nikkei 0 0 0 0 1 1 1 0 0 – – 0 0 3 8 NHK – – – 0 0 0 0 0 0 0 0 0 0 0 9 JNN 1 0 0 0 0 0 – 0 0 0 0 0 – 1 10 ANN – – – – 0 1 0 0 0 0 0 0 0 1 11 NNN – – 1 0 0 1 0 1 1 0 – 0 0 4 Total 5 2 3 2 3 5 2 2 2 1 1 2 1 31 Party Prime minister Koizumi Koizumi Koizumi Koizumi Abe (1) Fukuda Kan Kan Noda Noda Noda Abe (2) Abe (2) LDP LDP LDP LDP LDP LDP DPJ DPJ DPJ DPJ DPJ LDP LDP Month 9 9 9 10 8 8 9 1 1 6 10 9 10 Polling firm Year 02 03 04 05 07 08 10 11 12 12 12 14 15 Total 1 Jiji – – – 0 – – – 0 0 0 0 0 0 0 2 Kyodo 1 1 1 1 1 1 1 1 1 1 1 1 1 13 3 Yomiuri 1 1* 1 1* 1* 1* 0 0 0 0 0 0 0 6 4 Asahi 1 0 0 0 0 0 0 0 0 0 0 0 0 1 5 Mainichi 0 0 0 0 0 0 0 0 0 – – 0 0 0 6 Sankei-FNN 1 0 0 0 0 0 0 0 0 0 0 1 – 2 7 Nikkei 0 0 0 0 1 1 1 0 0 – – 0 0 3 8 NHK – – – 0 0 0 0 0 0 0 0 0 0 0 9 JNN 1 0 0 0 0 0 – 0 0 0 0 0 – 1 10 ANN – – – – 0 1 0 0 0 0 0 0 0 1 11 NNN – – 1 0 0 1 0 1 1 0 – 0 0 4 Total 5 2 3 2 3 5 2 2 2 1 1 2 1 31 Note.—Zero means that the polling firm did not provide a reshuffle cue in the poll conducted either the week that the PM reshuffled his cabinet or the following week; 1 means that the polling firm provided a type I reshuffle cue; and an underlined 1 means that the polling firm provided a type II reshuffle cue. A dash means that the polling firm did not conduct an opinion poll immediately following that reshuffle. The last row shows the number of polls that provided either a type I or type II reshuffle cue about that reshuffle, and the last column shows the number of times that a polling firm provided either a type I or type II reshuffle cue during the analysis period. An asterisk means that in either the week the cabinet was reshuffled or the following week, the polling firm conducted more than two opinion polls, one that provided a reshuffle cue and others that did not. View Large In addition, I reestimated the model using only the following four companies: Jiji Press, Mainichi Shimbun, NHK, and Kyodo News. The former three have never provided a reshuffle cue, while Kyodo News has provided reshuffle cues in every case. Thus, there is no concern about self-selection bias. The estimated model was the same as the original, except that the mode effect term was omitted because the survey mode was perfectly collinear with the four companies.11 The point estimate was 0.031, slightly higher than the original result. Uncertainty increased due to the reduction of the sample size, but the 95 percent CI still did not include zero ([0.012, 0.053]). Details of this analysis are shown in Online Appendix C. WHEN IS THE EFFECT STRONG? To examine heterogeneity in the effect size of reshuffle cues, I sought to explain it by several factors. I altered Equation (3) as follows: μi=αp,ti+γmi+δji+λrixi, (5) λr~N(θ'wr,η2), (6) where ri∈{1,…,13} is the index of reshuffles for poll i wr is a vector of covariates for reshuffle r, including 1 for an intercept, θ is a vector of their coefficients, and η2 is an error variance. I considered two variables to explain the size of the effect: cabinet approval ratings in the week immediately preceding a reshuffle (estimated by the dynamic linear model [i.e., αp,ti−1 and the number of affected ministers. If reshuffles are effective in correcting falls in popularity, as Kam and Indriðason (2005) and others have argued, then the effect of reshuffle cues should be large when cabinet approval is low. On the other hand, if reshuffles provide a negative signal to the public implying discontinuity and turmoil, as Hansen et al. (2013) have indicated, the number of affected ministers should negatively impact the effect of the reshuffle cues. Further, I included a dummy variable for the DPJ government, and estimated θ simultaneously with the other parameters of the dynamic linear model. Although this analysis does not exclude the possibility of omitted variables and the substantial sample size is too small (i.e., the number of reshuffle cases is only 13), it provides an insight into how citizens respond to reshuffles. Table 3 shows the results. The lagged cabinet approval has a negative impact on the effect of reshuffle cues; that is, the lower the cabinet approval rating, the greater the effect of the reshuffle cue.12 This implies that reshuffles have a corrective effect on government popularity, which is in line with Dewan and Dowding’s (2005) analysis of resignations of a single minister. A positive coefficient for the number of affected ministers indicates that on average, citizens do not have a negative response to a discontinuous cabinet; rather, they welcome the freshness of reshuffled cabinets.13 The effect size of reshuffle cues does not depend on the governing party. Details of this analysis are shown in Online Appendix D. Table 3. Estimated coefficients on the size of the effect of reshuffle cues Point 95% CI Intercept 0.028 [−0.066, 0.125] Lagged cabinet approval −0.142 [−0.270, −0.014] Number of altered ministers 0.005 [−0.002, 0.011] DPJ dummy 0.008 [−0.037, 0.051] Point 95% CI Intercept 0.028 [−0.066, 0.125] Lagged cabinet approval −0.142 [−0.270, −0.014] Number of altered ministers 0.005 [−0.002, 0.011] DPJ dummy 0.008 [−0.037, 0.051] Note.—The dependent variable is an estimated effect of a reshuffle cue for each reshuffle. Lagged cabinet approval is simultaneously estimated by the dynamic linear model. View Large Table 3. Estimated coefficients on the size of the effect of reshuffle cues Point 95% CI Intercept 0.028 [−0.066, 0.125] Lagged cabinet approval −0.142 [−0.270, −0.014] Number of altered ministers 0.005 [−0.002, 0.011] DPJ dummy 0.008 [−0.037, 0.051] Point 95% CI Intercept 0.028 [−0.066, 0.125] Lagged cabinet approval −0.142 [−0.270, −0.014] Number of altered ministers 0.005 [−0.002, 0.011] DPJ dummy 0.008 [−0.037, 0.051] Note.—The dependent variable is an estimated effect of a reshuffle cue for each reshuffle. Lagged cabinet approval is simultaneously estimated by the dynamic linear model. View Large Additional analyses and robustness checks were conducted (see Online Appendices E, F, and G). Their results indicate that both type I and type II reshuffle cues have a positive effect on cabinet approval ratings, and that the conclusion does not change even when the effect of reshuffle cues was reestimated by a simpler frequentist approach. Further, the main results are robust to altering the statistical model and coding the data. Conclusions In Japan, a number of polling firms conduct opinion polls week after week to measure cabinet approval rating. When a cabinet is reshuffled, some polling companies add information on the reshuffle (a reshuffle cue) to their questionnaire on cabinet approval, while others do not. Even if a polling firm provides a reshuffle cue at the time of a particular reshuffle, it may not necessarily do so at the time of another reshuffle. I exploited this situation as an opportunity to investigate whether citizens evaluate reshuffles positively and are more likely to approve government without endogeneity. I employed the results of polls conducted from 2001 to 2015 by 11 polling firms and a dynamic linear model to consider a PM’s individual popularity, as well as house effects, mode effects, and sampling uncertainty. The results showed that reshuffle cues increase cabinet approval ratings by 2.4 percentage points on average, and the conclusion that reshuffles can have a positive effect on cabinet approval is statistically credible. Furthermore, the supplementary analysis implies that reshuffles have a corrective effect on declining popularity and citizens favor the freshness of reshuffled cabinets. Political scientists have studied cabinet reshuffles as a tool through which PMs effectively delegate power to ministers and successfully manage the government. However, scholars have paid insufficient attention to the potential cost of reshuffles on government popularity, while some have assumed without empirical evidence that reshuffles have a positive effect on popularity. The results of this study provide evidence in support of such assumptions and reinforce previous research on the theory of cabinet management. The results also provide some justification for game theory research on reshuffles, most of which assumes that there is at least no cost of reshuffles with respect to government popularity, but such research should include the potential benefit of cabinet reshuffles in its models in some cases. This study has some limitations. First, it was limited to Japan; future research should investigate whether the positive effect of cabinet reshuffles on government popularity is observed in other countries. Despite conducting an exploratory analysis of the conditions that obtain when citizens welcome reshuffles, I did not fully investigate its mechanism—whether voters evaluate the competence of reshuffled cabinets, merely respond to the newness of reshuffled cabinets, or other reasons. More analyses that are free from omitted variables, such as experimental studies, are required in order to examine the psychological mechanism that leads to the positive effects of reshuffles. While my research design did not enable us to determine how long the effect of reshuffle lasts, appropriate time-series analyses may resolve this question. Finally, the implications of this study for survey research are as follows. This study demonstrated that adding only one word to the poll question (changing “cabinet” to “reshuffled cabinet,” or “naikaku” to “kaizo naikaku” in Japanese) has a substantive framing effect and results in changes in responses. Various textbooks about survey research have repeatedly warned survey designers about this phenomenon. I have introduced a new case that shows a framing effect when subtle information is added. However, my results also imply that even when surveys contain different questions and thus are not simply comparable, we can compare and unify the results by using a suitable statistical model. Certainly, future surveys should be designed carefully to avoid unnecessary framing effects; however, on occasion, there is a need to analyze past surveys that were not necessarily designed appropriately. Statistical modeling, such as pooling the polling techniques, can satisfy this need. Supplementary Data Supplementary data are freely available at Public Opinion Quarterly online. Replication files are available at the Harvard Dataverse (http://dx.doi.org/10.7910/DVN/FAU3VR). The author thanks Yukio Maeda for providing data from Jiji Press and Kyodo News, and Kyodo News and TV Asahi for providing information on the method used in their opinion polls. An earlier version of this paper was presented at the Workshop on the Frontiers of Statistical Analysis and Formal Theory of Political Science (Gakushuin University, Tokyo, January 5, 2016) and the Workshop on Political Communication (Rikkyo University, Tokyo, August 26, 2016). The author thanks Kentaro Fukumoto, Masataka Harada, Hiroshi Hirano, Noriyo Isozaki, Yukio Maeda, Kenneth Mori McElwain, Kuniaki Nemoto, Ikuma Ogura, Teppei Yamamoto, Soichiro Yamauchi, participants of the workshops, and three anonymous reviewers for their helpful comments. Finally, the author would like to thank Shoko Omori for her research assistance. This work was supported by the Japan Society for the Promotion of Science KAKENHI [13J08571 to H.M.]. Appendix Table A1. Estimated results of miscellaneous parameters in the main analysis Point 95% CI Std. dev. of noise in the random walk ωp Koizumi 0.029 [0.026, 0.033] Abe (1) 0.031 [0.023, 0.039] Fukuda 0.032 [0.022, 0.042] Aso 0.029 [0.022, 0.037] Hatoyama 0.030 [0.020, 0.041] Kan 0.038 [0.030, 0.047] Noda 0.021 [0.015, 0.027] Abe (2) 0.018 [0.014, 0.022] Design effect τj Kyodo News 1.685 [1.490, 1.886] Yomiuri Shimbun 2.193 [1.917, 2.484] Asahi Shimbun 2.011 [1.783, 2.247] Mainichi Shimbun 2.487 [2.202, 2.793] Sankei Shimbun-FNN 1.832 [1.557, 2.109] Nikkei Research 2.059 [1.775, 2.372] NHK 2.006 [1.741, 2.299] JNN 2.614 [2.338, 2.929] NNN 2.039 [1.718, 2.429] Point 95% CI Std. dev. of noise in the random walk ωp Koizumi 0.029 [0.026, 0.033] Abe (1) 0.031 [0.023, 0.039] Fukuda 0.032 [0.022, 0.042] Aso 0.029 [0.022, 0.037] Hatoyama 0.030 [0.020, 0.041] Kan 0.038 [0.030, 0.047] Noda 0.021 [0.015, 0.027] Abe (2) 0.018 [0.014, 0.022] Design effect τj Kyodo News 1.685 [1.490, 1.886] Yomiuri Shimbun 2.193 [1.917, 2.484] Asahi Shimbun 2.011 [1.783, 2.247] Mainichi Shimbun 2.487 [2.202, 2.793] Sankei Shimbun-FNN 1.832 [1.557, 2.109] Nikkei Research 2.059 [1.775, 2.372] NHK 2.006 [1.741, 2.299] JNN 2.614 [2.338, 2.929] NNN 2.039 [1.718, 2.429] Note.—The design effects for Jiji Press and ANN were not estimated, because they did not employ RDD or quota sampling. View Large Table A1. Estimated results of miscellaneous parameters in the main analysis Point 95% CI Std. dev. of noise in the random walk ωp Koizumi 0.029 [0.026, 0.033] Abe (1) 0.031 [0.023, 0.039] Fukuda 0.032 [0.022, 0.042] Aso 0.029 [0.022, 0.037] Hatoyama 0.030 [0.020, 0.041] Kan 0.038 [0.030, 0.047] Noda 0.021 [0.015, 0.027] Abe (2) 0.018 [0.014, 0.022] Design effect τj Kyodo News 1.685 [1.490, 1.886] Yomiuri Shimbun 2.193 [1.917, 2.484] Asahi Shimbun 2.011 [1.783, 2.247] Mainichi Shimbun 2.487 [2.202, 2.793] Sankei Shimbun-FNN 1.832 [1.557, 2.109] Nikkei Research 2.059 [1.775, 2.372] NHK 2.006 [1.741, 2.299] JNN 2.614 [2.338, 2.929] NNN 2.039 [1.718, 2.429] Point 95% CI Std. dev. of noise in the random walk ωp Koizumi 0.029 [0.026, 0.033] Abe (1) 0.031 [0.023, 0.039] Fukuda 0.032 [0.022, 0.042] Aso 0.029 [0.022, 0.037] Hatoyama 0.030 [0.020, 0.041] Kan 0.038 [0.030, 0.047] Noda 0.021 [0.015, 0.027] Abe (2) 0.018 [0.014, 0.022] Design effect τj Kyodo News 1.685 [1.490, 1.886] Yomiuri Shimbun 2.193 [1.917, 2.484] Asahi Shimbun 2.011 [1.783, 2.247] Mainichi Shimbun 2.487 [2.202, 2.793] Sankei Shimbun-FNN 1.832 [1.557, 2.109] Nikkei Research 2.059 [1.775, 2.372] NHK 2.006 [1.741, 2.299] JNN 2.614 [2.338, 2.929] NNN 2.039 [1.718, 2.429] Note.—The design effects for Jiji Press and ANN were not estimated, because they did not employ RDD or quota sampling. View Large Footnotes 1. I used PMs, not presidents, as the subject of reshuffles in my research, because previous research on cabinet reshuffles focuses primarily on parliamentary democracies, and Japan, the case I study here, is a parliamentary democracy. 2. The Social Democratic Party left the coalition in May 2010 because of policy disagreement. This coalition change did not coincide with a cabinet reshuffle. 3. Details of data sources, wording of all questions containing a reshuffle cue, and supplementary information on polling data are presented in Online Appendix A. 4. A week is defined as Monday through Sunday, and the last day of the survey period determined the week in which a survey was conducted. 5. In Japan, a cabinet is nominally identified as before or after the PM is appointed by the emperor. I ignored this nominal distinction, however, and assumed that the same stochastic process of an approval rating continues until the PM finally resigns. However, Shinzo Abe’s two discontinuous regimes (the first ended on September 26, 2007, and the second began on December 26, 2012) are distinguished from each other. 6. In fact, the term κzp,t does not necessarily have to be included in order to estimate the effect of a reshuffle cue correctly. I included this term to emphasize that my estimation strategy is capable of solving the endogeneity problem. 7. I used R version 3.3.1 (R core team 2016) to analyze the data throughout this study. I used JAGS 4.2.0 (Plummer 2003) and runjags package version 2.0.4–2 (Denwood 2016) to implement MCMC sampling and coda package version 0.18.1 (Plummer et al. 2006) to analyze MCMC samples. 8. I employed a posterior mean as a point estimate and the highest posterior density interval as a CI throughout this paper. 9. Some may raise concerns that this study measures the impact of additional information rather than the impact of reshuffle cues. To address such concerns, I examined the effect of additional information on the inauguration of a cabinet (indeed, polling firms modified the wording of questions only in the case of either inaugurations or reshuffles). The additional analysis showed that the effect of an inauguration cue was small and its 95 percent CI includes zero (0.006 [−0.008, 0.021]). Therefore, the increase in approval ratings found in the main analysis can be attributed not to additional information in general, but to reshuffle cues. The details are shown in Online Appendix B. 10. To address further the question of whether the assignment of reshuffle cues may be ignored, I rebut the possibility that providing reshuffle cues is politically motivated; that is, whether the use of reshuffle cues relates to the ideological positions of polling firms and which party (the LDP or the DPJ) is in power. See Online Appendix H for details. 11. The procedure of an MCMC estimation of this analysis (and other additional analyses shown below and in the Online Appendix) was almost the same as that of the original analysis. The prior distribution of new parameters was U(0,100) for variance parameters and N(0,1002) for others. 12. I conducted the same analyses with the main analysis for each PM and reached a similar conclusion. See Online Appendix G. 13. The 80 percent CI of the coefficient of the number of affected ministers does not include zero ([0.001, 0.009]), although the 95 percent CI shown in table 3 does. References Bäck , Hanna , Henk Erik Meier , Thomas Persson , and Jörn Fischer . 2012 . “ European Integration and Prime Ministerial Power: A Differential Impact on Cabinet Reshuffles in Germany and Sweden .” German Politics 21 : 184 – 208 . Google Scholar CrossRef Search ADS Beck , Nathaniel , Simon Jackman , and Howard Rosenthal . 2006 . “ Presidential Approval: The Case of George W. Bush .” Unpublished manuscript. Berlinski , Samuel , Torun Dewan , and Keith Dowding . 2010 . “ The Impact of Individual and Collective Performance on Ministerial Tenure .” Journal of Politics 72 : 559 – 71 . Google Scholar CrossRef Search ADS Bowling , Ann . 2005 . “ Mode of Questionnaire Administration Can Have Serious Effects on Data Quality .” Journal of Public Health 27 : 281 – 91 . Google Scholar CrossRef Search ADS PubMed Burden , Barry C . 2015 . “ Economic Accountability and Strategic Calibration: The Case of Japan’s Liberal Democratic Party .” Party Politics 21 : 346 – 56 . Google Scholar CrossRef Search ADS Camerlo , Marcelo , and Aníbal Pérez-Liñán . 2015a . “ Minister Turnover, Critical Events, and the Electoral Calendar in Presidential Democracies .” Journal of Politics 77 : 608 – 19 . Google Scholar CrossRef Search ADS Camerlo , Marcelo , and Aníbal Pérez-Liñán . 2015b . “ The Politics of Minister Retention in Presidential Systems: Technocrats, Partisans, and Government Approval .” Comparative Politics 47 : 315 – 33 . Google Scholar CrossRef Search ADS Denwood , Matthew J . 2016 . “ runjags: An R Package Providing Interface Utilities, Model Templates, Parallel Computing Methods and Additional Distributions for MCMC Models in JAGS .” Journal of Statistical Software 71 (9): 1 – 25 . Google Scholar CrossRef Search ADS Dewan , Torun , and Keith Dowding . 2005 . “ The Corrective Effect of Ministerial Resignations on Government Popularity .” American Journal of Political Science 49 : 46 – 56 . Google Scholar CrossRef Search ADS Dewan , Torun , and Rafael Hortala-Vallve . 2011 . “ The Three As of Government Formation: Appointment, Allocation, and Assignment .” American Journal of Political Science 55 : 610 – 27 . Google Scholar CrossRef Search ADS Dewan , Torun , and David P. Myatt . 2007 . “ Scandal, Protection, and Recovery in the Cabinet .” American Political Science Review 101 : 63 – 77 . Google Scholar CrossRef Search ADS Fisher , Stephen D. , Robert Ford , Will Jennings , Mark Pickup , and Christopher Wlezien . 2011 . “ From Polls to Votes to Seats: Forecasting the 2010 British General Election .” Electoral Studies 30 : 250 – 57 . Google Scholar CrossRef Search ADS Fukumoto , Kentaro , and Asami Mizuyoshi . 2007 . “Koizumi naikaku no shijiritsu to media no ryogisei” [The approval rate of Koizumi cabinet and the ambivalent media] . Gakushuin daigaku hogakkai zasshi 43 (1): 1 – 21 . Hansen , Martin Ejnar , Robert Klemmensen , Sara B. Hobolt , and Hanna Bäck . 2013 . “ Portfolio Saliency and Ministerial Turnover: Dynamics in Scandinavian Postwar Cabinets .” Scandinavian Political Studies 36 : 227 – 48 . Google Scholar CrossRef Search ADS Hosogai , Ryo . 2010 . “Media ga naikaku shiji ni ataeru eikyoryoku to sono jikanteki henka: Shimbun shasetsu no naiyo bunseki o baikai ni shite” [The time-varying effect of the media coverage on the cabinet approval rate] . Masu komyunikeshon kenkyu 77 : 225 – 42 . Huber , John D. , and Cecilia Martínez-Gallardo . 2004 . “ Cabinet Instability and the Accumulation of Experience: The French Fourth and Fifth Republics in Comparative Perspective .” British Journal of Political Science 34 : 27 – 48 . Google Scholar CrossRef Search ADS Huber , John D. , and Cecilia Martínez-Gallardo . 2008 . “ Replacing Cabinet Ministers: Patterns of Ministerial Stability in Parliamentary Democracies .” American Political Science Review 102 : 169 – 80 . Google Scholar CrossRef Search ADS Iida , Takeshi . 2005 . “Seito shiji no naikaku shiji e no eikyo no jikanteki henka: ARFIMA moderu to jihen parameta o mochiita jikeiretsu bunseki” [The change in the effect of party support on cabinet support over time] . Senkyo gakkai kiyo 4 : 41 – 61 . Indriðason , Indridi H. , and Christopher Kam . 2008 . “ Cabinet Reshuffles and Ministerial Drift .” British Journal of Political Science 38 : 621 – 56 . Google Scholar CrossRef Search ADS Inoguchi , Takashi . 1983 . Gendai Nihon seiji keizai no kozu: Seifu to shijo [The structure of contemporary Japanese political economy] . Tokyo , Japan: Toyo Keizai Shinposha . Jackman , Simon . 2005 . “ Pooling the Polls over an Election Campaign .” Australian Journal of Political Science 40 : 499 – 517 . Google Scholar CrossRef Search ADS Jackman , Simon . 2009 . Bayesian Analysis for the Social Sciences . Chichester, UK : Wiley . Google Scholar CrossRef Search ADS Kam , Christopher , William T. Bainco , Itai Sened , and Regina Smyth . 2010 . “ Ministerial Selection and Intraparty Organization in the Contemporary British Parliament .” American Political Science Review 104 : 289 – 306 . Google Scholar CrossRef Search ADS Kam , Christopher , and Indriði Indriðason . 2005 . “ The Timing of Cabinet Reshuffles in Five Westminster Parliamentary Systems .” Legislative Studies Quarterly 30 : 327 – 63 . Google Scholar CrossRef Search ADS Krauss , Ellis S. , and Benjamin Nyblade . 2005 . “ ‘Presidentialization’ in Japan? The Prime Minister, Media and Elections in Japan .” British Journal of Political Science 35 : 357 – 68 . Google Scholar CrossRef Search ADS Maeda , Yukio . 2013 . “ The Development of DPJ Partisanship from a Fraction to a Majority (and Back Again?) .” In Japan under the DPJ: The Politics of Transition and Governance , edited by Kenji E. Kushida and Phillip Y. Lipscy , pp. 191 – 218 . Stanford, CA : Walter H. Shorenstein Asia-Pacific Research Center . Martínez-Gallardo , Cecilia . 2014 . “ Designing Cabinets: Presidential Politics and Ministerial Instability .” Journal of Politics in Latin America 6(2) : 3 – 38 . Masuyama , Mikitaka . 2007 . “ The Survival of Prime Ministers and the House of Councillors .” Social Science Japan Journal 10 : 81 – 93 . Google Scholar CrossRef Search ADS Matsumoto , Tomoko , and Michael Laver . 2015 . “ Public Opinion Feedback between Elections, and Stability of Single-Party Majority Governments .” Electoral Studies 40 : 308 – 14 . Google Scholar CrossRef Search ADS McElwain , Kenneth Mori . 2015 . “Kabuka ka kakusa ka: Naikaku shijiritsu no kyakkanteki shukanteki keizai yoin” [Inequality or growth? Subjective economic beliefs and government approval in Japan] . Leviathan 57 : 72 – 95 . Nakamura , Etsuhiro . 2006 . “Tahenryo choki kioku moderu o mochiita seito shiji to naikaku shiji no kankeisei no bunseki” [Analyzing the dynamic relationship between the cabinet support rate and the LDP support rate using the multivariate long memory model] . Senkyo gakkai kiyo 6 : 107 – 26 . Nishizawa , Yoshitaka . 2001 . “Jiminto shiji to keizai gyoseki hyoka” [LDP support and evaluation of economic performance] . In 55 Nen taiseika no seiji to keizai: Jiji yoron chosa deta no bunseki , edited by Ichiro Miyake , Yoshitaka Nishizawa , and Masaru Kohno , pp. 139 – 59 . Tokyo, Japan : Bokutakusha . Norpoth , Helmut . 1996 . “ Presidents and the Prospective Voter .” Journal of Politics 58 : 776 – 92 . Google Scholar CrossRef Search ADS Ohmura , Hirotaka , and Hanako Ohmura . 2014 . “Buryoku shototsu to Nihon no yoron no hannou” [The “rally ’round the flag” effect and public opinion in Japan] . Leviathan 54 : 70 – 90 . Pickup , Mark , J. Scott Matthews , Will Jennings , Robert Ford , and Stephen D. Fisher . 2011 . “ Why Did the Polls Overestimate Liberal Democrat Support? Sources of Polling Error in the 2010 British General Election .” Journal of Elections, Public Opinion and Parties 21 : 179 – 209 . Google Scholar CrossRef Search ADS Plummer , Martyn . 2003 . “ JAGS: A Program for Analysis of Bayesian Graphical Models Using Gibbs Sampling .” In Proceedings of the 3rd International Workshop on Distributed Statistical Computing, 1–10 . Plummer , Martyn , Nicky Best , Kate Cowles , and Karen Vines . 2006 . “ CODA: Convergence Diagnosis and Output Analysis for MCMC .” R News 6 (1): 7 – 11 . Quiroz Flores , Alejandro . 2016 . Ministerial Survival during Political and Cabinet Change: Foreign Affairs, Diplomacy and War . London, UK : Routledge . Quiroz Flores , Alejandro , and Alastair Smith . 2011 . “ Leader Survival and Cabinet Change .” Economics & Politics 23 : 345 – 66 . Google Scholar CrossRef Search ADS R Core Team . 2016 . R: A Language and Environment for Statistical Computing . Vienna, Austria : R Foundation for Statistical Computing . Smith , Tom W . 1978 . “ In Search of House Effects: A Comparison of Responses to Various Questions by Different Survey Organizations .” Public Opinion Quarterly 42 : 443 – 63 . Google Scholar CrossRef Search ADS Strøm , Kaare . 2000 . “ Delegation and Accountability in Parliamentary Democracies .” European Journal of Political Research 37 : 261 – 90 . Sugawara , Taku . 2009 . Yoron no kyokkai: Naze Jiminto wa taihai shita no ka [Distorted public opinion] . Tokyo , Japan: Kobunsha . Suzuki , Tokuhisa . 2009 . “Yoron chosa no saikin no doko” [Recent trends in opinion polls] . Shakai to chosa 3 : 13 –1 9 . © The Author(s) 2018. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. All rights reserved. For permissions, please e-mail: [email protected] This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)
Electoral Institutions and Democratic Legitimacy2018 Public Opinion Quarterly
doi: 10.1093/poq/nfy007
Abstract The Voting Rights Act of 1965 was widely heralded as a solution to persistently high levels of Black political alienation and cynicism. But despite the importance of the Voting Rights Act for the political representation of historically marginalized groups, little is known about how citizens protected by key provisions of the Act viewed democratic institutions. Integrating insights from the policy feedback literature with studies on the relationship between electoral institutions and attitudes toward government, we predict that the voting protections embedded in the Voting Rights Act led to more favorable attitudes toward government among affected communities. Analyses of data from 1972 to 1998 show that Black citizens in jurisdictions covered by Section 5 of the Voting Rights Act, the preclearance provision, exhibited consistently higher levels of trust in government and more positive perceptions of governmental responsiveness. However, we find no evidence that preclearance was associated with similar patterns among whites. Our results may have especially important contemporary relevance given recent controversies over changes to state and local election laws. Legal scholars, historians, political scientists, public officials, and community activists argue that the Voting Rights Act of 1965 (VRA) was among the most important legislative enactments of the twentieth century. According to Issacharoff (2013, p. 95), the Voting Rights Act “was pivotal in bringing black Americans to the broad currents of political life—a transformation that shook the foundations of Jim Crow, triggered the realignment of partisan politics, and set the foundations for the election of an African American president.” The VRA contributed to increased voter registration and turnout among Blacks and linguistic minorities (Tate 1993; Jones-Correa 2005; Fraga 2016) and improved descriptive (e.g., Grofman and Handley 1991; Lien et al. 2007) and substantive (e.g., Whitby and Gilliam 1991; Lublin 1997; Whitby 2000) representation of people of color. In this paper, we study the consequences of the VRA for public opinion. Prior to the VRA, political mistrust, alienation, and cynicism were significantly higher among Blacks than whites (e.g., Aberbach and Walker 1970; Abramson 1983). Black political attitudes, including orientations toward the political system, have been shaped by historical legacies of slavery, the failures of Reconstruction, and Jim Crow (e.g., Dawson 1994), which may explain relatively low perceptions of legitimacy among Blacks a half-century ago.1 Successful efforts by civil rights leaders and activists to secure guaranteed voting rights from the federal government, however, may have significantly reshaped attitudes toward government among members of historically marginalized groups. We offer two main contributions to the study of public opinion and race. First, while existing research devotes significant attention to how descriptive relationship affects political attitudes among racial minority groups (e.g., Gay 2002; Marschall and Shah 2007), we focus on how attitudes toward government are shaped by perceptions of democratic legitimacy. Just as trust in government is shaped by electoral (e.g., Rahn and Rudolph 2005) and governing institutions (e.g., Marschall and Shah 2007), political inclusion is an important determinant of how citizens feel about their government. Second, we integrate insights from research on policy feedback (see, e.g., Soss 1999; Campbell 2003; Weaver and Lerman 2010; Erikson and Stoker 2011) with the study of electoral institutions. Our argument predicts that individuals affected by the provisions of the VRA developed more favorable attitudes toward government because of the opportunities it provided for political inclusion. We test our argument in the context of the preclearance requirement outlined in Section 5 of the VRA. This provision prohibited certain jurisdictions from changing election laws without federal approval and provided security for the voting rights of historically marginalized groups living in those areas. Data from the American National Election Studies conducted from 1972 to 1998 show that Black citizens living in counties subject to federal preclearance reported significantly higher levels of trust in government, evaluations of government responsiveness, and approval of political institutions. These results are robust to a number of model specifications and empirical strategies. Moreover, we find no discernible evidence that preclearance had an effect on white political attitudes. Our findings have important implications for how democratization and electoral institutions affect attitudes toward the state. Electoral Institutions, Political Representation, and Attitudes toward Government The maintenance of democratic political institutions requires both diffuse and specific support. Diffuse support characterizes the public’s respect for and recognition of the political authority vested in institutions, while specific support refers to the public’s evaluations of incumbent political authorities (Easton 1975). These two dimensions of support make it possible for citizens to be personally upset by, for instance, the behavior of incumbent legislators (specific support) but still respect the authority of Congress to make the nation’s laws (diffuse support). Diffuse support thus helps maintain political institutions even in the face of dissatisfaction with particular political officials. Earlier scholarship has found that feelings of political alienation and alienation are associated with lower levels of support both for incumbent officeholders as well as the system as a whole (Citrin et al. 1975). We argue that the protection of voting rights in democracies affects citizens’ attitudes toward government. As Levi (1998, p. 90) argued, “The belief in government fairness requires the perception that all relevant interests have been considered, that the game is not rigged.” Electoral institutions play a key role in structuring beliefs in fairness and the representation of interests. By prescribing the “rules of the game,” electoral institutions affect citizens’ ability to participate in politics, influence election outcomes, and affect government policymaking. When electoral institutions and the outcomes they generate are perceived as biased, the public is likely to express greater disapproval of both the officeholders and the government system more generally.2 Our argument posits a feedback loop that links citizens’ political inclusion to their attitudes toward government. Just as participation in social welfare programs (e.g., Soss 1999), the criminal justice system (e.g., Weaver and Lerman 2010), and the Vietnam draft (e.g., Erikson and Stoker 2011) affects political attitudes and behaviors, citizens are likely to view government as legitimate to the extent they can influence it. Policies that expand or restrict voting access shape the ability of citizens to express political voice, and with them citizens’ affective orientations toward government. This argument builds upon several strands of related research. For instance, local districting and electoral institutions have been shown to affect government trust among historically marginalized groups (e.g., Rahn and Rudolph 2005), largely because these institutions shape the opportunities for residents to meaningfully effect political change in their communities. Research in comparative politics finds that government trust in post-communist nations was higher in societies that protected individual liberties (Mishler and Rose 1997, 2001), while other kinds of political institutions, including the nature of party (Miller and Listhaug 1990) and electoral systems (Banducci, Donovan, and Karp 1999), are also associated with government evaluations. Similarly, electoral fraud and corruption, which may reduce citizens’ perceptions that they can influence the political system, are associated with decreased levels of trust in government (e.g., McCann and Dominguez 1998). Across these diverse literatures, electoral and political institutions affect citizens’ evaluations of government based on how these institutions create opportunities for political inclusion and representation. When electoral institutions are changed, as in the case of the VRA, individuals can update their attitudes about government based on their own political inclusion. Thus, when election policies change, an individual’s evaluations toward government are more positive (negative) if the policies provide that person with enhanced (reduced) political inclusion, and remain stable if the policies have no implications for her level of political inclusion. This expectation contrasts with a sociotropic perspective rooted in core values such as equality or egalitarianism (e.g., Feldman 1988). The sociotropic perspective suggests that individual attitudes toward government respond to changes to electoral institutions based on whether the policy changes expand or reduce political inclusion generally, irrespective of the specific implications for the individual’s political inclusion. Electoral Institutions and the Voting Rights Act Prior to the VRA, electoral institutions including poll taxes and literacy tests limited opportunities for political participation among Blacks and language minorities (Davidson 1992). For instance, only about a quarter of eligible Blacks registered to vote in the middle of the twentieth century (Garrow 1978). Civil rights leaders focused their attention on voting rights during the early 1960s and pressed the Kennedy and Johnson administrations to guarantee voting rights to all eligible Americans. By prohibiting electoral institutions—formal or informal—that produced racial disparities in ballot access, civil rights leaders argued that the VRA would provide greater political power to Blacks (and language-minority groups, with the 1975 amendments). The elimination of restrictive electoral provisions was therefore expected to improve representation for Blacks and other historically marginalized groups and generate more positive evaluations of government. Several of the VRA’s provisions specifically address ballot access and electoral rules. Legal scholars argue that the combination of Section 4(b) and Section 5 is its critical component (MacCoon 1979; Motomura 1983). Section 5 requires certain jurisdictions with a history of voter disenfranchisement to receive federal approval before changing election laws or voting procedures, and Section 4(b) identifies these jurisdictions. The coverage formula specified in Section 4(b) originally identified jurisdictions that used a test or device in the November 1964 presidential election, and in which less than half of the jurisdiction’s eligible citizens were registered or voted in the November 1964 election.3 Thus, while the VRA guaranteed ballot access to all citizens nationwide, it offered the strongest protections to residents of communities that were subject to preclearance. An important body of research credits the VRA with increasing Black voter registration and turnout (Tate 1993) and improving Black descriptive and substantive political representation (Schuit and Rogowski 2017).4 Scholars have devoted less attention to studying how the VRA may have affected how citizens evaluated government. According to our argument, the electoral protections in the VRA should increase Black citizens’ evaluations of political institutions, elected officials, and government more generally by enhancing Blacks’ perceptions of representation and political inclusion. In earlier work, Abramson (1983) finds that Black citizens typically exhibited lower levels of trust in government than whites, and that the size and direction of the gap varied with the federal government’s efforts to ensure racial equality, but does not focus specifically on the VRA. Previous research on the attitudinal consequences of the VRA has instead focused mostly on the effects of Section 2(b), which prohibits minority vote dilution when drawing electoral districts and led to the creation of majority-minority districts and increased descriptive representation. However, scholars have found limited effects of descriptive representation on attitudes toward government. In one of the most important studies on this topic, Gay (2002) finds little evidence that trust in government or perceptions of Congress among Blacks are affected by descriptive representation.5 To the extent the VRA affected attitudes toward government, therefore, it seems unlikely to have done so through Section 2(b). In contrast, our argument predicts that Sections 4(b) and 5 are more likely to have implications for Black political attitudes. Moreover, these provisions may have affected perceptions of political inclusion apart from citizens’ knowledge about the particulars of the electoral institutions used in their local communities. Local organizations and entrepreneurial candidates used the provisions of the VRA to galvanize and mobilize local Black communities in the aftermath of its passage,6 and activities such as these could have raised visibility about federal electoral protections. Contemporary debates over voting access often focus on whether such provisions have disproportionate effects across groups, particularly among racial and ethnic lines, and underscore the importance of documenting the effects of the VRA. Research on voter identification requirements, however, finds that these laws have little effect on citizens’ perceptions of vote fraud (Ansolabehere and Persily 2008), and raises questions about whether restrictions such as these affect other dimensions of public opinion. Thus, understanding how voting protections influence Black attitudes toward government may also provide important insight into the potential effects of contemporary electoral reforms. Data and Methods We study the effects of the VRA on attitudes toward government using the American National Election Studies from 1972 to 1998. We use this time span because federal preclearance under Section 4(b) of the VRA was generally applied at the county level and the county indicators in the ANES are restricted after 1998.7 Our primary analyses focus specifically on Black respondents and in the comparison between Black and white respondents. Unfortunately, very small sample sizes for other minority groups in many of these surveys preclude comparison. In addition, small samples of Black respondents in earlier versions of the ANES limit our ability to include respondents from those years in our analyses. Although this is an obvious limitation of our ability to identify baseline attitudes among Blacks prior to the Act’s passage in 1965, it does not diminish the importance of the analyses. As our discussion above highlighted, Black citizens living in jurisdictions identified by Section 4(b) had greater federal protection against discriminatory election laws because those jurisdictions were required to receive federal preclearance before modifying their electoral laws or voting procedures. An ideal scenario would randomly assign individuals to geographic locations that were or were not covered by the preclearance provisions. However, the coverage formula in Section 4(b) was not randomly assigned to jurisdictions across the country; in fact, the government argued in South Carolina v. Katzenbach and Shelby v. Holder that Congress had reverse-engineered the coverage formula to identify areas with “reliable evidence” of voting discrimination. Thus, we compared individuals’ attitudes based on whether they lived in a jurisdiction that was subject to the preclearance requirement. Based on the geographic information provided about respondents’ locations in the ANES, we identified whether respondents lived in a county that was subject to the preclearance requirement. Counties constituted the vast majority of jurisdictions subject to preclearance, and we suspect our empirical approach represents a conservative strategy for reasons we detail below. All else equal, Black respondents living in counties subject to preclearance should express more positive attitudes toward government than Black respondents living in jurisdictions that were not subject to preclearance.8 We do not expect to observe differences in attitudes toward government on the basis of preclearance among whites, who were generally not subject to the patterns of disenfranchisement in jurisdictions covered by Section 5. While observational research designs like this confront unavoidable challenges due to endogeneity, this concern weighs against finding a positive relationship between preclearance and attitudes toward government. If the coverage formula was applied to the jurisdictions with the most egregious histories of voter discrimination, residents of those jurisdictions should have the most negative attitudes toward government. When comparing attitudes among respondents who live in covered and noncovered jurisdictions, then, we would find a negative relationship between coverage and evaluations of government. It is considerably more difficult to envision a scenario in which endogeneity would explain a positive relationship between coverage and government attitudes, and thus the results that follow may underestimate the effect of preclearance. Our key independent variable, VRA Coverage, is an indicator for whether the ANES respondent lived in a county that was covered by the preclearance requirement as a result of the original Voting Rights Act of 1965 or any of its later amendments, or on the basis of whether it was covered under the bail-in provision.9 In the absence of random assignment, we rely on two key sources of variation to identify the effect of the VRA. Though the coverage formula specified in Section 4(b) identified jurisdictions that were subject to preclearance, this list of covered jurisdictions has changed over the years due to the “bail-in” and “bail-out” provisions. The bail-in provision is specified in Section 3(c) and allows federal courts to subject jurisdictions that fall outside the coverage formula in Section 4(b) to the preclearance requirement if the jurisdiction has enacted voting laws that are racially discriminatory.10 The bail-out provision in Section 4(a) allowed jurisdictions to seek exemption from preclearance if they had not used a voting test or device with discriminatory intent and by showing registration and turnout rates among majority and minority citizens.11 These provisions produced temporal variation in the counties subject to preclearance. Though we report results for a variety of dependent variables, we begin our analyses using the ANES measure of trust in government. Our measure of trust is government is based on the traditional question “How much of the time do you think you can trust the government in Washington to do what is right?” Over the past forty years, the question has generally been asked on a three-point scale that ranges from “some of the time” (1) to “just about always” (3), though some respondents volunteered that they trust government “none of the time.” To facilitate interpretation, we code trust in government dichotomously, where responses of trusting government “most of the time” or “just about always” are coded 1, and other responses are coded zero. The trust in government questions were “designed to tap the basic evaluative orientations toward the national government” (Stokes 1962, p. 64). The more trustworthy citizens perceive government to be, the more likely they are to comply with and consent to its demands and regulations (Tyler 1990). Greater trust benefits elected officials and political institutions by “provid[ing] leaders more leeway to govern effectively and institutions a larger store of support regardless of the performance of those running the government” (Hetherington 1998, p. 803). Figure 1 below compares the level of trust in government among Black respondents who did (solid line) and did not (dotted line) live in areas that were subject to the preclearance provision. With only two exceptions (1984 and 1998), trust in government was higher among Black respondents who lived in areas subject to federal preclearance. Moreover, aggregating over the entire time period, Black respondents living in preclearance jurisdictions reported significantly higher levels of trust than Blacks who did not (difference = .09; p < .01). These raw data provide preliminary support for the hypothesis that the VRA influenced perceptions of democratic legitimacy. Figure 1. View largeDownload slide The Voting Rights Act, federal preclearance, and trust in government, 1972–1998. Black respondents only, American National Election Studies. Plot shows the percentages of Black respondents who indicated they trusted the government to do what was right “most of the time” or “just about always.” The size of the points is proportional to sample size. Figure 1. View largeDownload slide The Voting Rights Act, federal preclearance, and trust in government, 1972–1998. Black respondents only, American National Election Studies. Plot shows the percentages of Black respondents who indicated they trusted the government to do what was right “most of the time” or “just about always.” The size of the points is proportional to sample size. Statistical Models To examine the effect of the preclearance requirement on attitudes toward government, we estimate a series of logistic regressions. The unit of analysis is an individual survey respondent i living in county j in year t, and the main dependent variable is the Trust in government measure described above.12 Generally speaking, our statistical model takes the form Pr(yijt=1)=logit−1[β0+β1VRA Coveragejt+β2 Blacki+β3(VRA Coveragejt×Blacki)+XijtΩ+Dt+εijt], (1) where yi is respondent i’s degree of trust in government; VRA Coverage is an indicator for whether respondent i’s county j was subject to the preclearance provision under the VRA in year t; Black indicates whether the survey respondent identified as Black or African American; X is a matrix of potential confounding variables that may also be associated with trust in government, which are described below, and with the corresponding coefficient estimates contained in Ω; Dt indicates the year of survey administration to account for any year-specific differences in government trust; and β0 and εijt are constant and error terms, respectively. All data in the analyses are weighted to national population parameters and standard errors are clustered on county-years. Equation (1) allows us to evaluate our main hypothesis in two complementary ways. Primarily, a positive coefficient for the interaction term (β3) between VRA Coverage and Black would indicate that the preclearance provision of the VRA contributed to higher levels of trust in government among Black respondents. This specification also allows us to conduct a placebo test by comparing the effects of VRA Coverage between Black and white respondents. Given that the preclearance provision was targeted specifically to communities that had historically low levels of registration and turnout among people of color, we do not expect an association between preclearance and attitudes toward government among white respondents. Thus, the coefficient estimate (β1) for VRA Coverage should be close to zero, which would indicate that the preclearance provision had little effect among white respondents, while positive and statistically significant results for the interaction term would provide evidence that preclearance affected attitudes primarily among Blacks. However, our conclusions remain unchanged if our models include only respondents who identify as Black or African American (see the Supplementary Materials). We consider two sources of potential confounders: individual-level characteristics that may be associated with trust in government (e.g., education level, an indicator for female respondents, and age), and county-level covariates that were likely correlated with whether a particular county was subject to preclearance (two variables based on the 1964 presidential election: Democratic vote share13 and turnout14). We also accounted for the percentage of the county nonwhite and urban populations.15 Each measurement was taken prior to passage of the 1965 VRA. Finally, we include an indicator for counties located in the South as defined by the Census.16 We point out that the model excludes several individual and contextual characteristics that are identified by previous literature as potential predictors of trust, including partisanship, whether an individual reported being mobilized, level of political interest, and whether Black respondents are represented by a coracial legislator. Each of these characteristics was likely affected by the VRA, and including these variables in our models could introduce unknown degrees of post-treatment bias (though the results with these characteristics are reported in the Supplementary Materials). Results Table 1 displays estimates of the relationship between federal preclearance and trust in government. The first column reports coefficients from a simple model in which Trust in government is regressed on VRA Coverage, Black, its interaction, and the year-specific indicators. The model reported in column (2) accounts for education, sex, and age, and column (3) reports results when the county-level variables are included. Table 1. Logistic regressions predicting trust in government Model 1 Model 2 Model 3 coef. (s.e.) coef. (s.e.) coef. (s.e.) Intercept 0.22* (0.10) 0.17 (0.09) 0.11 (0.09) Black -0.59* (0.07) -0.58* (0.07) -0.57* (0.07) VRA coverage 0.02 (0.05) 0.01 (0.05) 0.02 (0.06) Black x VRA coverage 0.48* (0.12) 0.51* (0.12) 0.52* (0.12) Education 0.07* (0.02) 0.07* (0.02) Female -0.02 (0.03) -0.02 (0.03) Age -0.02* (0.01) -0.02* (0.01) Democratic vote share, 1964 0.13 (0.17) Democratic vote share, 1964 (squared) 0.11 (0.45) Voter turnout, 1964 0.11 (0.25) Nonwhite population, 1960 -0.46 (0.24) Urban population, 1960 0.00 (0.06) South 0.15* (0.05) N 21,777 21,777 21,777 Clusters 1,477 1,477 1,477 Log-likelihood -14,184.89 -14,168.45 -14,161.04 χ2 673.31* 692.87* 724.13* Change in Pr(yijt = 1) 0.10* (0.02) 0.11* (0.02) 0.11* (0.03) Model 1 Model 2 Model 3 coef. (s.e.) coef. (s.e.) coef. (s.e.) Intercept 0.22* (0.10) 0.17 (0.09) 0.11 (0.09) Black -0.59* (0.07) -0.58* (0.07) -0.57* (0.07) VRA coverage 0.02 (0.05) 0.01 (0.05) 0.02 (0.06) Black x VRA coverage 0.48* (0.12) 0.51* (0.12) 0.52* (0.12) Education 0.07* (0.02) 0.07* (0.02) Female -0.02 (0.03) -0.02 (0.03) Age -0.02* (0.01) -0.02* (0.01) Democratic vote share, 1964 0.13 (0.17) Democratic vote share, 1964 (squared) 0.11 (0.45) Voter turnout, 1964 0.11 (0.25) Nonwhite population, 1960 -0.46 (0.24) Urban population, 1960 0.00 (0.06) South 0.15* (0.05) N 21,777 21,777 21,777 Clusters 1,477 1,477 1,477 Log-likelihood -14,184.89 -14,168.45 -14,161.04 χ2 673.31* 692.87* 724.13* Change in Pr(yijt = 1) 0.10* (0.02) 0.11* (0.02) 0.11* (0.03) Source.—American National Election Studies, 1972–1998. Entries are logistic regression coefficients with standard errors in parentheses, clustered on county-year. The dependent variable is measured by responses to the question “How much of the time do you think you can trust the government in Washington to do what is right?,” where responses of “most of the time” and “just about always” are coded 1 and other responses are coded 0. Indicators for year are also included but not reported. *p < .05 (two-tailed tests). View Large Table 1. Logistic regressions predicting trust in government Model 1 Model 2 Model 3 coef. (s.e.) coef. (s.e.) coef. (s.e.) Intercept 0.22* (0.10) 0.17 (0.09) 0.11 (0.09) Black -0.59* (0.07) -0.58* (0.07) -0.57* (0.07) VRA coverage 0.02 (0.05) 0.01 (0.05) 0.02 (0.06) Black x VRA coverage 0.48* (0.12) 0.51* (0.12) 0.52* (0.12) Education 0.07* (0.02) 0.07* (0.02) Female -0.02 (0.03) -0.02 (0.03) Age -0.02* (0.01) -0.02* (0.01) Democratic vote share, 1964 0.13 (0.17) Democratic vote share, 1964 (squared) 0.11 (0.45) Voter turnout, 1964 0.11 (0.25) Nonwhite population, 1960 -0.46 (0.24) Urban population, 1960 0.00 (0.06) South 0.15* (0.05) N 21,777 21,777 21,777 Clusters 1,477 1,477 1,477 Log-likelihood -14,184.89 -14,168.45 -14,161.04 χ2 673.31* 692.87* 724.13* Change in Pr(yijt = 1) 0.10* (0.02) 0.11* (0.02) 0.11* (0.03) Model 1 Model 2 Model 3 coef. (s.e.) coef. (s.e.) coef. (s.e.) Intercept 0.22* (0.10) 0.17 (0.09) 0.11 (0.09) Black -0.59* (0.07) -0.58* (0.07) -0.57* (0.07) VRA coverage 0.02 (0.05) 0.01 (0.05) 0.02 (0.06) Black x VRA coverage 0.48* (0.12) 0.51* (0.12) 0.52* (0.12) Education 0.07* (0.02) 0.07* (0.02) Female -0.02 (0.03) -0.02 (0.03) Age -0.02* (0.01) -0.02* (0.01) Democratic vote share, 1964 0.13 (0.17) Democratic vote share, 1964 (squared) 0.11 (0.45) Voter turnout, 1964 0.11 (0.25) Nonwhite population, 1960 -0.46 (0.24) Urban population, 1960 0.00 (0.06) South 0.15* (0.05) N 21,777 21,777 21,777 Clusters 1,477 1,477 1,477 Log-likelihood -14,184.89 -14,168.45 -14,161.04 χ2 673.31* 692.87* 724.13* Change in Pr(yijt = 1) 0.10* (0.02) 0.11* (0.02) 0.11* (0.03) Source.—American National Election Studies, 1972–1998. Entries are logistic regression coefficients with standard errors in parentheses, clustered on county-year. The dependent variable is measured by responses to the question “How much of the time do you think you can trust the government in Washington to do what is right?,” where responses of “most of the time” and “just about always” are coded 1 and other responses are coded 0. Indicators for year are also included but not reported. *p < .05 (two-tailed tests). View Large The results are quite consistent across each model. The coefficients for VRA Coverage are very small in magnitude and not statistically distinguishable from zero, providing no evidence of an association between preclearance and trust in government among whites. The coefficients for Black are negative and statistically significant, indicating that Black respondents who lived in areas not subject to preclearance reported significantly lower levels of trust relative to whites. More importantly, the coefficients for the interaction between VRA Coverage and Black were consistently positive and statistically significant, indicating that federal preclearance significantly increased trust in government among Black respondents. The magnitudes of the coefficients for the interaction term, moreover, are nearly identical to those for Black. This finding indicates that while there was a significant Black-white gap in trust in jurisdictions not covered by preclearance, there was no such gap in preclearance counties. The bottom panel of Table 1 reports the substantive magnitudes of VRA Coverage on trust. The row labeled Change in Pr(yijt =1) displays the increase in the predicted probability of reporting trust in government among Black respondents living in counties subject to preclearance. These estimates were generated by holding constant the values of all other covariates and comparing the predicted probability of reporting trust in government among respondents who were and were not living in areas covered by the preclearance provision. The entries in parentheses report the standard errors associated with these increased probabilities. Across the three models, VRA Coverage is associated with a significant increase—between 10 and 11 percentage points—in the probability of trusting government among Black constituents. We find some evidence of an association between trust in government and our control variables. Respondents with higher levels of education reported greater trust, while trust decreased with age. Women were less likely to report trust in government, but this relationship is not statistically significant. Respondents living in counties with higher nonwhite populations in 1960 were less likely to report trust in government, while individuals in the South reported higher trust in government. The coefficients are not statistically significant for 1964 Democratic presidential vote share or its quadratic, 1964 voter turnout, or urban percentage of the 1960 population. These results are robust across a range of additional supplementary analyses. While jurisdictions subject to federal preclearance were distributed across the nation, they were disproportionately concentrated in the South. Thus, we reestimated our models when limiting our analyses to respondents living in Southern states. As mentioned above, we also estimated models that accounted for other potential confounding variables that were likely also affected by the VRA, including mobilization, party identification, political interest, and representation by a Black legislator. In addition, models were estimated that included as covariates income (which is missing for several thousand respondents and thus omitted from the models above) and respondents’ perceptions of racial discrimination (measures that are available for only four of the ANES years under study17). We also estimated models that allowed the coefficients for the individual-level covariates to vary across Black and white respondents. Moreover, rather than including our county-level variables as controls, they were used as instruments for federal preclearance in an instrumental variables framework. Finally, we used genetic matching (Diamond and Sekhon 2013) to address concerns about common support and identified counties that were not subject to preclearance but were otherwise similar to counties that were subject to preclearance on the basis of the county-level variables described above and whether counties were located in the South. After preprocessing the data, we reestimated the models shown above. Across all these additional analyses, the evidence supports our results in Table 1.18 In addition, though a full examination is beyond the scope of this paper, we note that our argument does not apply solely to Black Americans. Indeed, the VRA extended protections to other historically marginalized groups, including language minorities. The Supplementary Materials show that these results extend to nonwhites in general and Latino/as in particular. We are reluctant to overinterpret these results because the samples of Latino/as are quite small, particularly prior to the mid-1980s. However, this preliminary analysis provides evidence that the VRA increased trust in government among members of other protected groups living in jurisdictions subject to the preclearance provision. Evaluations of Government Responsiveness We explore the robustness of our results above by examining the relationship between VRA preclearance and several other system-level evaluations of government. Specifically, if preclearance helped secure perceptions of political inclusion, the guarantee of voting rights should have feedback effects on how individuals perceive American elections. We study how respondents evaluate the relationship between elections and government responsiveness, and expect that Black respondents living in preclearance jurisdictions would have more positive assessments of the capacity for elections to produce political change, as civil rights leaders hoped. We evaluate three additional indicators found in the ANES. The first, Government, is measured using respondents’ answers to the question “How much attention do you feel the government pays to what the people think when it decides what to do?” The second indicator, Elections, links elections to government performance by gauging respondents’ answers to the question “How much do you feel that having elections makes the government pay attention to what the people think?” The third indicator, Legislators, reports responses to the question “How much attention do you think most Congressmen pay to the people who elect them when they decide what to do in Congress?”19 Each question addresses some dimension of respondents’ assessments that the country’s electoral system helps provide effective representation. Responses to these questions were used to create the dependent variables, which were coded 1 if respondents answered “a good deal” and 0 if respondents answered “some” or “not much.”20 Logit models were estimated with these dependent variables and the same independent variables used in the full model in table 1. As shown in table 2, the patterns are consistent with the results in table 1. Across the three dependent variables, the preclearance provision significantly increased perceptions of responsiveness among Black respondents. As the bottom row of the table indicates, the magnitude of these relationships was between seven and 12 percentage points. Again, no systematic evidence exists that preclearance was associated with heightened perceptions of responsiveness among whites. Interestingly, the results do indicate that Black respondents living in preclearance areas had substantially more positive assessments of electoral institutions than white respondents, whether or not they lived in preclearance jurisdictions. Table 2. Logistic regressions predicting evaluations of government responsiveness Government Elections Legislators coef. (s.e.) coef. (s.e.) coef. (s.e.) Intercept -2.43* (0.13) -0.41* (0.10) -1.78* (0.16) Black 0.03 (0.10) -0.06 (0.08) -0.17 (0.13) VRA coverage 0.10 (0.09) 0.08 (0.08) 0.07 (0.10) Black x VRA coverage 0.51* (0.15) 0.40* (0.13) 0.38# (0.23) Education 0.23* (0.03) 0.19* (0.02) 0.25* (0.03) Female -0.30* (0.05) -0.11* (0.03) -0.18* (0.06) Age 0.09* (0.01) 0.06* (0.01) 0.06* (0.02) Democratic vote share, 1964 0.00 (0.26) 0.15 (0.21) 0.48 (0.37) Democratic vote share, 1964 (squared) 0.02 (0.65) 0.44 (0.53) 0.59 (0.83) Voter turnout, 1964 0.16 (0.37) 0.19 (0.30) 0.24 (0.39) Nonwhite population, 1960 -0.31 (0.30) -0.36 (0.30) -0.31 (0.39) Urban population, 1960 0.21* (0.08) 0.07 (0.06) 0.01 (0.11) South 0.18* (0.08) 0.03 (0.07) 0.14 (0.09) N 16,838 15,679 7,952 Clusters 1,134 1,035 486 Log-likelihood -6,960.32 -11,520.30 -4,617.52 χ2 271.70* 292.47* 138.65* Change in Pr(yijt = 1) 0.08* (0.02) 0.12* (0.03) 0.07# (0.04) Government Elections Legislators coef. (s.e.) coef. (s.e.) coef. (s.e.) Intercept -2.43* (0.13) -0.41* (0.10) -1.78* (0.16) Black 0.03 (0.10) -0.06 (0.08) -0.17 (0.13) VRA coverage 0.10 (0.09) 0.08 (0.08) 0.07 (0.10) Black x VRA coverage 0.51* (0.15) 0.40* (0.13) 0.38# (0.23) Education 0.23* (0.03) 0.19* (0.02) 0.25* (0.03) Female -0.30* (0.05) -0.11* (0.03) -0.18* (0.06) Age 0.09* (0.01) 0.06* (0.01) 0.06* (0.02) Democratic vote share, 1964 0.00 (0.26) 0.15 (0.21) 0.48 (0.37) Democratic vote share, 1964 (squared) 0.02 (0.65) 0.44 (0.53) 0.59 (0.83) Voter turnout, 1964 0.16 (0.37) 0.19 (0.30) 0.24 (0.39) Nonwhite population, 1960 -0.31 (0.30) -0.36 (0.30) -0.31 (0.39) Urban population, 1960 0.21* (0.08) 0.07 (0.06) 0.01 (0.11) South 0.18* (0.08) 0.03 (0.07) 0.14 (0.09) N 16,838 15,679 7,952 Clusters 1,134 1,035 486 Log-likelihood -6,960.32 -11,520.30 -4,617.52 χ2 271.70* 292.47* 138.65* Change in Pr(yijt = 1) 0.08* (0.02) 0.12* (0.03) 0.07# (0.04) Source.—American National Election Studies, 1972–1998. Entries are logistic regression coefficients with standard errors in parentheses, clustered on county-year. The dependent variable is measured by responses to the question “How much of the time do you think you can trust the government in Washington to do what is right?,” where responses of “most of the time” and “just about always” are coded 1 and other responses are coded 0. Indicators for year are also included but not reported. #p < .10; *p < .05 (two-tailed tests). View Large Table 2. Logistic regressions predicting evaluations of government responsiveness Government Elections Legislators coef. (s.e.) coef. (s.e.) coef. (s.e.) Intercept -2.43* (0.13) -0.41* (0.10) -1.78* (0.16) Black 0.03 (0.10) -0.06 (0.08) -0.17 (0.13) VRA coverage 0.10 (0.09) 0.08 (0.08) 0.07 (0.10) Black x VRA coverage 0.51* (0.15) 0.40* (0.13) 0.38# (0.23) Education 0.23* (0.03) 0.19* (0.02) 0.25* (0.03) Female -0.30* (0.05) -0.11* (0.03) -0.18* (0.06) Age 0.09* (0.01) 0.06* (0.01) 0.06* (0.02) Democratic vote share, 1964 0.00 (0.26) 0.15 (0.21) 0.48 (0.37) Democratic vote share, 1964 (squared) 0.02 (0.65) 0.44 (0.53) 0.59 (0.83) Voter turnout, 1964 0.16 (0.37) 0.19 (0.30) 0.24 (0.39) Nonwhite population, 1960 -0.31 (0.30) -0.36 (0.30) -0.31 (0.39) Urban population, 1960 0.21* (0.08) 0.07 (0.06) 0.01 (0.11) South 0.18* (0.08) 0.03 (0.07) 0.14 (0.09) N 16,838 15,679 7,952 Clusters 1,134 1,035 486 Log-likelihood -6,960.32 -11,520.30 -4,617.52 χ2 271.70* 292.47* 138.65* Change in Pr(yijt = 1) 0.08* (0.02) 0.12* (0.03) 0.07# (0.04) Government Elections Legislators coef. (s.e.) coef. (s.e.) coef. (s.e.) Intercept -2.43* (0.13) -0.41* (0.10) -1.78* (0.16) Black 0.03 (0.10) -0.06 (0.08) -0.17 (0.13) VRA coverage 0.10 (0.09) 0.08 (0.08) 0.07 (0.10) Black x VRA coverage 0.51* (0.15) 0.40* (0.13) 0.38# (0.23) Education 0.23* (0.03) 0.19* (0.02) 0.25* (0.03) Female -0.30* (0.05) -0.11* (0.03) -0.18* (0.06) Age 0.09* (0.01) 0.06* (0.01) 0.06* (0.02) Democratic vote share, 1964 0.00 (0.26) 0.15 (0.21) 0.48 (0.37) Democratic vote share, 1964 (squared) 0.02 (0.65) 0.44 (0.53) 0.59 (0.83) Voter turnout, 1964 0.16 (0.37) 0.19 (0.30) 0.24 (0.39) Nonwhite population, 1960 -0.31 (0.30) -0.36 (0.30) -0.31 (0.39) Urban population, 1960 0.21* (0.08) 0.07 (0.06) 0.01 (0.11) South 0.18* (0.08) 0.03 (0.07) 0.14 (0.09) N 16,838 15,679 7,952 Clusters 1,134 1,035 486 Log-likelihood -6,960.32 -11,520.30 -4,617.52 χ2 271.70* 292.47* 138.65* Change in Pr(yijt = 1) 0.08* (0.02) 0.12* (0.03) 0.07# (0.04) Source.—American National Election Studies, 1972–1998. Entries are logistic regression coefficients with standard errors in parentheses, clustered on county-year. The dependent variable is measured by responses to the question “How much of the time do you think you can trust the government in Washington to do what is right?,” where responses of “most of the time” and “just about always” are coded 1 and other responses are coded 0. Indicators for year are also included but not reported. #p < .10; *p < .05 (two-tailed tests). View Large The results shown in table 2 suggest that the VRA helped create more positive evaluations of government among Black citizens by guaranteeing the right to vote in places where such guarantees had been largely absent. The results further implicate the role of elections specifically as a mechanism through which citizens can influence government. By ensuring ballot access, the VRA generated more favorable perceptions of democratic legitimacy among traditionally marginalized groups. Perceptions of Political Inclusiveness As noted earlier, federal preclearance may have increased citizens’ feelings of political inclusion by guaranteeing voting access. We study this potential mechanism using responses to the question “Would you say the government is pretty much run by a few big interests looking out for themselves (coded 0) or that it is run for the benefit of all the people (coded 1)?” Though the question wording may not be ideal for testing perceptions of political inclusion, it does tap into respondents’ perceptions of whether government serves the mass public, including themselves. If the VRA helped create the perception of increased political inclusion, Black respondents living in areas subject to preclearance should be more likely than other Black respondents to believe that government is run for the benefit of all people. The results are shown below in table 3. The coefficient for the interaction between VRA Coverage and Black is again positive and statistically significant, indicating that Black respondents living in areas subject to federal clearance were more likely to report feeling that government is responsive to all people compared with Black respondents living in areas that were not covered by preclearance. The constituent term for VRA Coverage is not statistically significant and provides no evidence of a similar relationship among whites. The bottom panel of the table indicates that Black respondents in preclearance areas were about 11 percentage points more likely to report that government was run for the benefit of all the people rather than just a few big interests. Table 3. Logistic regressions predicting perceptions of political inclusion Government run for benefit of all people coef. (s.e.) Intercept -0.28* (0.10) Black -0.24* (0.08) VRA coverage 0.07 (0.07) Black x VRA coverage 0.45* (0.18) Education 0.05* (0.02) Female 0.01 (0.03) Age -0.06* (0.01) Democratic vote share, 1964 0.22 (0.20) Democratic vote share, 1964 (squared) 0.41 (0.56) Voter turnout, 1964 -0.21 (0.30) Nonwhite population, 1960 -0.58* (0.28) Urban population, 1960 0.15* (0.07) South 0.17* (0.06) N 19,075 Clusters 1,353 Log-likelihood -11,880.36 χ2 416.08* Change in Pr(yijt = 1) 0.11* (0.03) Government run for benefit of all people coef. (s.e.) Intercept -0.28* (0.10) Black -0.24* (0.08) VRA coverage 0.07 (0.07) Black x VRA coverage 0.45* (0.18) Education 0.05* (0.02) Female 0.01 (0.03) Age -0.06* (0.01) Democratic vote share, 1964 0.22 (0.20) Democratic vote share, 1964 (squared) 0.41 (0.56) Voter turnout, 1964 -0.21 (0.30) Nonwhite population, 1960 -0.58* (0.28) Urban population, 1960 0.15* (0.07) South 0.17* (0.06) N 19,075 Clusters 1,353 Log-likelihood -11,880.36 χ2 416.08* Change in Pr(yijt = 1) 0.11* (0.03) Source.—American National Election Studies, 1972–1998. Entries are logistic regression coefficients with standard errors in parentheses, clustered on county-year. The dependent variable is measured by responses to the question “Would you say the government is pretty much run by a few big interests looking out for themselves (0) or that it is run for the benefit of all the people (1)?” Indicators for year are also included but not reported. *p < .05 (two-tailed tests). View Large Table 3. Logistic regressions predicting perceptions of political inclusion Government run for benefit of all people coef. (s.e.) Intercept -0.28* (0.10) Black -0.24* (0.08) VRA coverage 0.07 (0.07) Black x VRA coverage 0.45* (0.18) Education 0.05* (0.02) Female 0.01 (0.03) Age -0.06* (0.01) Democratic vote share, 1964 0.22 (0.20) Democratic vote share, 1964 (squared) 0.41 (0.56) Voter turnout, 1964 -0.21 (0.30) Nonwhite population, 1960 -0.58* (0.28) Urban population, 1960 0.15* (0.07) South 0.17* (0.06) N 19,075 Clusters 1,353 Log-likelihood -11,880.36 χ2 416.08* Change in Pr(yijt = 1) 0.11* (0.03) Government run for benefit of all people coef. (s.e.) Intercept -0.28* (0.10) Black -0.24* (0.08) VRA coverage 0.07 (0.07) Black x VRA coverage 0.45* (0.18) Education 0.05* (0.02) Female 0.01 (0.03) Age -0.06* (0.01) Democratic vote share, 1964 0.22 (0.20) Democratic vote share, 1964 (squared) 0.41 (0.56) Voter turnout, 1964 -0.21 (0.30) Nonwhite population, 1960 -0.58* (0.28) Urban population, 1960 0.15* (0.07) South 0.17* (0.06) N 19,075 Clusters 1,353 Log-likelihood -11,880.36 χ2 416.08* Change in Pr(yijt = 1) 0.11* (0.03) Source.—American National Election Studies, 1972–1998. Entries are logistic regression coefficients with standard errors in parentheses, clustered on county-year. The dependent variable is measured by responses to the question “Would you say the government is pretty much run by a few big interests looking out for themselves (0) or that it is run for the benefit of all the people (1)?” Indicators for year are also included but not reported. *p < .05 (two-tailed tests). View Large Figure 2 summarizes the results from tables 1 through 3, and contrasts the relationship between preclearance and attitudes toward government for Black and white respondents. The plotted points display the predicted increase in the probability of providing a positive response to each of the dependent variables shown on the y-axis as a function of preclearance. The x-axis displays these increased probabilities in percentage points, where positive numbers indicate that preclearance is associated with more favorable attitudes toward government. Black respondents are shown with the darker circles, and white respondents are shown with gray circles. The horizontal lines are the 95 percent confidence intervals associated with the estimates. The dashed vertical line at zero indicates where these points would fall under the null hypothesis of no association between federal preclearance and attitudes toward government. Figure 2. View largeDownload slide Federal preclearance and attitudes toward government among Blacks and whites, 1972–1998. The x-axis shows the increased probability (in percentage points) of providing a favorable attitude toward government for each dependent variable among respondents living in areas subject to federal preclearance under the Voting Rights Act. Positive numbers indicate that preclearance is associated with more favorable attitudes, and negative numbers indicate that preclearance is associated with less favorable attitudes. The plotted points are the predicted increases in the probability of providing positive evaluations, and the horizontal lines are the 95 percent confidence intervals associated with these estimates. Figure 2. View largeDownload slide Federal preclearance and attitudes toward government among Blacks and whites, 1972–1998. The x-axis shows the increased probability (in percentage points) of providing a favorable attitude toward government for each dependent variable among respondents living in areas subject to federal preclearance under the Voting Rights Act. Positive numbers indicate that preclearance is associated with more favorable attitudes, and negative numbers indicate that preclearance is associated with less favorable attitudes. The plotted points are the predicted increases in the probability of providing positive evaluations, and the horizontal lines are the 95 percent confidence intervals associated with these estimates. Across all five dependent variables, the predicted difference in attitudes among Black respondents is positive, and both statistically and substantively larger than the predicted change in attitudes among white respondents. Moreover, none of the predicted differences in attitudes among white respondents is statistically distinguishable from zero. Thus, while there is considerable evidence that Black political attitudes were responsive to the efforts of Congress and the federal government to secure voting rights, no evidence exists of such an association among whites. In combination, the results presented here indicate that the provisions of the VRA significantly increased perceptions of democratic legitimacy among Black citizens. Consistent with our argument, Black respondents who lived in areas with the strongest voting rights protections were significantly more likely to report positive evaluations of government and electoral institutions compared with Blacks living in jurisdictions without preclearance. Further, there is no association between preclearance and government attitudes among whites. Conclusion Scholarly accounts of Black attitudes toward government in the 1950s, 1960s, and 1970s routinely emphasized the elevated levels of distrust, cynicism, and alienation (see Walton 1985). By all accounts, the struggle for civil rights achieved a resounding victory with the Voting Rights Act of 1965, which guaranteed ballot access to all citizens and put into place a variety of protections to ensure that those rights were not infringed. Our findings build upon research on the effects of the VRA on Black voter registration and turnout and descriptive representation, and suggest that the guarantee of political inclusion helped reduce Black political alienation and increased affective evaluations of government. Our findings complement other research on the attitudinal consequences of other provisions of the VRA. Increases in the number of Black elected officials over the past half-century, which stemmed in part from the creation of new majority-minority districts pursuant to voting rights jurisprudence, were widely posited to increase the linkages between Black citizens and government. However, the findings in the literature are decidedly mixed (e.g., Gay 2002; Tate 2003; Scherer and Curry 2010). Our research suggests that while descriptive representation and other consequences of the VRA may well have important implications for normatively desirable attitudinal outcomes, the security of voting rights itself plays an important role in shaping Black citizens’ orientations toward and evaluations of democratic institutions. Consistent with other research on the importance of procedural fairness (e.g., Tyler, Casper, and Fisher 1989), attitudes toward government depend upon not only whom is elected but also whether an individual feels their right to participation in the selection process is guaranteed. By design, however, the findings presented here have important limitations. Though we estimated a wide range of statistical models and conducted numerous empirical checks and placebo tests, these data do not permit us to conclusively identify a causal relationship between the preclearance provision in the VRA and political attitudes. However, neither limitation is cause for dispensing with the importance of the findings. Moreover, the focus of this study is limited primarily to Black Americans due to relatively small samples of Asian Americans, Latino/as, and other groups in the ANES. Given the central role the VRA has played in American politics for a half-century, further research is needed to understand the effects its specific provisions have had on the political inclusion of historically marginalized groups. Importantly, though, our research design does not allow us to distinguish the effects of the passage of new policy from those generated by its implementation. It is therefore unclear whether the enactment of new voter ID laws (for instance) would be sufficient to reduce government evaluations or whether such relationships would not be observed until these laws produced material changes in the political representation of Black communities. Consequently, changes to election laws may not produce immediate and dramatic effects on public opinion but instead may cumulate over time. Additional research is needed to understand how policy feedback effects are produced in these contexts and identify their implications for policymakers. Supplementary Data Supplementary data are freely available at Public Opinion Quarterly online. The authors thank the Office of Undergraduate Research and the Department of Political Science at Washington University in St. Louis for generous research support. Chris Elmendorf, Jim Gibson, Andrew Reeves, Maya Sen, and Doug Spencer provided helpful comments. This is one in a series of papers by the authors, and the ordering of authors’ names reflects the principle of rotation. An earlier version of this paper was presented at the 2016 Annual Meeting of the Midwest Political Science Association. Appendix Data The data used in this study are taken from the American National Election Studies Time Series Cumulative File, 1948–2012 (The American National Election Studies 2010). The surveys were conducted by the University of Michigan and were supported by the National Science Foundation (SBR-9707741, SBR-9317631, SES-9209410, SES-9009379, SES-8808361, SES-8341310, SES-8207580, and SOC77-08885). The study population is the American electorate, and eligible voters are the sample universe. The ANES uses a probability sample of adult citizens, which is designed to be representative of the target population (US adults 18 years and older). The data used in this research were collected via face-to-face and telephone interviews and mail surveys. The sample sizes and AAPOR RR1 response rates for each survey wave are below (American Association for Public Opinion Research 2016): Year N RR1 (%) Year N RR1 (%) 1972 2705 75.0 1986 2176 67.7 1974 1575 70.0 1988 2040 70.5 1976 2248 70.4 1990 1980 70.6 1978 2304 68.9 1992 1126 74.0 1980 1614 71.8 1994 1036 72.1 1982 1418 72.3 1996 398 59.8 1984 2257 72.1 1998 1281 63.8 Year N RR1 (%) Year N RR1 (%) 1972 2705 75.0 1986 2176 67.7 1974 1575 70.0 1988 2040 70.5 1976 2248 70.4 1990 1980 70.6 1978 2304 68.9 1992 1126 74.0 1980 1614 71.8 1994 1036 72.1 1982 1418 72.3 1996 398 59.8 1984 2257 72.1 1998 1281 63.8 View Large Year N RR1 (%) Year N RR1 (%) 1972 2705 75.0 1986 2176 67.7 1974 1575 70.0 1988 2040 70.5 1976 2248 70.4 1990 1980 70.6 1978 2304 68.9 1992 1126 74.0 1980 1614 71.8 1994 1036 72.1 1982 1418 72.3 1996 398 59.8 1984 2257 72.1 1998 1281 63.8 Year N RR1 (%) Year N RR1 (%) 1972 2705 75.0 1986 2176 67.7 1974 1575 70.0 1988 2040 70.5 1976 2248 70.4 1990 1980 70.6 1978 2304 68.9 1992 1126 74.0 1980 1614 71.8 1994 1036 72.1 1982 1418 72.3 1996 398 59.8 1984 2257 72.1 1998 1281 63.8 View Large Question Wording DEPENDENT VARIABLES How much of the time do you think you can trust the government in Washington to do what is right—just about always, most of the time, only some of the time? (None of the time/never; Some of the time; Most of the time; Just about always) Over the years, how much attention do you feel the government pays to what the people think when it decides what to do—a good deal, some, or not much? (Not much; Some; A good deal) How much do you feel that having elections makes the government pay attention to what the people think—a good deal, some, or not much? (Not much; Some; A good deal) How much attention do you think most Congressmen pay to the people who elect them when they decide what to do in Congress—a good deal, some, or not much? (Not much; Some; A good deal) Would you say the government is pretty much run by a few big interests looking out for themselves or that it is run for the benefit of all the people? (Few big interests; Benefit of all) INDEPENDENT VARIABLES Female—coded from respondent’s sex (male = 0; female = 1) Black—coded from respondent’s race (not black = 0; black = 1) Education—coded from respondent’s educational attainment (no high school degree = 1; high school degree, no college = 2; some college = 3; four-year college degree or more = 4) Age—coded from respondent’s reported birth year Footnotes 1. The historical legacies of slavery have also shaped white political attitudes among whites (Acharya, Blackwell, and Sen 2016). 2. Citizens may also use motivated reasoning and related processes to evaluate government and political figures. For instance, citizens who support their political officials may view those officials as responsive to their political interests and values (e.g., Lenz 2012). Our analysis cannot rule out this possibility. However, it is relatively uncontroversial to say that in the context of our analysis of the VRA, attitudes toward government among historically marginalized groups had been structured by generations of institutional and political inequalities rather than by those groups’ decisions to oppose government and hold more negative attitudes as a result. 3. Subsequent revisions to the VRA expanded these dates to include the 1968 and 1972 presidential elections. 4. The remainder of the paper focuses mostly on the effect of the VRA on Blacks, although our argument also applies to other historically marginalized groups whose political representation may have also been affected by the VRA. 5. This is not to say that descriptive representation is not an important political goal. As Tate (2003) shows, Blacks prefer increased minority representation to color-blind congressional districting. 6. For instance, see Morrison’s (1987) account of an aldermanic primary election in Mississippi in 1969. 7. Replicating our analyses using congressional districts rather than counties as the relevant geographic units for assigning preclearance coverage generates identical patterns of findings to those reported here. 8. As one might expect, the vast majority (90 percent) of Black respondents in our sample from states subject to preclearance lived in Southern states. The remaining 10 percent were predominantly from New York. Sample sizes of Black respondents in states with preclearance requirements are as follows: Alabama (189), Arizona (8), Connecticut (3 in preclearance areas; 33 not in preclearance areas), Georgia (340), Louisiana (84), Mississippi (14), North Carolina (27 in preclearance areas; 98 not in preclearance areas), New York (82 in preclearance areas; 128 not in preclearance areas), South Carolina (15), Texas (152 under preclearance; 16 not subject to preclearance), and Virginia (60). 9. We relied primarily on Hancock and Tredway (1985) to identify covered jurisdictions, supplemented with information available on the Department of Justice website; http://www.justice.gov/crt/about/vot/misc/sec_4.php and http://www.justice.gov/crt/about/vot/sec_5/covered.php (accessed July 9, 2014). 10. For instance, the states of Alaska, Arizona, and Texas were subject to preclearance beginning in 1975 due to the bail-in provision. 11. Between 1984 and 1998, dozens of jurisdictions successfully bailed out of preclearance coverage. For a list, see https://www.justice.gov/crt/section-4-voting-rights-act (accessed April 19, 2016). 12. Our substantive findings are robust to estimating ordered logistic regression and multinomial logistic regression using the original response options. For ease of interpretation, however, we present results using logistic regression. 13. We also include its quadratic to account for the possibility that assignment to preclearance decreased for some values of Democratic vote share before increasing once the values of this variable pass a certain threshold. This could indicate, for instance, that while preclearance was generally less likely to be assigned to increasingly Democratic constituencies, this relationship could reverse among counties with extremely high levels of support for Democrats—such as those in the South. 14. These data come from ICPSR study #8611, “Electoral Data for Counties in the United States: Presidential and Congressional Races, 1840–1972.” 15. These data come from ICPSR study #2896, “Historical, Demographic, Economic, and Social Data: The United States, 1790–2002.” 16. These states include AL, AR, DE, DC, FL, GA, KY, LA, MD, MS, NC, OK, SC, TN, TX, VA, and WV. 17. We measured perceptions of racial discrimination with responses to the question “Generations of slavery and discrimination have created conditions that make it difficult for blacks to work their way out of the lower class.” Both this question and our measure of trust were asked of respondents in 1988, 1990, 1992, and 1994. 18. These tables are shown in tables A.1 through A.5 in the Supplementary Materials. The interaction term in the matching analysis is positive and relatively large in magnitude but falls short of statistical significance, possibly due to the substantial decrease in sample size and statistical power. 19. While not all of these questions explicitly reference the federal government, respondents were likely primed to answer these questions with the federal government in mind because they were administered after respondents received the following prompt: “People have different ideas about the government in Washington. These ideas don’t refer to Democrats or Republicans in particular, but just to the government in general.” 20. Similar results were obtained when coding both “a good deal” and “some” as 1, and “not much” as zero, and when estimating linear, multinominal, and ordinal models. For consistency with the other analyses, the binary dependent variable was used. References Aberbach , Joel D. , and Jack L. Walker . 1970 . “ Political Trust and Racial Ideology .” American Political Science Review 64 : 1199 –1 219 . Google Scholar CrossRef Search ADS Abramson , Paul R . 1983 . Political Attitudes in America: Formation and Change . San Francisco : W. H. Freeman and Company . Acharya , Avidit , Matthew Blackwell , and Maya Sen . 2016 . “ The Political Legacy of American Slavery .” Journal of Politics 78 : 621 – 41 . Google Scholar CrossRef Search ADS American Association for Public Opinion Research . 2016 . Standard Definitions: Final Dispositions of Case Codes and Outcome Rates for Surveys , 9th ed . Ansolabehere , Stephen , and Nathaniel Persily . 2008 . “ Vote Fraud in the Eye of the Beholder: The Role of Public Opinion in the Challenge to Voter Identification Requirements .” Harvard Law Review 121 : 1737 – 74 . Banducci , Susan A. , Todd Donovan , and Jeffrey A. Karp . 1999 . “ Proportional Representation and Attitudes about Politics: Results from New Zealand .” Electoral Studies 18 : 533 – 55 . Google Scholar CrossRef Search ADS Campbell , Andrea L . 2003 . How Policies Make Citizens: Senior Political Activism and the American Welfare State . Princeton, NJ : Princeton University Press . Google Scholar CrossRef Search ADS Citrin , Jack , Herbert McClosky , J. Merrill Shanks , and Paul M. Sniderman . 1975 . “ Personal and Political Sources of Political Alienation .” British Journal of Political Science 5 : 1 – 31 . Google Scholar CrossRef Search ADS Davidson , Chandler . 1992 . “ The Voting Rights Act: A Brief History .” In Controversies in Minority Voting: The Voting Rights Act in Perspective , edited by B. Grofman and C. Davidson , pp. 7 – 34 . Washington, DC : Brookings Institution Press . Dawson , Michael C . 1994 . Behind the Mule: Race and Class in African-American Politics . Princeton, NJ : Princeton University Press . Diamond , Alexis , and Jasjeet S. Sekhon . 2013 . “ Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies .” Review of Economics and Statistics 95 : 932 – 45 . Google Scholar CrossRef Search ADS Easton , David . 1975 . “ A Re-Assessment of the Concept of Political Support .” British Journal of Political Science 5 : 435 – 57 . Google Scholar CrossRef Search ADS Erikson , Robert S. , and Laura Stoker . 2011 . “ Caught in the Draft: The Effects of Vietnam Draft Lottery Status on Political Attitudes .” American Political Science Review 105 : 221 – 37 . Google Scholar CrossRef Search ADS Feldman , Stanley . 1988 . “ Structure and Consistency in Public Opinion: The Role of Core Beliefs and Values .” American Journal of Political Science 32 : 416 – 40 . Google Scholar CrossRef Search ADS Fraga , Bernard L . 2016 . “ Redistricting and the Causal Impact of Race on Voter Turnout .” Journal of Politics 78 : 19 – 34 . Google Scholar CrossRef Search ADS Garrow , David J . 1978 . Protest at Selma: Martin Luther King, Jr., and the Voting Rights Act of 1965 . New Haven, CT : Yale University Press . Gay , Claudine . 2002 . “ Spirals of Trust? The Effect of Descriptive Representation on the Relationship between Citizens and Their Government .” American Journal of Political Science 46 : 717 – 32 . Google Scholar CrossRef Search ADS Grofman , Bernard , and Lisa Handley . 1991 . “ The Impact of the Voting Rights Act on Black Representation in Southern State Legislatures .” Legislative Studies Quarterly 16 : 111 – 28 . Google Scholar CrossRef Search ADS Hancock , Paul F. , and Lora L. Tredway . 1985 . “ The Bailout Standard of the Voting Rights Act: An Incentive to End Discrimination .” Urban Lawyer 17 : 379 – 425 . Hetherington , Marc J . 1998 . “ The Political Relevance of Political Trust .” American Political Science Review 92 : 791 – 808 . Google Scholar CrossRef Search ADS Issacharoff , Samuel . 2013 . “ Beyond the Discrimination Model on Voting .” Harvard Law Review 127 : 95 – 126 . Jones-Correa , Michael . 2005 . “ Language Provisions Under the Voting Rights Act: How Effective Are They ?” Social Science Quarterly 85 : 549 – 64 . Google Scholar CrossRef Search ADS Lenz , Gabriel S . 2012 . Follow the Leader? Chicago : University of Chicago Press . Google Scholar CrossRef Search ADS Levi , Margaret . 1998 . “ A State of Trust .” In Trust and Governance , edited by V. Braithwaite and M. Levi , pp. 77 – 101 . New York : Russell Sage Foundation . Lien , Pei-te , Dianne M. Pinderhughes , Carol Hardy-Fanta , and Christine M. Sierra . 2007 . “ The Voting Rights Act and the Election of Nonwhite Officials .” PS: Political Science & Politics 40 : 489 – 94 . Google Scholar CrossRef Search ADS Lublin , David . 1997 . The Paradox of Representation: Racial Gerrymandering and Minority Influence in Congress . Princeton, NJ : Princeton University Press . MacCoon , John P . 1979 . “ The Enforcement of the Preclearance Requirement of Section 5 of the Voting Rights Act of 1965 .” Catholic University Law Review 29 : 107 – 28 . Marschall , Melissa , and Paru R. Shah . 2007 . “ The Attitudinal Effects of Minority Incorporation: Examining the Racial Dimensions of Trust in Urban America .” Urban Affairs Review 42 : 629 – 58 . Google Scholar CrossRef Search ADS McCann , James A. , and Jorge I. Dominguez . 1998 . “ Mexicans React to Electoral Fraud and Political Corruption: An Assessment of Public Opinion and Voting Behavior .” Electoral Studies 17 : 483 – 503 . Google Scholar CrossRef Search ADS Miller , Arthur H. , and Ola Listhaug . 1990 . “ Political Parties and Confidence in Government: A Comparison of Norway, Sweden and the United States .” British Journal of Political Science 20 : 357 – 89 . Google Scholar CrossRef Search ADS Mishler , William , and Richard Rose . 1997 . “ Trust, Distrust and Skepticism: Popular Evaluations of Civil and Political Institutions in Post-Communist Societies .” Journal of Politics 59 : 418 – 51 . Google Scholar CrossRef Search ADS Mishler , William , and Richard Rose . 2001 . “ What Are the Origins of Political Trust? Testing Institutional and Cultural Theories in Post-Communist Societies .” Comparative Political Studies 34 : 30 – 62 . Google Scholar CrossRef Search ADS Morrison , Minion K. C . 1987 . Black Political Mobilization, Leadership, Power and Mass Behavior . Albany : State University of New York Press . Motomura , Hiroshi . 1983 . “ Preclearance Under Section 5 of the Voting Rights Act .” North Carolina Law Review 61 : 189 – 246 . Rahn , Wendy M. , and Thomas J. Rudolph . 2005 . “ A Tale of Political Trust in American Cities .” Public Opinion Quarterly 69 : 530 – 60 . Google Scholar CrossRef Search ADS Scherer , Nancy , and Brett Curry . 2010 . “ Does Descriptive Race Representation Enhance Institutional Legitimacy? The Case of the U.S. Courts .” Journal of Politics 72 : 90 – 104 . Google Scholar CrossRef Search ADS Schuit , Sophie , and Jon C. Rogowski . 2017 . “ Race, Representation, and the Voting Rights Act .” American Journal of Political Science 61 : 513 – 26 . Google Scholar CrossRef Search ADS Soss , Joe . 1999 . “ Lessons of Welfare: Policy Design, Political Learning, and Political Action .” American Political Science Review 93 : 363 – 80 . Google Scholar CrossRef Search ADS Stokes , Donald E . 1962 . “ Popular Evaluations of Government: An Empirical Assessment .” In Ethics and Bigness: Scientific, Academic, Religious, Political, and Military , edited by H. Cleveland and H. D. Lasswell , pp. 61 – 72 . New York : Harper and Brothers . Tate , Katherine . 1993 . From Protest to Politics: The New Black Voters in American Elections . Cambridge, MA : Harvard University Press . Tate , Katherine . 2003 . Black Faces in the Mirror: African Americans and Their Representatives in the U.S. Congress . Princeton, NJ : Princeton University Press . The American National Election Studies . 2010 . Time Series Cumulative Data File. Stanford University and the University of Michigan . www.electionstudies.org. Tyler , Tom R . 1990 . Why People Obey the Law: Procedural Justice, Legitimacy, and Compliance . New Haven, CT : Yale University Press . Tyler , Tom R. , Jonathan D. Casper , and Bonnie Fisher . 1989 . “ Maintaining Allegiance toward Political Authorities: The Role of Prior Attitudes and the Use of Fair Procedures .” American Journal of Political Science 33 : 629 – 52 . Google Scholar CrossRef Search ADS Walton , Hanes Jr . 1985 . Invisible Politics: Black Political Behavior . Albany : State University of New York Press . Weaver , Vesla , and Amy Lerman . 2010 . “ Political Consequences of the Carceral State .” American Political Science Review 104 : 817 – 33 . Google Scholar CrossRef Search ADS Whitby , Kenny J . 2000 . The Color of Representation: Congressional Behavior and Black Interests . Ann Arbor : University of Michigan Press . Whitby , Kenny J. , and Franklin D. Gilliam . 1991 . “ A Longitudinal Analysis of Competing Explanations for the Transformation of Southern Congressional Politics .” Journal of Politics 53 : 504 – 18 . Google Scholar CrossRef Search ADS © The Author(s) 2018. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. All rights reserved. For permissions, please e-mail: [email protected] This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)
Manuscript Referees, 20172018 Public Opinion Quarterly
doi: 10.1093/poq/nfy021
The editors wish to express their appreciation to the following reviewers, who have generously contributed their time and skills to Public Opinion Quarterly by refereeing manuscripts from January 1, 2017 through December 31, 2017. Alan Abramowitz Hovannes Abramyan Eulàlia Puig Abril Feray Adigüzel Douglas Ahler John Ahlquist Bethany Albertson Scott Althaus Sowmya Anand Ioannis Andreadis Christopher Antoun Sarah Anzia Kevin Arceneaux Allison Archer Sveinung Arnesen Lonna Atkeson Kay Axhausen Anna Bagnoli Bart N. Bakker Maria Teresa Balaguer-Coll Alexa Bankert Michael Barber David Barker John Bartle Nichole Bauer Paul C. Bauer Frank Baumgartner Jody Baumgartner René Bautista Paul Beatty Emily Beaulieu Arthur Beckman Robert F. Belli Lindsay J. Benstead Matt Berent Rosa Berganza Matthew Bergbower Adam J. Berinsky Michael B. Berkman David Berrigan Jelke Bethlehem Shaun Bevan Matthew Blackwell Edward Blair André Blais Scott Blinder Stephen Blumberg Tamas Bodor Frederick Boehmke Toby Bolsen Brittany Bond Cheryl Boudreau Florence Bouvet John Boyle Philip S. Brenner Paul R. Brewer J. Michael Brick David Broockman Charles Bullock John Bullock Barry C. Burden Camille D. Burge Diana Burlacu Craig M. Burnett Mario Callegaro Andrea Campbell W. Keith Campbell Marta Cantijoch Cunill Christine Carabain Ryan Carlin Barbara Lepidus Carlson Christopher Carman Jamie Carson Philip G. Chen Young Cho Erin Cikanek Christopher Claassen Ryan L. Claassen Scott Clifford Kevin Coe Jonathan Collins Luke Condra Frederick Conrad Jeff Conroy-Kutz Alex Coppock Mick P. Couper Stephen C. Craig Mathew Creighton Richard Curtin Marcel Das Lauren Davenport Darren Davis Christopher Dawes Anna DeCastellarnau Maggie Deichert Julie de Jong Michael X. Delli Carpini Scott de Marchi Ken Demarree Louis DeSipio Kristof Dhont Joseph DiGrazia Don A. Dillman Elias Dinas Peter Thisted Dinesen Tessa Ditonto David Doherty Kathleen Dolan Salima Douhou Conor Dowling Joerg Dreschler Dominik Duell Bernard Dugoni Delia Dumitrescu Johanna L. Dunaway Gabriele B. Durrant Shira Dvir-Gvirsman Joshua Dyck Jennifer Dykema Jennifer Edgar Patrick Egan Laurel Elder Christopher R. Ellis Sherry L. Emery Peter Enns Ryan Enos Derek Epp Robert S. Erikson Cengiz Erisen Lawrence Ezrow Mansour Fahimi C. Christine Fair Christopher M. Federico Lauren Feldman Stanley Feldman Alexandra Filindra Henning Finseraas Quiroz Flores Brian J. Fogarty Angela Fontes Floyd J. Fowler Jr. Anthony Fowler Annie Franco Michael Franz Kim Fridkin Marek Fuchs Dana Garbarski Francisco Garfias R. Kelly Garrett Joyce Gelb Andrew Gelman Benny Geys Elisabeth Gidengil Martin Gilens Steven Gittelman Sharad Goel Guy Golan Andreas Goldberg Seth Goldman Pablo Gracia Andreas Graefe Melanie Green Eric Greenleaf Beth Ann Griffin Eric Groenendyk Jacob Groshek Kimberly Gross Lauren Guggenheim Alexandra Guisinger Jonathan Haidt Michael Hanmer Laurel Harbridge Bruce Hardy Lauren Harris-Kojetin Todd Hartman Danny Hayes Thomas Hayes John Henderson Michael Henderson PJ Henry Daniel Herda Paul Herrnson Michael Herron Eitan Hersh Marc Hetherington Matthew V. Hibbing Benjamin Highton Seth Hill D. Sunshine Hillygus Jay D. Hmielowski Jennifer Hochschild Allyson L. Holbrook Thomas Holbrook Barry Hollander Gregory Holyk Marc Hooghe Daniel J. Hopkins Veronica Hoyo Greg Huber Connor Huff Ruth Igielnik Indridi Indridason Libby Jenke Will Jennings Jennifer Jerit Julia Jerke Stephen Jessee Martin Johnson Timothy Johnson Christopher Johnston Richard Johnston Kerem Kalkan Nathan Kalmoe Chester Kam Cindy D. Kam Arie Kapteyn Jeffrey Karp Chris Karpowitz Scott Keeter Paul Kellstedt Courtney Kennedy Kate Kenski Joshua Kertzer Florian Keusch Kabir Khanna Seihill Kim Ashley Kirzinger Samara Klar Jonathan Klingler Jeffrey W. Koch Thomas Koch Gregory Koger Tobias Konitzer Spyros Kosmidis Phillip Kott Sarah Kreps Dorothy Kronick Jonathan Kropko Yanna Krupnikov Patrick M. Kuhn Simon Kühne Richard A. Kulka Christopher Larimer Lasse Laustsen Howard Lavine Paul J. Lavrakas Jennifer Lawless Eric Lawrence Geoffrey Layman Sophie Lecheler Thomas J. Leeper Yphtach Lelkes Gabriel Lenz Brad Leveck Matthew Levendusky Adam S. Levine Peter Liberman Ulf Liebe Scott Lilienfeld Michael W. Link Phillip Lipscy Glory Liu Mingnan Liu Geert Loosveldt Mary Losch Peter Lugtig Arthur Lupia Noam Lupu Jeffrey Lyons Scott MacKenzie Kristen Malecki Ariel Malka Michele Margolis Thomas Marshall Paul Marx Seth E. Masket Lilliana Hall Mason Winter Mason Aigul Mavletova Katherine McCabe Colleen McClain Max McCombs Corrine McConnaughy Christopher McConnell Rose McDermott Seth C. McKee Patrick Meirick Marc Meredith Daniel M. Merkle Matto Mildenberger Patrick R. Miller Peter V. Miller Lorraine Minnite Zenia N. Mneimneh Cecilia Hyunjung Mo Jeffery Mondak Robert Montgomery Matt Motyl John Mueller Kevin Mullinix Jonathan Mummolo Kevin M. Munger Simon Munzert Joe Murphy Teresa Myers Alessandro Nai Rico Neuman W. Russell Neuman Anja Neundorf Benjamin J. Newman Brian Newman Simeon Nichter David Nickerson Jeff Niederdeppe Lilach Nir Eric Oliver Kristen Olson Heather Ondercin Jean Opsomer Julianna Pacheco Carl L. Palmer Costas Panagopoulos Philip Paolino Chris Parker Jennifer Parsons Joanne Pascale Josh Pasek Teun Pauwels Shanna Pearson-Merkowitz Rasmus Pedersen Steven Pedlow Mark Peffley Mikael Persson Erik Peterson Andraz Petrovcic Gregory Petrow Andy Peytchev Spencer Piston Lana Rakow Carolin Rapp Tim Reeskens Andrew Reeves Sharon Reif Jason Reifler Becky Reimer Steven Rigdon Adrian L. Rinscheid Louis Rizzo Joshua Robison Steven Rogers Margaret Roller Matthijs Rooduijn Femke Roosma Stella Rouse Thomas Rudolph Janet Ruscher John Ryan Timothy J. Ryan Kira Sanbonmatsu Michael W. Sances Elizabeth Saunders Kyle L. Saunders Ariela Schachter Nora Cate Schaeffer Brian Schaffner Michael Schober Barry Schouten Beth E. Schueler Norbert Schwarz Kathleen Searles Jacob Shapiro Robert Y. Shapiro Tamir Sheafer Geoffrey Sheagley Fei Shen John Sides Elizabeth Simas Peter M. Siminski Gabor Simonovits Betsy Sinclair Shane P. Singh Benjamin Skalland Linda J. Skitka Rune Slothuus Corwin D. Smidt Candis Watts Smith Eric R.A.N. Smith Glen Smith Gaurav Sood Kellie Stanfield Mathew Stange Jeff Stec LaFleur Stephens-Dougan Michael Stern Laura Stoker Dietlind Stolle Ineke Stoop Natalie Jomini Stroud Bella Struminskaya Patrick Sturgis Elizabeth Suhay Alexander Tahk Christopher Tausanovitch Margit Tavits Michael Tesler David Tewksbury Adam Thal Randall K. Thomas Danielle Thomsen Lotte Thomsen Judd R. Thornton Michael Ting Roger Tourangeau Michael Traugott Kris-Stella Trump Marc Trussler Chi-Lin Tsai Yariv Tsfati Patrick Tucker Rollin Tusalem Steve Vaisey Nicholas Valentino Steven Van Hauwaert Hester van Herk Stijn van Kessel David L. Vannette Timothy Vercellotti Nick Vivyan Michael Wagner Israel S. Waismel-Manor Stefaan Walgrave Wei Wang Christopher Warshaw Christopher L. Weaver Christopher Weber Justin Wedeking Rebecca Weitz-Shapiro Brady West Sean Westwood Brian D. Williams Christopher J. Williams Laron Williams Meredith Williams David C. Wilson Christopher Wlezien Jennifer Wolak Christina Wolbrecht Michael R. Wolf Felix Wolter Cara Wong Nicole Yadon JungHwan Yang David S. Yeager Elizabeth J. Zechmeister Chan Zhang Lawrence J. Zigerell Jr. © The Author(s) 2018. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. All rights reserved. For permissions, please e-mail: [email protected] This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)
The Impact of Greeting Personalization on Prevalence Estimates in a Survey of Sexual Assault Victimization2018 Public Opinion Quarterly
doi: 10.1093/poq/nfy019
Abstract Although personalized invitations tend to increase response rates in web surveys, little is known about how personalization impacts data quality. To evaluate the impact of personalization on survey estimates of sensitive items, the effects of personalized and generic greetings in a survey (n = 9,673) on an extremely sensitive topic—sexual assault victimization—were experimentally compared. Personalization was found to have increased response rates with negligible impact on victimization reporting, and this impact was similar across most demographic groups. The findings suggest that future studies may benefit from the use of a personalized greeting when recruiting sample members to participate in a sensitive survey, but that further research is necessary to better understand how the impact of personalization on reporting may differ across some demographic groups. Background A common technique to increase survey response rates is to tailor messages or contact materials to individual sample members. This personalization can be done in any mode of contact using information known about sample members, such as their demographic characteristics or interests. In general, experimental (e.g., Muñoz-Leiva et al. 2010; Sauermann and Roach 2013) and meta-analysis research (e.g., Cook, Heath, and Thompson 2000; Edwards et al. 2009) suggest that personalization increases web-survey response rates (see Pearson and Levine [2003]; Porter and Whitcomb [2003]; Joinson, Woodley, and Reips [2007] for exceptions). For instance, a meta-analysis conducted by Edwards et al. (2009) indicates that the odds of response increase by about 25 percent with personalized contact materials. Consistently, numerous studies have found that name personalization in invitation emails increases response rates in web surveys of college students (e.g., Heerwegh et al. 2005; Heerwegh 2005; Heerwegh and Loosveldt 2006; Joinson and Reips 2007). However, little is known about how personalization impacts responses to questions on sensitive topics. Sensitive topics tend to (1) be intrusive and inappropriate in everyday conversation; (2) elicit responses that may be considered socially unacceptable; and (3) raise concerns among respondents about the consequences of answering truthfully, due to the potential threat of disclosure (Tourangeau, Rips, and Rasinski 2000). Self-reports on sensitive topics—such as sexual behavior or socially undesirable behaviors such as intoxication—are often biased toward underreporting, particularly if respondents feel uncomfortable discussing them with others (Bradburn et al. 1978). Consequently, questions on sensitive topics can lead to unit nonresponse (refusing to participate in the survey at all) or item nonresponse (answering only certain questions). Even if sample members do respond, response quality suffers as topics become more sensitive, due to underreporting of socially undesirable behaviors and overreporting of desirable behaviors (Tourangeau and Yan 2007). Response quality may suffer even further when sensitive topics are paired with personalization, if respondents worry about their identity and responses being linked. These confidentiality concerns were evident in Heerwegh’s (2005) web survey of Belgian college students about attitudes toward marriage/divorce, which included items about sexual attitudes/behavior. Respondents who received a personalized greeting, “Dear [First Name] [Last Name],” in their email invitation were significantly less likely to feel comfortable responding to the questions honestly and sincerely, compared to students who received a generic greeting, “Dear Student.” Furthermore, of the 13 respondents who mentioned confidentiality concerns in an open-ended debriefing question, all but one had received the personalized greeting, suggesting unintended consequences of personalization. Aside from Heerwegh’s (2005) findings, there does not appear to be additional research on the impact of personalization on respondents’ perceptions of privacy or the impact of those perceptions on their survey responses, particularly on sensitive topics. Heerwegh and Loosveldt (2006) concluded that additional research is necessary to investigate personalization’s impact on responses to sensitive questions, such as sexual behavior. This is especially true today, as a decade has passed since much of the personalization research was done (e.g., Heerwegh and Loosveldt 2006), concerns about online privacy likely have changed as the internet has evolved, and reactions to personalized greetings also may have evolved. Furthermore, the impact of personalization on reporting sensitive information likely varies across surveys—depending on survey characteristics like topic, confidentiality assurances, sample, and incentives—so further research is beneficial. Consequently, a personalization experiment was conducted in a web survey of US college students about sexual assault victimization. Tourangeau, Rips, and Rasinski (2000) suggest that sexual behavior is an intimate—and possibly the most intimate—topic. However, sexual assault victimization is probably even more so, because of the difficulty of the experience and the negative emotions victims often experience, such as shame, guilt, regret, and blame. Rather than focusing on response rates, this study investigates the effects on self-reported victimization, hypothesizing that compared with the generic greeting, the personalized greeting results in lower rates of self-reported sexual assault victimization. The study also explores, as a secondary research question, how the greeting impacts victimization rates across varied student characteristics. The impact of personalization on victimization reporting was anticipated to differ across demographic groups, especially for members of minority groups who may perceive a greater threat of disclosure. Methods The College Experiences Survey (CES), sponsored by the Bureau of Justice Statistics and conducted by RTI International, invited undergraduate students at nine universities across the United States to participate in a web survey. The aim of the CES was to measure sexual assault victimization and campus climate. Fielded from March through May 2015, the CES included a greeting experiment at five of the nine participating schools. For the experiment, sample members were randomly assigned to receive one of two greetings in their survey invitation and reminder emails: personalized (“Dear [First Name]”) or generic (“Dear [School Name] Student”). Four of the five schools in the greeting experiment were included for the analyses presented in this paper. One school was excluded because, as the only two-year school in the experiment, it would have skewed the results of the analysis due to different student populations for key characteristics such as age and year of study.1 The four schools included in the analysis were four-year schools in different regions of the United States with undergraduate populations of 2,500 to 10,000 students. Both public and private not-for-profit schools were included. Sample And Experimental Design Participating schools provided a roster of all undergraduate students at least 18 years old; these rosters were used as the sampling frame. The sample was a simple random sample, stratified by sex. Females (11,012) and males (8,808) were sampled across the four schools, for a total sample size of 19,820. Of these, 6,283 females and 3,390 males participated in the survey, for an unweighted response rate of 48.8 percent (AAPOR RR3; AAPOR 2015). Random assignment to experimental conditions was done separately for each sex to ensure that the proportion of males to females was equal across conditions. Communications with Sample Members Undergraduate students at the participating schools were notified about the survey in an email from their university president. The email encouraged students to participate in the upcoming survey, which it described as being “about the sexual experiences and attitudes of undergraduate students.” The email emphasized that the survey would be completely confidential, any answers provided would not be linked to the respondent’s identity, and that RTI, a nonprofit research organization, would conduct the survey. This email used the same greeting, “Dear [School Name] Students,” for all students, regardless of their experimental condition. Survey invitations were emailed to students approximately one week later. This email used the personalized or generic greeting to which each sample member was assigned. The message indicated that the survey was about sexual experiences and attitudes and that responses would be kept confidential. The email also mentioned the survey length (approximately 15 minutes) and incentive ($25 gift card after completing the survey). Throughout data collection, nonrespondents were sent up to five email reminders; each used the sample member’s assigned greeting. Analysis When calculating victimization rates, “victimization” was defined as unwanted sexual contact that occurred during the current academic year. The survey defined victimization for respondents as “sexual contact that you did not consent to and that you did not want to happen.” The survey included greater detail about this definition and examples of different types of sexual contact (Krebs et al. 2016). To verify the assumption that the personalized greeting would yield higher response rates, response rates were computed by greeting type and the student characteristics on the sample frame:2 sex, year of study, race/ethnicity, location of residence, transfer status, full- or part-time status, grade point average (GPA), and SAT/ACT score. Table 1 presents the response rates and shows that, as expected, the personalized greeting yielded a higher response rate across nearly all student characteristics; the difference was statistically significant for about half of the characteristics. Table 1. Response rates by greeting and student characteristics Response rate (%) Generic greeting Personalized greeting All students 47.1 50.6** School A 39.2 40.1 B 63.2 68.9** C 57.1 62.5** D 42.1 45.8** Sex Male 36.7 40.4** Female 55.4 58.7** Year of studya First 53.9 57.1 Second 48.3 53.0** Third 48.4 48.6 Fourth or more 49.0 54.1** Age 18 56.0 57.6 19 50.9 55.1** 20 48.6 50.8 21 44.9 49.2* 22 or older 40.8 44.7** Part-time/full-time status Part-time 29.7 33.2 Full-time 48.7 52.3** Race/Ethnicitya White, non-Hispanic 51.4 54.9** Black, non-Hispanic 45.6 44.7 Hispanic 54.0 51.7 Asian 44.3 48.0 Otherb 48.0 62.0** Living status On campus 51.0 54.3** Off campus 42.8 46.4** Transfer student Yes 48.1 54.8** No 50.0 52.8** GPA 0.0–1.0 30.6 43.5 1.1–2.0 41.6 51.2 2.1–3.0 49.0 51.9 3.1–4.0 50.7 53.5** SAT/ACT Score < 1200 / < 16 52.6 57.8 1200–1400 / 17–19 45.0 49.5 1401–1570 / 20–22 46.6 49.0 1571–1780 / 23–25 58.8 62.1 > 1780 / > 25 50.0 51.0 Response rate (%) Generic greeting Personalized greeting All students 47.1 50.6** School A 39.2 40.1 B 63.2 68.9** C 57.1 62.5** D 42.1 45.8** Sex Male 36.7 40.4** Female 55.4 58.7** Year of studya First 53.9 57.1 Second 48.3 53.0** Third 48.4 48.6 Fourth or more 49.0 54.1** Age 18 56.0 57.6 19 50.9 55.1** 20 48.6 50.8 21 44.9 49.2* 22 or older 40.8 44.7** Part-time/full-time status Part-time 29.7 33.2 Full-time 48.7 52.3** Race/Ethnicitya White, non-Hispanic 51.4 54.9** Black, non-Hispanic 45.6 44.7 Hispanic 54.0 51.7 Asian 44.3 48.0 Otherb 48.0 62.0** Living status On campus 51.0 54.3** Off campus 42.8 46.4** Transfer student Yes 48.1 54.8** No 50.0 52.8** GPA 0.0–1.0 30.6 43.5 1.1–2.0 41.6 51.2 2.1–3.0 49.0 51.9 3.1–4.0 50.7 53.5** SAT/ACT Score < 1200 / < 16 52.6 57.8 1200–1400 / 17–19 45.0 49.5 1401–1570 / 20–22 46.6 49.0 1571–1780 / 23–25 58.8 62.1 > 1780 / > 25 50.0 51.0 *p < .05, **p < .01. aExcludes School D because it did not provide information on frame. bIncludes respondents who selected American Indian/Alaska Native, Other Pacific Islander, Other, or multiple races. View Large Table 1. Response rates by greeting and student characteristics Response rate (%) Generic greeting Personalized greeting All students 47.1 50.6** School A 39.2 40.1 B 63.2 68.9** C 57.1 62.5** D 42.1 45.8** Sex Male 36.7 40.4** Female 55.4 58.7** Year of studya First 53.9 57.1 Second 48.3 53.0** Third 48.4 48.6 Fourth or more 49.0 54.1** Age 18 56.0 57.6 19 50.9 55.1** 20 48.6 50.8 21 44.9 49.2* 22 or older 40.8 44.7** Part-time/full-time status Part-time 29.7 33.2 Full-time 48.7 52.3** Race/Ethnicitya White, non-Hispanic 51.4 54.9** Black, non-Hispanic 45.6 44.7 Hispanic 54.0 51.7 Asian 44.3 48.0 Otherb 48.0 62.0** Living status On campus 51.0 54.3** Off campus 42.8 46.4** Transfer student Yes 48.1 54.8** No 50.0 52.8** GPA 0.0–1.0 30.6 43.5 1.1–2.0 41.6 51.2 2.1–3.0 49.0 51.9 3.1–4.0 50.7 53.5** SAT/ACT Score < 1200 / < 16 52.6 57.8 1200–1400 / 17–19 45.0 49.5 1401–1570 / 20–22 46.6 49.0 1571–1780 / 23–25 58.8 62.1 > 1780 / > 25 50.0 51.0 Response rate (%) Generic greeting Personalized greeting All students 47.1 50.6** School A 39.2 40.1 B 63.2 68.9** C 57.1 62.5** D 42.1 45.8** Sex Male 36.7 40.4** Female 55.4 58.7** Year of studya First 53.9 57.1 Second 48.3 53.0** Third 48.4 48.6 Fourth or more 49.0 54.1** Age 18 56.0 57.6 19 50.9 55.1** 20 48.6 50.8 21 44.9 49.2* 22 or older 40.8 44.7** Part-time/full-time status Part-time 29.7 33.2 Full-time 48.7 52.3** Race/Ethnicitya White, non-Hispanic 51.4 54.9** Black, non-Hispanic 45.6 44.7 Hispanic 54.0 51.7 Asian 44.3 48.0 Otherb 48.0 62.0** Living status On campus 51.0 54.3** Off campus 42.8 46.4** Transfer student Yes 48.1 54.8** No 50.0 52.8** GPA 0.0–1.0 30.6 43.5 1.1–2.0 41.6 51.2 2.1–3.0 49.0 51.9 3.1–4.0 50.7 53.5** SAT/ACT Score < 1200 / < 16 52.6 57.8 1200–1400 / 17–19 45.0 49.5 1401–1570 / 20–22 46.6 49.0 1571–1780 / 23–25 58.8 62.1 > 1780 / > 25 50.0 51.0 *p < .05, **p < .01. aExcludes School D because it did not provide information on frame. bIncludes respondents who selected American Indian/Alaska Native, Other Pacific Islander, Other, or multiple races. View Large To test the hypothesis, that relative to generic greetings, personalized greetings would result in lower rates of self-reported victimization, bivariate and multivariate analyses were conducted. For the bivariate analysis, victimization rates were compared by greeting type and student characteristics, which were self-reported3 and included gender identity, race/ethnicity, year of study, age, and sexual orientation. Two logistic models were fit for the multivariate analysis. The first logistic model was a main-effects model, with reported victimization status as the dependent variable and the student characteristics and greeting type as the independent variables.4 This model tested whether the type of greeting impacted reporting of victimization after controlling for student characteristics. The second logistic model added to the first model interactions between greeting type and student characteristics. This model tested whether the greeting type impacted how a particular type of student reported victimization status, controlling for other student characteristics. Models were run using SUDAAN Version 11 and accounted for the complex survey design and clustered nature of the data. The models were unweighted because the hypotheses were based on students’ responses rather than generalizations regarding sexual assault. To assess the fit of the models, a Hosmer-Lemeshow test of the goodness-of-fit was used. The p-values for both models were greater than 0.05, indicating good model fit (p = 0.341 for model 1, and p = 0.297 for model 2). Results Bivariate Analysis Table 2 presents the results of the bivariate analysis. As shown, the personalized greeting produced a modestly lower victimization rate (9.8 percent) compared to the generic greeting (10.6 percent); this difference was statistically significant. Further, the personalized greeting produced significantly lower victimization rates compared with the generic greeting for students at School B (11.4 percent vs. 13.2 percent), first-year students (9.7 percent vs. 12.4 percent), 18-year-olds (10.9 percent vs. 13.6 percent), and non-Hispanic whites (8.8 percent vs. 9.9 percent). Table 2. Victimization rates by greeting and student characteristics Prevalence of sexual assault (%) Generic greeting Personalized greeting All students 10.6 9.8* School A 15.4 13.6 B 13.2 11.4* C 5.0 5.9 D 9.5 8.9 Gender identity Male 3.6 3.6 Female 14.2 13.1 Transgender/Other 13.3 20.0 Year of study First 12.4 9.7** Second 11.2 11.4 Third 9.6 9.9 Fourth or more 9.2 8.4 Age 18 13.6 10.9* 19 12.7 11.5 20 10.1 11.2 21 11.6 10.2 22 or older 7.0 6.4 Race/Ethnicity White, non-Hispanic 9.9 8.8* Black, non-Hispanic 12.2 14.6 Hispanic 11.7 12.0 Asian 7.7 8.3 Othera 13.2 13.0 Sexual orientation Heterosexual 9.7 9.0 Nonheterosexual 15.9 14.8 Prevalence of sexual assault (%) Generic greeting Personalized greeting All students 10.6 9.8* School A 15.4 13.6 B 13.2 11.4* C 5.0 5.9 D 9.5 8.9 Gender identity Male 3.6 3.6 Female 14.2 13.1 Transgender/Other 13.3 20.0 Year of study First 12.4 9.7** Second 11.2 11.4 Third 9.6 9.9 Fourth or more 9.2 8.4 Age 18 13.6 10.9* 19 12.7 11.5 20 10.1 11.2 21 11.6 10.2 22 or older 7.0 6.4 Race/Ethnicity White, non-Hispanic 9.9 8.8* Black, non-Hispanic 12.2 14.6 Hispanic 11.7 12.0 Asian 7.7 8.3 Othera 13.2 13.0 Sexual orientation Heterosexual 9.7 9.0 Nonheterosexual 15.9 14.8 *p < .05, **p < .01. aIncludes respondents who selected American Indian/Alaska Native, Other Pacific Islander, Other, or multiple races. View Large Table 2. Victimization rates by greeting and student characteristics Prevalence of sexual assault (%) Generic greeting Personalized greeting All students 10.6 9.8* School A 15.4 13.6 B 13.2 11.4* C 5.0 5.9 D 9.5 8.9 Gender identity Male 3.6 3.6 Female 14.2 13.1 Transgender/Other 13.3 20.0 Year of study First 12.4 9.7** Second 11.2 11.4 Third 9.6 9.9 Fourth or more 9.2 8.4 Age 18 13.6 10.9* 19 12.7 11.5 20 10.1 11.2 21 11.6 10.2 22 or older 7.0 6.4 Race/Ethnicity White, non-Hispanic 9.9 8.8* Black, non-Hispanic 12.2 14.6 Hispanic 11.7 12.0 Asian 7.7 8.3 Othera 13.2 13.0 Sexual orientation Heterosexual 9.7 9.0 Nonheterosexual 15.9 14.8 Prevalence of sexual assault (%) Generic greeting Personalized greeting All students 10.6 9.8* School A 15.4 13.6 B 13.2 11.4* C 5.0 5.9 D 9.5 8.9 Gender identity Male 3.6 3.6 Female 14.2 13.1 Transgender/Other 13.3 20.0 Year of study First 12.4 9.7** Second 11.2 11.4 Third 9.6 9.9 Fourth or more 9.2 8.4 Age 18 13.6 10.9* 19 12.7 11.5 20 10.1 11.2 21 11.6 10.2 22 or older 7.0 6.4 Race/Ethnicity White, non-Hispanic 9.9 8.8* Black, non-Hispanic 12.2 14.6 Hispanic 11.7 12.0 Asian 7.7 8.3 Othera 13.2 13.0 Sexual orientation Heterosexual 9.7 9.0 Nonheterosexual 15.9 14.8 *p < .05, **p < .01. aIncludes respondents who selected American Indian/Alaska Native, Other Pacific Islander, Other, or multiple races. View Large Multivariate Analysis To confirm the bivariate results for the hypothesis, a main-effects model was fit. Table 3 presents the predicted probabilities for the greeting type, and the student characteristics and the p-value for the adjusted Wald-F statistic for each characteristic. The model results indicate that, once controlling for student characteristics, the greeting type did not significantly influence how students reported sexual assault (p = 0.278). Table 3. Predicted probabilities for reporting victimization: main-effects model Student characteristics Predicted probability of sexual assault Adjusted Wald-F p-value % (SE) Greeting type Generic 8.0 (0.3) 0.2777 Personalized 7.6 (0.3) School A 12.6 (0.5) < 0.0001 B 9.7 (0.4) C 4.2 (0.2) D 7.4 (0.4) Gender identity Male 3.1 (0.2) < 0.0001 Female 12.4 (0.3) Transgender/Other 11.6 (3.5) Year of study First 8.2 (0.4) 0.0111 Second 8.5 (0.4) Third 7.3 (0.4) Fourth or more 7.1 (0.4) Race/Ethnicity White, non-Hispanic 8.0 (0.3) < 0.0001 Black, non-Hispanic 8.4 (0.9) Hispanic 8.7 (0.6) Asian 4.5 (0.4) Othera 9.5 (0.9) Sexual orientation Heterosexual 7.5 (0.2) < 0.0001 Nonheterosexual 11.9 (0.8) Student characteristics Predicted probability of sexual assault Adjusted Wald-F p-value % (SE) Greeting type Generic 8.0 (0.3) 0.2777 Personalized 7.6 (0.3) School A 12.6 (0.5) < 0.0001 B 9.7 (0.4) C 4.2 (0.2) D 7.4 (0.4) Gender identity Male 3.1 (0.2) < 0.0001 Female 12.4 (0.3) Transgender/Other 11.6 (3.5) Year of study First 8.2 (0.4) 0.0111 Second 8.5 (0.4) Third 7.3 (0.4) Fourth or more 7.1 (0.4) Race/Ethnicity White, non-Hispanic 8.0 (0.3) < 0.0001 Black, non-Hispanic 8.4 (0.9) Hispanic 8.7 (0.6) Asian 4.5 (0.4) Othera 9.5 (0.9) Sexual orientation Heterosexual 7.5 (0.2) < 0.0001 Nonheterosexual 11.9 (0.8) aIncludes respondents who selected American Indian/Alaska Native, Other Pacific Islander, Other, or multiple races. View Large Table 3. Predicted probabilities for reporting victimization: main-effects model Student characteristics Predicted probability of sexual assault Adjusted Wald-F p-value % (SE) Greeting type Generic 8.0 (0.3) 0.2777 Personalized 7.6 (0.3) School A 12.6 (0.5) < 0.0001 B 9.7 (0.4) C 4.2 (0.2) D 7.4 (0.4) Gender identity Male 3.1 (0.2) < 0.0001 Female 12.4 (0.3) Transgender/Other 11.6 (3.5) Year of study First 8.2 (0.4) 0.0111 Second 8.5 (0.4) Third 7.3 (0.4) Fourth or more 7.1 (0.4) Race/Ethnicity White, non-Hispanic 8.0 (0.3) < 0.0001 Black, non-Hispanic 8.4 (0.9) Hispanic 8.7 (0.6) Asian 4.5 (0.4) Othera 9.5 (0.9) Sexual orientation Heterosexual 7.5 (0.2) < 0.0001 Nonheterosexual 11.9 (0.8) Student characteristics Predicted probability of sexual assault Adjusted Wald-F p-value % (SE) Greeting type Generic 8.0 (0.3) 0.2777 Personalized 7.6 (0.3) School A 12.6 (0.5) < 0.0001 B 9.7 (0.4) C 4.2 (0.2) D 7.4 (0.4) Gender identity Male 3.1 (0.2) < 0.0001 Female 12.4 (0.3) Transgender/Other 11.6 (3.5) Year of study First 8.2 (0.4) 0.0111 Second 8.5 (0.4) Third 7.3 (0.4) Fourth or more 7.1 (0.4) Race/Ethnicity White, non-Hispanic 8.0 (0.3) < 0.0001 Black, non-Hispanic 8.4 (0.9) Hispanic 8.7 (0.6) Asian 4.5 (0.4) Othera 9.5 (0.9) Sexual orientation Heterosexual 7.5 (0.2) < 0.0001 Nonheterosexual 11.9 (0.8) aIncludes respondents who selected American Indian/Alaska Native, Other Pacific Islander, Other, or multiple races. View Large To confirm the bivariate results for the research question, the interaction between greeting type and the student characteristics was included in the model. Table 4 presents the predicted probabilities for each of the student characteristic interactions by greeting type and the p-value for the adjusted Wald F. The model results indicate that after controlling for student characteristics, the greeting significantly impacted how students reported sexual assault across schools (p = 0.0401) and year of study (p = 0.0432). Within schools, only School C had a significantly different victimization rate across treatment groups (p = 0.0186), with the personalized greeting yielding a higher rate than the generic greeting (4.7 percent vs. 3.6 percent). Within year of study, no group had significantly different rates at the 0.05 level and only first-year students were significant at the 0.10 level (p = 0.0756). The Wald test was likely significant because the direction of change for first-year students was the opposite of the other three years of study. Table 4. Predicted probabilities for reporting victimization: model including interactions of greeting and student characteristics Generic greeting Personalized greeting Student characteristics Predicted probability of sexual assault Predicted probability of sexual assault Adjusted Wald-F p-value Pairwise comparison p-valuea % (SE) % (SE) School A 13.3 (0.8) 11.9 (0.8) 0.0401 0.2089 B 9.9 (0.6) 9.4 (0.6) 0.5486 C 3.6 (0.3) 4.7 (0.4) 0.0186 D 7.5 (0.5) 7.3 (0.5) 0.7327 Gender identity Male 2.9 (0.3) 3.2 (0.3) 0.6177 n/a Female 12.6 (0.4) 12.2 (0.4) n/a Transgender/Other 11.8 (5.0) 11.7 (4.8) n/a Year of study First 8.8 (0.6) 7.5 (0.5) 0.0432 0.0756 Second 8.0 (0.6) 8.9 (0.6) 0.2221 Third 6.8 (0.5) 7.9 (0.5) 0.1343 Fourth or more 7.1 (0.5) 7.1 (0.5) 0.9816 Race/Ethnicity White, non-Hispanic 8.3 (0.4) 7.8 (0.3) 0.1057 n/a Black, non-Hispanic 7.1 (1.1) 9.5 (1.3) n/a Hispanic 8.0 (0.8) 9.4 (0.9) n/a Asian 3.8 (0.5) 5.1 (0.6) n/a Otherb 9.2 (1.2) 9.8 (1.2) n/a Sexual orientation Heterosexual 7.4 (0.3) 7.5 (0.3) 0.9113 n/a Nonheterosexual 11.7 (1.1) 12.0 (1.1) n/a Generic greeting Personalized greeting Student characteristics Predicted probability of sexual assault Predicted probability of sexual assault Adjusted Wald-F p-value Pairwise comparison p-valuea % (SE) % (SE) School A 13.3 (0.8) 11.9 (0.8) 0.0401 0.2089 B 9.9 (0.6) 9.4 (0.6) 0.5486 C 3.6 (0.3) 4.7 (0.4) 0.0186 D 7.5 (0.5) 7.3 (0.5) 0.7327 Gender identity Male 2.9 (0.3) 3.2 (0.3) 0.6177 n/a Female 12.6 (0.4) 12.2 (0.4) n/a Transgender/Other 11.8 (5.0) 11.7 (4.8) n/a Year of study First 8.8 (0.6) 7.5 (0.5) 0.0432 0.0756 Second 8.0 (0.6) 8.9 (0.6) 0.2221 Third 6.8 (0.5) 7.9 (0.5) 0.1343 Fourth or more 7.1 (0.5) 7.1 (0.5) 0.9816 Race/Ethnicity White, non-Hispanic 8.3 (0.4) 7.8 (0.3) 0.1057 n/a Black, non-Hispanic 7.1 (1.1) 9.5 (1.3) n/a Hispanic 8.0 (0.8) 9.4 (0.9) n/a Asian 3.8 (0.5) 5.1 (0.6) n/a Otherb 9.2 (1.2) 9.8 (1.2) n/a Sexual orientation Heterosexual 7.4 (0.3) 7.5 (0.3) 0.9113 n/a Nonheterosexual 11.7 (1.1) 12.0 (1.1) n/a aPairwise comparisons only conducted for student characteristics with a statistically significant overall Wald test. bIncludes respondents who selected American Indian/Alaska Native, Other Pacific Islander, Other, or multiple races. View Large Table 4. Predicted probabilities for reporting victimization: model including interactions of greeting and student characteristics Generic greeting Personalized greeting Student characteristics Predicted probability of sexual assault Predicted probability of sexual assault Adjusted Wald-F p-value Pairwise comparison p-valuea % (SE) % (SE) School A 13.3 (0.8) 11.9 (0.8) 0.0401 0.2089 B 9.9 (0.6) 9.4 (0.6) 0.5486 C 3.6 (0.3) 4.7 (0.4) 0.0186 D 7.5 (0.5) 7.3 (0.5) 0.7327 Gender identity Male 2.9 (0.3) 3.2 (0.3) 0.6177 n/a Female 12.6 (0.4) 12.2 (0.4) n/a Transgender/Other 11.8 (5.0) 11.7 (4.8) n/a Year of study First 8.8 (0.6) 7.5 (0.5) 0.0432 0.0756 Second 8.0 (0.6) 8.9 (0.6) 0.2221 Third 6.8 (0.5) 7.9 (0.5) 0.1343 Fourth or more 7.1 (0.5) 7.1 (0.5) 0.9816 Race/Ethnicity White, non-Hispanic 8.3 (0.4) 7.8 (0.3) 0.1057 n/a Black, non-Hispanic 7.1 (1.1) 9.5 (1.3) n/a Hispanic 8.0 (0.8) 9.4 (0.9) n/a Asian 3.8 (0.5) 5.1 (0.6) n/a Otherb 9.2 (1.2) 9.8 (1.2) n/a Sexual orientation Heterosexual 7.4 (0.3) 7.5 (0.3) 0.9113 n/a Nonheterosexual 11.7 (1.1) 12.0 (1.1) n/a Generic greeting Personalized greeting Student characteristics Predicted probability of sexual assault Predicted probability of sexual assault Adjusted Wald-F p-value Pairwise comparison p-valuea % (SE) % (SE) School A 13.3 (0.8) 11.9 (0.8) 0.0401 0.2089 B 9.9 (0.6) 9.4 (0.6) 0.5486 C 3.6 (0.3) 4.7 (0.4) 0.0186 D 7.5 (0.5) 7.3 (0.5) 0.7327 Gender identity Male 2.9 (0.3) 3.2 (0.3) 0.6177 n/a Female 12.6 (0.4) 12.2 (0.4) n/a Transgender/Other 11.8 (5.0) 11.7 (4.8) n/a Year of study First 8.8 (0.6) 7.5 (0.5) 0.0432 0.0756 Second 8.0 (0.6) 8.9 (0.6) 0.2221 Third 6.8 (0.5) 7.9 (0.5) 0.1343 Fourth or more 7.1 (0.5) 7.1 (0.5) 0.9816 Race/Ethnicity White, non-Hispanic 8.3 (0.4) 7.8 (0.3) 0.1057 n/a Black, non-Hispanic 7.1 (1.1) 9.5 (1.3) n/a Hispanic 8.0 (0.8) 9.4 (0.9) n/a Asian 3.8 (0.5) 5.1 (0.6) n/a Otherb 9.2 (1.2) 9.8 (1.2) n/a Sexual orientation Heterosexual 7.4 (0.3) 7.5 (0.3) 0.9113 n/a Nonheterosexual 11.7 (1.1) 12.0 (1.1) n/a aPairwise comparisons only conducted for student characteristics with a statistically significant overall Wald test. bIncludes respondents who selected American Indian/Alaska Native, Other Pacific Islander, Other, or multiple races. View Large Discussion The bivariate analysis indicated that the personalized greeting suppressed sexual assault reporting; however, after controlling for student characteristics, the predicted probabilities of reporting victimization were statistically equivalent. Thus, any differences found in the bivariate analysis were likely due to differences in the composition of respondents across the two greeting types, rather than differences due to the greeting itself. There is not statistical evidence to support the hypothesis that the personalized greeting would suppress sexual assault reporting. Namely, at the aggregate level, the personalized greeting increased response by 3.5 percentage points without impacting the reported sexual assault prevalence rate. Further findings indicate that after controlling for student characteristics, victimization reporting differed by greeting only with respect to school attended and year of study. Race—for which non-Hispanic whites were significantly different in the bivariate analysis—was no longer significant after controlling for the other characteristics. For school attended, the statistical significance between greeting types was caused by School C (the only school with statistically different rates) having higher predicted prevalence rates when the personalized greeting was used, compared with the other three schools, where the personalized greeting led to lower prevalence rates. Other measures in the survey, such as students’ perceptions of campus climate, could be examined to see whether School C’s climate differs in a way that might explain why personalization uniquely impacted prevalence rates at this school. Unfortunately, those findings could be paired with previously published findings from the study to compromise the confidentiality of the schools’ identities, so this possible explanation was not examined.5 Therefore, one note of caution on the results is that for some schools, greeting type may have unexpected impacts on reporting sexual assault. For studies making estimates at the primary sampling unit level, this caution may be important. For year of study, the statistical difference by greeting type was caused by second- through fourth-year students having higher reported prevalence rates when the personalized greeting was used, unlike first-year students, who had lower prevalence rates. Why personalization impacted students differently is unclear; further experiments to replicate and/or explain this finding are needed. Perhaps second- through fourth-year students are more trusting of their school’s administration, more aware of how common sexual assault is, more aware of the importance of reporting victimization and the positive impact that can result, or have more established support systems and would be better equipped to handle a breach of confidentiality. Consistent with prior research, the personalized greeting resulted in higher response rates than the generic greeting. Response rates were about 3.5 percentage points higher for the personalized greeting, which is a less pronounced impact than observed in prior research. This muted effect of personalization could be due to a couple factors. First, including school name in the generic greeting made it semi-personalized and was done deliberately to reap benefits of personalization without introducing privacy concerns. The relatively small difference between the generic and personalized response rates suggests the semi-personalization was effective. However, this is purely speculative, as a completely impersonalized condition (e.g., “Dear Student”) was not examined. Second, the response rates were relatively high already. Sample members may have been motivated to participate because of the survey’s desirable incentives, support from trusted school administrators, encouragement from participating friends, and recent widespread media attention to the topic. Some of these factors also may have impacted victimization reporting. Thus, the finding that the personalized greeting had little impact on victimization reporting may not hold across all surveys, particularly when sample members lack a pre-established, trusting relationship with the survey sponsor. A limitation of this research is that the sample is not nationally representative. The selected schools could differ from a nationally representative sample of schools on factors that contribute to reporting victimization, such as campus climate. Future research should examine the extent to which these findings differ across a larger, randomly selected sample of schools. This is especially important given the unexpected finding that School C students, who received the personalized greeting, were more likely to report victimization. Further research also is needed with the general population, to which findings about students are not generalizable. Personalization may impact other segments of the population differently, based on characteristics such as age, education, or digital literacy. Overall, the findings did not provide strong enough evidence to discourage the use of a personalized greeting in an extremely sensitive survey. The findings suggest that the personalized greeting increased response rates without impacting overall rates of reported sexual assault victimization, except for School C and first-year students. At School C, the personalized greeting produced a higher victimization rate. This suggests that personalization produced a more accurate estimate and is consistent with the recommendation to use a personalized greeting, even on sensitive surveys. However, more research is needed to explore reporting differences among first-year students. Other areas for future research include (1) examining the impact of personalization on sensitive surveys when sample members are less familiar with or trusting of the survey sponsor; and (2) examining the impact on the general population, especially across the subgroups noted above that are likely to be differentially affected. These investigations would be imperative in determining how best to personalize contacts in sensitive surveys. The authors thank the Bureau of Justice Statistics (BJS) for its valuable contributions to and sponsorship of this research. The views expressed in this manuscript are those of the authors only and do not reflect the views or position of BJS or the Department of Justice. Funding for this research was provided by the Department of Justice, Office of Justice Programs, through cooperative agreement 2011-NV-CX-K068 to C.P.K Footnotes 1. Two-year schools do not have third- or fourth-year students, and the average age of first- and second-year students at these schools tends to be older compared to traditional four-year schools. 2. The frame data were used over self-reported data, because only frame data were available on nonrespondents. 3. Self-reported characteristics were used over frame characteristics, because the former included data on more characteristics and those self-reported characteristics were of greater interest for the analysis. Furthermore, some of the self-reported variables (e.g., gender identity) were assumed to provide a more accurate representation of students than the corresponding frame variables (e.g., sex). 4. Age was not included in the logistic model because of its high collinearity with year of study. 5. One of the agreements made between BJS and the participating schools was that the schools would not be identified. References American Association for Public Opinion Research . 2015 . “ Standard Definitions: Final Dispositions of Case Codes and Outcome Rates for Surveys .” American Association for Public Opinion Research (AAPOR) 8 th ed . Bradburn , Norman M. , Seymour Sudman , Ed Blair , and Carol Stocking . 1978 . “ Question Threat and Response Bias .” Public Opinion Quarterly 42 : 221 – 34 . Google Scholar CrossRef Search ADS Cook , Coleen , Fred Heath , and Russell L. Thompson . 2000 . “ A Meta-Analysis of Response Rates in Web- or Internet-Based Surveys .” Educational and Psychological Measurement 60 : 821 – 36 . Google Scholar CrossRef Search ADS Edwards , Philip James , Ian Roberts , Mike J. Clarke , Carolyn DiGuiseppi , Reinhard Wentz , Irene Kwan , Rachel Cooper , Lambert M. Felix , and Sarah Pratap . 2009 . “ Methods to Increase Response to Postal and Electronic Questionnaires .” CochraneDatabase of Systematic Reviews 8 ( 3 ): 1 – 527 . doi: 10.1002/14651858.MR000008.pub4 Heerwegh , Dirk . 2005 . “ Effects of Personal Salutations in E-Mail Invitations to Participate in a Web Survey .” Public Opinion Quarterly 69 : 588 – 98 . Google Scholar CrossRef Search ADS Heerwegh , Dirk , and Geert Loosveldt . 2006 . “ Personalizing E-Mail Contacts: Its Influence on Web Survey Response Rate and Social Desirability Response Bias .” International Journal of Public Opinion Research 19 : 258 – 67 . Google Scholar CrossRef Search ADS Heerwegh , Dirk , Tim Vanhove , Koen Matthijs , and Geert Loosveldt . 2005 . “ The Effect of Personalization on Response Rates and Data Quality in Web Surveys .” International Journal of Social Research Methodology 8 : 85 – 99 . Google Scholar CrossRef Search ADS Joinson , Adam N. , and Ulf-Dietrich Reips . 2007 . “ Personalized Salutation, Power of Send and Response Rates to Web-Based Surveys .” Computers in Human Behavior 23 : 1372 – 83 . Google Scholar CrossRef Search ADS Joinson , Adam N. , Alan Woodley , and Ulf-Dietrich Reips . 2007 . “ Personalization, Authentication and Self-Disclosure in Self-Administered Internet Surveys .” Computers in Human Behavior 23 : 275 – 85 . Google Scholar CrossRef Search ADS Krebs , Christopher , Christine Lindquist , Marcus Berzofsky , Bonnie Shook-Sa , Kimberly Peterson , Michael Planty , Lynn Langton , and Jessica Stroop . 2016 . “Campus Climate Survey Validation Study Final Technical Report.” Available at http://www.bjs.gov/content/pub/pdf/ccsvsftr.pdf. Muñoz-Leiva , Francisco , Juan Sánchez-Fernández , Francisco Montoro-Ríos , and José Ángel Ilbáñez-Zapata . 2010 . “ Improving the Response Rate and Quality in Web-Based Surveys through the Personalization and Frequency of Reminder Mailings .” Quality & Quantity 44 : 1037 – 52 . Google Scholar CrossRef Search ADS Pearson , Jerold , and Roger A. Levine . 2003 . “ Salutations and Response Rates to Online Surveys .” In Associationfor Survey Computing Fourth International Conference on the Impact of Technology on the Survey Process , September 19, University of Warwick . Porter , Stephen R. , and Michael E. Whitcomb . 2003 . “ The Impact of Contact Type on Web Survey Response Rates .” Public Opinion Quarterly 67 : 579 – 88 . Google Scholar CrossRef Search ADS Sauermann , Henry , and Michael Roach . 2013 . “ Increasing Web Survey Response Rates in Innovation Research: An Experimental Study of Static and Dynamic Contact Design Features .” Research Policy 42 : 273 – 86 . Google Scholar CrossRef Search ADS Tourangeau , Roger , Lance J. Rips , and Kenneth Rasinski . 2000 . The Psychology of Survey Response . Cambridge : Cambridge University Press . Google Scholar CrossRef Search ADS Tourangeau , Roger , and Ting Yan . 2007 . “ Sensitive Questions in Surveys .” Psychological Bulletin 133 : 859 – 83 . Google Scholar CrossRef Search ADS PubMed © The Author(s) 2018. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. All rights reserved. For permissions, please e-mail: [email protected] This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)
Kevin Arceneaux and Ryan Vander Wielen. Taming Intuition: How Reflection Minimizes Partisan Reasoning and Promotes Democratic Accountability. New York: Cambridge University Press. 2017. 260 pp. $99.99 (cloth). $34.99 (paper)2018 Public Opinion Quarterly
doi: 10.1093/poq/nfy012
Are voters rationally capable of holding politicians accountable, or are political attitudes determined solely by partisan identity? Few questions are more central to the study of political behavior and public opinion. In Taming Intuition, Kevin Arceneaux and Ryan Vander Wielen make a detailed and convincing case that the answer depends on voters’ preferences toward second-guessing themselves. Arceneaux and Vander Wielen’s book makes substantial progress toward closing the gap between psychological and economic theories of voter rationality by showing how partisan perception and motivated skepticism can be overcome—through reflection—by voters who enjoy thinking. Previous research on heuristics and the rationality of low-information partisan voting undertheorizes voters’ preference for cognitive effort. Partisan identification is universally described as central to voter decision-making: psychological theories treat party identity as a perceptual screen that leads to motivated reasoning, while economic perspectives view partisan labels as heuristics that help rational voters assess politicians’ performance over time. Simply because partisan identification is a useful heuristic, however, does not mean that all voters are looking for a shortcut. The authors describe how political information is processed through two separate psychological systems: first through quick affective judgment, and second through effortful, cognitive reflection, which some may prefer to use more than others. This conception builds on and extends Lodge and Taber (2013) by incorporating individual-level differences into their model of conscious processing. Intuition’s ability to overwhelm rationality depends upon the relative balance of voters’ preferences for feeling and thinking. These preferences are captured in continuous scales measuring “need for affect” (NFA; Maio and Esses 2001) and “need for cognition” (NFC; Petty and Cacioppo 1986), reflecting individuals’ propensity to either seek a broad range of emotional experiences or use thought to make sense of the world, respectively. This focus on “cognitive style” follows approaches in psychology, such as the Elaboration Likelihood Model, that have made an impact in political science as well. Since measures of NFC and NFA are uncommon in political surveys and experimental research, the authors conducted five original studies: three large-N internet surveys (matched to quotas or the Census), and two smaller experiments drawn from the internet and college students. The authors deftly apply the psychological implications of cognitive style to the processes of democratic accountability. When politicians’ behavior runs counter to voters’ expectations—with failed policies, ideologically inconsistent policy positions, or corruption—voters’ first impulse will be to maintain loyalty to partisan identity, leading to reinforcement of their attitudes and continued support for their party’s politicians. Only by accessing the second system, reflection, can attitudes toward politicians and policies be adjusted and democratic accountability be activated. These emotional and rational tendencies are associated with strength of partisan identities in predictable ways. Strong partisans are more attached to their partisan identity, of course, but partisans with higher NFA exhibit much stronger partisan attitudes while levels of NFC are unrelated to party attachment. Among independents, higher NFA is unrelated to their attachment to their independent identity, while higher levels of NFC are negatively related. Voters with higher need for affect rely more strongly on party cues when receiving new information, rejecting pro-attitudinal information when it is conveyed by an out-party politician. Higher need for cognition, when accompanied by lower need for affect, predicts weakening party identity over the course of a campaign. When presented with negative descriptions about the effects of policies of an in-party governor, for instance, only the combination of high need for cognition and low need for affect led voters to evaluate that governor less favorably. By examining these predispositions at their relative extremes, the authors convincingly show that relative values of emotion and rationality moderate citizens’ ability to hold politicians accountable. When voters have higher need for cognition and lower need for affect, they have higher rates of split-ticket voting, lower affective polarization, higher positive ratings for the out-party, and lower levels of anger. These effects of reflection seem to reconcile many of the democratic dilemmas posed by scholars and observers of American politics in our recent polarized and contentious times. For several empirical and theoretical reasons, however, the book falls short of its promise for providing democratic remedies. The findings are exclusively portrayed as predicted probabilities for reflective and non-reflective individuals, but their data show that these categories combined represent a minority of observed respondents, depending upon the cutpoint used. If these categories are made using only the medians of NFA and NFC, 40 percent of the sample fall into one of the two categories; with more stringent criteria, using the seventieth and thirtieth percentiles for each scale, that proportion drops to 11 percent (p. 62). If the observed categories of reflective and non-reflective voters are such a small percent of the population, the authors’ findings are less impactful than they initially seem to be. The dynamics of media use are also critical for determining voters’ exposure to counter-attitudinal information. Their theory begins with the reception of new political information, but the authors do not address how reflective and non-reflective people might differ in their exposure to political information. If people are gravitating toward ideologically congenial media that omit damaging information about the politicians from their party, or misrepresent the facts of policy disputes, that will interfere with the process of reflection. Future research should take up the question of selective exposure considering this book’s findings: do more reflective people also consume a greater diversity of media perspectives? Finally, if reflectiveness is classified as a trait, that limits its usefulness as a cure for American democracy’s ills. It is useful to know that the quality of being reflective is helpful for democratic accountability, but the reader is left without any guidance on how to develop that quality in citizens, or whether that is even possible. Can early childhood socialization inculcate reflective qualities? Civic education? Social capital? If reflection can “minimize partisan reasoning and promote democratic accountability,” as the book’s subtitle argues, the necessary question becomes how to promote reflection in the public. The authors’ proposed solution—crafting institutions that minimize the need for reflection by the public (p. 171)—falls short of describing how reflection might be encouraged. Taming Intuition makes an important contribution to the literature on the psychology of democratic accountability. Measures of need for cognition and need for affect should be included in more political surveys due to their demonstrable impact on attitude formation. Reflection is clearly a powerful force, and Taming Intuition should be commended for calling attention to its political effects. There is great potential for future research on this topic, and certainly much to reflect on in the book itself. References Lodge , Milton , and Charles S. Taber . 2013 . The Rationalizing Voter . Cambridge : Cambridge University Press . Google Scholar CrossRef Search ADS Maio , Gregory R. , and Victoria M. Esses . 2001 . “ The Need for Affect: Individual Differences in the Motivation to Approach or Avoid Emotions .” Journal of Personality 69 ( 4 ): 583 – 614 . Google Scholar CrossRef Search ADS PubMed Petty , Richard E. , and John T. Cacioppo . 1986 . “ The Elaboration Likelihood Model of Persuasion .” Advances in Experimental Social Psychology 19 : 123 – 205 . Google Scholar CrossRef Search ADS © The Author(s) 2018. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. All rights reserved. For permissions, please e-mail: [email protected] This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)
Harold D. Clarke, Matthew Goodwin, and Paul Whiteley. Brexit: Why Britain Voted to Leave the European Union. Cambridge: Cambridge University Press. 2017. 256 pp. $19.99 (paper)2018 Public Opinion Quarterly
doi: 10.1093/poq/nfy010
This is the first book-length attempt using academically designed polling data to understand the outcome of the referendum on the UK’s membership of the European Union in June 2016. Other texts—most notably by Oliver (2016) and Shipman (2016)—provide excellent journalistic accounts of the stratagems and maneuvering of the key actors in the drama that preceded the vote, but they lack much in the way of evidence as to how and why the audience reacted as they did. Clarke, Goodwin, and Whiteley’s efforts fill that gap. That said, the book ranges across a much wider canvas than simply providing a forensic examination of the referendum vote itself. It begins by outlining the key events immediately leading up to the referendum and, in particular, the efforts of the then prime minister, David Cameron, in 2015–2016 to renegotiate the terms of the UK’s membership of the EU. It reminds readers, too, of the key messages deployed by the two sides and how, according to the polls, those messages were received by voters. This introductory material is followed by multivariate analyses of some of the backdrop to the vote, which was promised by Cameron following an increase in support for the anti-EU United Kingdom Independence Party (UKIP) in 2012—an increase that seemed to threaten the re-election chances of Cameron’s Conservative Party. First, there is an analysis of the key influences on British attitudes toward Europe during the course of the decade leading up to the referendum, and, second, a discussion of the social character and ideological predispositions of UKIP supporters together with an examination of what encouraged voters to support UKIP in the ballot box. Only then do we come to what is just a single chapter on why people voted as they did. Meanwhile, before reaching a conclusion and, in what might be regarded as a particularly unusual inclusion in a text on electoral choice, the book undertakes empirical analysis of the validity of some of the arguments that were deployed during the referendum campaign, looking in particular at the historical impact of EU membership on economic growth, immigration, and the quality of governance. The chapters that provide the analytic core of the book have one valuable feature. They all address their subject matter at more than one level. For example, in analyzing attitudes toward the EU before the referendum, Clarke et al. undertake both an aggregate-level analysis of the dynamics of support during the previous decade, as registered by regular monthly polling of attitudes toward the EU they themselves had directed, and an individual-level analysis of attitudes toward EU membership that helps identify both the voter characteristics and the contextual circumstances that are correlated with higher or lower levels of support. A similar approach is then adopted in the analysis of electoral support for UKIP, and of how people voted in the EU referendum (where the aggregate-level analysis is of the geography of the vote). This dualism is to be welcomed; not only is an explanation that operates at more than one level more convincing, but it is also too often forgotten that the overall outcome of a vote cannot be adequately explained solely by an examination of the correlates of voting behavior. Attention also has to be paid to marginal distributions and what causes them to rise or fall. However, there is an obvious challenge facing a book that contains much modeling of diverse data sets; that is, to develop a coherent narrative that infuses the presentation from beginning to end. The authors do have an overall story to tell—the three key issues of immigration, the economy, and sovereignty had between them long ago created the potential for a successful Leave campaign. All three were important influences on attitudes toward the EU before the referendum was held, on willingness to join UKIP and to vote for the party, and subsequently on how people voted in the referendum. To that extent, how people voted in the referendum was, as the authors put it, “baked in” long before the referendum was called. However, the eventual success of the Leave side was also facilitated by the actions of politicians. UKIP’s initial electoral rise (that led to the promise to hold a referendum) was assisted by the Liberal Democrats’ decision in 2010 to enter in coalition with the Conservatives, thereby removing them from their role as the vehicle for midterm anti-government protest. Meanwhile, the Leave campaign secured a significant boost from the decision of the charismatic former Conservative mayor of London, Boris Johnson, to campaign for Leave, whereas, in contrast, the cues provided by the various pro-Remain parties and leaders were all relatively ineffective. This is a familiar tale to anyone who has followed British politics in recent years. Of course, that familiarity does nothing to devalue the merits of the book—social science interrogation that confirms existing conventional wisdom is just as valuable as that which overturns it. But what is missing is the sense of an emerging narrative as the reader moves from chapter to chapter; consequently, there is a risk that the chapters can seem somewhat disparate from each other. The presentation of a clear narrative is, perhaps, also not helped by the authors’ attempt in Chapter 4 to argue that the outcome of the referendum can best be viewed through the lens of a “valence” theory of voting behavior, a perspective that has long been promoted by two of the authors and which argues that voting behavior is primarily driven nowadays by perceptions of the competence of political parties rather than disagreements about the ends of public policy. One suspects that many readers will feel that voters do disagree fundamentally about what the aims and objectives of immigration policy should be, and about the merits or otherwise of sharing sovereignty with other European states. Indeed, even the arguments about the economic benefits or otherwise of EU membership might be thought to reflect differences of interest between those in different positions in the labor market. In any event, after Chapter 4 there are largely little more than glancing references to valence theory, and it seems to play little role when the authors do bring the threads of their analysis together in the final chapter. There is, though, one piece of analysis in the book that does challenge much conventional wisdom. In their penultimate chapter, Clarke et al. argue that EU membership has had little discernible impact on Britain’s economic prosperity, while they suggest that because the level of EU migration has oscillated in tandem with the health of Britain’s economy (unlike non-EU immigration), ending the automatic right of EU citizens to come to the UK to live and work (one likely consequence of leaving the EU) would see immigration fall. This analysis seems highly supportive of some of the key arguments put forward by the Leave side. Should we take that to mean that the authors feel that ultimately Britain voted to leave the EU because the Leave side had the better tunes? On that, tantalizingly, the book remains silent. References Oliver , Craig . 2016 . Unleashing Demons: The Inside Story of Brexit . London : Hodder and Stoughton . Shipman , Tim . 2016 . All Out War: The Full Story of How Brexit Sank Britain’s Political Class . London : William Collins . © The Author(s) 2018. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. All rights reserved. For permissions, please e-mail: [email protected] This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)
Mary Layton Atkinson. Combative Politics: The Media and Public Perceptions of Lawmaking. Chicago: University of Chicago Press. 2017. 208 pp. $85.00 (cloth). $27.50 (paper)2018 Public Opinion Quarterly
doi: 10.1093/poq/nfy011
Combative Politics, a compelling new book by Mary Layton Atkinson, weaves together several important threads of American politics research of recent years: the dominance of game-frame news coverage, the renewed emphasis on citizen discontent, and the gap in public preferences and policy outcomes in often-gridlocked Washington. This project uses a range of evidence to make its case, including content analysis of media reports, multivariate analyses of public opinion research, and experiments that expose subjects to genuine and subtly altered news reports to measure reader responses to different media content. This combination of research approaches—something of a mash-up of Out of Order (Thomas Patterson), News That Matters (Shanto Iyengar and Donald Kinder), Governing with the News (Tim Cook), and Stealth Democracy (John Hibbing and Elizabeth Theiss-Morse)—provides an effective and comprehensive framework for examining the roles that media content play in increasing public cynicism and in creating obstacles for passage of even popular legislation. This well-written book offers compelling evidence that journalism norms work against both sound policy development and legislative compromise. Chapter 1 examines public support for public issues during the time they are considered in Congress, noting that even popular policies become objects of scorn “because of their association with the unpopular, contentious process of policy making” (p. 10). Atkinson’s content analysis research focuses on the New York Times, a media outlet selected because it covers the several decades of policy coverage considered here as well as the author’s quite reasonable view that this newspaper would be more inclined to provide depth and background to its policy stories than nearly any other prominent news outlet. In Chapter 2, Atkinson uses content analysis to examine policy news stories relating to health care, K–12 education, and social welfare between January 1980 and December 2010. The content analysis revealed that conflict narratives were prominent in more than two-thirds of policy-oriented stories analyzed, presenting the legislative debates as battles in decade after decade. By only rarely providing links between problems and desired outcomes, the partisan disputes on Capitol Hill rarely focus on finding the best policy path forward. The experimental efforts, which involved a student sample that was then largely replicated in the 2012 Cooperative Congressional Election Study (CCES), found that support for policy initiatives was considerably greater among those exposed to largely hypothetical civil debate media treatments as opposed to news stories that reported on heated debate over the legislation. Too often citizens, particularly less informed ones, imagine that there is an optimal policy solution readily apparent to everyone, which leads to a belief that conflict serves the personal ends of the combatants. But even more informed readers who consume conflict-focused news are less supportive of policies than those who consume news that emphasizes cooperation. (Since reporters don’t write a lot of cooperation-focused legislative news, the author was obligated to create versions of existing news stories that emphasized compromise—experimental procedures carefully detailed in the book’s appendices.) The consequences of conflict-laden news were confirmed in public opinion data from 2004 and 2005 that compared support for the proposed federal Constitutional Amendment to ban gay marriage in states that had separate state-level gay marriage bans on their ballots with those states that did not. Once again, the gay marriage ban demonstrated that heightened controversy led to reduced support for a policy measure as discussion increased. Of course, most policy issues involve greater issue complexity than a gay marriage ban. Citizens trying to untangle the effects of a health care bill, for example, are far more dependent on the media when it comes to making sense of the legislation than when it comes to divining one’s personal views on same-sex marriage. In studies that examine the trajectory of public opinion relating to health care bills backed by President Bill Clinton and President Obama, Atkinson demonstrates that conflict reporting depresses support for the measures. Citizens with less formal education are particularly inclined to believe that policy disputes exist because politicians want to score political points more than because they have genuine policy disagreements. Atkinson wisely demonstrates exceptions to these overall patterns of combative rhetoric and combative news. In those rare situations where an important policy issue does not sustain partisan controversy—like the Americans with Disabilities Act of 1990—there was little media coverage and no decline in public support for the measure during the time Congress debated the bill. While the Brady Bill endured considerable criticism from gun control advocates, the clear linking of the measure to a popular and specific policy outcome allowed the measure to pass despite opposition from many in Congress. To Atkinson, many public misconceptions about policy and policymakers are laid at the feet of journalists. By focusing on extreme rhetoric and the most controversial legislation, reporters confirm the most cynical citizen interpretations about business in Washington. Yet incentives for reporters reward that behavior: the for-profit media marketplace will respond to public preferences for the bad and the ugly in Washington. Ambitious lawmakers will attack each other as they also respond to external pressures as they seek to become more prominent nationally. Show horses, not workhorses, make the paper. The book’s evidence demonstrates that journalistic norms, particularly the use of conflict frames, discourage responsible lawmaking and reward divisiveness, conflict, and policy stalemate. Although Atkinson does link this circle of cynicism to the election of Donald Trump in this book, future research might examine how the media denigration of experience and of the struggle to compromise led to the election of a president with profound disdain for conventional lawmaking by an electorate who prized his lack of government experience. Reporters and editors focus on conflict on policy debate, Atkinson argues, because new disputes allow news outlets to provide something new to say about a specific bill, which changes little on most days. The evidence from decades of news coverage in the New York Times demonstrates that reporters, in their desire for novelty, pay little attention to moderate voices in any debate, and likewise largely ignore moderate policies that generate little opposition. Because rewarded behavior is repeated behavior, politicians and journalists gravitate toward comments and news reports that emphasize the most uncompromising commentary. While some critics may object to the absence of television and online content in these pages, any study of media impact on national public opinion across decades could hardly ignore the direct (and arguably greater indirect) influence of the Times when it comes to shaping media and political discourse in a variety of venues. Future scholars, of course, can use this work as a template for subsequent studies about the impact of other media sources on lawmaking and public opinion. As a whole, Combative Politics provides comprehensive evidence, effective and accessible writing, and well-founded conclusions, making this a compelling classroom choice for advanced undergraduate- and graduate-level classes in public policy, legislative politics, and political communication. © The Author(s) 2018. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. All rights reserved. For permissions, please e-mail: [email protected] This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)
Affective Polarization or Partisan Disdain?: Untangling a Dislike for the Opposing Party from a Dislike of Partisanship2018 Public Opinion Quarterly
doi: 10.1093/poq/nfy014
Abstract Recent scholarship suggests that American partisans dislike other party members so much that partisanship has become the main social divide in modern politics. We argue that at least one measure of this “affective polarization” conflates a dislike for members of the other party with a dislike for partisanship in general. The measure asks people how they feel about their child marrying someone from another party. What seems like negative affect toward the other party is, in fact, negative affect toward partisans from either side of the aisle and political discussion in general. Relying on two national experiments, we demonstrate that although some Americans are politically polarized, more simply want to avoid talking about politics. In fact, many people do not want their child to marry someone from their own party if that hypothetical in-law were to discuss politics frequently. Supplementary analyses using ANES feeling thermometers show that inparty feeling thermometer ratings have decreased in recent years among weak and leaning partisans. As a result, the feeling thermometer results confirm the conclusion from the experiments. Polarization is a phenomenon concentrated in the one-third of Americans who consider themselves strong partisans. More individuals are averse to partisan politics. The analyses demonstrate how affective polarization exists alongside weakening partisan identities. Contemporary scholars and journalists show great interest in the growing partisan divide among Americans. Although scholars debate whether this divide is based on actual differences in issue preferences or merely on perceived differences (Mason 2015; Levendusky and Malhotra 2016), most acknowledge that “affective” polarization—“view[ing] opposing partisans negatively and copartisans positively” (Iyengar and Westwood 2015, p. 691)—appears to be increasing. In this study, however, we find that at least one widely cited measure of affective polarization overstates the amount of affective polarization.1 The measure asks partisans how they would feel if their child were to marry someone from the other political party. As many as half of respondents report that their child marrying someone from the other party would make them unhappy, indicating, according to Iyengar, Sood, and Lelkes (2012), a large amount of affective polarization. Among the potential issues with this measure, the most problematic is that it conflates two distinct phenomena: (1) affective polarization; and (2) a dislike for political parties generally (Klar and Krupnikov 2016). Further, the measure is included alongside other political questions, heightening the salience of partisan considerations. Finally, respondents hear only about the hypothetical in-law’s partisanship, but not other traits or descriptors, implying that partisanship is a particularly important aspect of the potential in-law’s identity. To examine the effects of these features on measured affective polarization, we conduct a survey experiment on two nationally representative samples at two points in time during 2016, with some respondents interviewed in both samples. The experiment separates a dislike for parties in general from a specific dislike for the outparty by asking respondents how they would feel about a child marrying an individual from both parties. Further, the experiment is part of a survey that does not ask other political questions. Finally, we tell respondents how frequently the hypothetical in-law discusses politics, thereby providing context about how important partisan politics is to the hypothetical in-law. Results show that affective polarization is a dominant trait only among the one-third of respondents who identify as strong partisans, and they become more polarized during the 2016 presidential campaign. The majority of individuals are not “affectively polarized”; rather, many are averse to partisan politics. Because this question is only one possible measure of affective polarization, in the Online Appendix we use ANES feeling thermometers to confirm that affective polarization is largely confined to strong partisans. Affective Polarization We suggest that studies measuring affective polarization inadvertently measure two distinct concepts: (1) dislike for the outparty; and (2) dislike for partisanship in general. As affective polarization has ostensibly grown, the percentage of people reporting that they dislike both parties also has increased (Smith 2015). Moreover, both a dislike of partisanship and a genuine dislike of the outparty manifest themselves in lower ratings of the outparty because people are more willing to publicly denigrate the outparty even if they actually dislike both parties (Groenendyk 2013). Of primary interest in this paper is a measure of affective polarization based on the Social Distance Scale, originally developed by Bogardus (1926) to measure social distance between racial and ethnic groups. The original scale included seven items, with willingness to marry a member of a particular group indicating the least social distance. Almond and Verba (1963) adapted the scale to political partisanship, asking respondents if they would feel pleased, displeased, or indifferent if their child were to marry “across party lines.” They found that about 5 percent of partisans would be displeased and a similar number reported they would be pleased. Hence, 90 percent of partisans were “indifferent” about their child marrying someone of the outparty. One problem with this question as a measure of affective polarization is that it asks only about the child marrying someone from the outparty. In order to measure affective polarization properly, one must identify those who both dislike the outparty and like their inparty. When researchers ask only about dislike for the other party, they run the risk of overestimating affective polarization. For example, Klar and Krupnikov (2016) find that respondents dislike working with someone who talks about politics even when that hypothetical colleague agrees with their political views. Further, an often overlooked confound arises in surveys when respondents infer omitted information (Dafoe, Zhang, and Caughey 2016). Some respondents will assume that partisanship is an important identity for the hypothetical child-in-law if the question mentions only partisanship in describing this individual. This can increase the probability that the respondent is unhappy, because the majority of Americans dislike strong partisans (Klar and Krupnikov 2016). Finally, measuring affective polarization in the context of a larger political survey can prime partisan considerations. In particular, questions about partisan politics may bring to mind political polarization, leading respondents to believe that the hypothetical in-law is more extreme, as Americans tend to overestimate the ideological extremity of the other party’s members (Levendusky and Malhotra 2016). Methods DATA Respondents to our two survey experiments were members of the nationally representative GfK sample via Time-Sharing Experiments for Social Sciences. The first survey took place from January 21 to February 1, 2016. The second began July 19 and finished on August 10, 2016 (during the Republican and Democratic conventions). The first sample consisted of 2,030 adult Americans; and the second sample of 2,136 adult Americans, 1,428 of whom had participated in the first survey.2 On the first survey, respondents also participated in an unrelated experiment about wages and the stock market. No partisan political actors were mentioned. In the second survey, respondents answered no questions prior to our experiment. Hence, nothing should prime partisan considerations. EXPERIMENT Both samples were randomly assigned to one of three groups. Respondents who participated in both surveys were assigned to the same treatment, allowing us to look at changes over time. First, following Iyengar, Sood, and Lelkes (2012), one-third of the sample was asked both of the following questions: “How would you feel if you had a son or daughter who married someone who votes for the Democratic Party? Would you feel unhappy or happy?” and “How would you feel if you had a son or daughter who married someone who votes for the Republican Party? Would you feel unhappy or happy?”3 Responses fell on a five-point scale ranging from “very unhappy” to “very happy.” The second group received the same questions but with one important change: the hypothetical child-in-law was described as someone who “talks about politics rarely.” The final third of the sample read about a child-in-law who discusses politics “frequently.” These treatments eliminate the need for the respondent to infer the importance of partisanship to the child-in-law. Within each of the three experimental groups, we randomly assigned respondents to read about the hypothetical child-in-law’s partisan affiliation in one of two different ways. Half read about an in-law who either “supports the Democratic Party [Republican Party],” and the other half read about an in-law who “supports local Democratic [Republican] candidates.” This allows us to distinguish dislike for national-level parties from a dislike for any candidate affiliated with that party. In addition to the experimental measures, the instrument included items related to partisanship, education levels, gender, race, and census region. These measures allow for the identification of types of individuals who are more prone toward affectively polarization. Results HAPPINESS In figure 1, the left side of both the top panel (Wave 1, January 2016) and bottom panel (Wave 2, July/August 2016) displays the measure reported in previous studies: unhappiness with a child-in-law who supports the other party. Considering the control group in both waves, slightly more than 30 percent of respondents reported unhappiness at the notion of their child marrying someone from the outparty. This proportion is slightly greater than the percentage that Iyengar, Sood, and Lelkes (2012) report for 2008, but about 10 percentage points less than their 2010 results. Figure 1. View largeDownload slide Measuring respondent unhappiness with their child marrying someone from the other party and happiness with child marrying someone from their own party. All estimates adjusted using probability weights. Figure 1. View largeDownload slide Measuring respondent unhappiness with their child marrying someone from the other party and happiness with child marrying someone from their own party. All estimates adjusted using probability weights. When the hypothetical in-law rarely discusses politics (light gray bars), the percentage of reported unhappiness drops by about five percentage points (p < .05, one-tailed test). When the hypothetical in-law discusses politics frequently (dark gray bars), however, there is a larger and statistically significant difference in reported unhappiness (about 10 percentage points). This suggests that many individuals are more averse to disagreeable political discussion in their family than they are toward members of the other party generally. Important patterns emerge when the analysis considers partisan strength. Not surprisingly, weak/leaning respondents in the control group are about 30 percentage points less unhappy about partisan intermarriage than are strong partisans. Differences in treatment effects become apparent as well: while weak/leaning partisans are affected only when the child-in-law discusses politics frequently, strong partisans are more affected when discussion is rare (the decrease is statistically significant in Wave 1 but not Wave 2). The second panel in both graphs illustrates a previously underexplored aspect of affective polarization: happiness with their child marrying someone from their own party. The vast majority of people do not care if their child marries someone from their own party: only about 35 percent of respondents in the control group would be happy if this occurred. That number drops below 30 percent in both treatment groups in both surveys (p < .05, two-tailed tests). The treatment effects are largest among strong partisans. Compared with the control group, both experimental conditions lower strong partisans’ reported happiness with their child marrying a copartisan. Strong partisans are less happy if their child marries someone from their party who rarely talks about politics, presumably because in that situation partisanship is irrelevant. But they are also less happy if their child marries someone from their party who frequently talks about politics. Even strong partisans dislike too much political discussion—even agreeable discussion. POLARIZATION The dependent variable in figure 2 is our best measure of affective polarization. It is a dummy variable coded 1 if a respondent is happy about his/her child marrying a copartisan and unhappy about his/her child marrying an opposing partisan and 0 in all other cases.4 In the control group, about 25 percent of respondents in both Wave 1, January 2016 (top panel) and Wave 2, July/August 2016 (bottom panel) are affectively polarized—that is, they are happier when their child marries someone from the inparty than the outparty. The control group, however, cannot distinguish people emotionally invested in partisanship from those who want to engage in only agreeable political discussion (see Huckfeldt and Mendez [2008]). Figure 2. View largeDownload slide Subjects who are affectively polarized by partisan strength and treatment. All estimates adjusted using probability weights. Figure 2. View largeDownload slide Subjects who are affectively polarized by partisan strength and treatment. All estimates adjusted using probability weights. Hence, it is important to look at the Rarely treatment. If a partisan respondent gives polarized responses even when they know they will rarely have to engage in political discussion with the opposing partisan, then that person is affectively polarized. Only about 15 percent of all respondents are affectively polarized in both surveys. That number is less than 10 percent among weak/leaning respondents and about 25 percent of strong partisans. The Online Appendix includes an ordered logit model that estimates an individual’s level of polarization. A few consistent findings emerged from this model. First, strength of partisanship increases polarization. Second, Republicans are more polarized, but only if they are not educated. Third, college graduates are more polarized, but only if they are Democrats. The most important result confirms that previous polarization results are driven by a fear of disagreeable conversation and not pure affect, as the Rarely treatment always reduces polarization. In Wave 2 only, the Local Candidates treatment lowers the level of affective polarization, suggesting that some of the polarization in August is a reaction to a dislike of supporters of Donald Trump and Hillary Clinton and not necessarily the parties themselves. Depending on how one conceptualizes affective polarization, individuals who are responding only to presidential candidates might not be polarized because they are responding to specific political figures and not political groups. STABILITY OF POLARIZATION The final analysis examines the stability of polarization using the measure constructed for figure 2’s analysis. Across all treatments, about 65 percent of respondents had the same level of polarization in both surveys. Of those who gave different responses, 57 percent became more polarized in the summer and 43 percent became less polarized. Table 1’s multinomial logit provides a closer look at these differences. The dependent variable has three categories: (1) the respondent is less polarized in the summer; (2) the respondent is more polarized in the summer; (3) there is no change in polarization (the reference category). The main independent variables are the treatment variables with partisan strength interacted with both Frequently and Rarely. We control for polarization level in Wave 1. We also include a series of control variables measured by GfK prior to our survey. Inclusion of control variables is necessary to avoid omitted variable bias with the partisan strength variable (Kam and Trussler forthcoming), but no substantive conclusions change if they were omitted (see Online Appendix A3). Table 1. Predicting changes in polarization levels Less polarized More polarized b (SE) b (SE) Rarely 0.46 (0.34) –0.52 (0.27)# Frequently 0.23 (0.39) –0.05 (0.25) Strong partisan –0.96 (0.46)* 0.43 (0.32) Rarely*Strong partisan 0.52 (0.57) 1.07 (0.41)** Frequently*Strong partisan 1.01 (0.59)# 0.35 (0.40) Local treatment 0.67 (0.22)** –0.06 (0.16) January polarization 1.24 (0.09)** 0.11 (0.09) Republican –0.52 (0.31) 0.31 (0.21) College degree –0.56 (0.32)# 0.66 (0.22)** Republican*College 0.74 (0.46) –0.92 (0.33)** Male 0.49 (0.23)* 0.06 (0.16) White –0.38 (0.44) 0.38 (0.41) Black –0.85 (0.65) 0.06 (0.49) Hispanic –0.86 (0.60) 0.16 (0.48) Midwest –0.31 (0.37) –0.50 (0.24)* South –0.08 (0.32) –0.32 (0.21) West 0.27 (0.35) 0.03 (0.24) Constant –3.05 (0.65)** –1.76 (0.44)** Number of respondents 1,336 A.I.C. 1914.97 Less polarized More polarized b (SE) b (SE) Rarely 0.46 (0.34) –0.52 (0.27)# Frequently 0.23 (0.39) –0.05 (0.25) Strong partisan –0.96 (0.46)* 0.43 (0.32) Rarely*Strong partisan 0.52 (0.57) 1.07 (0.41)** Frequently*Strong partisan 1.01 (0.59)# 0.35 (0.40) Local treatment 0.67 (0.22)** –0.06 (0.16) January polarization 1.24 (0.09)** 0.11 (0.09) Republican –0.52 (0.31) 0.31 (0.21) College degree –0.56 (0.32)# 0.66 (0.22)** Republican*College 0.74 (0.46) –0.92 (0.33)** Male 0.49 (0.23)* 0.06 (0.16) White –0.38 (0.44) 0.38 (0.41) Black –0.85 (0.65) 0.06 (0.49) Hispanic –0.86 (0.60) 0.16 (0.48) Midwest –0.31 (0.37) –0.50 (0.24)* South –0.08 (0.32) –0.32 (0.21) West 0.27 (0.35) 0.03 (0.24) Constant –3.05 (0.65)** –1.76 (0.44)** Number of respondents 1,336 A.I.C. 1914.97 Note.—Estimates from a multinomial logit model. Dependent variable has three categories: (-1) respondent is less polarized in the summer than January; (0) respondent has no change in polarization levels; and (1) respondent is more polarized in the summer than January. The no change category is excluded as the reference. All estimates adjusted using probability weights. #p < .10; *p < .05; **p < .01 in two-tailed tests. View Large Table 1. Predicting changes in polarization levels Less polarized More polarized b (SE) b (SE) Rarely 0.46 (0.34) –0.52 (0.27)# Frequently 0.23 (0.39) –0.05 (0.25) Strong partisan –0.96 (0.46)* 0.43 (0.32) Rarely*Strong partisan 0.52 (0.57) 1.07 (0.41)** Frequently*Strong partisan 1.01 (0.59)# 0.35 (0.40) Local treatment 0.67 (0.22)** –0.06 (0.16) January polarization 1.24 (0.09)** 0.11 (0.09) Republican –0.52 (0.31) 0.31 (0.21) College degree –0.56 (0.32)# 0.66 (0.22)** Republican*College 0.74 (0.46) –0.92 (0.33)** Male 0.49 (0.23)* 0.06 (0.16) White –0.38 (0.44) 0.38 (0.41) Black –0.85 (0.65) 0.06 (0.49) Hispanic –0.86 (0.60) 0.16 (0.48) Midwest –0.31 (0.37) –0.50 (0.24)* South –0.08 (0.32) –0.32 (0.21) West 0.27 (0.35) 0.03 (0.24) Constant –3.05 (0.65)** –1.76 (0.44)** Number of respondents 1,336 A.I.C. 1914.97 Less polarized More polarized b (SE) b (SE) Rarely 0.46 (0.34) –0.52 (0.27)# Frequently 0.23 (0.39) –0.05 (0.25) Strong partisan –0.96 (0.46)* 0.43 (0.32) Rarely*Strong partisan 0.52 (0.57) 1.07 (0.41)** Frequently*Strong partisan 1.01 (0.59)# 0.35 (0.40) Local treatment 0.67 (0.22)** –0.06 (0.16) January polarization 1.24 (0.09)** 0.11 (0.09) Republican –0.52 (0.31) 0.31 (0.21) College degree –0.56 (0.32)# 0.66 (0.22)** Republican*College 0.74 (0.46) –0.92 (0.33)** Male 0.49 (0.23)* 0.06 (0.16) White –0.38 (0.44) 0.38 (0.41) Black –0.85 (0.65) 0.06 (0.49) Hispanic –0.86 (0.60) 0.16 (0.48) Midwest –0.31 (0.37) –0.50 (0.24)* South –0.08 (0.32) –0.32 (0.21) West 0.27 (0.35) 0.03 (0.24) Constant –3.05 (0.65)** –1.76 (0.44)** Number of respondents 1,336 A.I.C. 1914.97 Note.—Estimates from a multinomial logit model. Dependent variable has three categories: (-1) respondent is less polarized in the summer than January; (0) respondent has no change in polarization levels; and (1) respondent is more polarized in the summer than January. The no change category is excluded as the reference. All estimates adjusted using probability weights. #p < .10; *p < .05; **p < .01 in two-tailed tests. View Large Figure 3 presents the treatment probabilities by strength of partisanship. In the left panel, among the weak/leaning partisans, people in Control and Frequently were equally likely to be less polarized as more polarized, indicating no aggregate change. In the Rarely treatment, however, respondents were twice as likely to become less polarized as more polarized. Strong partisans in the right panel were more likely to be polarized in all three treatments. Figure 3. View largeDownload slide Polarization effects by partisan strength and treatment. Predicted probabilities calculated using values in table 1. Figure 3. View largeDownload slide Polarization effects by partisan strength and treatment. Predicted probabilities calculated using values in table 1. Interestingly, subjects who were not in the Local Candidates treatment were more likely to become more polarized. Again, this may suggest that measures of affective polarization often capture dislike for the partisan politics respondents see in the news instead of dislike for citizens who are Democrats or Republicans. This could also explain why weak/leaning partisans became less partisan in the Rarely treatment. Spending time with anyone—even someone from the other party—who will not talk about politics is preferred. Conclusion In this paper, we argue that the extent to which modern Americans are “affectively” polarized may be overstated. Rather, there are two distinct phenomena that are easily conflated: affective polarization and a desire to avoid partisan politics. Many strong partisans are affectively polarized. This can make it appear that everyone is polarized, because ideologically extreme partisans are the most politically engaged (Klar 2013). In the presidential campaign, they became even more polarized. But for many Americans—the weak/leaning partisans—the thought of having to discuss politics with even someone from their own party is unappetizing. It is important to note that these results do not imply that affective polarization has not increased—indeed, it has increased. It also does not mean that many people actually like the other party. Rather, scholars are underestimating how much people dislike their own party. In the Online Appendix, we analyze ANES feeling thermometers and find an increasing dislike of the inparty in recent years. These data are consistent with a theory of “negative partisanship” (Abramowitz and Webster 2016)—individuals support their own party mainly because they dislike the other party—but not consistent with affective polarization. The results in this study help improve our understanding of how affective polarization exists alongside weakening partisan identities (Klar and Krupnikov 2016). The implications of these results extend beyond how to measure polarization. Respondents in our surveys appear willing to spend time with individuals with whom they disagree as long as they do not talk about politics. The frequency of disagreement is one of the key variables in the social networks literature (e.g., Huckfeldt, Johnson, and Sprague 2004; Mutz 2006; Mutz and Mondak 2006; Ahn, Huckfeldt, and Ryan 2014). This study further demonstrates the difficulties with the conceptualization and measurement of social network disagreement (Klofstad, Sokhey, and McClurg 2013). Supplementary Data Supplementary data are freely available at Public Opinion Quarterly online. This research was conducted via Time-Sharing Experiments in the Social Sciences (TESS3 197-Klar and TESS3 221 Klar) to S.K. Footnotes 1. As of September 2017, Iyengar, Sood, and Lelkes (2012), who include the measure, have been cited between 150 (Scopus) and 408 (Google Scholar). Not all citations are to the measure. 2. GfK recruits panel members using address-based sampling. Panel members are emailed when they have been assigned to a study, which may be completed online. The first survey’s completion rate was 62 percent. The cumulative response rate is 5.3 percent considering panel recruitment and retention (Callegaro and DiSogra 2008). For the second survey, GfK contacted 1,921 of the first survey’s respondents (74.3 percent completion rate) and 1,208 new respondents (58.6 percent completion rate). This study had a 6.0 percent cumulative response rate. Pure independents are excluded from analyses. 3. The question order was randomized. 4. We replicate figure 2’s analysis by age cohort in Online Appendix A4. References Abramowitz , Alan I. , and Steven Webster . 2016 . “ The Rise of Negative Partisanship and the Nationalization of U.S. Elections in the 21st Century .” Electoral Studies 41 : 12 – 22 . Google Scholar CrossRef Search ADS Ahn , T. K. , Robert Huckfeldt , and John Barry Ryan . 2014 . Experts, Activists, and Democratic Politics: Are Electorates Self-Educating? New York : Cambridge University Press . Almond , Gabriel A. , and Sidney Verba . 1963 . The Civic Culture: Political Attitudes and Democracy in Five Nations . Princeton, NJ : Princeton University Press . Google Scholar CrossRef Search ADS Bogardus , Emory S . 1926 . “ Social Distance in the City .” Proceedings and Publications of the American Sociological Society 20 : 40 –4 6 . Callegaro , Mario , and Charles DiSogra . 2008 . “ Computing Response Metrics for Online Panels .” Public Opinion Quarterly 72 : 1008 – 32 . Google Scholar CrossRef Search ADS Dafoe , Allan , Baobao Zhang , and Devin Caughey . 2016 . “ Confounding in Survey Experiments: Diagnostics and Solutions .” Working Paper. http://www.allandafoe.com/confounding. Groenendyk , Eric . 2013 . Competing Motives in the Partisan Mind: How Loyalty and Responsiveness Shape Party Identification and Democracy . New York : Oxford University Press . Google Scholar CrossRef Search ADS Huckfeldt , Robert , Paul E. Johnson , and John Sprague . 2004 . Political Disagreement: The Survival of Diverse Opinions within Communication Networks . New York : Cambridge University Press . Google Scholar CrossRef Search ADS Huckfeldt , Robert , and Jeanette Morehouse Mendez . 2008 . “ Moths, Flames, and Political Engagement: Managing Disagreement within Communication Networks .” Journal of Politics 70 : 83 – 96 . Google Scholar CrossRef Search ADS Iyengar , Shanto , Gaurav Sood , and Yphtach Lelkes . 2012 . “ Affect, Not Ideology: A Social Identity Perspective on Polarization .” Public Opinion Quarterly 76 : 405 – 31 . Google Scholar CrossRef Search ADS Iyengar , Shanto , and Sean J. Westwood . 2015 . “ Fear and Loathing across Party Lines: New Evidence on Group Polarization .” American Journal of Political Science 59 : 690 – 707 . Google Scholar CrossRef Search ADS Kam , Cindy D. , and Marc J. Trussler . Forthcoming. “ At the Nexus of Experimental and Observational Research: Theory, Specification, and Analysis of Experiments with Heterogeneous Treatment Effects .” Political Behavior . doi: 10.1007/s11109-016-9379-z . Klar , Samara . 2013 . “ Identity and Engagement among Political Independents in America .” Political Psychology 35 : 577 – 91 . Google Scholar CrossRef Search ADS Klar , Samara , and Yanna Krupnikov . 2016 . Independent Politics: How American Disdain for Parties Leads to Political Inaction . New York : Cambridge University Press . Google Scholar CrossRef Search ADS Klofstad , Casey A. , Anand Edward Sokhey , and Scott D. McClurg . 2013 . “ Disagreeing about Disagreement: How Conflict in Social Networks Affects Political Behavior .” American Journal of Political Science 57 : 120 – 34 . Google Scholar CrossRef Search ADS Levendusky , Matthew S. , and Neil Malhotra . 2016 . “ (Mis)Perceptions of Partisan Polarization in the American Public .” Public Opinion Quarterly 80 : 387 – 91 . Google Scholar CrossRef Search ADS Mason , Lilliana . 2015 . “ ‘I Disrespectfully Agree’: The Differential Effects of Partisan Sorting on Social and Issue Polarization .” American Journal of Political Science 59 : 128 – 45 . Google Scholar CrossRef Search ADS Mutz , Diana C . 2006 . Hearing the Other Side: Deliberative versus Participatory Democracy . New York : Cambridge University Press . Google Scholar CrossRef Search ADS Mutz , Diana C. , and Jeffery J. Mondak . 2006 . “ The Workplace as a Context for Cross-Cutting Political Discourse .” Journal of Politics 68 : 140 – 55 . Google Scholar CrossRef Search ADS Smith , Samantha . 2015 . “ 24 Percent of Americans Now View Both GOP and Democratic Party Unfavorably .” Pew Research Center , August 21 . http://www.pewresearch.org/fact-tank/2015/08/21/24-of-americans-now-view-both-gop-and-democratic-party-unfavorably/. © The Author(s) 2018. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. All rights reserved. For permissions, please e-mail: [email protected] This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)
Implications of Moving Public Opinion Surveys to a Single-Frame Cell-Phone Random-Digit-Dial Design2018 Public Opinion Quarterly
doi: 10.1093/poq/nfy016
Abstract With the share of US adults living in households with cell phones climbing over 90 percent (Blumberg and Luke 2016), survey researchers are considering whether it is necessary to continue sampling landlines in addition to cell phones in RDD surveys. This transition raises a number of questions, including whether it will systematically change public opinion estimates, subgroup estimates, long-standing trends, as well as precision. To address these questions, we analyzed data from Pew Research Center national dual-frame RDD surveys conducted from 2012 to 2015. We compared the final survey estimates to those computed with an experimental weight that excluded the landline sample, thus simulating a single-frame cell-phone design. Analysis of more than 250 survey questions shows that when landlines are excluded, estimates change by less than one percentage point, on average. In addition, responding samples of adults reached via cell phone are much more demographically representative of the United States than responding samples of adults reached via landline, thus requiring less aggressive weighting and yielding lower design effects and smaller margins of error, relative to dual-frame RDD. While most estimates commonly reported from public opinion surveys seem unaffected by this design change, there are some exceptions. While considerations of cost and concerns about maximizing the population coverage rate may continue to favor dual-frame RDD for some time, these results demonstrate that by and large public opinion researchers have little to fear and potentially much to gain when landlines become a thing of the past. Introduction Since the late 2000s, numerous organizations conducting telephone surveys have gradually increased the ratio of cell-phone random-digit-dial (RDD) interviews to landline RDD interviews in response to the public’s shifting telephone usage patterns. Numerous surveys that featured majority landline interviewing a decade ago now feature majority cell-phone interviewing (Langer 2015; Kaiser Family Foundation 2016; McGeeney 2016; Newport 2016; SSRS 2018). With cell-phone household penetration among US adults climbing over 90 percent and landline penetration declining each year (Blumberg and Luke 2016), this transition is approaching its natural conclusion: RDD designs that abandon landline interviewing altogether.1 While this major shift for the telephone survey industry is near at hand, to date just one peer-reviewed study has explored its implications. Using 2010 data from the CDC’s National Intimate Partner and Sexual Violence Survey (NISVS), Peytchev and Neely (2013) compared that study’s dual-frame estimates to estimates from several simulated designs, including one that used only cell-phone interviews. They found that depending on the estimate of interest, the variance efficiencies from a single-frame cell-phone design may outweigh the coverage bias from excluding adults not reachable on a cell phone. While that study demonstrated the potential for superior data quality with a single-frame cell-phone design, it did not lay to rest all concerns surrounding this transition. For public opinion researchers, a key question is whether the Peytchev and Neely results, based on an analysis of 16 health and victimization items in a federally sponsored study, generalize to fast-turnaround low-response-rate opinion surveys measuring attitudes and behaviors on a range of different topics. The impact of moving to a single-frame cell-phone design is fundamentally a question-level outcome. While it may be safe to exclude landlines when asking some questions, that may not be the case for other questions probing different topics. For example, a study by Reifman and Niehuis (2015) suggests that dropping the landline sample may have a negative effect on the accuracy of likely voter estimates in some low-turnout elections. The accuracy of estimates from single-frame cell-phone samples could differ from those of dual-frame samples for a number of reasons. Perhaps most important is the impact of single-frame designs on coverage of the population of interest. Estimates from the 2015 National Health Interview Survey (NHIS) indicate that 91 percent of US adults live in a household with a cell phone, 6 percent have no cell phone but do have a residential landline, and another 3 percent have no phone (Blumberg and Luke 2016). This means that a dual-frame RDD design covers 97 percent of adults, while a single-frame cell-phone RDD design covers 91 percent. Interestingly, while the cell-phone-only rate has increased steadily in recent years (at about four percentage points a year), the overall share of US adults in households with a cell phone appears to have plateaued, hovering in the range of 90 to 91 percent for the past five years (Blumberg and Luke 2016). The expectation that the coverage provided by a single-frame cell-phone RDD design will soon match that provided by a dual-frame RDD design is not well supported by data. Another potential source of difference is nonresponse. While response rates on landlines and cell phones are very similar (Dutwin and Lavrakas 2016), research has shown that different kinds of people are more comfortable responding to surveys on one type of device or the other (e.g., Kennedy 2007). A third source of possible differences between dual-frame and single-frame cell-phone surveys is measurement error. Published research suggests that there is little difference between landlines and cell phones with respect to measurement error, but there are theoretical reasons for expecting that differences could occur (Kennedy and Everett 2011; Lynn and Kaminska 2013). All of these sources of potential differences between dual-frame and single-frame cell-phone surveys could affect estimates for individual questions, but the impact of this transition extends even further. What impact would moving to a single-frame cell-phone design have on public opinion trends, such as presidential approval, political party affiliation, opinion about gun control, and so on? If a survey transitions to single-frame cell-phone interviewing, will that lead to a disruption in the time series? We expect that the answer depends on the survey estimate of interest and the ratio of cell-phone to landline interviewing used in the survey during the time series. To our knowledge, no published work has examined the consequences for public opinion trends associated with moving to a single-frame cell-phone design. This transition also raises questions about subgroup analyses. Even if full sample estimates (e.g., based on all adults) are unaffected by dropping landlines, this does not mean that subgroup estimates would be unaffected. Cell-phone penetration rates vary across demographic groups and by geography (Blumberg and Luke 2016). For subgroups where cell-phone adoption has been slower (e.g., the elderly), discarding landline samples may have a more noticeable and problematic effect on survey estimates. The NISVS study only considered differences by gender, which is not particularly correlated with cell-phone penetration. Finally, moving to single-frame cell-phone interviewing has largely unknown consequences for the estimated design effect and the 95 percent margin of error commonly reported for public opinion surveys. Peytchev and Neely (2013) found that moving to a single-frame cell-phone design may reduce design effects and standard errors, but the vintage of their data is a limitation. Since the 2010 NISVS data were collected, the share of US adults who are cell-phone-only has nearly doubled, from 25 percent to 48 percent (Blumberg and Luke 2016). As a consequence, the variance in weights one would expect under a dual-frame design versus a single-frame cell-phone design stands to be different today than in 2010. In anticipation of the impending transition to single-frame cell-phone surveys, the Pew Research Center has been generating an experimental weight in nearly all of its national RDD surveys since 2012. The experimental weight uses only interviews conducted with the cell-phone sample, simulating a single-frame cell-phone design. In this paper, we bring data from this collection of national telephone surveys to bear on these questions about survey data quality when landlines are no longer sampled. We examine the impact of dropping landlines on estimates for a wide range of topics, ranging from attitudes about abortion, homosexuality, and foreign policy to opinions about online dating, personal finances, and views on the 2016 election. We break out results for key subgroups, including adults age 65 and over who, by virtue of their lower cell-phone adoption rate, are more likely to be affected by this design change. We also present average approximate design effects and 95 percent margins of error with and without the landline sample, so that practitioners can estimate the impact of this transition on survey precision. Data and Methods This analysis draws on 28 Pew Research Center surveys conducted from 2012 to 2015 and one survey from 2008. Different sections of the analysis leverage different surveys from that time period, based on the research question being addressed. All of the surveys analyzed are dual-frame landline and cell-phone RDD surveys of US adults conducted by a pair of firms (PSRAI or Abt SRBI) using protocols that varied very little over time, aside from the ratio of landline to cell-phone interviews. Most of these surveys employ a design similar to that used by other prominent public polls, such as those conducted by major newspapers, national television networks, and many university survey centers. Such surveys typically have very short field periods (e.g., four or five days), with multiple calls to each sampled number (Pew Research Center uses a six-call design). Within-household selection in the landline sample entails asking for the youngest adult male/female (randomly assigned) 18 years of age or older who is currently at home. Cell phones are treated as personal devices; an interview is attempted with the adult who answers the phone.2 Each survey featured two weights: the production weight (the regular weight used in all reporting of the survey) and an experimental weight. The production weight was based on the total sample of landline and cell-phone interviews and was computed in two main stages. The first stage used a single-frame estimation approach (Bankier 1986; Kalton and Anderson 1986) to correct for different probabilities of selection associated with the number of adults in each household as well as the overlapping landline and cell sample frames. Single-frame estimation treats all the respondents as having been sampled from one frame, with adjusted weights for cases in the intersection domain (e.g., adults with both a landline and cell phone) relying on the inclusion probabilities for each frame. The second stage of weighting used raking to balance sample demographics to population parameters. The responding sample is balanced to match national population parameters for sex, age, education, race, Hispanic origin, region (U.S. Census definitions), population density, and telephone usage. Univariate marginal distributions are used as parameters for region, population density, and phone status. Other raking parameters are cross-classified: age by education, age by sex, education by sex, and race by Hispanic ethnicity. Hispanic ethnicity was cross-classified with nativity (US-born Hispanics versus non-US-born Hispanics). The white, non-Hispanic subgroup is also balanced on age, education, and region. The basic weighting parameters come from the most recently available one-year American Community Survey data (typically available about eight months after data collection for a given year is complete). The population density parameter was derived from the 2010 Decennial Census; based on their county of residence, respondents are sorted by density and grouped into quintiles for the raking. The telephone usage parameter was derived from the most recent published estimates from the NHIS. The experimental weight was computed using the same procedures as the full sample weight, except that there was no first-stage adjustment and a phone use parameter was not included in the raking. No first-stage adjustment was computed because no within-household selection was used with cell-phone cases and discarding the landline sample eliminated the sampling frame overlap. The response rate (AAPOR3) in these surveys ranged from 8 percent to 12 percent in the landline sample and 6 percent to 11 percent in the cell-phone sample (see Online Appendix). These rates are similar to those of polls conducted by other major media and opinion research organizations (Dutwin and Lavrakas 2016). Like many telephone survey organizations in the United States, the Pew Research Center has increased the share of interviews conducted on cell phones over time in response to shifting telephone service patterns documented in the NHIS (Blumberg and Luke 2016). In typical Pew surveys, this proportion has increased from 25 percent in 2007 to 75 percent in 2016. This means that any differences between cell-phone sample estimates and total sample estimates would be expected to narrow over time, as the former constitutes an increasingly large share of the latter. Given that this type of transition is common among RDD public opinion polls, we treat this as a natural feature of the data examined. No attempt is made to “control” for this change over time in the analysis of weighted estimates. Results DEMOGRAPHIC PROFILE OF RESPONDENTS Before turning to the substantive findings, it is instructive to examine historical RDD data to see how moving to a single-frame cell-phone design could affect responding sample composition, specifically with respect to the demographic profile of respondents. While responding cell-phone samples have become more representative of the US public, the trend with responding landline samples has been just the opposite (see Figure 1). In 2008 (the year Pew Research Center began routinely using dual-frame samples), both responding cell-phone and responding landline samples had lopsided age distributions, albeit in different directions. In a typical 2008 Pew Research Center survey (from September), the unweighted responding cell-phone sample underrepresented older adults—9 percent were age 65 and older, compared with an American Community Survey (ACS) benchmark of 19 percent—and overrepresented young adults, as 31 percent were age 18 to 29, compared with an ACS benchmark of 21 percent. At the same time, the responding landline sample was underrepresenting young adults (with only 9 percent in this age group) while overrepresenting older adults (25 percent). Figure 1. View largeDownload slide Unweighted age profile of cell phone and landline RDD samples in 2008 and 2015. “Don’t know” and refused responses are not shown. Data source: Pew Research Center surveys conducted September 9–14, 2008, and September 22–27, 2015. Figure 1. View largeDownload slide Unweighted age profile of cell phone and landline RDD samples in 2008 and 2015. “Don’t know” and refused responses are not shown. Data source: Pew Research Center surveys conducted September 9–14, 2008, and September 22–27, 2015. The story is quite different today, as cell-phone use has grown among all age groups. In a September 2015 survey, the unweighted age distribution of the responding cell-phone sample was very close to ACS population benchmarks (Table 1), including for those age 65 and older (16 percent in the cell-phone sample versus 19 percent in ACS). Just as the improvement in responding cell-phone samples has been rapid and dramatic, so has the deterioration of responding landline samples. From 2008 to 2015, the proportion of adults interviewed on landlines that are age 50 or older has ballooned from 58 percent to 76 percent. Table 1. Demographic profile of US adults, cell-phone RDD sample respondents and landline RDD sample respondents US adult benchmarks (%)(weighted) Cell-phone sample (%)(unweighted) Landline sample (%)(unweighted) White, non-Hispanic 65 64** 82 Hispanic 15 14** 6 Black, non-Hispanic 12 11** 6 Other, non-Hispanic 8 9** 3 18–29 22 21** 6 30–49 34 34** 17 50–64 26 28** 35 65+ 19 16** 41 Male 48 56** 45 Female 52 44** 55 High school grad or less 41 30 30 Some college/associate degree 31 28 26 Bachelor’s degree or more 28 41 44 Unweighted n 2,403,157 977 525 US adult benchmarks (%)(weighted) Cell-phone sample (%)(unweighted) Landline sample (%)(unweighted) White, non-Hispanic 65 64** 82 Hispanic 15 14** 6 Black, non-Hispanic 12 11** 6 Other, non-Hispanic 8 9** 3 18–29 22 21** 6 30–49 34 34** 17 50–64 26 28** 35 65+ 19 16** 41 Male 48 56** 45 Female 52 44** 55 High school grad or less 41 30 30 Some college/associate degree 31 28 26 Bachelor’s degree or more 28 41 44 Unweighted n 2,403,157 977 525 Note.—“Don’t know” and refused responses are not shown. Source.—Cell-phone and landline sample data are from a Pew survey conducted September 22–27, 2015; US adult benchmarks are from the 2014 American Community Survey; some percentages do not equal 100 due to rounding; asterisks reflect significance of the difference of proportions z-test for the cell-phone sample estimate versus the landline sample estimate: **p < .01. View Large Table 1. Demographic profile of US adults, cell-phone RDD sample respondents and landline RDD sample respondents US adult benchmarks (%)(weighted) Cell-phone sample (%)(unweighted) Landline sample (%)(unweighted) White, non-Hispanic 65 64** 82 Hispanic 15 14** 6 Black, non-Hispanic 12 11** 6 Other, non-Hispanic 8 9** 3 18–29 22 21** 6 30–49 34 34** 17 50–64 26 28** 35 65+ 19 16** 41 Male 48 56** 45 Female 52 44** 55 High school grad or less 41 30 30 Some college/associate degree 31 28 26 Bachelor’s degree or more 28 41 44 Unweighted n 2,403,157 977 525 US adult benchmarks (%)(weighted) Cell-phone sample (%)(unweighted) Landline sample (%)(unweighted) White, non-Hispanic 65 64** 82 Hispanic 15 14** 6 Black, non-Hispanic 12 11** 6 Other, non-Hispanic 8 9** 3 18–29 22 21** 6 30–49 34 34** 17 50–64 26 28** 35 65+ 19 16** 41 Male 48 56** 45 Female 52 44** 55 High school grad or less 41 30 30 Some college/associate degree 31 28 26 Bachelor’s degree or more 28 41 44 Unweighted n 2,403,157 977 525 Note.—“Don’t know” and refused responses are not shown. Source.—Cell-phone and landline sample data are from a Pew survey conducted September 22–27, 2015; US adult benchmarks are from the 2014 American Community Survey; some percentages do not equal 100 due to rounding; asterisks reflect significance of the difference of proportions z-test for the cell-phone sample estimate versus the landline sample estimate: **p < .01. View Large There are also stark differences between sample types on race and ethnicity. In the survey fielded in September 2015, about one in 10 cell-phone respondents (11 percent) were black and about one in seven (14 percent) were Hispanic. These figures closely mirror the actual size of these groups in the US adult population (12 percent and 15 percent, respectively). The responding landline sample, by contrast, skewed heavily white non-Hispanic (82 percent) relative to the adult population (65 percent). While responding cell-phone RDD samples look quite representative on age and race/ethnicity, they perform less well on other dimensions. For one thing, they tend to skew male. The responding cell-phone sample in the September 2015 survey was 56 percent male, 44 percent female, falling short of the ACS benchmark for women by eight percentage points. This type of result is fairly common in cell-phone RDD samples (Kennedy 2007; Link et al. 2007). The mechanism(s) behind this gender pattern are not well understood.3 Education is a less mysterious but no less important challenge with both cell-phone and landline samples. Both types of samples underrepresent adults with lower levels of educational attainment. Adults with a high school education or less constitute 41 percent of the public but less than a third of the responding landline and cell-phone samples (30 percent for each sample type) in the September 2015 survey. To address the gender and education disparities, as well as other demographic imbalances, Pew Research Center and other major survey organizations use weighting, in particular techniques such as raking, to align the survey sample to the population benchmarks. KEY PUBLIC OPINION TRENDS If moving to a single-frame cell-phone design disrupts public opinion trends, this would pose a problem for the rich trove of RDD time series collected during the past several decades. To assess this risk, we examined two key opinion trends for which we have both total sample (landline and cell-phone) design and simulated single-frame cell-phone design estimates. Figure 2 shows the 2012 to 2015 trend for the percent of US adults identifying as Republican or leaning Republican and the trend for Barack Obama’s presidential job approval rating. These two outcomes in particular were examined because they are among the few constructs measured routinely in Pew Research Center political surveys and they are widely used by opinion researchers as foundational measures of public sentiment. Figure 2. View largeDownload slide Trends in Republican Party affiliation (left panel) and presidential approval (right panel) based on RDD estimates using dual-frame RDD (solid line) versus the cell phone sample only (dotted line). Figure 2. View largeDownload slide Trends in Republican Party affiliation (left panel) and presidential approval (right panel) based on RDD estimates using dual-frame RDD (solid line) versus the cell phone sample only (dotted line). The trend lines produced with the two different designs largely overlap for both outcomes. In April 2012, the share of US adults identifying as Republican or leaning Republican was nearly identical when computed using the total sample (39 percent) versus just the cell-phone sample (40 percent). The result was highly similar in a September 2015 survey (41 percent for total sample versus 42 percent for the cell-phone sample). The contours of the trends also appear to be largely unaffected by excluding landlines. Both the total sample estimates and the cell-phone sample estimates show Obama’s job approval numbers increasing from the mid-40s to the low 50s in 2012, dipping in 2013–2014, and then rebounding slightly in 2015. These results suggest that for organizations that have been increasing the share of cell-phone interviews over time, a shift to single-frame cell-phone seems unlikely to disrupt such time series. TESTING OVER 250 SURVEY QUESTIONS While it may be safe to exclude landlines when asking some questions, that may not be the case for other questions probing different topics. To address this, we examined 278 questions from eight national dual-frame RDD surveys conducted between September 2014 and September 2015. The studies were selected from a broad pool to represent a wide range of topics, including attitudes about abortion, homosexuality, and foreign policy to opinions about online dating, personal finances, and views on the 2016 election. We could have included many more surveys on politics, for example, but these would be largely redundant with respect to many of the measures already covered in the eight studies chosen. A complete listing of the questions is provided in the Online Appendix. For each question, we computed the difference between the weighted estimate based on the total sample (landline and cell phone) and the weighted estimate based on the cell-phone sample alone. The average difference (absolute value) between these two estimates was 0.75 points. The vast majority of estimates (87 percent) showed either a 0- or 1-point difference. For each of the 278 pairs of estimates, we computed a difference of proportions test accounting the overlap in the samples being compared as well as the approximate design effect from weighting. Some 12 survey estimates, less than 5 percent of the total number tested, were significantly different at the .05 level when based on the total sample weight versus the single-frame cell-phone design weight.4 The differences that passed the significance threshold do not exhibit much in the way of a substantive pattern. In one survey, the estimated share of adults saying the Democratic Party was doing an “excellent” job of standing up for its traditional positions was 8 percent with the total sample weight versus 10 percent with the single-frame cell-phone weight. In another survey, the estimated share of adults in favor of building a fence along the entire border with Mexico was 46 percent with the total sample weight versus 49 percent with the single-frame cell-phone weight. Overall, the differences between the two sets of estimates tended to be quite small and did not reveal a systematic pattern in terms of the attitudes or behaviors measured. While this analysis tested a large number of questions typical of those asked in public opinion surveys, it does not cover all possible survey domains. For example, questions about health, transportation, or employment were not covered in the source surveys and therefore were not included in the analysis. As a result, researchers should be cautious if looking to generalize results from this study to other fields of inquiry. SUBGROUP ESTIMATES Examining the effect of excluding landlines on subgroup estimates is challenging because sample sizes for key subgroups are often relatively small and, therefore, subject to a fair amount of noise due to sampling error. For example, a Pew Research Center survey in September 2015 featured 1,502 total interviews, but the cell-phone sample alone had only 106 interviews with non-Hispanic blacks, 141 with Hispanics, and 161 with adults age 65 and over. These sample sizes do not provide strong statistical power to detect differences in total sample versus cell-phone sample estimates. Our solution was to perform the subgroup analysis using two very large national dual-frame RDD surveys. The 2014 political polarization and typology survey featured 10,013 interviews (5,010 on landlines and 5,003 on cell phones). The 2015 survey on government featured 6,004 interviews (2,113 on landlines and 3,891 on cell phones). These studies have large enough subgroup sample sizes to support reliable tests of differences when simulating the exclusion of landlines. They also feature some overlap in questionnaire content. To reduce the risk of type I error in our conclusions, we focused the analysis on questions asked in both of these large surveys. If dropping the landline sample produced a change in a subgroup estimate, we would have more confidence that the pattern is robust if this occurred in both surveys rather than just one. Table 2 shows comparisons for several variables. Table 2. Percentage point differences between weighted subgroup estimates under simulated single-frame cell-phone design and weighted subgroup estimates using dual-frame design Subgroup All adults Men Women Age 18–29 Age 30–49 Age 50–64 Age 65+ White non-Hisp. Black non-Hisp. Hispanic 2015 Survey on Government Republican/lean Republican 0 0 0 –1 0 –1 2 0 –2* 1 Consistently/mostly conservative 0 0 0 0 0 1 1 0 –1 0 Registered voter –1** –1* –2** –2** 0 –2** 0 –1 –2 -1 Internet user 3** 1** 4** 0 1* 1* 11** 3** 4** 0 Protestant 0 0 0 0 2* –1 0 0 –2 0 Attend religious services weekly or more –1* –1* –1 0 0 –1 –3* –1 –3* -1 Family income <$30,000 0 0 –1 0 1 0 –3* –1** 2 0 Unweighted cell-phone interviews 3,891 2,204 1,687 848 1,281 1,069 636 2,564 401 563 2014 Political Polarization and Typology Survey Republican/lean Republican –1 0 –1 0 –1 –1 1 –1 0 1 Consistently/mostly conservative –1 –1 –1 0 –1 –1 2 –1 –1 1 Registered voter –2** –2** –3** 0 –3** –3** –2 –2** –2 -2 Internet user 2** 1** 3** 0 0 1 10** 2** 2 3** Protestant 0 –1* 2* 0 0 1 2 1 –2 0 Attend religious services weekly or more –2** –2** –2* 0 –1 –2 –5** –2** –1 -2 Family income <$30,000 2** 1 4** 0 4** 2* 2 2** 4** 2 Unweighted cell-phone interviews 5,003 2,876 2,127 1,208 1,598 1,363 787 3,289 588 615 Subgroup All adults Men Women Age 18–29 Age 30–49 Age 50–64 Age 65+ White non-Hisp. Black non-Hisp. Hispanic 2015 Survey on Government Republican/lean Republican 0 0 0 –1 0 –1 2 0 –2* 1 Consistently/mostly conservative 0 0 0 0 0 1 1 0 –1 0 Registered voter –1** –1* –2** –2** 0 –2** 0 –1 –2 -1 Internet user 3** 1** 4** 0 1* 1* 11** 3** 4** 0 Protestant 0 0 0 0 2* –1 0 0 –2 0 Attend religious services weekly or more –1* –1* –1 0 0 –1 –3* –1 –3* -1 Family income <$30,000 0 0 –1 0 1 0 –3* –1** 2 0 Unweighted cell-phone interviews 3,891 2,204 1,687 848 1,281 1,069 636 2,564 401 563 2014 Political Polarization and Typology Survey Republican/lean Republican –1 0 –1 0 –1 –1 1 –1 0 1 Consistently/mostly conservative –1 –1 –1 0 –1 –1 2 –1 –1 1 Registered voter –2** –2** –3** 0 –3** –3** –2 –2** –2 -2 Internet user 2** 1** 3** 0 0 1 10** 2** 2 3** Protestant 0 –1* 2* 0 0 1 2 1 –2 0 Attend religious services weekly or more –2** –2** –2* 0 –1 –2 –5** –2** –1 -2 Family income <$30,000 2** 1 4** 0 4** 2* 2 2** 4** 2 Unweighted cell-phone interviews 5,003 2,876 2,127 1,208 1,598 1,363 787 3,289 588 615 Note.—The consistently/mostly conservative estimate is based on an ideological consistency scale created from 10 questions about political values and positions. Significance of the difference of proportions z-test for the simulated single-frame cell-phone design estimate versus the dual-frame estimate, accounting for sample overlap and multiple weights: *p < .05, **p < .01. View Large Table 2. Percentage point differences between weighted subgroup estimates under simulated single-frame cell-phone design and weighted subgroup estimates using dual-frame design Subgroup All adults Men Women Age 18–29 Age 30–49 Age 50–64 Age 65+ White non-Hisp. Black non-Hisp. Hispanic 2015 Survey on Government Republican/lean Republican 0 0 0 –1 0 –1 2 0 –2* 1 Consistently/mostly conservative 0 0 0 0 0 1 1 0 –1 0 Registered voter –1** –1* –2** –2** 0 –2** 0 –1 –2 -1 Internet user 3** 1** 4** 0 1* 1* 11** 3** 4** 0 Protestant 0 0 0 0 2* –1 0 0 –2 0 Attend religious services weekly or more –1* –1* –1 0 0 –1 –3* –1 –3* -1 Family income <$30,000 0 0 –1 0 1 0 –3* –1** 2 0 Unweighted cell-phone interviews 3,891 2,204 1,687 848 1,281 1,069 636 2,564 401 563 2014 Political Polarization and Typology Survey Republican/lean Republican –1 0 –1 0 –1 –1 1 –1 0 1 Consistently/mostly conservative –1 –1 –1 0 –1 –1 2 –1 –1 1 Registered voter –2** –2** –3** 0 –3** –3** –2 –2** –2 -2 Internet user 2** 1** 3** 0 0 1 10** 2** 2 3** Protestant 0 –1* 2* 0 0 1 2 1 –2 0 Attend religious services weekly or more –2** –2** –2* 0 –1 –2 –5** –2** –1 -2 Family income <$30,000 2** 1 4** 0 4** 2* 2 2** 4** 2 Unweighted cell-phone interviews 5,003 2,876 2,127 1,208 1,598 1,363 787 3,289 588 615 Subgroup All adults Men Women Age 18–29 Age 30–49 Age 50–64 Age 65+ White non-Hisp. Black non-Hisp. Hispanic 2015 Survey on Government Republican/lean Republican 0 0 0 –1 0 –1 2 0 –2* 1 Consistently/mostly conservative 0 0 0 0 0 1 1 0 –1 0 Registered voter –1** –1* –2** –2** 0 –2** 0 –1 –2 -1 Internet user 3** 1** 4** 0 1* 1* 11** 3** 4** 0 Protestant 0 0 0 0 2* –1 0 0 –2 0 Attend religious services weekly or more –1* –1* –1 0 0 –1 –3* –1 –3* -1 Family income <$30,000 0 0 –1 0 1 0 –3* –1** 2 0 Unweighted cell-phone interviews 3,891 2,204 1,687 848 1,281 1,069 636 2,564 401 563 2014 Political Polarization and Typology Survey Republican/lean Republican –1 0 –1 0 –1 –1 1 –1 0 1 Consistently/mostly conservative –1 –1 –1 0 –1 –1 2 –1 –1 1 Registered voter –2** –2** –3** 0 –3** –3** –2 –2** –2 -2 Internet user 2** 1** 3** 0 0 1 10** 2** 2 3** Protestant 0 –1* 2* 0 0 1 2 1 –2 0 Attend religious services weekly or more –2** –2** –2* 0 –1 –2 –5** –2** –1 -2 Family income <$30,000 2** 1 4** 0 4** 2* 2 2** 4** 2 Unweighted cell-phone interviews 5,003 2,876 2,127 1,208 1,598 1,363 787 3,289 588 615 Note.—The consistently/mostly conservative estimate is based on an ideological consistency scale created from 10 questions about political values and positions. Significance of the difference of proportions z-test for the simulated single-frame cell-phone design estimate versus the dual-frame estimate, accounting for sample overlap and multiple weights: *p < .05, **p < .01. View Large The analysis indicates that the effect on major subgroup estimates from dropping the landline sample tends to be small and is often inconsistent. For example, in the 2015 survey on government, the estimated share of Hispanics identifying as Republican or leaning Republican is 26 percent under the dual-frame design versus 27 percent based on the simulated single-frame cell-phone design. Similarly, on a 10-item index measuring ideological consistency,5 the estimated share of black non-Hispanics classified as consistently or mostly conservative is 8 percent under the dual-frame design versus 7 percent based on the simulated single-frame cell-phone design. That aside, not all of the differences are minor. When landline interviews are excluded, the estimated share of adults age 65 and older who use the internet increases by at least 10 percentage points in both surveys. Dropping the landline interviews also reduces the estimated share of adults attending religious services weekly or more often. This is true for most of the major subgroups, but the change is largest for those age 65 and over. CHARACTERISTICS OF DUAL USERS To further understand why there is generally little effect on weighted public opinion estimates from dropping the landline sample, we examined cases in the overlap. In the context of dual-frame RDD, the overlap refers to the fact that some people have both landlines and cell phones, meaning they could be sampled on either. In each Pew Research Center telephone survey, landline respondents are asked if they have a working cell phone and cell-phone respondents are asked if they have a working landline. Respondents who report having both types of phone service are known as dual users. In thinking about excluding the landline sample in future surveys, it is worth considering how well the dual users reached by cell phone represent those reached by landline. Data for this analysis come from the 2015 survey on government mentioned above, which featured 6,004 interviews total, including 1,767 interviews with dual users reached on a landline and 1,590 dual users reached on a cell phone (Table 3). Upon initial examination, dual users from the cell-phone sample look to be poor proxies for those reached by landline. Cell-phone sample dual users are significantly younger, more male, and more racially diverse relative to their landline sample counterparts—findings that all comport with earlier research on dual users by Kennedy (2007). These groups also differ on key attitudinal and behavioral variables. For example, 37 percent of dual users from the cell-phone sample report attending religious services weekly or more often, versus 44 percent among dual users reached by landline. Dual users from the cell-phone sample are also less likely to consider themselves Republicans relative to their counterparts from the landline sample. On a bivariate basis, the unweighted difference between the groups of dual users was statistically significant on each of seven key variables examined. Table 3. Unweighted demographic profile of dual users by sample type Dual users reached on landline (%) Dual users reached on cell phone (%) White, non-Hispanic 78** 73 Hispanic 7** 9 Black, non-Hispanic 8* 10 Other, non-Hispanic 5 6 18–29 5** 13 30–49 18** 27 50–64 33 33 65+ 43** 25 Male 46** 55 Female 54** 45 High school grad or less 26 23 Some college/associate’s degree 29 29 Bachelor’s degree or more 45* 48 Unweighted n 1,767 1,590 Dual users reached on landline (%) Dual users reached on cell phone (%) White, non-Hispanic 78** 73 Hispanic 7** 9 Black, non-Hispanic 8* 10 Other, non-Hispanic 5 6 18–29 5** 13 30–49 18** 27 50–64 33 33 65+ 43** 25 Male 46** 55 Female 54** 45 High school grad or less 26 23 Some college/associate’s degree 29 29 Bachelor’s degree or more 45* 48 Unweighted n 1,767 1,590 Source.—Pew Research Center survey conducted August 27–October 4, 2015. Some percentages do not equal 100 due to rounding. Significance of the difference of proportions z-test for the landline sample dual-user estimate versus the cell-phone sample dual-user estimate: *p < .05, **p < .01. View Large Table 3. Unweighted demographic profile of dual users by sample type Dual users reached on landline (%) Dual users reached on cell phone (%) White, non-Hispanic 78** 73 Hispanic 7** 9 Black, non-Hispanic 8* 10 Other, non-Hispanic 5 6 18–29 5** 13 30–49 18** 27 50–64 33 33 65+ 43** 25 Male 46** 55 Female 54** 45 High school grad or less 26 23 Some college/associate’s degree 29 29 Bachelor’s degree or more 45* 48 Unweighted n 1,767 1,590 Dual users reached on landline (%) Dual users reached on cell phone (%) White, non-Hispanic 78** 73 Hispanic 7** 9 Black, non-Hispanic 8* 10 Other, non-Hispanic 5 6 18–29 5** 13 30–49 18** 27 50–64 33 33 65+ 43** 25 Male 46** 55 Female 54** 45 High school grad or less 26 23 Some college/associate’s degree 29 29 Bachelor’s degree or more 45* 48 Unweighted n 1,767 1,590 Source.—Pew Research Center survey conducted August 27–October 4, 2015. Some percentages do not equal 100 due to rounding. Significance of the difference of proportions z-test for the landline sample dual-user estimate versus the cell-phone sample dual-user estimate: *p < .05, **p < .01. View Large Upon closer examination, however, the differences on key attitudinal and behavioral outcomes are largely explained by the demographic differences between landline and cell-phone respondents discussed above. We regressed each of the seven outcomes using a model that controlled for age, sex, race/ethnicity, and education. Table 4 shows these results. Depending on the nature of the variable, a linear, ordinal, or binary logistic model was used. In each model, the outcome was estimated using only dual users and regressed on sample type (landline vs. cell phone), age, sex, race/ethnicity, and education. If the effect of sample type, which was significant in each of the bivariate analyses, remained significant in the multivariate regression analysis, this would suggest that dual users from the landline sample were distinct from those reached by cell phone—even after controlling for the known demographic differences. Table 4. Attitudinal differences between dual users reached by landline versus cell phone, and whether the differences are explained by demographics Dual users reached in the ... Is the difference significant … Landline sample (%) Cell-phone sample (%) Diff (%) Without controlling for demographics? Controlling for demographics? Republican/lean Republican 49 46 3 No No Consistently/mostly conservative 38** 33 5 Yes No Registered voter (“certain”) 89** 84 5 Yes No Use the internet at least occasionally 89** 95 –6 Yes No Protestant 54** 47 7 Yes No Attend religious services weekly or more 44** 37 7 Yes No Annual HH income < $30,000 21* 17 4 Yes Yes Dual users reached in the ... Is the difference significant … Landline sample (%) Cell-phone sample (%) Diff (%) Without controlling for demographics? Controlling for demographics? Republican/lean Republican 49 46 3 No No Consistently/mostly conservative 38** 33 5 Yes No Registered voter (“certain”) 89** 84 5 Yes No Use the internet at least occasionally 89** 95 –6 Yes No Protestant 54** 47 7 Yes No Attend religious services weekly or more 44** 37 7 Yes No Annual HH income < $30,000 21* 17 4 Yes Yes Note.—Estimates are unweighted. The multivariate analysis (controlling for demographics) consists of linear, ordinal, or logistic regression modeling, depending on the nature of outcome variable. Each model regressed the outcome on the sample type (landline or cell phone) as well as four demographics: sex, age, education, and race/ethnicity. Source.—Pew Research Center survey conducted August 27–October 4, 2015. Significance of the difference of proportions z-test for the landline sample dual-user estimate versus the cell-phone sample dual-user estimate: *p < .05, **p < .01. View Large Table 4. Attitudinal differences between dual users reached by landline versus cell phone, and whether the differences are explained by demographics Dual users reached in the ... Is the difference significant … Landline sample (%) Cell-phone sample (%) Diff (%) Without controlling for demographics? Controlling for demographics? Republican/lean Republican 49 46 3 No No Consistently/mostly conservative 38** 33 5 Yes No Registered voter (“certain”) 89** 84 5 Yes No Use the internet at least occasionally 89** 95 –6 Yes No Protestant 54** 47 7 Yes No Attend religious services weekly or more 44** 37 7 Yes No Annual HH income < $30,000 21* 17 4 Yes Yes Dual users reached in the ... Is the difference significant … Landline sample (%) Cell-phone sample (%) Diff (%) Without controlling for demographics? Controlling for demographics? Republican/lean Republican 49 46 3 No No Consistently/mostly conservative 38** 33 5 Yes No Registered voter (“certain”) 89** 84 5 Yes No Use the internet at least occasionally 89** 95 –6 Yes No Protestant 54** 47 7 Yes No Attend religious services weekly or more 44** 37 7 Yes No Annual HH income < $30,000 21* 17 4 Yes Yes Note.—Estimates are unweighted. The multivariate analysis (controlling for demographics) consists of linear, ordinal, or logistic regression modeling, depending on the nature of outcome variable. Each model regressed the outcome on the sample type (landline or cell phone) as well as four demographics: sex, age, education, and race/ethnicity. Source.—Pew Research Center survey conducted August 27–October 4, 2015. Significance of the difference of proportions z-test for the landline sample dual-user estimate versus the cell-phone sample dual-user estimate: *p < .05, **p < .01. View Large As it turns out, for six of the seven outcomes, the effect associated with the sample type is no longer statistically significant when controlling for the demographics. This suggests that the dual users reached by cell phone are reasonable proxies for those reached by landline, after accounting for differences across demographic variables that are adjusted for in the survey weighting. Household income is the one estimate in this analysis where the difference by sample type is not fully explained by demographics. Dual users reached by cell phone tend to have higher incomes than those reached by landline, even after controlling for sex, age, education, and race/ethnicity. DESIGN EFFECT AND MARGIN OF ERROR The effect on the precision of estimates is another important consideration in the transition to a single-frame cell-phone design. For this analysis, we examined 27 national surveys of adults conducted by Pew Research Center between 2012 and 2015 in order to capture a longer period of transition as both the share of the population who are cell-phone-only and the share of interviews conducted by cell phone steadily increased. We computed two approximate design effect6 values for each survey—one using the total sample weight based on the dual-frame design and another using the experimental weight based on the single-frame cell-phone design. The approximate design effect is a useful metric for several reasons. It summarizes the penalty in the precision of estimates stemming from the choice between a dual- versus single-frame design. In addition, the approximate design effect is well suited for comparison across different surveys because, unlike the margin of error, it is not a function of the number of interviews. Between 2012 and 2015, the average approximate design effect of the weight using just the cell-phone sample was 1.22, which compares with an average of 1.32 using the total sample weight based on both the landline and cell-phone samples. For a hypothetical survey with 1,000 interviews, the design effect difference translates into a margin of error at the 95 percent confidence level of +/–3.4 percentage points with the single-frame cell-phone design versus +/–3.6 points using the dual-frame design. In terms of statistical power, the single-frame cell-phone design would provide roughly 60 more cases than the dual-frame design, a finding similar to those reported in Peytchev and Neely (2013). In other words, single-frame cell-phone designs tend to yield a larger effective sample size, all else being equal, than dual-frame RDD designs. One limitation of figure 3 is that it gives no indication as to how much of the variance in total sample weight is attributable to the dual-frame design versus the raking to adjust for differential nonresponse across subgroups. The dual-frame design necessitates two weighting adjustments in the Pew Research Center protocol: an adjustment for differential probabilities of selection between frame segments under the single-frame estimation approach and an adjustment for the within-household selection procedure used in landline interviews. The single-frame cell-phone design avoids the loss in precision stemming from those adjustments. Table 5 reports the approximate design effect associated with the individual adjustments made for the total sample weight, using the September 2015 survey. The results are highly similar for the other surveys analyzed. The table shows that the adjustment for probability of selection in the frame segment and the raking procedure carry similar design effects (1.27 and 1.25, respectively). By comparison, the within-household adjustment carries a small design effect (1.07). These findings lend empirical support to the concern that having to adjust dual-frame surveys for factors like differential probabilities of selection and within-household selection procedures reduces the overall precision from such surveys relative to single-frame cell-phone designs. Table 5. Decomposition of the design effect in a national dual-frame RDD poll Weight component Approximate design effect (a) Adjustment for probability of selection in frame segment 1.27 (b) Adjustment for within household selection 1.07 (c) Adjustment for raking 1.25 Final total sample weight 1.32 Weight component Approximate design effect (a) Adjustment for probability of selection in frame segment 1.27 (b) Adjustment for within household selection 1.07 (c) Adjustment for raking 1.25 Final total sample weight 1.32 Source.—Pew Research Center survey conducted September 22–27, 2015, with n = 1,502 interviews (65 percent cell phone, 35 percent landline). Approximate design effect values were computed using the Kish (1992) formula. Results from other polls examined in this study are highly similar. View Large Table 5. Decomposition of the design effect in a national dual-frame RDD poll Weight component Approximate design effect (a) Adjustment for probability of selection in frame segment 1.27 (b) Adjustment for within household selection 1.07 (c) Adjustment for raking 1.25 Final total sample weight 1.32 Weight component Approximate design effect (a) Adjustment for probability of selection in frame segment 1.27 (b) Adjustment for within household selection 1.07 (c) Adjustment for raking 1.25 Final total sample weight 1.32 Source.—Pew Research Center survey conducted September 22–27, 2015, with n = 1,502 interviews (65 percent cell phone, 35 percent landline). Approximate design effect values were computed using the Kish (1992) formula. Results from other polls examined in this study are highly similar. View Large Figure 3. View largeDownload slide Approximate design effect in Pew Research Center surveys from 2012 to 2015, based on RDD surveys using dual frame RDD (solid line) versus the cellphone sample only (dotted line) in left panel and decomposed design effect in right panel. Figures plotted are the six-month rolling average. Figure 3. View largeDownload slide Approximate design effect in Pew Research Center surveys from 2012 to 2015, based on RDD surveys using dual frame RDD (solid line) versus the cellphone sample only (dotted line) in left panel and decomposed design effect in right panel. Figures plotted are the six-month rolling average. Discussion In this study, we have brought a great deal of data to bear on an evaluation of the consequences for public opinion surveys associated with moving from a dual-frame landline and cell-phone RDD design to one that uses just a cell-phone RDD sample. For analyses based on all adults, this transition appears to have little impact on the vast majority of survey estimates examined, provided that the dual-frame design being compared already features a sizable share of interviewing by cell phone (and if it does not, it is likely to have significant biases and/or a large design effect). Analysis of more than 250 survey questions shows that when landlines are excluded and the cell-phone sample reweighted, the estimates change by less than one percentage point, on average. Similarly, estimates based on most major subgroups tend to show only small, inconsistent changes when the landline sample is excluded. Perhaps the most compelling reason to consider moving to a single-frame cell-phone RDD design is the fact that cell-phone RDD samples are now quite representative of the US public on a number of key dimensions, particularly age, race, and ethnicity. Landline samples, by contrast, increasingly skew very old and white, relative to the public. The fact that samples of adults reached via cell phone are much more demographically representative of the United States than samples of adults reached via landline means that replacing landline interviews with cell-phone interviews reduces the extent to which survey data need to be weighted to be representative of US adults. This, in turn, improves the precision of survey estimates. Other factors contributing to the precision advantage of single-frame cell-phone designs have to do with how survey weights are computed. When a survey has both a landline sample and a cell-phone sample, the weight must include an adjustment to account for the fact that people with both landlines and cell phones could have been reached in both samples and, thus, have a higher chance of selection relative to adults with just one type of phone.7 This adjustment for the overlap in the sampling frames increases the variability in the weights and, in turn, the design effect. By contrast, surveys that sample only cell-phone numbers do not need such an adjustment and avoid the penalty in precision. Dropping the landline sample also arguably eliminates the need to include a weighting adjustment for respondent selection. Landlines are generally considered a household-level device. Typically, when interviewers dial landlines, they select one adult to interview from among all the adults in the household. Researchers adjust for this by weighting up landline interviews proportional to the number of adults in the household. Cell-phone samples by and large do not feature this weighting adjustment, because most survey researchers assume cell phones are a person-level device rather than a household-level device (Link et al. 2007; Battaglia, Frankel, and Mokdad 2008; Langer 2015; Kaiser Family Foundation 2016; SSRS 2018; Newport 2016; Pew Research Center 2016).8 The fact that cell-phone surveys do not require as many weighting corrections as dual-frame surveys contributes to their greater overall precision. The analysis presented here indicates that migrating to single-frame cell-phone samples will have minimal impact on substantive survey estimates. Despite the fact that dual-frame designs have a six-percentage-point advantage in coverage over single-frame cell-phone designs, large differences in estimates are observed for only a few questions among very small subgroups, such as those age 65 and older. This suggests that survey designs should be carefully tailored for surveys focused on certain populations with lower levels of cell-phone adoption. Any analysis of survey design must consider costs. Landline interviewing remains less expensive on a per-interview basis than cell-phone interviewing for a number of reasons covered in other sources (AAPOR Cellphone Task Force 2008, 2010). But there is evidence that landline and cell-phone interviewing costs are equalizing (AAPOR Cellphone Task Force 2017, appendix E). Moreover, even with a higher per-interview cost, cell-phone samples may provide larger effective sample sizes compared with dual-frame samples with the same number of cases. A change in the regulatory environment could make cell-phone RDD interviewing even more cost competitive, but it is impossible to know if or when such changes might occur. When survey researchers first noticed the impact of cell-phone adoption on the composition of their landline samples in the early years of the last decade, there was considerable concern about how the change could disrupt this sector of the survey research world. In fact, cell phones have arguably turned out to be a boon to the field, despite the higher costs involved in conducting surveys by cell phone. Market forces have made cell phones much more affordable than landlines for low-income people and minorities, expanding overall telephone coverage. The fast adoption of cell phones by segments of the population that have been more difficult for conventional surveys to reach (younger adults in particular) has led to improved representation of these groups in surveys. At the same time, landline samples have become much less representative of the demographic profile of US adults on key measures such as ethnicity and age. These biases require substantial weighting to align the samples with population parameters, increasing the design effect and reducing precision in the process. For researchers looking to survey the US public in all its diversity, cell-phone RDD has proven to be a very effective, if somewhat more expensive, methodology. Relative to current dual-frame designs fielding a large share of interviews with cell phones, the data presented here indicated that moving to a single-frame cell-phone design would not meaningfully change most estimates. Supplementary Data Supplementary data are freely available at Public Opinion Quarterly online. The authors are grateful to Claudia Deane, Rachel Weisel, David Kent, and Travis Mitchell for their contributions to this study. They also thank the editors and three anonymous reviewers for helpful comments. References AAPOR Cellphone Task Force . 2008 . “Guidelines and Considerations for Survey Researchers When Planning and Conducting RDD and Other Telephone Surveys in the U.S. with Respondents Reached via Cellphone Numbers.” Available at http://www.aapor.org/AAPOR_Main/media/MainSiteFiles/Final_AAPOR_Cell_Phone_TF_report_041208.pdf. Accessed April 17, 2018. AAPOR Cellphone Task Force . 2010 . “ New Considerations for Survey Researchers When Planning and Conducting RDD Telephone Surveys in the U.S. with Respondents Reached via Cellphone Numbers .” Available at http://www.aapor.org/AAPOR_Main/media/MainSiteFiles/2010AAPORCellphoneTFReport.pdf. Accessed April 17, 2018. AAPOR Task Force . 2017 . “The Future of U.S. General Population Telephone Survey Research.” Available at http://www.aapor.org/getattachment/Education-Resources/Reports/Future-of-Telephone-Survey-Research-Report.pdf.aspx. Accessed April 17, 2018. Bankier , Michael D . 1986 . “ Estimators Based on Several Stratified Samples with Applications to Multiple Frame Surveys .” Journal of the American Statistical Association 81 : 1074 –7 9 . Google Scholar CrossRef Search ADS Battaglia , Michael P. , Martin R. Frankel , and Ali H. Mokdad . 2008 . “ Statistical Challenges Facing Cellphone Surveys .” Proceedings of the Survey Research Methods Section , pp. 883 – 90 . Blumberg , Stephen J. , and Julian V. Luke . 2016 . “Wireless Substitution: Early Release of Estimates from the National Health Interview Survey, July–December 2015.” National Center for Health Statistics . Available at http://www.cdc.gov/nchs/nhis.htm. Brick , J. Michael , W. Sherman Edwards , and Sunghee Lee . 2007 . “ Sampling Telephone Numbers and Adults, Interview Length, and Weighting in the California Health Interview Survey Cell Phone Pilot Study .” Public Opinion Quarterly 71 : 793 – 813 . Google Scholar CrossRef Search ADS Dutwin , David , and Paul Lavrakas . 2016 . “ Trends in Telephone Outcomes, 2008–2015 .” Survey Practice 9 ( 3 ). Available at http://www.surveypractice.org/index.php/SurveyPractice/article/view/346/html_6 2. Accessed June 14, 2017 . Jiang , Charley , James M. Lepkowski , Tuba Suzer-Gurtekin , Michael Sadowsky , Richard Curtin , Rebecca McBee , and Dan Zahs . 2015 . “Transition from Landline-Cell to Cell Frame Design: Surveys of Consumers.” Paper presented at the Annual Conference of the American Association for Public Opinion Research , Hollywood, FL, USA. Kaiser Family Foundation . 2016 . “Kaiser Health Tracking Poll: January 2016.” Available at http://files.kff.org/attachment/topline-methodology-kaiser-health-tracking-poll-january-2016. Kalton , Graham , and Dallas W. Anderson . 1986 . “ Sampling Rare Populations .” Journal of the Royal Statistical Society, Series A 149 : 65 – 82 . Google Scholar CrossRef Search ADS Kennedy , Courtney . 2007 . “ Evaluating the Effects of Screening for Telephone Service in Dual Frame RDD Surveys .” Public Opinion Quarterly 71 : 750 – 71 . Google Scholar CrossRef Search ADS Kennedy , Courtney , and Stephen E. Everett . 2011 . “ Use of Cognitive Shortcuts in Landline and Cell Phone Surveys .” Public Opinion Quarterly 75 : 336 – 48 . Google Scholar CrossRef Search ADS Kish , Leslie . 1992 . “ Weighting for Unequal Pi .” Journal of Official Statistics 8 : 183 – 200 . Langer , Gary . 2015 . “ ABC News’ Polling Methodology and Standards .” ABC News , July 23. Available at http://abcnews.go.com/US/PollVault/abc-news-polling-methodology-standards/story?id=145373. Link , Michael W. , Michael P. Battaglia , Martin R. Frankel , Larry Osborn , and Ali H. Mokdad . 2007 . “ Reaching the U.S. Cellphone Generation: Comparison of Cellphone Survey Results with an Ongoing Landline Telephone Survey .” Public Opinion Quarterly 71 : 814 – 39 . Google Scholar CrossRef Search ADS Lynn , Peter , and Olena Kaminska . 2013 . “ The Impact of Mobile Phones on Survey Measurement Error .” Public Opinion Quarterly 77 : 586 – 605 . Google Scholar CrossRef Search ADS McGeeney , Kyley . 2016 . “ Pew Research Center Will Call 75% Cellphones for Surveys in 2016.” Pew Research Center , January 5. Available at http://www.pewresearch.org/fact-tank/2016/01/05/pew-research-center-will-call-75-cellphones-for-surveys-in-2016/. Newport , Frank . 2016 . “ Less Than Half of Republicans Pleased with Trump as Nominee .” Gallup , August 19. Available at http://www.gallup.com/poll/194738/less-half-republicans-pleased-trump-nominee.aspx?g_source=Election%202016&g_medium=newsfeed&g_campaign=tiles. Pew Research Center . 2016 . “Our Survey Methodology in Detail.” Pew Research Center. Available at http://www.pewresearch.org/methodology/u-s-survey-research/our-survey-methodology-in-detail/. Peytchev , Andy , and Benjamin Neely . 2013 . “ RDD Telephone Surveys: Towards a Single-Frame Cell-Phone Design .” Public Opinion Quarterly 77 : 283 – 304 . Google Scholar CrossRef Search ADS Reifman , Alan , and Sylvia Niehuis . 2015 . “ Pollsters’ Cell-Phone Proportions and Accuracy in 2014 US Senate Races .” Survey Practice 8 ( 5 ). Available at https://surveypractice.scholasticahq.com/article/2836-pollsters-cell-phone-proportions-and-accuracy-in-2014-u-s-senate-races. Accessed April 17, 2018. SSRS . 2018 . “SSRS Omnibus: National Dual-Frame Telephone Omnibus Survey.” Available at https://ssrs.com/wp-content/uploads/2018/01/SSRS-Omnibus-Methodology-2018.pdf. Accessed April 17, 2018 . Footnotes 1. One ongoing RDD study, the University of Michigan Surveys of Consumers, recently made the transition to single-frame cellphone interviewing in January 2015 (Jiang et al. 2015). To our knowledge, no other ongoing national surveys have done this. 2. Other approaches to handling respondent selection with cellphones are possible (Brick, Edwards, and Lee 2007). Many cellphones are shared within households. However, we have not seen persuasive evidence that within-household selection with cellphone respondents produces estimates with lower mean square error than estimates based on the protocol used by Pew. 3. For decades, responding landline samples have skewed female and continue to do so, but for presumably clear reasons. Women tend to live longer than men, and landline samples skew old. In addition, traditional gender roles may have contributed to women being somewhat more likely to be at home and to answer the phone on behalf of the household. For cell phones, there is no obvious narrative for the gender skew. 4. The significance testing used in this analysis accounts for the overlap in samples when comparing estimates based on the full sample design versus estimates based just on the cell-phone sample. Specifically, the analysis uses z-statistic tests for the difference between two weighted proportions based on partially overlapping samples and different weights for the subsample proportion and the full sample proportion. 5. A detailed discussion of this index is available at http://www.people-press.org/2014/06/12/appendix-a-the-ideological-consistency-scale/. 6. The approximate design effect is computed as 1 plus the squared coefficient of variation of the survey weights, as suggested in Kish (1992). 7. This assumes that the landline and cell-phone samples are overlapping. Some surveys, particularly in the early years of dialing cell phones, only used the cell-phone sample to interview people who had no residential landline. Such nonoverlapping surveys do not use this particular weighting adjustment. 8. The fact that some people share their cell phones means that this assumption does not always hold. But it is not clear from research to date that alternative approaches (e.g., within-household selection with cell phones and/or a weighting correction for sharing) are more effective for reducing the total amount of error in the survey estimates. © The Author(s) 2018. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. All rights reserved. For permissions, please e-mail: [email protected] This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)