Mitigating Satisficing in Cognitively Demanding Grid Questions: Evidence from Two Web-Based Experiments

Mitigating Satisficing in Cognitively Demanding Grid Questions: Evidence from Two Web-Based... Abstract Satisficing often has been assumed to be a hazard to response quality in web surveys because interview supervision is limited in the absence of a human interviewer. Therefore, devising methods that help to mitigate satisficing poses an important challenge to survey methodology. The present article examines whether splitting up cognitively demanding grid questions into single items can be an effective means to reduce measurement error and nonresponse resulting from survey satisficing. Furthermore, we investigate whether modifying the question design decreases the adverse effects of low ability and motivation of the respondents on response quality. The statistical analyses in our study relied on data from two web-based experiments with respondents from an opt-in and a probability-based online panel. Our results showed that using single items increased the response times compared to the use of standard grid questions, which might indicate a greater response burden. However, the use of single items significantly reduced the amount of response nondifferentiation and nonsubstantive responses. Our results further showed that the impact of respondent ability and motivation on the likelihood of satisficing was moderated by the question design. 1. INTRODUCTION While the presentation of questions is an important aspect of survey design, it is especially relevant in web surveys because of the absence of an interviewer who can guide respondents through the answering process (Couper 2008). Therefore, respondents can only obtain information about what answers are being sought from the design and presentation of the questions. The question design also has an impact on efficiency in terms of cognitive effort and time use with which respondents complete web questionnaires (Couper, Traugott and Lamias 2001). Consequently, the question design affects the response burden, and thus, may influence the motivation of respondents to provide accurate and meaningful responses (cf. Krosnick 1991, 1999). These considerations are particularly relevant in the decision for or against the use of grids, which are a widely used question format in web surveys (Couper, Tourangeau, Conrad and Zhang 2013). Grid (or matrix) questions present respondents a set of items that they are asked to answer using a common response scale (usually a Likert-type or rating scale). Although grids often have been assumed to be an efficient question format, previous research has shown that their use can lead to higher rates of survey breakoff, higher rates of missing or nonsubstantive answers, and higher levels of straightlining (i.e., nondifferentiated answers) compared to the use of single-item designs (Couper etal. 2013). While a few studies have explicitly acknowledged that particularly nondifferentiated answers in grid questions might be the result of survey satisficing (e.g., Couper etal. 2013; Zhang and Conrad 2014), most studies have fallen short of providing an empirically testable theoretical framework that explains the effects of using grids on measurement error and nonresponse. Thus, the present study proposes satisficing (Krosnick 1991, 1999) as a theoretical framework to derive testable hypotheses about the implications of question design decisions on the quality of respondents’ answers. In this regard, the main aim of our study is to assess whether splitting up cognitively demanding grid questions into single items can be an effective means to reduce measurement error resulting from survey satisficing. Furthermore, while a sizeable body of research has been carried out on comparing grid and single-item designs (e.g., Couper etal. 2001; Tourangeau, Couper, and Conrad 2004; Liu and Cernat 2016), none of these studies has systematically explored whether the effects of question design on measurement error and nonresponse interact with the respondents’ abilities and motivation. With regard to satisficing theory, we derived further testable hypotheses about how the complex relationship between task difficulty, ability, and motivation affects the quality of respondents’ answers. By using data from two web-based experiments on respondents from an opt-in and a probability-based online panel respectively, the present study set out to examine whether modifying the design of two cognitively demanding questions, one with five and one with seven items and each with an eleven-point response scale, decreases the adverse effects of low ability and motivation on the selection of satisficing as a response strategy. The following section provides a detailed overview of previous research, establishes the theoretical framework, and provides the hypotheses that guide our statistical analyses. The next sections present the data, methods, and results. We conclude the present article with a discussion on the implications of our findings for web survey methodology, the limitations of our study, and possible areas for further research. 2. THEORY Previous research on designing survey questions has suggested that arranging multiple items in a compact matrix makes answering questions faster (Couper etal. 2001; Tourangeau etal. 2004), easier, and more enjoyable (Krosnick 1991) because the perceived length of the questionnaire is reduced and the context of the questions is harmonized (Heerwegh 2009). However, some researchers have argued that answering grid questions—particularly in self-administered surveys—is more complex and taxing for respondents compared to answering single-item questions. Although the response scale has to be introduced only once, grids often present a lot of text within a limited space and, thus, require the respondents to process a large amount of information. In addition, with the increasing size of the grid (i.e., increasing number of items and/or scale points), navigation becomes more difficult compared to answering questions on separate pages because respondents have to locate a response option in two dimensions instead of one (Couper etal. 2013). Large grids may also require respondents to scroll vertically and/or horizontally on the screen, which further increases their response burden. Although answering grid questions does not always have to be more taxing than answering a series of separate questions, respondents may perceive answering grids as more burdensome (Couper etal. 2013; Liu and Cernat 2016). Due to their actual or perceived complexity, the (extensive) use of grids in surveys may discourage respondents from exerting sufficient efforts with respect to performing the cognitive processes involved in answering survey questions: comprehension, information retrieval, judgment, and response selection (Tourangeau, Rips, and Rasinski 2000). Consequently, respondents may process the available information less thoroughly or completely skip the cognitive processes of comprehension, information retrieval, and judgment; in other words, they use satisficing as a response strategy (Krosnick 1991, 1999). With regard to survey questions that are arranged in grids, satisficing might manifest in at least four different forms. First, according to the theory of satisficing, respondents often fail to differentiate between the items in grids. Under the condition of strong satisficing, respondents might select a somewhat reasonable response option for the first item in the grid, and rate all (or almost all) the remaining items with exactly the same response value (Krosnick 1991).1 This response pattern is referred to as nondifferentiation (Krosnick and Alwin 1988; Krosnick 1991; Krosnick, Narayan, and Smith 1996; Revilla and Ochoa 2014) or straightlining (Kaminska, McCutcheon, and Billiet 2010; Couper etal. 2013; Zhang and Conrad 2014; Schonlau and Toepoel 2015; Struminskaya, Weyandt, and Bosnjak 2015). Previous research lends support to the expectation that grids may invoke satisficing. Referring to the near means related heuristic, which states that respondents expect items that are grouped together to be closely related conceptually, a study by Tourangeau etal. (2004) investigated the effects of placing eight items in one large grid on a single page, in two smaller grids on separate pages, and on eight separate pages. Their results showed that the correlations between the items were significantly higher when the items were grouped in one large grid than when they were presented in two smaller grids or on single screens. As Tourangeau etal. (2004) pointed out, these differences could be accounted for partly by higher levels of straightlining in the single grid condition compared to the two grids and the single-item conditions. Thus, with respect to satisficing theory and previous research, we derive our first hypothesis on the implications of using (seemingly) complex grids compared to single-item designs. H1: The use of grids, compared to single-item designs, increases the rate of straightlining response patterns. Yet, some respondents may only fail to differentiate between some of the items in a grid, and may differentiate their answers simply to a lesser degree compared to optimizing respondents. Thus, weaker forms of satisficing might result in a lower response differentiation (cf. Krosnick and Alwin 1988). H2: The use of grids, compared to single-item designs, leads to less differentiated responses. Third, satisficing theory suggests that respondents seek to reduce the response burden by superficially comprehending questions or even completely skipping the cognitive processes of information retrieval and judgment. In line with this assumption, research has shown that satisficing respondents are more likely to provide nonsubstantive responses such as “don’t know” or “no opinion” than optimizing respondents (Krosnick etal. 1996; Krosnick etal. 2002). In web surveys—particularly in the absence of explicit “don’t know” and “no answer” response options—a nonsubstantive response might also be leaving a question blank (Heerwegh 2005). The grid question design may aggravate the use of nonsubstantive responses for at least two reasons. First, respondents are likely to assume that the items in grid questions are conceptually related (Tourangeau etal. 2004). Consequently, respondents should be more inclined to select the same response option for the question items in a grid than in a single-item design. Thus, if satisficing respondents perceive selecting an explicit “don’t know” response option for the first item in the grid as a viable strategy to reduce their response burden, the reasonable consequence might be to answer all or almost all of the remaining items nonsubstantively as well. Second, the grid question design might invite satisficing respondents to quickly click through the response options, and thus fail to accurately click on some of the response fields. In line with these assumptions, previous research has shown that grouping multiple questions on a single page of a survey can increase the rate of nonsubstantive or missing answers relative to single-item-per-page designs (Peytchev, Couper, McCabe, and Crawford 2006; Toepoel, Das, and Van Soest 2009; Mavletova and Couper 2015).2 Thus, based on satisficing theory, we put forward our third hypothesis: H3: The use of grids, compared to asking questions separately, increases the rate of nonsubstantive responses. Fourth, and finally, as satisficing implies, respondents may less thoroughly perform or skip the cognitive processes of comprehension, information retrieval, and judgment, which yields comparatively short response times (Zhang and Conrad 2014; Greszki, Meyer, and Schoen 2015). Research on survey and question design has provided ample evidence that single-item-per-page designs take respondents longer to answer than multiple-items-per-page designs (Couper etal. 2001; Tourangeau etal. 2004; Toepoel etal. 2009; Mavletova and Couper 2015). While it is obvious that answering questions in grids requires less physical actions by respondents than answering the questions separately, we propose that satisficing also contributes substantially to shorter response times for grids. H4: The use of grids, compared to single-item designs, leads to shorter response times. Although previous research has compared the impact of grid and single-item designs on indicators of response quality (e.g., Couper etal. 2001; Tourangeau etal. 2004), to date only one study has systematically explored whether the implications of question design are moderated by respondents’ abilities (Toepoel etal. 2009). In their survey experiment, these authors examined the effects of placing forty items with a five-point Likert scale on forty separate pages, ten pages with four items each, four pages with ten items each, or all forty items on a single page. Notably, they found that the number of item nonresponses significantly differed across the four designs for the group of low educated respondents, whereas they found nonsignificant differences for the groups of respondents with intermediate or high levels of education. This finding fits well with the assumption of satisficing theory that task difficulty, respondents’ abilities, and respondents’ motivation interact in determining the response strategy (cf. Krosnick 1991). While previous research has shown that nondifferentiation is more common among less educated respondents (e.g., Krosnick and Alwin 1988; Krosnick etal. 1996; Roberts, Gillian, Allum, and Lynn 2010) and that nonsubstantive answers are more likely among less able and motivated respondents (e.g., Lenzner 2012), the findings by Toepoel etal. (2009) suggest that the effect of respondents’ abilities on item nonresponse is moderated by the complexity of the task (which corresponds to the experimental variation of the survey design). With respect to our theoretical framework, we assume that the effects of respondents’ abilities and motivation on satisficing differ between grid and single-item designs. Assuming that answering grids is more burdensome and less enjoyable for respondents than answering single items, we suggest that the impact of low abilities or low motivation on response outcomes should be stronger for grids than the single-item alternative. H5: The impact of low ability on survey satisficing is stronger for grids than single-item designs. H6: The impact of low motivation on survey satisficing is stronger for grids than single-item designs. Taking these six hypotheses as a basis, the present study assesses whether modifying the question design decreases the adverse effects of respondents’ abilities and motivation on survey satisficing. 3. EXPERIMENT 1 3.1 Data The first web-based experiment was conducted with respondents from the German opt-in online panel of the Respondi AG, which had about 100,000 active members in 2012. The experiment was included in a web survey on political attitudes and behaviors in Germany, which was fielded between December 6 and December 16, 2012. The web survey was quota sampled using reference distributions for gender, age, and education from the target population of the German electorate. Overall, 3,478 panelists accepted the survey invitation, of whom 535 were screened out, 440 broke off, and 2,503 completed the survey. The break-off rate is 15.0 percent (cf. Callegaro and DiSogra 2008). To examine the effects of the design on the quality of responses, we used a cognitively demanding question on the expected policy positions of five potential government coalitions on taxes and welfare spending in Germany. Respondents answered the question using an eleven-point bipolar rating scale with labeled endpoints. This question also featured an explicit “don’t know” response option, which was visually separated from the response scale. The respondents were randomly assigned to either a standard grid question with five items or a single-item design in which the five questions were presented on separate pages (see Appendix figures 6 and 7). Bivariate analyses on gender (χ2 = 0.176, df = 1, p = 0.675), age (χ2 = 0.304, df = 4, p = 0.990), education (χ2 = 1.357, df = 2, p = 0.507), region (χ2 = 2.873, df = 1, p = 0.090), and interest in politics (χ2 = 7.189, df = 4, p = 0.126) indicated that the randomization of respondents to the experimental conditions had been successful. 3.2 Methods We experimentally tested our hypotheses on the effects of question design on response quality (H1 to H4) by analyzing the differences in means and proportions. Accordingly, we created five indicators for weak and strong forms of satisficing. 3.2.1 Response time. We used server-side response times (in seconds) as an indicator for the respondents’ speed of processing and answering a question. 3.2.2 Response differentiation. The responses to a set of items can differ in at least two aspects (McCarty and Shrum 2000): first, with respect to the number of different scale points used (i.e., variability of the responses), and second, with respect to the distance among the scale points used by respondents (i.e., extremity of the responses). To take these two aspects into account, we created two indicators to examine the extent to which respondents differentiated their answers to the five questions. With respect to the variability of the responses, we computed the probability of differentiation Pd as Pd=1-∑i=1nPi2, where Pi is the proportion of the values rated at a given point on a rating scale, and n is the number of rating points (Krosnick and Alwin 1988). Pd takes values between zero and close to one. While zero means that all items were given the same rating (i.e., straightlining), larger values indicate that more scale points were used. In addition, we computed the coefficient of variation, which is a measure of the distance among the scale points used by respondents, and thus is a measure of the extremity of the responses (McCarty and Shrum 2000). We computed the coefficient of variation (CV) as CV=sx-, where s is the standard deviation and x̄ is the mean of the responses over items.3 The coefficient of variation takes values greater than or equal to zero. Again, zero corresponds to straightlining response patterns, while larger values indicate more differentiated answers. 3.2.3 Straightlining. While weak forms of satisficing may cause less differentiated answers to question items, strong satisficing can result in nondifferentiated answers to question items. Thus, we further created a dummy variable for straightlining response patterns (0 = differentiated response and 1 = straightlining) as an indicator of strong satisficing.4 3.2.4 Nonsubstantive responses. To study the impact of question design on nonsubstantive responses, we generated a dummy variable for extensively providing “don’t know” responses or refusing to answer across the five questions (0 = less than four nonsubstantive responses and 1 = four to five nonsubstantive responses) as an additional indicator for strong forms of satisficing. In addition, we analyzed the effects of the question design on respondents’ substantive answers to the five items. These additional analyses explored the implications for observed means and the internal consistency of the answers. 3.2.5 Observed means. For each item of the question, we tested whether the observed mean for the single-item condition differed significantly from the observed mean for the grid condition. 3.2.6 Internal consistency. Previous studies have provided mixed evidence for the assumption that interitem correlations are lower in single-item designs compared to grid designs (Couper etal. 2001; Tourangeau etal. 2004; Toepoel etal. 2009). Taking up their approach, we examined whether a measure of internal consistency among the answers to the five items—Cronbach’s α (Cronbach 1951)—was significantly lower for the single-item design than for the grid design using the Feldt test (Feldt 1969; Charter and Feldt 1996). Finally, to assess H5 and H6, we ran regression models with the indicators for straightlining, response differentiation (Pd), and nonsubstantive responses as dependent variables. All models included the respondents’ gender (0 = male and 1 = female) and age (in years) as controls. A dummy variable indicated whether respondents had to answer the grid or the single-item design question (0 = grid and 1 = single items). In addition, we used two personal characteristics to model respondents’ propensity to satisfice. 3.2.7 Ability. The respondents’ level of cognitive sophistication reflects their general ability to perform complex mental operations (Krosnick 1991). In conformance with previous research, we used the level of formal education (0 = low, 1 = intermediate, and 2 = high) as an indicator for respondents’ cognitive sophistication (e.g., Krosnick and Alwin 1987; Krosnick etal. 1996; Holbrook, Krosnick, Moore, and Tourangeau 2007). We coded respondents without a school-leaving certificate or who held the lowest formal qualification of Germany’s tripartite secondary school system as having low education. We coded respondents with intermediary secondary qualification as having intermediate education, and we assigned respondents who were entitled to study at a college or university to high education (see Appendix Table 5 for the wording of the question). 3.2.8 Motivation. According to Krosnick (1991), motivation is affected by the degree to which the topic of a question is personally important. As we assumed that respondents perceived the questions as more important when they were more interested in the survey topic (cf. Holbrook etal. 2014), we used a rescaled five-point measure of the respondent’s interest in politics as an indicator for motivation (the rescaled variable ranged from 0 = low to 1 = high). To assess whether the effects of respondents’ ability and motivation differed across the question designs, we included interaction terms between the regressors and the indicator for the single-item design in each of the regression models. The logistic regression models for straightlining and nonsubstantive responses as dependent variables (yi) can be expressed as logitPryi=1=β0+β1genderi+β2agei+β3abilityi+β4motivationi+β5 single itemsi+β15genderi single itemsi+β25agei single itemsi+β35abilityi single itemsi+β45motivationi single itemsi. Similarly, the linear regression model for the response differentiation (Pd) as dependent variable (yi) is yi=β0+β1genderi+β2agei+β3abilityi+β4motivationi+β5 single itemsi+β15genderi single itemsi+β25agei single itemsi+β35abilityi single itemsi+β45motivationi single itemsi. Since the interpretation of interactions can change with regard to the probabilities due to nonadditivity and nonlinearity of the logistic model of probabilities (Best and Wolf 2015), we plotted the predicted probabilities for the low, intermediate, and high levels of our indicators for ability and motivation to examine their effects on the dependent variables. Correspondingly, we plotted linear predictions to study the effects of ability and motivation on response differentiation (Pd). 3.3 Results In line with the expectation of H1, we found that the rate of straightlining was significantly higher for the grid question (MGrid = 0.16) compared to the single-item design (MSingle = 0.10, z = –3.836, p = 0.000). However, despite the larger proportion of straightlining in the grid condition, we did not observe that the answers to the grid question (MGrid = 0.31) were significantly less differentiated than the answers to the single questions (MSingle = 0.31, t = 0.121, p = 0.904) with respect to the coefficient of variation. However, at the same time, our analysis showed that respondents differentiated their responses to a significantly less degree in the grid question (MGrid = 0.56) in contrast to the single-item design (MSingle = 0.61, t = 4.218, p = 0.000) in terms of the coefficient Pd. Taken together, these results showed that respondents differentiated their answers to a moderately higher degree in the single-item design compared to the grid design in terms of the number of scale points used. However, their responses were not more differentiated in the single-item design with respect to the distance among the answers. In other words, respondents to the single-item condition used slightly more points of the response scale, although it seems that the additional scale points they used were rather proximate to the scale points they would have selected for the grid condition. Overall, these results from Experiment 1 provide partial support for H2, which suggests that the use of grids leads to less differentiated responses compared to single-item designs. As expected with respect to H3, our results showed that the rate of an elevated use of nonsubstantive response options was significantly higher for the grid design (MGrid = 0.30) compared to the single-item design (MSingle = 0.19, z = –6.050, p = 0.000).5 In line with the findings from previous research, our analysis supported the assumption of H4, which suggests that response times (in seconds) were markedly shorter for the grid questions (MGrid = 29.14) in contrast to asking the questions separately (MSingle = 38.64, t = 11.512, p = 0.000). However, this analysis of response times did not enable us to determine whether the shorter response times for the grid condition resulted from higher levels of satisficing or the lower number of physical actions taken by respondents. In addition, we studied whether the observed differences in our indicators for satisficing also were reflected in differences in the observed means and the internal consistency of the answers. Notably, we did not find any significant difference in item means between the grid design and single-item design (see Appendix Table 2). However, we did find slightly higher internal consistency in the grid design compared to the single-item design (Cronbach’s αGrid = 0.68, αSingle = 0.64, F = 1.133, p = 0.026). Thus, our results supported the finding by Tourangeau etal. (2004) that respondents gave answers of higher consistency in a grid design than in a single-item design. Apparently, this finding can be attributed to the higher rate of straightlining in the grid condition; if we excluded straightlining respondents, the difference in the internal consistency between the designs disappeared (αGrid = 0.54, αSingle = 0.53, F = 1.021, p = 0.381). Finally, as described in the Methods section, we used linear and logistic regression models to assess hypotheses H5 and H6 for those indicators that showed significant differences between the two designs. Figure 1 presents the effects of respondents’ ability and motivation on the predicted probability of straightlining across the experimental conditions. In line with H5, we found that respondents with lower ability were significantly more likely to straightline in the grid design than in the single-item design, whereas we found no significant difference for respondents with intermediate or high ability. Despite the relatively large confidence intervals, the results also suggested that respondents with low or intermediate motivation were more likely to straightline if the items were arranged in a grid design rather than presented separately.6 Figure 1. View largeDownload slide The effects of ability and motivation on the predicted probability of straightlining in the grid and single-item conditions (Experiment 1). Figure 1. View largeDownload slide The effects of ability and motivation on the predicted probability of straightlining in the grid and single-item conditions (Experiment 1). Figure 2 presents linear predictions of the effects of respondents’ ability and motivation on response differentiation (Pd) across the experimental conditions. As expected, the findings are very similar to those for straightlining; in particular, respondents with low ability and low or intermediate levels of motivation gave less differentiated responses with regard to the number of scale points used, whereas we did not observe significant differences for respondents with intermediate or high ability, or high levels of motivation. Figure 2. View largeDownload slide Linear prediction of the effects of ability and motivation on response differentiation (Pd) in the grid and single-item conditions (Experiment 1). Figure 2. View largeDownload slide Linear prediction of the effects of ability and motivation on response differentiation (Pd) in the grid and single-item conditions (Experiment 1). Finally, figure 3 depicts the effects of respondents’ ability and motivation on the predicted probability of nonsubstantive responses for the grid design and the single-item design. In contrast to the models for straightlining and response differentiation, the results did not support the assumptions of H5 and H6. Contradicting our expectations, we found that the probability of nonsubstantive responses was significantly lower for highly educated respondents to the single-item condition than in the grid condition. Figure 3. View largeDownload slide The effects of ability and motivation on the predicted probability of nonsubstantive responses in the grid and single-item conditions (Experiment 1). Figure 3. View largeDownload slide The effects of ability and motivation on the predicted probability of nonsubstantive responses in the grid and single-item conditions (Experiment 1). 4. EXPERIMENT 2 To verify the robustness of our results across different questions and to gain some further in-depth insights on how the design of questions affects response behavior, we replicated our analyses with a slightly modified experimental design. 4.1 Data We conducted the second web-based experiment between February 27 and April 1, 2013 with respondents from the GESIS Online Panel Pilot (see Struminskaya, Kaczmirek, Schaurer, and Bandilla 2014 for a detailed description of the online panel). The respondents for this online panel were recruited in telephone interviews using a probability sample of the German-speaking population in Germany aged eighteen years and older, who used the Internet for non-work-related purposes (Struminskaya etal. 2014). Of the 859 eligible panel members who were invited, 529 completed the survey, thus providing a completion rate of 61.6 percent. The cumulative response rate two equaled 2.6 percent (cf. Callegaro and DiSogra 2008; DiSogra and Callegaro 2015). To adjust the sample to the German online population, we calculated weights based on reference distributions for gender, age, education, and region as provided by the (N)Onliner Atlas 2010 (Initiative D21 and TNS Infratest 2010). We randomly assigned participants to either a grid or single-item version of a cognitively demanding question concerning the respondents’ evaluation of seven major German political parties (i.e., CDU, CSU, SPD, FDP, Die Linke, Bündnis 90/Die Grünen, and the Piratenpartei). The question featured an eleven-point rating scale with verbally labeled endpoints and a visually separated “unable to evaluate” response option (see Appendix figures 8 and 9). Bivariate analyses on gender (χ2 = 3.576, df = 1, p = 0.059), age (χ2 = 2.696, df = 4, p = 0.610), education (χ2 = 1.906, df = 2, p = 0.386), region (χ2 = 0.407, df = 1, p = 0.523), and interest in politics (χ2 = 4.105, df = 4, p = 0.392) suggested that the randomization had been successful. 4.2 Methods With two notable exceptions, the second web-based experiment used the same methods and indicators for satisficing as Experiment 1. First, response times were measured in milliseconds using a client-side paradata script. In addition, the script captured the number of mouse clicks and the time between those click events. Thus, the measure was not affected by the speed of the Internet connection of the respondent’s computer. Consequently, client-side response times enable a more precise analysis of the response process compared to server-side measures (Kaczmirek 2009). Second, we coded an extensive use of nonsubstantive responses if respondents provided less than two substantive responses across the seven questions (0 = less than 6 nonsubstantive responses and 1 = six to seven nonsubstantive responses). 4.3 Results Again in support of H1, we observed that the rate of straightlining was significantly higher in the grid design (MGrid = 0.10) compared to the single-item design (MSingle = 0.04, z = –2.970, p = 0.003). Largely confirming the findings from Experiment 1, we found that the answers to the grid question were not less differentiated than the answers to the single questions (MGrid = 0.56, MSingle = 0.60, t = 1.651, p = 0.099) with respect to the coefficient of variation, but they differed moderately in terms of the coefficient Pd (MGrid = 0.64, MSingle = 0.69, t = 2.925, p = 0.004). Again, these results provide partial support for H2. Lending further support for H3, our results showed that the rate of nonsubstantive responses was significantly higher for the grid design (MGrid = 0.06) than the single-item design (MSingle = 0.01, z = –2.609, p = 0.009). In line with the findings from Experiment 1, our analyses revealed that response times were again shorter for the grid question (MGrid = 37.16 s) in contrast to asking the questions separately (MSingle = 48.12 s, t = 6.503, p = 0.000). However, taking advantage of the client-side paradata, we were able to assess the association between design and response behavior in more detail. Confirming the assumption that single-item designs require respondents to take more physical actions, we found that the average number of mouse clicks was significantly higher if the questions were asked separately (MGrid = 8.27, MSingle = 14.94, t = 34.026, p = 0.000). However, even if we subtracted the times for loading the websites and for clicking on the continue button from the overall response time, and only looked at the time respondents spent to read the questions and select a response, answering the grid question was still faster than answering single items (MGrid = 34.98 s, MSingle = 42.62 s, t = 4.779, p = 0.000). In addition, by examining exclusively the first item, we discovered that respondents did not need significantly more time to read and answer the question item in the grid compared to the single-item design (MGrid = 14.21 s, MSingle = 14.66 s, t = 0.590, p = 0.555). We take this finding as an indication that respondents particularly spent more time to process the adjacent items if they were presented on separate pages of the survey. Thus, although respondents have to perform fewer physical actions and do not need to double-check whether scale labels have changed when answering a grid question, we suggest that the latter result also lends support to the notion that satisficing contributes to shorter response times in grid designs compared to single-item designs. Providing further support for the results from Experiment 1, we did not observe significant differences in observed means between the grid and single-item conditions (see Appendix Table 4). Also in line with the findings from Experiment 1, we found considerably higher consistency among the answers for the grid design compared to the single-item design (αGrid = 0.60, αSingle = 0.41, F = 1.470, p = 0.001). Again, we can attribute this difference to the higher rate of straightlining in the grid condition; if we excluded straightlining respondents, the difference in the internal consistency between the designs vanished (αGrid = 0.38, αSingle = 0.37, F = 1.014, p = 0.456). To further assess H5 and H6, we ran a logistic regression with straightlining and a linear regression with the response differentiation (Pd) as dependent variables. The indicators for respondents’ ability and motivation were the same as in Experiment 1. Figure 4 depicts the effects of respondents’ ability and motivation on the predicted probability of straightlining across the experimental conditions. Figure 4. View largeDownload slide The effects of ability and motivation on the predicted probability of straightlining in the grid and single-item conditions (Experiment 2). Figure 4. View largeDownload slide The effects of ability and motivation on the predicted probability of straightlining in the grid and single-item conditions (Experiment 2). Again, confirming H5 and H6, our results showed that the probability of straightlining was significantly higher for respondents with low ability and motivation if the items were arranged in a grid design rather than presented on separate pages of the survey, whereas the probability was not significantly different between the designs for respondents with intermediate or high levels of ability and motivation. The results from the analyses on the degree to which respondents differentiate their responses (Pd) confirm the findings from Experiment 1. They show that particularly respondents with low ability and low or intermediate levels of motivation provided less differentiated responses with regard to the number of scale points used, whereas no significant differences for respondents with intermediate or high ability or high levels of motivation. Thus, these findings provide further evidence in support of H5 and H6 (see figure 5).7 Figure 5. View largeDownload slide Linear prediction of the effects of ability and motivation on response differentiation (Pd) in the grid and single-item conditions (Experiment 2). Figure 5. View largeDownload slide Linear prediction of the effects of ability and motivation on response differentiation (Pd) in the grid and single-item conditions (Experiment 2). 5. CONCLUSIONS With regard to the widely shared assumption that web survey respondents might perceive answering grid questions as more complex and taxing than single-item questions (cf. Couper etal. 2013; Liu and Cernat 2016), our study has sought to examine whether splitting up cognitively demanding grid questions into single items can be an effective means to mitigate survey satisficing. In line with three hypotheses, which we derived from satisficing theory, our results from two web-based experiments showed that the use of grids resulted in moderately lower response differentiation with regard to the number of scale points used (i.e., lower variability of the responses) and higher rates of straightlining, as well as more nonsubstantive answers compared to single-item designs. In particular, the rates of nonsubstantive answers and straightlining were substantially lower if the items were presented on separate pages of the survey. Remarkably, these findings hold true for two different questions on political attitudes and two different sampling sources (opt-in and probability-based online panels), which supports the robustness of our results. Most importantly, our study gained valuable insights for survey methodologists by showing that the effects of respondents’ abilities and motivation on survey satisficing are moderated by the question design. Our findings provide support for the assumption that using a single-item design instead of a grid desgin decreases the adverse effects of low abilities and motivation on the selection of satisficing as a response strategy. Specifically, we observed that the probability of straightlining and low response differentiation (as measured by Pd) was substantially lower among respondents with low abilities and low motivation. The results of our study suggest that the single-item design particularly enabled respondents with low abilities and low motivation to provide answers of higher quality. Therefore, we recommend that for cognitively demanding survey questions, researchers should generally favor the use of a single-item design over grid designs due to their positive effects on response quality. However, the results of our analyses are consistent with previous research demonstrating that grids take less time to answer than single-item designs (e.g., Couper etal. 2001; Tourangeau etal. 2004). Although it seems obvious that answering grid questions requires respondents to take fewer physical actions than answering single-item questions, we also found that answering a grid question was still faster than answering single items even if we only considered the time respondents needed to read the questions and to select their responses. In conjunction with the observation that the response times for the first item of a question did not differ between the grid and single-item designs, we interpreted this finding to be an indication that respondents spent more time processing adjacent items in single-item designs. Thus, we inferred that satisficing is likely to contribute to the shorter response times to grid designs. However, we need to acknowledge that grids are an efficient question format in that respondents can easily recognize that the items refer to an overarching topic and feature a common response scale. Although grid questions may appear very complex and daunting to respondents, answering them is slightly more comfortable compared to single-item designs because respondents do not need to double-check whether scale labels have changed, and they do not need to submit their response to every single item. Thus, we need to be aware that substituting grids with single-item designs might not be feasible or desirable in every situation. Particularly, if a questionnaire includes several multi-item scales or thematically related sets of items, using single-item designs instead of grids can quickly increase the length of a survey to an extent where the positive effects of the question design are outweighed by the negative effects of the increased response burden on response quality and the willingness to continue survey participation (cf. Galesic and Bosnjak 2009; Roberts etal. 2010). We encourage further research that explores to what extent grids can be substituted appropriately with alternative designs until the negative effects of an increasing survey length on measurement error and nonresponse bias prevail. Furthermore, the results of both experiments suggest that the lower response differentiation for the grid design was associated with a higher consistency of answers compared to the single-item design. While we did not find any evidence that the question design affected observed item means, its effect on the consistency of answers might have serious implications for analyses based on latent means. However, it is beyond the scope of the present study to draw generalizable conclusions about whether a lower consistency of answers is desirable or not. Finally, future research could address two limitations of the present study. First, while our analyses provided consistent evidence across the two experiments that the impact of low ability and motivation on the probability of straightlining and response differentiation were stronger in the grid condition than in the single-item condition, we did not find evidence for the expected impact on the probability of nonsubstantive responses. Thus, we recommend that future research examine the effects of question design on nonsubstantive responses. Second, both experiments used cognitively demanding questions with eleven-point end-labeled rating scales. Previous research has suggested that response burden likely varies with the length and labeling of the response scale (Krosnick and Presser 2010). For instance, a recent study by Liu and Cernat (2016) has provided first evidence that differences in response quality between grid and single-item designs might be more pronounced for questions with nine- and eleven-point rating scales than for those with seven or fewer scale points. Thus, we see merit in replications of our study that compare grid and single-item designs for questions with response scales of different length or with different labeling. Such research designs could provide valuable insights on the role of task difficulty in explaining satisficing. Footnotes 1. Often, this will be the response option that respondents expect to reflect a common or average response (e.g., the midpoint of the response scale). In other situations, it will be the response option that respondents believe conforms to the researcher’s expectations. This might be one of the extreme options of the response scale (Krosnick and Alwin 1988). 2. However, a few studies present conflicting results. For example, Couper etal. (2001) examined whether grouping five related knowledge items in one grid and eleven related attitude items in three grids on separate pages yielded better response quality than presenting the items on five and eleven separate pages. They found that the number of nonsubstantive responses (don’t know/not applicable) was lower in the grid condition. Using a slightly different design, Mavletova and Couper (2014) did not find significant differences in the amount of nonsubstantive responses when the questions were placed on seventeen separate pages or on two pages of their survey. 3. We treated “don’t know” and “no answer” responses as missing in the calculation of CV (and the straightlining indicator). Most importantly, computing the mean and the standard deviation requires assigning numeric values to “don’t know” and “no answer” responses that cannot be justified from the response scale of the item (e.g., assigning values of twelve to “don’t know” and thirteen to “no answer” responses if an ordinal response scale runs from one to eleven). In contrast, the computation of Pd required including “don’t know” and “no answer” responses to obtain the sum of the proportions of the values rated at given points on the rating scale (which is not affected by the values assigned to “don’t know” and “no answer” responses). However, to enhance the comparability of both measures of response differentiation (i.e., CV and Pd), we set Pd to missing if the number of nonmissing responses was lower than two. 4. We derived the indicator from the standard deviation of the responses across the items. Accordingly, straightlining was indicated if all nonmissing responses were identical across the items (i.e., at least two nonmissing responses with a standard deviation equal to zero). 5. Notably, the results of a series of additional logistic regression models provided evidence in favor of our assumption that respondents who provided a nonsubstantive response (“don’t know” or no answer) to the first item were more likely to answer nonsubstantively to the remaining items in a grid design than in a single-item design (in each model, the coefficient of the interaction between a nonsubstantive answer to the first item and the indicator for the single-item design was negative and p < 0.01). 6. Note that the figures 1 through 4 present predicted probabilities and linear predictions with 95% confidence intervals for low, intermediate, and high levels of ability and motivation. For the logistic and linear regression models that were used to compute the predicted probabilities and linear predictions, see the Appendix Tables 1 and 3. These tables also provide information about whether the overall effects of ability and motivation differed significantly between the experimental conditions. Regarding the known problems in the interpretation of interactions in nonlinear models (see Methods section), we referred to predicted values in discussing the results of our analyses. 7. We could not replicate the analysis for nonsubstantive responses due to the low number of relevant cases in Experiment 2 (n = 16). References Best H. , Wolf C. ( 2015 ), “ Logistic Regression,” in Regression Analysis and Causal Inference , eds. Best H. , Wolf C. , pp. 153 – 171 , Los Angeles, CA : Sage . Callegaro M. , DiSogra C. ( 2008 ), “ Computing Response Metrics for Online Panels ,” Public Opinion Quarterly , 72 ( 5 ), 1008 – 1032 . Google Scholar CrossRef Search ADS Charter R. A. , Feldt L. S. ( 1996 ), “ Testing the Equality of Two Alpha Coefficients ,” Perceptual and Motor Skills , 82 ( 3 ), 763 – 768 . Google Scholar CrossRef Search ADS Couper M. P. ( 2008 ), Designing Effective Web Surveys , Cambridge : Cambridge University Press . Google Scholar CrossRef Search ADS Couper M. P. , Traugott M. W. , Lamias M. J. ( 2001 ), “ Web Survey Design and Administration ,” Public Opinion Quarterly , 65 (2), 230 – 253 . Google Scholar CrossRef Search ADS PubMed Couper M. P. , Tourangeau R. , Conrad F. G. , Zhang C. ( 2013 ), “ The Design of Grids in Web Surveys ,” Social Science Computer Review , 31(3) , 322 – 345 . Google Scholar CrossRef Search ADS Cronbach L. J. ( 1951 ), “ Coefficient Alpha and the Internal Structure of Tests ,” Psychometrika , 16 (3), 297 – 334 . Google Scholar CrossRef Search ADS DiSogra C. , Callegaro M. ( 2015 ), “ Metrics and Design Tool for Building and Evaluating Probability-Based Online Panels ,” Social Science Computer Review , 34 (1), 26 – 40 . Google Scholar CrossRef Search ADS Feldt L. S. ( 1969 ), “ A Test of the Hypothesis That Cronbach's Alpha or Kuder-Richardson Coefficent Twenty Is the Same for Two Tests ,” Psychometrika , 34 (3), 363 – 373 . Google Scholar CrossRef Search ADS Galesic M. , Bosnjak M. ( 2009 ), “ Effects of Questionnaire Length on Participation and Indicators of Response Quality in a Web Survey ,” Public Opinion Quarterly , 73 (2), 349 – 360 . Google Scholar CrossRef Search ADS Greszki R. , Meyer M. , Schoen H. ( 2015 ), “ Exploring the Effects of Removing “Too Fast” Responses and Respondents from Web Surveys ,” Public Opinion Quarterly , 79 (2), 471 – 503 . Google Scholar CrossRef Search ADS Heerwegh D. ( 2005 ), Web Surveys. Explaining and Reducing Unit Nonresponse, Item Nonresponse and Partial Nonresponse , Katholieke Universiteit Leuven , Belgium. Heerwegh D. ( 2009 ), “ Mode Differences between Face-to-Face and Web Surveys: An Experimental Investigation of Data Quality and Social Desirability Effects ,” International Journal of Public Opinion Research , 21 (1), 111 – 121 . Google Scholar CrossRef Search ADS Holbrook A. L , Anand S. , Johnson T. P. , Cho Y. I. , Shavitt S. , Chávez N. , Weiner S. , et al. ( 2014 ), “ Response Heaping in Interviewer-Administered Surveys: Is It Really a Form of Satisficing? ,” Public Opinion Quarterly , 78 , 591 – 633 . Google Scholar CrossRef Search ADS Holbrook A. L. , Krosnick J. A. , Moore D. , Tourangeau R. ( 2007 ), “ Response Order Effects in Dichotomous Categorical Questions Presented Orally: The Impact of Question and Respondent Attributes ,” Public Opinion Quarterly , 71 (3), 325 – 348 . Google Scholar CrossRef Search ADS Initiative D21 and TNS Infratest ( 2010 ), “(N)Onliner Atlas 2010,” available at http://initiatived21.de/app/uploads/2017/02/nonliner2010.pdf. Last accessed on July 25, 2017. Kaczmirek L. ( 2009 ), Human-Survey Interaction: Usability and Nonresponse in Online Surveys , Köln : Halem , Germany. Kaminska O. , McCutcheon A. L. , Billiet J. ( 2010 ), “ Satisficing among Reluctant Respondents in a Cross-National Context ,” Public Opinion Quarterly , 74 (5), 956 – 984 . Google Scholar CrossRef Search ADS Krosnick J. A. ( 1991 ), “ Response Strategies for Coping with the Cognitive Demands of Attitude Measures in Surveys ,” Applied Cognitive Psychology , 5 (3), 213 – 236 . Google Scholar CrossRef Search ADS Krosnick J. A. ( 1999 ), “ Survey Research ,” Annual Review of Psychology , 50 , 537 – 567 . Google Scholar CrossRef Search ADS PubMed Krosnick J. A. , Alwin D. F. ( 1987 ), “ An Evaluation of a Cognitive Theory of Response-Order Effects in Survey Measurement ,” Public Opinion Quarterly , 51 (2), 201 – 219 . Google Scholar CrossRef Search ADS Krosnick J. A. , Alwin D. F. ( 1988 ), “ A Test of the Form-Resistant Correlation Hypothesis. Ratings, Rankings, and the Measurement of Values ,” Public Opinion Quarterly , 52 (4), 526 – 538 . Google Scholar CrossRef Search ADS Krosnick J. A , et al. ( 2002 ), “ The Impact of "No Opinion" Response Options on Data Quality. Non-Attitude Reduction or an Invitation to Satisfice? ,” Public Opinion Quarterly , 66 (3), 371 – 403 . Google Scholar CrossRef Search ADS Krosnick J. A. , Narayan S. , Smith W. R. , ( 1996 ), “Satisficing in Surveys: Initial Evidence,” in Advances in Survey Research: New Directions for Evaluation , eds. Braverman M. T. , Slater J. K. , pp. 29 –44, Jossey-Bass : Wiley . Krosnick J. A. , Presser S. ( 2010 ), “Question and Questionnaire Design,” in Handbook of Survey Research , eds. Marsden P. V. , Wright J. D. , pp. 263 – 313 , Bingley : Emerald Lenzner T. ( 2012 ), “ Effects of Survey Question Comprehensibility on Response Quality ,” Field Methods , 24 , 409 – 428 . Google Scholar CrossRef Search ADS Liu M. , Cernat A. ( 2016 ), “ Item-by-Item Versus Matrix Questions: A Web Survey Experiment ,” Social Science Computer Review , 1 – 17 . Mavletova A. , Couper M. P. ( 2014 ), “ Mobile Web Survey Design: Scrolling Versus Paging, SMS Versus E-Mail Invitations ,” Journal of Survey Statistics and Methodology , 2 (4), 498 – 518 . Google Scholar CrossRef Search ADS Mavletova A. , Couper M. P. ( 2015 ), “ Grouping of Items in Mobile Web Questionnaires ,” Field Methods , 28 (2), 170 – 193 . Google Scholar CrossRef Search ADS McCarty J. A. , Shrum L. J. ( 2000 ), “ The Measurement of Personal Values in Survey Research: A Test of Alternative Rating Procedures ,” Public Opinion Quarterly , 64 (3), 271 – 298 . Google Scholar CrossRef Search ADS PubMed Peytchev A. , Couper M. P. , McCabe S. E. , Crawford S. D. ( 2006 ), “ Web Survey Design: Paging Versus Scrolling ,” Public Opinion Quarterly , 70 (4), 596 – 607 . Google Scholar CrossRef Search ADS Revilla M. , Ochoa C. ( 2014 ), “ What Are the Links in a Web Survey among Response Time, Quality, and Auto-Evaluation of the Efforts Done? ,” Social Science Computer Review , 33 (1), 97 – 114 . Google Scholar CrossRef Search ADS Roberts C. , Gillian E. , Allum N. , Lynn P. ( 2010 ), “Data Quality in Telephone Surveys and the Effect of Questionnaire Length: A Cross-National Experiment,” ISER Working Paper Series, No. 2010-36. Schonlau M. , Toepoel V. ( 2015 ), “ Straightlining in Web Survey Panels over Time ,” Survey Research Methods , 9 (2), 125 – 137 . Struminskaya B. , Kaczmirek L. , Schaurer I. , Bandilla W. , ( 2014 ), “Assessing Representativeness of a Probability-Based Online Panel in Germany,” in Online Panel Research: A Data Quality Perspective , eds. Callegaro M. , Baker R. , Bethlehem J. , Göritz A. S. , Krosnick J. A. , Lavrakas P. J. , pp. 61 – 84 , New York : Wiley . Google Scholar CrossRef Search ADS Struminskaya B. , Weyandt K. , Bosnjak M. ( 2015 ), “ The Effects of Questionnaire Completion Using Mobile Devices on Data Quality. Evidence from a Probability-Based General Population Panel ,” Methods, Data, Analyses , 9 (2), 261 – 292 . Toepoel V. , Das M. , Van Soest A. ( 2009 ), “ Design of Web Questionnaires: The Effects of the Number of Items Per Screen ,” Field Methods , 21 , 200 – 213 . Google Scholar CrossRef Search ADS Tourangeau R. , Rips L. J. , Rasinski K. ( 2000 ), The Psychology of Survey Response , Cambridge : Cambridge University Press . Google Scholar CrossRef Search ADS Tourangeau R. , Couper M. P. , Conrad Frederick ( 2004 ), “ Spacing, Position, and Order. Interpretive Heuristics for Visual Features of Survey Questions ,” Public Opinion Quarterly , 68 (3), 368 – 393 . Google Scholar CrossRef Search ADS Zhang C. , Conrad F. G. ( 2014 ), “ Speeding in Web Surveys: The Tendency to Answer Very Fast and Its Association with Straightlining ,” Survey Research Methods , 8 (2), 127 – 135 . Appendix Table 1. Linear and Logistic Regressions on Indicators of Response Quality (Experiment 1) (1) (2) (3) Straightlining (0/1) Differentiation (Pd) Nonsubstantive responses (0/1) Gender: male Ref. Ref. Ref.  female 0.486* –0.038* 0.599*** (0.197) (0.016) (0.138) Age –0.026*** 0.002*** –0.016** (0.007) (0.001) (0.005) Ability: low Ref. Ref. Ref.  intermediate –0.575** 0.055** 0.098 (0.215) (0.019) (0.160)  high –1.565*** 0.140*** –0.148 (0.299) (0.021) (0.189) Motivation –1.485*** 0.154*** –2.959*** (0.392) (0.032) (0.281) Single items –0.751** 0.078*** –0.495* (0.270) (0.022) (0.222) Gender: female * single items –0.522 0.046* 0.219 (0.297) (0.022) (0.221) Age * single items 0.003 –0.001 –0.008 (0.011) (0.001) (0.008) Ability: intermediate * single items 0.743* –0.057* –0.396 (0.325) (0.026) (0.246) Ability: high * single items 0.716 –0.068* –0.778* (0.442) (0.028) (0.307) Motivation * single items 0.690 –0.101* –0.886 (0.594) (0.045) (0.456) Constant –1.260*** 0.499*** –1.280*** (0.175) (0.016) (0.144) N 1,880 1,880 2,491 McFadden’s R2 0.083 0.186 Adjusted R2 0.074 (1) (2) (3) Straightlining (0/1) Differentiation (Pd) Nonsubstantive responses (0/1) Gender: male Ref. Ref. Ref.  female 0.486* –0.038* 0.599*** (0.197) (0.016) (0.138) Age –0.026*** 0.002*** –0.016** (0.007) (0.001) (0.005) Ability: low Ref. Ref. Ref.  intermediate –0.575** 0.055** 0.098 (0.215) (0.019) (0.160)  high –1.565*** 0.140*** –0.148 (0.299) (0.021) (0.189) Motivation –1.485*** 0.154*** –2.959*** (0.392) (0.032) (0.281) Single items –0.751** 0.078*** –0.495* (0.270) (0.022) (0.222) Gender: female * single items –0.522 0.046* 0.219 (0.297) (0.022) (0.221) Age * single items 0.003 –0.001 –0.008 (0.011) (0.001) (0.008) Ability: intermediate * single items 0.743* –0.057* –0.396 (0.325) (0.026) (0.246) Ability: high * single items 0.716 –0.068* –0.778* (0.442) (0.028) (0.307) Motivation * single items 0.690 –0.101* –0.886 (0.594) (0.045) (0.456) Constant –1.260*** 0.499*** –1.280*** (0.175) (0.016) (0.144) N 1,880 1,880 2,491 McFadden’s R2 0.083 0.186 Adjusted R2 0.074 * p < 0.05, **p < 0.01, ***p < 0.001. Note.— Model 1 and 3 are logistic regressions, and Model 2 is a linear regression. Cell entries are unstandardized regression coefficients with standard errors in parentheses. Table 1. Linear and Logistic Regressions on Indicators of Response Quality (Experiment 1) (1) (2) (3) Straightlining (0/1) Differentiation (Pd) Nonsubstantive responses (0/1) Gender: male Ref. Ref. Ref.  female 0.486* –0.038* 0.599*** (0.197) (0.016) (0.138) Age –0.026*** 0.002*** –0.016** (0.007) (0.001) (0.005) Ability: low Ref. Ref. Ref.  intermediate –0.575** 0.055** 0.098 (0.215) (0.019) (0.160)  high –1.565*** 0.140*** –0.148 (0.299) (0.021) (0.189) Motivation –1.485*** 0.154*** –2.959*** (0.392) (0.032) (0.281) Single items –0.751** 0.078*** –0.495* (0.270) (0.022) (0.222) Gender: female * single items –0.522 0.046* 0.219 (0.297) (0.022) (0.221) Age * single items 0.003 –0.001 –0.008 (0.011) (0.001) (0.008) Ability: intermediate * single items 0.743* –0.057* –0.396 (0.325) (0.026) (0.246) Ability: high * single items 0.716 –0.068* –0.778* (0.442) (0.028) (0.307) Motivation * single items 0.690 –0.101* –0.886 (0.594) (0.045) (0.456) Constant –1.260*** 0.499*** –1.280*** (0.175) (0.016) (0.144) N 1,880 1,880 2,491 McFadden’s R2 0.083 0.186 Adjusted R2 0.074 (1) (2) (3) Straightlining (0/1) Differentiation (Pd) Nonsubstantive responses (0/1) Gender: male Ref. Ref. Ref.  female 0.486* –0.038* 0.599*** (0.197) (0.016) (0.138) Age –0.026*** 0.002*** –0.016** (0.007) (0.001) (0.005) Ability: low Ref. Ref. Ref.  intermediate –0.575** 0.055** 0.098 (0.215) (0.019) (0.160)  high –1.565*** 0.140*** –0.148 (0.299) (0.021) (0.189) Motivation –1.485*** 0.154*** –2.959*** (0.392) (0.032) (0.281) Single items –0.751** 0.078*** –0.495* (0.270) (0.022) (0.222) Gender: female * single items –0.522 0.046* 0.219 (0.297) (0.022) (0.221) Age * single items 0.003 –0.001 –0.008 (0.011) (0.001) (0.008) Ability: intermediate * single items 0.743* –0.057* –0.396 (0.325) (0.026) (0.246) Ability: high * single items 0.716 –0.068* –0.778* (0.442) (0.028) (0.307) Motivation * single items 0.690 –0.101* –0.886 (0.594) (0.045) (0.456) Constant –1.260*** 0.499*** –1.280*** (0.175) (0.016) (0.144) N 1,880 1,880 2,491 McFadden’s R2 0.083 0.186 Adjusted R2 0.074 * p < 0.05, **p < 0.01, ***p < 0.001. Note.— Model 1 and 3 are logistic regressions, and Model 2 is a linear regression. Cell entries are unstandardized regression coefficients with standard errors in parentheses. Table 2. Mean Differences in the Substantive Answers to Five Items (Experiment 1) Items MGrid MSingle Δ t p Item A 5.19 5.16 –0.03 –0.271 0.786 Item B 6.67 6.53 –0.14 –1.231 0.218 Item C 6.02 5.92 –0.10 –0.970 0.332 Item D 6.77 6.78 0.01 0.078 0.938 Item E 5.77 5.74 –0.03 –0.300 0.764 Items MGrid MSingle Δ t p Item A 5.19 5.16 –0.03 –0.271 0.786 Item B 6.67 6.53 –0.14 –1.231 0.218 Item C 6.02 5.92 –0.10 –0.970 0.332 Item D 6.77 6.78 0.01 0.078 0.938 Item E 5.77 5.74 –0.03 –0.300 0.764 Table 2. Mean Differences in the Substantive Answers to Five Items (Experiment 1) Items MGrid MSingle Δ t p Item A 5.19 5.16 –0.03 –0.271 0.786 Item B 6.67 6.53 –0.14 –1.231 0.218 Item C 6.02 5.92 –0.10 –0.970 0.332 Item D 6.77 6.78 0.01 0.078 0.938 Item E 5.77 5.74 –0.03 –0.300 0.764 Items MGrid MSingle Δ t p Item A 5.19 5.16 –0.03 –0.271 0.786 Item B 6.67 6.53 –0.14 –1.231 0.218 Item C 6.02 5.92 –0.10 –0.970 0.332 Item D 6.77 6.78 0.01 0.078 0.938 Item E 5.77 5.74 –0.03 –0.300 0.764 Table 3. Linear and Logistic Regressions on Indicators of Response Quality (Experiment 2) (1) (2) Straightlining (0/1) Differentiation (Pd) Gender: male Ref. Ref.  female 0.136 –0.034 (0.639) (0.027) Age 0.051* –0.000 (0.026) (0.001) Ability: low Ref. Ref.  intermediate –3.487*** 0.128*** (0.948) (0.032)  high –2.525** 0.165*** (0.822) (0.034) Motivation –8.394*** 0.227*** (1.892) (0.060) Single items –1.794 0.096** (0.955) (0.035) Gender: female * single items 0.817 0.007 (0.985) (0.036) Age * single items –0.103* 0.002 (0.042) (0.001) Ability: intermediate * single items 2.235 –0.064 (1.366) (0.043) Ability: high * single items 1.979 –0.076 (1.124) (0.045) Motivation * single items 5.074* –0.197* (2.516) (0.081) Constant –2.136*** 0.553*** (0.548) (0.027) N 511 509 McFadden’s R2 0.369 Adjusted R2 0.133 (1) (2) Straightlining (0/1) Differentiation (Pd) Gender: male Ref. Ref.  female 0.136 –0.034 (0.639) (0.027) Age 0.051* –0.000 (0.026) (0.001) Ability: low Ref. Ref.  intermediate –3.487*** 0.128*** (0.948) (0.032)  high –2.525** 0.165*** (0.822) (0.034) Motivation –8.394*** 0.227*** (1.892) (0.060) Single items –1.794 0.096** (0.955) (0.035) Gender: female * single items 0.817 0.007 (0.985) (0.036) Age * single items –0.103* 0.002 (0.042) (0.001) Ability: intermediate * single items 2.235 –0.064 (1.366) (0.043) Ability: high * single items 1.979 –0.076 (1.124) (0.045) Motivation * single items 5.074* –0.197* (2.516) (0.081) Constant –2.136*** 0.553*** (0.548) (0.027) N 511 509 McFadden’s R2 0.369 Adjusted R2 0.133 * p < 0.05, **p < 0.01, ***p < 0.001. Note.— Model 1 is a logistic and Model 2 a linear regression. Cell entries are unstandardized regression coefficients with standard errors in parentheses. Table 3. Linear and Logistic Regressions on Indicators of Response Quality (Experiment 2) (1) (2) Straightlining (0/1) Differentiation (Pd) Gender: male Ref. Ref.  female 0.136 –0.034 (0.639) (0.027) Age 0.051* –0.000 (0.026) (0.001) Ability: low Ref. Ref.  intermediate –3.487*** 0.128*** (0.948) (0.032)  high –2.525** 0.165*** (0.822) (0.034) Motivation –8.394*** 0.227*** (1.892) (0.060) Single items –1.794 0.096** (0.955) (0.035) Gender: female * single items 0.817 0.007 (0.985) (0.036) Age * single items –0.103* 0.002 (0.042) (0.001) Ability: intermediate * single items 2.235 –0.064 (1.366) (0.043) Ability: high * single items 1.979 –0.076 (1.124) (0.045) Motivation * single items 5.074* –0.197* (2.516) (0.081) Constant –2.136*** 0.553*** (0.548) (0.027) N 511 509 McFadden’s R2 0.369 Adjusted R2 0.133 (1) (2) Straightlining (0/1) Differentiation (Pd) Gender: male Ref. Ref.  female 0.136 –0.034 (0.639) (0.027) Age 0.051* –0.000 (0.026) (0.001) Ability: low Ref. Ref.  intermediate –3.487*** 0.128*** (0.948) (0.032)  high –2.525** 0.165*** (0.822) (0.034) Motivation –8.394*** 0.227*** (1.892) (0.060) Single items –1.794 0.096** (0.955) (0.035) Gender: female * single items 0.817 0.007 (0.985) (0.036) Age * single items –0.103* 0.002 (0.042) (0.001) Ability: intermediate * single items 2.235 –0.064 (1.366) (0.043) Ability: high * single items 1.979 –0.076 (1.124) (0.045) Motivation * single items 5.074* –0.197* (2.516) (0.081) Constant –2.136*** 0.553*** (0.548) (0.027) N 511 509 McFadden’s R2 0.369 Adjusted R2 0.133 * p < 0.05, **p < 0.01, ***p < 0.001. Note.— Model 1 is a logistic and Model 2 a linear regression. Cell entries are unstandardized regression coefficients with standard errors in parentheses. Table 4. Mean Differences in the Substantive Answers to Seven Items (Experiment 2) Items MGrid MSingle ΔM t p Item A 6.00 5.91 –0.09 –0.321 0.748 Item B 4.98 4.63 –0.35 –1.227 0.220 Item C 6.46 6.48 0.01 0.046 0.963 Item D 3.91 3.78 –0.13 –0.510 0.610 Item E 3.90 3.85 –0.05 –0.195 0.846 Item F 6.08 6.16 0.08 0.286 0.775 Item G 3.40 3.60 0.20 0.829 0.408 Items MGrid MSingle ΔM t p Item A 6.00 5.91 –0.09 –0.321 0.748 Item B 4.98 4.63 –0.35 –1.227 0.220 Item C 6.46 6.48 0.01 0.046 0.963 Item D 3.91 3.78 –0.13 –0.510 0.610 Item E 3.90 3.85 –0.05 –0.195 0.846 Item F 6.08 6.16 0.08 0.286 0.775 Item G 3.40 3.60 0.20 0.829 0.408 Table 4. Mean Differences in the Substantive Answers to Seven Items (Experiment 2) Items MGrid MSingle ΔM t p Item A 6.00 5.91 –0.09 –0.321 0.748 Item B 4.98 4.63 –0.35 –1.227 0.220 Item C 6.46 6.48 0.01 0.046 0.963 Item D 3.91 3.78 –0.13 –0.510 0.610 Item E 3.90 3.85 –0.05 –0.195 0.846 Item F 6.08 6.16 0.08 0.286 0.775 Item G 3.40 3.60 0.20 0.829 0.408 Items MGrid MSingle ΔM t p Item A 6.00 5.91 –0.09 –0.321 0.748 Item B 4.98 4.63 –0.35 –1.227 0.220 Item C 6.46 6.48 0.01 0.046 0.963 Item D 3.91 3.78 –0.13 –0.510 0.610 Item E 3.90 3.85 –0.05 –0.195 0.846 Item F 6.08 6.16 0.08 0.286 0.775 Item G 3.40 3.60 0.20 0.829 0.408 Table 5. Wording and Response Options of the Questions on Gender, Age, Education, and Interest in Politics (Experiment 1 & 2) Question Question wording Response scale Gender Please state your gender. (1) male (2) female Age Please enter the year you were born in. open-ended question that permitted numerical responses in the range of 1900 to 1994 in Experiment 1 and 1900 to 1995 in Experiment 2 Education What general school-leaving certificate do you have? (1) finished school without school-leaving certificate (2) lowest formal qualification of Germany’s tripartite secondary school system, after 8 or 9 years of schooling (3) intermediary secondary qualification, after 10 years of schooling (4) certificate fulfilling entrance requirements to study at a polytechnical college (5) higher qualification, entitling holders to study at a university (6) university degree (7) I am still in high school Interest in politics In general terms: how interested in politics are you? (1) very interested (2) interested (3) moderately interested (4) slightly interested (5) not interested at all (99) no answer Question Question wording Response scale Gender Please state your gender. (1) male (2) female Age Please enter the year you were born in. open-ended question that permitted numerical responses in the range of 1900 to 1994 in Experiment 1 and 1900 to 1995 in Experiment 2 Education What general school-leaving certificate do you have? (1) finished school without school-leaving certificate (2) lowest formal qualification of Germany’s tripartite secondary school system, after 8 or 9 years of schooling (3) intermediary secondary qualification, after 10 years of schooling (4) certificate fulfilling entrance requirements to study at a polytechnical college (5) higher qualification, entitling holders to study at a university (6) university degree (7) I am still in high school Interest in politics In general terms: how interested in politics are you? (1) very interested (2) interested (3) moderately interested (4) slightly interested (5) not interested at all (99) no answer Note.— Authors’ translations of the question and the response option wording. The sixth response option of the question on education (university degree) was not provided in Experiment 2. Table 5. Wording and Response Options of the Questions on Gender, Age, Education, and Interest in Politics (Experiment 1 & 2) Question Question wording Response scale Gender Please state your gender. (1) male (2) female Age Please enter the year you were born in. open-ended question that permitted numerical responses in the range of 1900 to 1994 in Experiment 1 and 1900 to 1995 in Experiment 2 Education What general school-leaving certificate do you have? (1) finished school without school-leaving certificate (2) lowest formal qualification of Germany’s tripartite secondary school system, after 8 or 9 years of schooling (3) intermediary secondary qualification, after 10 years of schooling (4) certificate fulfilling entrance requirements to study at a polytechnical college (5) higher qualification, entitling holders to study at a university (6) university degree (7) I am still in high school Interest in politics In general terms: how interested in politics are you? (1) very interested (2) interested (3) moderately interested (4) slightly interested (5) not interested at all (99) no answer Question Question wording Response scale Gender Please state your gender. (1) male (2) female Age Please enter the year you were born in. open-ended question that permitted numerical responses in the range of 1900 to 1994 in Experiment 1 and 1900 to 1995 in Experiment 2 Education What general school-leaving certificate do you have? (1) finished school without school-leaving certificate (2) lowest formal qualification of Germany’s tripartite secondary school system, after 8 or 9 years of schooling (3) intermediary secondary qualification, after 10 years of schooling (4) certificate fulfilling entrance requirements to study at a polytechnical college (5) higher qualification, entitling holders to study at a university (6) university degree (7) I am still in high school Interest in politics In general terms: how interested in politics are you? (1) very interested (2) interested (3) moderately interested (4) slightly interested (5) not interested at all (99) no answer Note.— Authors’ translations of the question and the response option wording. The sixth response option of the question on education (university degree) was not provided in Experiment 2. Figure 6. View largeDownload slide Screenshot of the question in the grid design (Experiment 1). Note.—Authors’ translations of the question and the response option wording. Figure 6. View largeDownload slide Screenshot of the question in the grid design (Experiment 1). Note.—Authors’ translations of the question and the response option wording. Figure 7. View largeDownload slide Screenshot of the first item in the single-item design (Experiment 1).Note.—Authors’ translations of the question and the response option wording. Figure 7. View largeDownload slide Screenshot of the first item in the single-item design (Experiment 1).Note.—Authors’ translations of the question and the response option wording. Figure 8. View largeDownload slide Screenshot of the question in the grid design (Experiment 2).Note.—Authors’ translations of the question and the response option wording. Figure 8. View largeDownload slide Screenshot of the question in the grid design (Experiment 2).Note.—Authors’ translations of the question and the response option wording. Figure 9. View largeDownload slide Screenshot of the first item in the single-item design (Experiment 2).Note.— Authors’ translations of the question and the response option wording. Figure 9. View largeDownload slide Screenshot of the first item in the single-item design (Experiment 2).Note.— Authors’ translations of the question and the response option wording. © The Author 2017. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Survey Statistics and Methodology Oxford University Press

Mitigating Satisficing in Cognitively Demanding Grid Questions: Evidence from Two Web-Based Experiments

Loading next page...
 
/lp/ou_press/mitigating-satisficing-in-cognitively-demanding-grid-questions-P51a6oaK20
Publisher
Oxford University Press
Copyright
© The Author 2017. Published by Oxford University Press on behalf of the American Association for Public Opinion Research.
ISSN
2325-0984
eISSN
2325-0992
D.O.I.
10.1093/jssam/smx020
Publisher site
See Article on Publisher Site

Abstract

Abstract Satisficing often has been assumed to be a hazard to response quality in web surveys because interview supervision is limited in the absence of a human interviewer. Therefore, devising methods that help to mitigate satisficing poses an important challenge to survey methodology. The present article examines whether splitting up cognitively demanding grid questions into single items can be an effective means to reduce measurement error and nonresponse resulting from survey satisficing. Furthermore, we investigate whether modifying the question design decreases the adverse effects of low ability and motivation of the respondents on response quality. The statistical analyses in our study relied on data from two web-based experiments with respondents from an opt-in and a probability-based online panel. Our results showed that using single items increased the response times compared to the use of standard grid questions, which might indicate a greater response burden. However, the use of single items significantly reduced the amount of response nondifferentiation and nonsubstantive responses. Our results further showed that the impact of respondent ability and motivation on the likelihood of satisficing was moderated by the question design. 1. INTRODUCTION While the presentation of questions is an important aspect of survey design, it is especially relevant in web surveys because of the absence of an interviewer who can guide respondents through the answering process (Couper 2008). Therefore, respondents can only obtain information about what answers are being sought from the design and presentation of the questions. The question design also has an impact on efficiency in terms of cognitive effort and time use with which respondents complete web questionnaires (Couper, Traugott and Lamias 2001). Consequently, the question design affects the response burden, and thus, may influence the motivation of respondents to provide accurate and meaningful responses (cf. Krosnick 1991, 1999). These considerations are particularly relevant in the decision for or against the use of grids, which are a widely used question format in web surveys (Couper, Tourangeau, Conrad and Zhang 2013). Grid (or matrix) questions present respondents a set of items that they are asked to answer using a common response scale (usually a Likert-type or rating scale). Although grids often have been assumed to be an efficient question format, previous research has shown that their use can lead to higher rates of survey breakoff, higher rates of missing or nonsubstantive answers, and higher levels of straightlining (i.e., nondifferentiated answers) compared to the use of single-item designs (Couper etal. 2013). While a few studies have explicitly acknowledged that particularly nondifferentiated answers in grid questions might be the result of survey satisficing (e.g., Couper etal. 2013; Zhang and Conrad 2014), most studies have fallen short of providing an empirically testable theoretical framework that explains the effects of using grids on measurement error and nonresponse. Thus, the present study proposes satisficing (Krosnick 1991, 1999) as a theoretical framework to derive testable hypotheses about the implications of question design decisions on the quality of respondents’ answers. In this regard, the main aim of our study is to assess whether splitting up cognitively demanding grid questions into single items can be an effective means to reduce measurement error resulting from survey satisficing. Furthermore, while a sizeable body of research has been carried out on comparing grid and single-item designs (e.g., Couper etal. 2001; Tourangeau, Couper, and Conrad 2004; Liu and Cernat 2016), none of these studies has systematically explored whether the effects of question design on measurement error and nonresponse interact with the respondents’ abilities and motivation. With regard to satisficing theory, we derived further testable hypotheses about how the complex relationship between task difficulty, ability, and motivation affects the quality of respondents’ answers. By using data from two web-based experiments on respondents from an opt-in and a probability-based online panel respectively, the present study set out to examine whether modifying the design of two cognitively demanding questions, one with five and one with seven items and each with an eleven-point response scale, decreases the adverse effects of low ability and motivation on the selection of satisficing as a response strategy. The following section provides a detailed overview of previous research, establishes the theoretical framework, and provides the hypotheses that guide our statistical analyses. The next sections present the data, methods, and results. We conclude the present article with a discussion on the implications of our findings for web survey methodology, the limitations of our study, and possible areas for further research. 2. THEORY Previous research on designing survey questions has suggested that arranging multiple items in a compact matrix makes answering questions faster (Couper etal. 2001; Tourangeau etal. 2004), easier, and more enjoyable (Krosnick 1991) because the perceived length of the questionnaire is reduced and the context of the questions is harmonized (Heerwegh 2009). However, some researchers have argued that answering grid questions—particularly in self-administered surveys—is more complex and taxing for respondents compared to answering single-item questions. Although the response scale has to be introduced only once, grids often present a lot of text within a limited space and, thus, require the respondents to process a large amount of information. In addition, with the increasing size of the grid (i.e., increasing number of items and/or scale points), navigation becomes more difficult compared to answering questions on separate pages because respondents have to locate a response option in two dimensions instead of one (Couper etal. 2013). Large grids may also require respondents to scroll vertically and/or horizontally on the screen, which further increases their response burden. Although answering grid questions does not always have to be more taxing than answering a series of separate questions, respondents may perceive answering grids as more burdensome (Couper etal. 2013; Liu and Cernat 2016). Due to their actual or perceived complexity, the (extensive) use of grids in surveys may discourage respondents from exerting sufficient efforts with respect to performing the cognitive processes involved in answering survey questions: comprehension, information retrieval, judgment, and response selection (Tourangeau, Rips, and Rasinski 2000). Consequently, respondents may process the available information less thoroughly or completely skip the cognitive processes of comprehension, information retrieval, and judgment; in other words, they use satisficing as a response strategy (Krosnick 1991, 1999). With regard to survey questions that are arranged in grids, satisficing might manifest in at least four different forms. First, according to the theory of satisficing, respondents often fail to differentiate between the items in grids. Under the condition of strong satisficing, respondents might select a somewhat reasonable response option for the first item in the grid, and rate all (or almost all) the remaining items with exactly the same response value (Krosnick 1991).1 This response pattern is referred to as nondifferentiation (Krosnick and Alwin 1988; Krosnick 1991; Krosnick, Narayan, and Smith 1996; Revilla and Ochoa 2014) or straightlining (Kaminska, McCutcheon, and Billiet 2010; Couper etal. 2013; Zhang and Conrad 2014; Schonlau and Toepoel 2015; Struminskaya, Weyandt, and Bosnjak 2015). Previous research lends support to the expectation that grids may invoke satisficing. Referring to the near means related heuristic, which states that respondents expect items that are grouped together to be closely related conceptually, a study by Tourangeau etal. (2004) investigated the effects of placing eight items in one large grid on a single page, in two smaller grids on separate pages, and on eight separate pages. Their results showed that the correlations between the items were significantly higher when the items were grouped in one large grid than when they were presented in two smaller grids or on single screens. As Tourangeau etal. (2004) pointed out, these differences could be accounted for partly by higher levels of straightlining in the single grid condition compared to the two grids and the single-item conditions. Thus, with respect to satisficing theory and previous research, we derive our first hypothesis on the implications of using (seemingly) complex grids compared to single-item designs. H1: The use of grids, compared to single-item designs, increases the rate of straightlining response patterns. Yet, some respondents may only fail to differentiate between some of the items in a grid, and may differentiate their answers simply to a lesser degree compared to optimizing respondents. Thus, weaker forms of satisficing might result in a lower response differentiation (cf. Krosnick and Alwin 1988). H2: The use of grids, compared to single-item designs, leads to less differentiated responses. Third, satisficing theory suggests that respondents seek to reduce the response burden by superficially comprehending questions or even completely skipping the cognitive processes of information retrieval and judgment. In line with this assumption, research has shown that satisficing respondents are more likely to provide nonsubstantive responses such as “don’t know” or “no opinion” than optimizing respondents (Krosnick etal. 1996; Krosnick etal. 2002). In web surveys—particularly in the absence of explicit “don’t know” and “no answer” response options—a nonsubstantive response might also be leaving a question blank (Heerwegh 2005). The grid question design may aggravate the use of nonsubstantive responses for at least two reasons. First, respondents are likely to assume that the items in grid questions are conceptually related (Tourangeau etal. 2004). Consequently, respondents should be more inclined to select the same response option for the question items in a grid than in a single-item design. Thus, if satisficing respondents perceive selecting an explicit “don’t know” response option for the first item in the grid as a viable strategy to reduce their response burden, the reasonable consequence might be to answer all or almost all of the remaining items nonsubstantively as well. Second, the grid question design might invite satisficing respondents to quickly click through the response options, and thus fail to accurately click on some of the response fields. In line with these assumptions, previous research has shown that grouping multiple questions on a single page of a survey can increase the rate of nonsubstantive or missing answers relative to single-item-per-page designs (Peytchev, Couper, McCabe, and Crawford 2006; Toepoel, Das, and Van Soest 2009; Mavletova and Couper 2015).2 Thus, based on satisficing theory, we put forward our third hypothesis: H3: The use of grids, compared to asking questions separately, increases the rate of nonsubstantive responses. Fourth, and finally, as satisficing implies, respondents may less thoroughly perform or skip the cognitive processes of comprehension, information retrieval, and judgment, which yields comparatively short response times (Zhang and Conrad 2014; Greszki, Meyer, and Schoen 2015). Research on survey and question design has provided ample evidence that single-item-per-page designs take respondents longer to answer than multiple-items-per-page designs (Couper etal. 2001; Tourangeau etal. 2004; Toepoel etal. 2009; Mavletova and Couper 2015). While it is obvious that answering questions in grids requires less physical actions by respondents than answering the questions separately, we propose that satisficing also contributes substantially to shorter response times for grids. H4: The use of grids, compared to single-item designs, leads to shorter response times. Although previous research has compared the impact of grid and single-item designs on indicators of response quality (e.g., Couper etal. 2001; Tourangeau etal. 2004), to date only one study has systematically explored whether the implications of question design are moderated by respondents’ abilities (Toepoel etal. 2009). In their survey experiment, these authors examined the effects of placing forty items with a five-point Likert scale on forty separate pages, ten pages with four items each, four pages with ten items each, or all forty items on a single page. Notably, they found that the number of item nonresponses significantly differed across the four designs for the group of low educated respondents, whereas they found nonsignificant differences for the groups of respondents with intermediate or high levels of education. This finding fits well with the assumption of satisficing theory that task difficulty, respondents’ abilities, and respondents’ motivation interact in determining the response strategy (cf. Krosnick 1991). While previous research has shown that nondifferentiation is more common among less educated respondents (e.g., Krosnick and Alwin 1988; Krosnick etal. 1996; Roberts, Gillian, Allum, and Lynn 2010) and that nonsubstantive answers are more likely among less able and motivated respondents (e.g., Lenzner 2012), the findings by Toepoel etal. (2009) suggest that the effect of respondents’ abilities on item nonresponse is moderated by the complexity of the task (which corresponds to the experimental variation of the survey design). With respect to our theoretical framework, we assume that the effects of respondents’ abilities and motivation on satisficing differ between grid and single-item designs. Assuming that answering grids is more burdensome and less enjoyable for respondents than answering single items, we suggest that the impact of low abilities or low motivation on response outcomes should be stronger for grids than the single-item alternative. H5: The impact of low ability on survey satisficing is stronger for grids than single-item designs. H6: The impact of low motivation on survey satisficing is stronger for grids than single-item designs. Taking these six hypotheses as a basis, the present study assesses whether modifying the question design decreases the adverse effects of respondents’ abilities and motivation on survey satisficing. 3. EXPERIMENT 1 3.1 Data The first web-based experiment was conducted with respondents from the German opt-in online panel of the Respondi AG, which had about 100,000 active members in 2012. The experiment was included in a web survey on political attitudes and behaviors in Germany, which was fielded between December 6 and December 16, 2012. The web survey was quota sampled using reference distributions for gender, age, and education from the target population of the German electorate. Overall, 3,478 panelists accepted the survey invitation, of whom 535 were screened out, 440 broke off, and 2,503 completed the survey. The break-off rate is 15.0 percent (cf. Callegaro and DiSogra 2008). To examine the effects of the design on the quality of responses, we used a cognitively demanding question on the expected policy positions of five potential government coalitions on taxes and welfare spending in Germany. Respondents answered the question using an eleven-point bipolar rating scale with labeled endpoints. This question also featured an explicit “don’t know” response option, which was visually separated from the response scale. The respondents were randomly assigned to either a standard grid question with five items or a single-item design in which the five questions were presented on separate pages (see Appendix figures 6 and 7). Bivariate analyses on gender (χ2 = 0.176, df = 1, p = 0.675), age (χ2 = 0.304, df = 4, p = 0.990), education (χ2 = 1.357, df = 2, p = 0.507), region (χ2 = 2.873, df = 1, p = 0.090), and interest in politics (χ2 = 7.189, df = 4, p = 0.126) indicated that the randomization of respondents to the experimental conditions had been successful. 3.2 Methods We experimentally tested our hypotheses on the effects of question design on response quality (H1 to H4) by analyzing the differences in means and proportions. Accordingly, we created five indicators for weak and strong forms of satisficing. 3.2.1 Response time. We used server-side response times (in seconds) as an indicator for the respondents’ speed of processing and answering a question. 3.2.2 Response differentiation. The responses to a set of items can differ in at least two aspects (McCarty and Shrum 2000): first, with respect to the number of different scale points used (i.e., variability of the responses), and second, with respect to the distance among the scale points used by respondents (i.e., extremity of the responses). To take these two aspects into account, we created two indicators to examine the extent to which respondents differentiated their answers to the five questions. With respect to the variability of the responses, we computed the probability of differentiation Pd as Pd=1-∑i=1nPi2, where Pi is the proportion of the values rated at a given point on a rating scale, and n is the number of rating points (Krosnick and Alwin 1988). Pd takes values between zero and close to one. While zero means that all items were given the same rating (i.e., straightlining), larger values indicate that more scale points were used. In addition, we computed the coefficient of variation, which is a measure of the distance among the scale points used by respondents, and thus is a measure of the extremity of the responses (McCarty and Shrum 2000). We computed the coefficient of variation (CV) as CV=sx-, where s is the standard deviation and x̄ is the mean of the responses over items.3 The coefficient of variation takes values greater than or equal to zero. Again, zero corresponds to straightlining response patterns, while larger values indicate more differentiated answers. 3.2.3 Straightlining. While weak forms of satisficing may cause less differentiated answers to question items, strong satisficing can result in nondifferentiated answers to question items. Thus, we further created a dummy variable for straightlining response patterns (0 = differentiated response and 1 = straightlining) as an indicator of strong satisficing.4 3.2.4 Nonsubstantive responses. To study the impact of question design on nonsubstantive responses, we generated a dummy variable for extensively providing “don’t know” responses or refusing to answer across the five questions (0 = less than four nonsubstantive responses and 1 = four to five nonsubstantive responses) as an additional indicator for strong forms of satisficing. In addition, we analyzed the effects of the question design on respondents’ substantive answers to the five items. These additional analyses explored the implications for observed means and the internal consistency of the answers. 3.2.5 Observed means. For each item of the question, we tested whether the observed mean for the single-item condition differed significantly from the observed mean for the grid condition. 3.2.6 Internal consistency. Previous studies have provided mixed evidence for the assumption that interitem correlations are lower in single-item designs compared to grid designs (Couper etal. 2001; Tourangeau etal. 2004; Toepoel etal. 2009). Taking up their approach, we examined whether a measure of internal consistency among the answers to the five items—Cronbach’s α (Cronbach 1951)—was significantly lower for the single-item design than for the grid design using the Feldt test (Feldt 1969; Charter and Feldt 1996). Finally, to assess H5 and H6, we ran regression models with the indicators for straightlining, response differentiation (Pd), and nonsubstantive responses as dependent variables. All models included the respondents’ gender (0 = male and 1 = female) and age (in years) as controls. A dummy variable indicated whether respondents had to answer the grid or the single-item design question (0 = grid and 1 = single items). In addition, we used two personal characteristics to model respondents’ propensity to satisfice. 3.2.7 Ability. The respondents’ level of cognitive sophistication reflects their general ability to perform complex mental operations (Krosnick 1991). In conformance with previous research, we used the level of formal education (0 = low, 1 = intermediate, and 2 = high) as an indicator for respondents’ cognitive sophistication (e.g., Krosnick and Alwin 1987; Krosnick etal. 1996; Holbrook, Krosnick, Moore, and Tourangeau 2007). We coded respondents without a school-leaving certificate or who held the lowest formal qualification of Germany’s tripartite secondary school system as having low education. We coded respondents with intermediary secondary qualification as having intermediate education, and we assigned respondents who were entitled to study at a college or university to high education (see Appendix Table 5 for the wording of the question). 3.2.8 Motivation. According to Krosnick (1991), motivation is affected by the degree to which the topic of a question is personally important. As we assumed that respondents perceived the questions as more important when they were more interested in the survey topic (cf. Holbrook etal. 2014), we used a rescaled five-point measure of the respondent’s interest in politics as an indicator for motivation (the rescaled variable ranged from 0 = low to 1 = high). To assess whether the effects of respondents’ ability and motivation differed across the question designs, we included interaction terms between the regressors and the indicator for the single-item design in each of the regression models. The logistic regression models for straightlining and nonsubstantive responses as dependent variables (yi) can be expressed as logitPryi=1=β0+β1genderi+β2agei+β3abilityi+β4motivationi+β5 single itemsi+β15genderi single itemsi+β25agei single itemsi+β35abilityi single itemsi+β45motivationi single itemsi. Similarly, the linear regression model for the response differentiation (Pd) as dependent variable (yi) is yi=β0+β1genderi+β2agei+β3abilityi+β4motivationi+β5 single itemsi+β15genderi single itemsi+β25agei single itemsi+β35abilityi single itemsi+β45motivationi single itemsi. Since the interpretation of interactions can change with regard to the probabilities due to nonadditivity and nonlinearity of the logistic model of probabilities (Best and Wolf 2015), we plotted the predicted probabilities for the low, intermediate, and high levels of our indicators for ability and motivation to examine their effects on the dependent variables. Correspondingly, we plotted linear predictions to study the effects of ability and motivation on response differentiation (Pd). 3.3 Results In line with the expectation of H1, we found that the rate of straightlining was significantly higher for the grid question (MGrid = 0.16) compared to the single-item design (MSingle = 0.10, z = –3.836, p = 0.000). However, despite the larger proportion of straightlining in the grid condition, we did not observe that the answers to the grid question (MGrid = 0.31) were significantly less differentiated than the answers to the single questions (MSingle = 0.31, t = 0.121, p = 0.904) with respect to the coefficient of variation. However, at the same time, our analysis showed that respondents differentiated their responses to a significantly less degree in the grid question (MGrid = 0.56) in contrast to the single-item design (MSingle = 0.61, t = 4.218, p = 0.000) in terms of the coefficient Pd. Taken together, these results showed that respondents differentiated their answers to a moderately higher degree in the single-item design compared to the grid design in terms of the number of scale points used. However, their responses were not more differentiated in the single-item design with respect to the distance among the answers. In other words, respondents to the single-item condition used slightly more points of the response scale, although it seems that the additional scale points they used were rather proximate to the scale points they would have selected for the grid condition. Overall, these results from Experiment 1 provide partial support for H2, which suggests that the use of grids leads to less differentiated responses compared to single-item designs. As expected with respect to H3, our results showed that the rate of an elevated use of nonsubstantive response options was significantly higher for the grid design (MGrid = 0.30) compared to the single-item design (MSingle = 0.19, z = –6.050, p = 0.000).5 In line with the findings from previous research, our analysis supported the assumption of H4, which suggests that response times (in seconds) were markedly shorter for the grid questions (MGrid = 29.14) in contrast to asking the questions separately (MSingle = 38.64, t = 11.512, p = 0.000). However, this analysis of response times did not enable us to determine whether the shorter response times for the grid condition resulted from higher levels of satisficing or the lower number of physical actions taken by respondents. In addition, we studied whether the observed differences in our indicators for satisficing also were reflected in differences in the observed means and the internal consistency of the answers. Notably, we did not find any significant difference in item means between the grid design and single-item design (see Appendix Table 2). However, we did find slightly higher internal consistency in the grid design compared to the single-item design (Cronbach’s αGrid = 0.68, αSingle = 0.64, F = 1.133, p = 0.026). Thus, our results supported the finding by Tourangeau etal. (2004) that respondents gave answers of higher consistency in a grid design than in a single-item design. Apparently, this finding can be attributed to the higher rate of straightlining in the grid condition; if we excluded straightlining respondents, the difference in the internal consistency between the designs disappeared (αGrid = 0.54, αSingle = 0.53, F = 1.021, p = 0.381). Finally, as described in the Methods section, we used linear and logistic regression models to assess hypotheses H5 and H6 for those indicators that showed significant differences between the two designs. Figure 1 presents the effects of respondents’ ability and motivation on the predicted probability of straightlining across the experimental conditions. In line with H5, we found that respondents with lower ability were significantly more likely to straightline in the grid design than in the single-item design, whereas we found no significant difference for respondents with intermediate or high ability. Despite the relatively large confidence intervals, the results also suggested that respondents with low or intermediate motivation were more likely to straightline if the items were arranged in a grid design rather than presented separately.6 Figure 1. View largeDownload slide The effects of ability and motivation on the predicted probability of straightlining in the grid and single-item conditions (Experiment 1). Figure 1. View largeDownload slide The effects of ability and motivation on the predicted probability of straightlining in the grid and single-item conditions (Experiment 1). Figure 2 presents linear predictions of the effects of respondents’ ability and motivation on response differentiation (Pd) across the experimental conditions. As expected, the findings are very similar to those for straightlining; in particular, respondents with low ability and low or intermediate levels of motivation gave less differentiated responses with regard to the number of scale points used, whereas we did not observe significant differences for respondents with intermediate or high ability, or high levels of motivation. Figure 2. View largeDownload slide Linear prediction of the effects of ability and motivation on response differentiation (Pd) in the grid and single-item conditions (Experiment 1). Figure 2. View largeDownload slide Linear prediction of the effects of ability and motivation on response differentiation (Pd) in the grid and single-item conditions (Experiment 1). Finally, figure 3 depicts the effects of respondents’ ability and motivation on the predicted probability of nonsubstantive responses for the grid design and the single-item design. In contrast to the models for straightlining and response differentiation, the results did not support the assumptions of H5 and H6. Contradicting our expectations, we found that the probability of nonsubstantive responses was significantly lower for highly educated respondents to the single-item condition than in the grid condition. Figure 3. View largeDownload slide The effects of ability and motivation on the predicted probability of nonsubstantive responses in the grid and single-item conditions (Experiment 1). Figure 3. View largeDownload slide The effects of ability and motivation on the predicted probability of nonsubstantive responses in the grid and single-item conditions (Experiment 1). 4. EXPERIMENT 2 To verify the robustness of our results across different questions and to gain some further in-depth insights on how the design of questions affects response behavior, we replicated our analyses with a slightly modified experimental design. 4.1 Data We conducted the second web-based experiment between February 27 and April 1, 2013 with respondents from the GESIS Online Panel Pilot (see Struminskaya, Kaczmirek, Schaurer, and Bandilla 2014 for a detailed description of the online panel). The respondents for this online panel were recruited in telephone interviews using a probability sample of the German-speaking population in Germany aged eighteen years and older, who used the Internet for non-work-related purposes (Struminskaya etal. 2014). Of the 859 eligible panel members who were invited, 529 completed the survey, thus providing a completion rate of 61.6 percent. The cumulative response rate two equaled 2.6 percent (cf. Callegaro and DiSogra 2008; DiSogra and Callegaro 2015). To adjust the sample to the German online population, we calculated weights based on reference distributions for gender, age, education, and region as provided by the (N)Onliner Atlas 2010 (Initiative D21 and TNS Infratest 2010). We randomly assigned participants to either a grid or single-item version of a cognitively demanding question concerning the respondents’ evaluation of seven major German political parties (i.e., CDU, CSU, SPD, FDP, Die Linke, Bündnis 90/Die Grünen, and the Piratenpartei). The question featured an eleven-point rating scale with verbally labeled endpoints and a visually separated “unable to evaluate” response option (see Appendix figures 8 and 9). Bivariate analyses on gender (χ2 = 3.576, df = 1, p = 0.059), age (χ2 = 2.696, df = 4, p = 0.610), education (χ2 = 1.906, df = 2, p = 0.386), region (χ2 = 0.407, df = 1, p = 0.523), and interest in politics (χ2 = 4.105, df = 4, p = 0.392) suggested that the randomization had been successful. 4.2 Methods With two notable exceptions, the second web-based experiment used the same methods and indicators for satisficing as Experiment 1. First, response times were measured in milliseconds using a client-side paradata script. In addition, the script captured the number of mouse clicks and the time between those click events. Thus, the measure was not affected by the speed of the Internet connection of the respondent’s computer. Consequently, client-side response times enable a more precise analysis of the response process compared to server-side measures (Kaczmirek 2009). Second, we coded an extensive use of nonsubstantive responses if respondents provided less than two substantive responses across the seven questions (0 = less than 6 nonsubstantive responses and 1 = six to seven nonsubstantive responses). 4.3 Results Again in support of H1, we observed that the rate of straightlining was significantly higher in the grid design (MGrid = 0.10) compared to the single-item design (MSingle = 0.04, z = –2.970, p = 0.003). Largely confirming the findings from Experiment 1, we found that the answers to the grid question were not less differentiated than the answers to the single questions (MGrid = 0.56, MSingle = 0.60, t = 1.651, p = 0.099) with respect to the coefficient of variation, but they differed moderately in terms of the coefficient Pd (MGrid = 0.64, MSingle = 0.69, t = 2.925, p = 0.004). Again, these results provide partial support for H2. Lending further support for H3, our results showed that the rate of nonsubstantive responses was significantly higher for the grid design (MGrid = 0.06) than the single-item design (MSingle = 0.01, z = –2.609, p = 0.009). In line with the findings from Experiment 1, our analyses revealed that response times were again shorter for the grid question (MGrid = 37.16 s) in contrast to asking the questions separately (MSingle = 48.12 s, t = 6.503, p = 0.000). However, taking advantage of the client-side paradata, we were able to assess the association between design and response behavior in more detail. Confirming the assumption that single-item designs require respondents to take more physical actions, we found that the average number of mouse clicks was significantly higher if the questions were asked separately (MGrid = 8.27, MSingle = 14.94, t = 34.026, p = 0.000). However, even if we subtracted the times for loading the websites and for clicking on the continue button from the overall response time, and only looked at the time respondents spent to read the questions and select a response, answering the grid question was still faster than answering single items (MGrid = 34.98 s, MSingle = 42.62 s, t = 4.779, p = 0.000). In addition, by examining exclusively the first item, we discovered that respondents did not need significantly more time to read and answer the question item in the grid compared to the single-item design (MGrid = 14.21 s, MSingle = 14.66 s, t = 0.590, p = 0.555). We take this finding as an indication that respondents particularly spent more time to process the adjacent items if they were presented on separate pages of the survey. Thus, although respondents have to perform fewer physical actions and do not need to double-check whether scale labels have changed when answering a grid question, we suggest that the latter result also lends support to the notion that satisficing contributes to shorter response times in grid designs compared to single-item designs. Providing further support for the results from Experiment 1, we did not observe significant differences in observed means between the grid and single-item conditions (see Appendix Table 4). Also in line with the findings from Experiment 1, we found considerably higher consistency among the answers for the grid design compared to the single-item design (αGrid = 0.60, αSingle = 0.41, F = 1.470, p = 0.001). Again, we can attribute this difference to the higher rate of straightlining in the grid condition; if we excluded straightlining respondents, the difference in the internal consistency between the designs vanished (αGrid = 0.38, αSingle = 0.37, F = 1.014, p = 0.456). To further assess H5 and H6, we ran a logistic regression with straightlining and a linear regression with the response differentiation (Pd) as dependent variables. The indicators for respondents’ ability and motivation were the same as in Experiment 1. Figure 4 depicts the effects of respondents’ ability and motivation on the predicted probability of straightlining across the experimental conditions. Figure 4. View largeDownload slide The effects of ability and motivation on the predicted probability of straightlining in the grid and single-item conditions (Experiment 2). Figure 4. View largeDownload slide The effects of ability and motivation on the predicted probability of straightlining in the grid and single-item conditions (Experiment 2). Again, confirming H5 and H6, our results showed that the probability of straightlining was significantly higher for respondents with low ability and motivation if the items were arranged in a grid design rather than presented on separate pages of the survey, whereas the probability was not significantly different between the designs for respondents with intermediate or high levels of ability and motivation. The results from the analyses on the degree to which respondents differentiate their responses (Pd) confirm the findings from Experiment 1. They show that particularly respondents with low ability and low or intermediate levels of motivation provided less differentiated responses with regard to the number of scale points used, whereas no significant differences for respondents with intermediate or high ability or high levels of motivation. Thus, these findings provide further evidence in support of H5 and H6 (see figure 5).7 Figure 5. View largeDownload slide Linear prediction of the effects of ability and motivation on response differentiation (Pd) in the grid and single-item conditions (Experiment 2). Figure 5. View largeDownload slide Linear prediction of the effects of ability and motivation on response differentiation (Pd) in the grid and single-item conditions (Experiment 2). 5. CONCLUSIONS With regard to the widely shared assumption that web survey respondents might perceive answering grid questions as more complex and taxing than single-item questions (cf. Couper etal. 2013; Liu and Cernat 2016), our study has sought to examine whether splitting up cognitively demanding grid questions into single items can be an effective means to mitigate survey satisficing. In line with three hypotheses, which we derived from satisficing theory, our results from two web-based experiments showed that the use of grids resulted in moderately lower response differentiation with regard to the number of scale points used (i.e., lower variability of the responses) and higher rates of straightlining, as well as more nonsubstantive answers compared to single-item designs. In particular, the rates of nonsubstantive answers and straightlining were substantially lower if the items were presented on separate pages of the survey. Remarkably, these findings hold true for two different questions on political attitudes and two different sampling sources (opt-in and probability-based online panels), which supports the robustness of our results. Most importantly, our study gained valuable insights for survey methodologists by showing that the effects of respondents’ abilities and motivation on survey satisficing are moderated by the question design. Our findings provide support for the assumption that using a single-item design instead of a grid desgin decreases the adverse effects of low abilities and motivation on the selection of satisficing as a response strategy. Specifically, we observed that the probability of straightlining and low response differentiation (as measured by Pd) was substantially lower among respondents with low abilities and low motivation. The results of our study suggest that the single-item design particularly enabled respondents with low abilities and low motivation to provide answers of higher quality. Therefore, we recommend that for cognitively demanding survey questions, researchers should generally favor the use of a single-item design over grid designs due to their positive effects on response quality. However, the results of our analyses are consistent with previous research demonstrating that grids take less time to answer than single-item designs (e.g., Couper etal. 2001; Tourangeau etal. 2004). Although it seems obvious that answering grid questions requires respondents to take fewer physical actions than answering single-item questions, we also found that answering a grid question was still faster than answering single items even if we only considered the time respondents needed to read the questions and to select their responses. In conjunction with the observation that the response times for the first item of a question did not differ between the grid and single-item designs, we interpreted this finding to be an indication that respondents spent more time processing adjacent items in single-item designs. Thus, we inferred that satisficing is likely to contribute to the shorter response times to grid designs. However, we need to acknowledge that grids are an efficient question format in that respondents can easily recognize that the items refer to an overarching topic and feature a common response scale. Although grid questions may appear very complex and daunting to respondents, answering them is slightly more comfortable compared to single-item designs because respondents do not need to double-check whether scale labels have changed, and they do not need to submit their response to every single item. Thus, we need to be aware that substituting grids with single-item designs might not be feasible or desirable in every situation. Particularly, if a questionnaire includes several multi-item scales or thematically related sets of items, using single-item designs instead of grids can quickly increase the length of a survey to an extent where the positive effects of the question design are outweighed by the negative effects of the increased response burden on response quality and the willingness to continue survey participation (cf. Galesic and Bosnjak 2009; Roberts etal. 2010). We encourage further research that explores to what extent grids can be substituted appropriately with alternative designs until the negative effects of an increasing survey length on measurement error and nonresponse bias prevail. Furthermore, the results of both experiments suggest that the lower response differentiation for the grid design was associated with a higher consistency of answers compared to the single-item design. While we did not find any evidence that the question design affected observed item means, its effect on the consistency of answers might have serious implications for analyses based on latent means. However, it is beyond the scope of the present study to draw generalizable conclusions about whether a lower consistency of answers is desirable or not. Finally, future research could address two limitations of the present study. First, while our analyses provided consistent evidence across the two experiments that the impact of low ability and motivation on the probability of straightlining and response differentiation were stronger in the grid condition than in the single-item condition, we did not find evidence for the expected impact on the probability of nonsubstantive responses. Thus, we recommend that future research examine the effects of question design on nonsubstantive responses. Second, both experiments used cognitively demanding questions with eleven-point end-labeled rating scales. Previous research has suggested that response burden likely varies with the length and labeling of the response scale (Krosnick and Presser 2010). For instance, a recent study by Liu and Cernat (2016) has provided first evidence that differences in response quality between grid and single-item designs might be more pronounced for questions with nine- and eleven-point rating scales than for those with seven or fewer scale points. Thus, we see merit in replications of our study that compare grid and single-item designs for questions with response scales of different length or with different labeling. Such research designs could provide valuable insights on the role of task difficulty in explaining satisficing. Footnotes 1. Often, this will be the response option that respondents expect to reflect a common or average response (e.g., the midpoint of the response scale). In other situations, it will be the response option that respondents believe conforms to the researcher’s expectations. This might be one of the extreme options of the response scale (Krosnick and Alwin 1988). 2. However, a few studies present conflicting results. For example, Couper etal. (2001) examined whether grouping five related knowledge items in one grid and eleven related attitude items in three grids on separate pages yielded better response quality than presenting the items on five and eleven separate pages. They found that the number of nonsubstantive responses (don’t know/not applicable) was lower in the grid condition. Using a slightly different design, Mavletova and Couper (2014) did not find significant differences in the amount of nonsubstantive responses when the questions were placed on seventeen separate pages or on two pages of their survey. 3. We treated “don’t know” and “no answer” responses as missing in the calculation of CV (and the straightlining indicator). Most importantly, computing the mean and the standard deviation requires assigning numeric values to “don’t know” and “no answer” responses that cannot be justified from the response scale of the item (e.g., assigning values of twelve to “don’t know” and thirteen to “no answer” responses if an ordinal response scale runs from one to eleven). In contrast, the computation of Pd required including “don’t know” and “no answer” responses to obtain the sum of the proportions of the values rated at given points on the rating scale (which is not affected by the values assigned to “don’t know” and “no answer” responses). However, to enhance the comparability of both measures of response differentiation (i.e., CV and Pd), we set Pd to missing if the number of nonmissing responses was lower than two. 4. We derived the indicator from the standard deviation of the responses across the items. Accordingly, straightlining was indicated if all nonmissing responses were identical across the items (i.e., at least two nonmissing responses with a standard deviation equal to zero). 5. Notably, the results of a series of additional logistic regression models provided evidence in favor of our assumption that respondents who provided a nonsubstantive response (“don’t know” or no answer) to the first item were more likely to answer nonsubstantively to the remaining items in a grid design than in a single-item design (in each model, the coefficient of the interaction between a nonsubstantive answer to the first item and the indicator for the single-item design was negative and p < 0.01). 6. Note that the figures 1 through 4 present predicted probabilities and linear predictions with 95% confidence intervals for low, intermediate, and high levels of ability and motivation. For the logistic and linear regression models that were used to compute the predicted probabilities and linear predictions, see the Appendix Tables 1 and 3. These tables also provide information about whether the overall effects of ability and motivation differed significantly between the experimental conditions. Regarding the known problems in the interpretation of interactions in nonlinear models (see Methods section), we referred to predicted values in discussing the results of our analyses. 7. We could not replicate the analysis for nonsubstantive responses due to the low number of relevant cases in Experiment 2 (n = 16). References Best H. , Wolf C. ( 2015 ), “ Logistic Regression,” in Regression Analysis and Causal Inference , eds. Best H. , Wolf C. , pp. 153 – 171 , Los Angeles, CA : Sage . Callegaro M. , DiSogra C. ( 2008 ), “ Computing Response Metrics for Online Panels ,” Public Opinion Quarterly , 72 ( 5 ), 1008 – 1032 . Google Scholar CrossRef Search ADS Charter R. A. , Feldt L. S. ( 1996 ), “ Testing the Equality of Two Alpha Coefficients ,” Perceptual and Motor Skills , 82 ( 3 ), 763 – 768 . Google Scholar CrossRef Search ADS Couper M. P. ( 2008 ), Designing Effective Web Surveys , Cambridge : Cambridge University Press . Google Scholar CrossRef Search ADS Couper M. P. , Traugott M. W. , Lamias M. J. ( 2001 ), “ Web Survey Design and Administration ,” Public Opinion Quarterly , 65 (2), 230 – 253 . Google Scholar CrossRef Search ADS PubMed Couper M. P. , Tourangeau R. , Conrad F. G. , Zhang C. ( 2013 ), “ The Design of Grids in Web Surveys ,” Social Science Computer Review , 31(3) , 322 – 345 . Google Scholar CrossRef Search ADS Cronbach L. J. ( 1951 ), “ Coefficient Alpha and the Internal Structure of Tests ,” Psychometrika , 16 (3), 297 – 334 . Google Scholar CrossRef Search ADS DiSogra C. , Callegaro M. ( 2015 ), “ Metrics and Design Tool for Building and Evaluating Probability-Based Online Panels ,” Social Science Computer Review , 34 (1), 26 – 40 . Google Scholar CrossRef Search ADS Feldt L. S. ( 1969 ), “ A Test of the Hypothesis That Cronbach's Alpha or Kuder-Richardson Coefficent Twenty Is the Same for Two Tests ,” Psychometrika , 34 (3), 363 – 373 . Google Scholar CrossRef Search ADS Galesic M. , Bosnjak M. ( 2009 ), “ Effects of Questionnaire Length on Participation and Indicators of Response Quality in a Web Survey ,” Public Opinion Quarterly , 73 (2), 349 – 360 . Google Scholar CrossRef Search ADS Greszki R. , Meyer M. , Schoen H. ( 2015 ), “ Exploring the Effects of Removing “Too Fast” Responses and Respondents from Web Surveys ,” Public Opinion Quarterly , 79 (2), 471 – 503 . Google Scholar CrossRef Search ADS Heerwegh D. ( 2005 ), Web Surveys. Explaining and Reducing Unit Nonresponse, Item Nonresponse and Partial Nonresponse , Katholieke Universiteit Leuven , Belgium. Heerwegh D. ( 2009 ), “ Mode Differences between Face-to-Face and Web Surveys: An Experimental Investigation of Data Quality and Social Desirability Effects ,” International Journal of Public Opinion Research , 21 (1), 111 – 121 . Google Scholar CrossRef Search ADS Holbrook A. L , Anand S. , Johnson T. P. , Cho Y. I. , Shavitt S. , Chávez N. , Weiner S. , et al. ( 2014 ), “ Response Heaping in Interviewer-Administered Surveys: Is It Really a Form of Satisficing? ,” Public Opinion Quarterly , 78 , 591 – 633 . Google Scholar CrossRef Search ADS Holbrook A. L. , Krosnick J. A. , Moore D. , Tourangeau R. ( 2007 ), “ Response Order Effects in Dichotomous Categorical Questions Presented Orally: The Impact of Question and Respondent Attributes ,” Public Opinion Quarterly , 71 (3), 325 – 348 . Google Scholar CrossRef Search ADS Initiative D21 and TNS Infratest ( 2010 ), “(N)Onliner Atlas 2010,” available at http://initiatived21.de/app/uploads/2017/02/nonliner2010.pdf. Last accessed on July 25, 2017. Kaczmirek L. ( 2009 ), Human-Survey Interaction: Usability and Nonresponse in Online Surveys , Köln : Halem , Germany. Kaminska O. , McCutcheon A. L. , Billiet J. ( 2010 ), “ Satisficing among Reluctant Respondents in a Cross-National Context ,” Public Opinion Quarterly , 74 (5), 956 – 984 . Google Scholar CrossRef Search ADS Krosnick J. A. ( 1991 ), “ Response Strategies for Coping with the Cognitive Demands of Attitude Measures in Surveys ,” Applied Cognitive Psychology , 5 (3), 213 – 236 . Google Scholar CrossRef Search ADS Krosnick J. A. ( 1999 ), “ Survey Research ,” Annual Review of Psychology , 50 , 537 – 567 . Google Scholar CrossRef Search ADS PubMed Krosnick J. A. , Alwin D. F. ( 1987 ), “ An Evaluation of a Cognitive Theory of Response-Order Effects in Survey Measurement ,” Public Opinion Quarterly , 51 (2), 201 – 219 . Google Scholar CrossRef Search ADS Krosnick J. A. , Alwin D. F. ( 1988 ), “ A Test of the Form-Resistant Correlation Hypothesis. Ratings, Rankings, and the Measurement of Values ,” Public Opinion Quarterly , 52 (4), 526 – 538 . Google Scholar CrossRef Search ADS Krosnick J. A , et al. ( 2002 ), “ The Impact of "No Opinion" Response Options on Data Quality. Non-Attitude Reduction or an Invitation to Satisfice? ,” Public Opinion Quarterly , 66 (3), 371 – 403 . Google Scholar CrossRef Search ADS Krosnick J. A. , Narayan S. , Smith W. R. , ( 1996 ), “Satisficing in Surveys: Initial Evidence,” in Advances in Survey Research: New Directions for Evaluation , eds. Braverman M. T. , Slater J. K. , pp. 29 –44, Jossey-Bass : Wiley . Krosnick J. A. , Presser S. ( 2010 ), “Question and Questionnaire Design,” in Handbook of Survey Research , eds. Marsden P. V. , Wright J. D. , pp. 263 – 313 , Bingley : Emerald Lenzner T. ( 2012 ), “ Effects of Survey Question Comprehensibility on Response Quality ,” Field Methods , 24 , 409 – 428 . Google Scholar CrossRef Search ADS Liu M. , Cernat A. ( 2016 ), “ Item-by-Item Versus Matrix Questions: A Web Survey Experiment ,” Social Science Computer Review , 1 – 17 . Mavletova A. , Couper M. P. ( 2014 ), “ Mobile Web Survey Design: Scrolling Versus Paging, SMS Versus E-Mail Invitations ,” Journal of Survey Statistics and Methodology , 2 (4), 498 – 518 . Google Scholar CrossRef Search ADS Mavletova A. , Couper M. P. ( 2015 ), “ Grouping of Items in Mobile Web Questionnaires ,” Field Methods , 28 (2), 170 – 193 . Google Scholar CrossRef Search ADS McCarty J. A. , Shrum L. J. ( 2000 ), “ The Measurement of Personal Values in Survey Research: A Test of Alternative Rating Procedures ,” Public Opinion Quarterly , 64 (3), 271 – 298 . Google Scholar CrossRef Search ADS PubMed Peytchev A. , Couper M. P. , McCabe S. E. , Crawford S. D. ( 2006 ), “ Web Survey Design: Paging Versus Scrolling ,” Public Opinion Quarterly , 70 (4), 596 – 607 . Google Scholar CrossRef Search ADS Revilla M. , Ochoa C. ( 2014 ), “ What Are the Links in a Web Survey among Response Time, Quality, and Auto-Evaluation of the Efforts Done? ,” Social Science Computer Review , 33 (1), 97 – 114 . Google Scholar CrossRef Search ADS Roberts C. , Gillian E. , Allum N. , Lynn P. ( 2010 ), “Data Quality in Telephone Surveys and the Effect of Questionnaire Length: A Cross-National Experiment,” ISER Working Paper Series, No. 2010-36. Schonlau M. , Toepoel V. ( 2015 ), “ Straightlining in Web Survey Panels over Time ,” Survey Research Methods , 9 (2), 125 – 137 . Struminskaya B. , Kaczmirek L. , Schaurer I. , Bandilla W. , ( 2014 ), “Assessing Representativeness of a Probability-Based Online Panel in Germany,” in Online Panel Research: A Data Quality Perspective , eds. Callegaro M. , Baker R. , Bethlehem J. , Göritz A. S. , Krosnick J. A. , Lavrakas P. J. , pp. 61 – 84 , New York : Wiley . Google Scholar CrossRef Search ADS Struminskaya B. , Weyandt K. , Bosnjak M. ( 2015 ), “ The Effects of Questionnaire Completion Using Mobile Devices on Data Quality. Evidence from a Probability-Based General Population Panel ,” Methods, Data, Analyses , 9 (2), 261 – 292 . Toepoel V. , Das M. , Van Soest A. ( 2009 ), “ Design of Web Questionnaires: The Effects of the Number of Items Per Screen ,” Field Methods , 21 , 200 – 213 . Google Scholar CrossRef Search ADS Tourangeau R. , Rips L. J. , Rasinski K. ( 2000 ), The Psychology of Survey Response , Cambridge : Cambridge University Press . Google Scholar CrossRef Search ADS Tourangeau R. , Couper M. P. , Conrad Frederick ( 2004 ), “ Spacing, Position, and Order. Interpretive Heuristics for Visual Features of Survey Questions ,” Public Opinion Quarterly , 68 (3), 368 – 393 . Google Scholar CrossRef Search ADS Zhang C. , Conrad F. G. ( 2014 ), “ Speeding in Web Surveys: The Tendency to Answer Very Fast and Its Association with Straightlining ,” Survey Research Methods , 8 (2), 127 – 135 . Appendix Table 1. Linear and Logistic Regressions on Indicators of Response Quality (Experiment 1) (1) (2) (3) Straightlining (0/1) Differentiation (Pd) Nonsubstantive responses (0/1) Gender: male Ref. Ref. Ref.  female 0.486* –0.038* 0.599*** (0.197) (0.016) (0.138) Age –0.026*** 0.002*** –0.016** (0.007) (0.001) (0.005) Ability: low Ref. Ref. Ref.  intermediate –0.575** 0.055** 0.098 (0.215) (0.019) (0.160)  high –1.565*** 0.140*** –0.148 (0.299) (0.021) (0.189) Motivation –1.485*** 0.154*** –2.959*** (0.392) (0.032) (0.281) Single items –0.751** 0.078*** –0.495* (0.270) (0.022) (0.222) Gender: female * single items –0.522 0.046* 0.219 (0.297) (0.022) (0.221) Age * single items 0.003 –0.001 –0.008 (0.011) (0.001) (0.008) Ability: intermediate * single items 0.743* –0.057* –0.396 (0.325) (0.026) (0.246) Ability: high * single items 0.716 –0.068* –0.778* (0.442) (0.028) (0.307) Motivation * single items 0.690 –0.101* –0.886 (0.594) (0.045) (0.456) Constant –1.260*** 0.499*** –1.280*** (0.175) (0.016) (0.144) N 1,880 1,880 2,491 McFadden’s R2 0.083 0.186 Adjusted R2 0.074 (1) (2) (3) Straightlining (0/1) Differentiation (Pd) Nonsubstantive responses (0/1) Gender: male Ref. Ref. Ref.  female 0.486* –0.038* 0.599*** (0.197) (0.016) (0.138) Age –0.026*** 0.002*** –0.016** (0.007) (0.001) (0.005) Ability: low Ref. Ref. Ref.  intermediate –0.575** 0.055** 0.098 (0.215) (0.019) (0.160)  high –1.565*** 0.140*** –0.148 (0.299) (0.021) (0.189) Motivation –1.485*** 0.154*** –2.959*** (0.392) (0.032) (0.281) Single items –0.751** 0.078*** –0.495* (0.270) (0.022) (0.222) Gender: female * single items –0.522 0.046* 0.219 (0.297) (0.022) (0.221) Age * single items 0.003 –0.001 –0.008 (0.011) (0.001) (0.008) Ability: intermediate * single items 0.743* –0.057* –0.396 (0.325) (0.026) (0.246) Ability: high * single items 0.716 –0.068* –0.778* (0.442) (0.028) (0.307) Motivation * single items 0.690 –0.101* –0.886 (0.594) (0.045) (0.456) Constant –1.260*** 0.499*** –1.280*** (0.175) (0.016) (0.144) N 1,880 1,880 2,491 McFadden’s R2 0.083 0.186 Adjusted R2 0.074 * p < 0.05, **p < 0.01, ***p < 0.001. Note.— Model 1 and 3 are logistic regressions, and Model 2 is a linear regression. Cell entries are unstandardized regression coefficients with standard errors in parentheses. Table 1. Linear and Logistic Regressions on Indicators of Response Quality (Experiment 1) (1) (2) (3) Straightlining (0/1) Differentiation (Pd) Nonsubstantive responses (0/1) Gender: male Ref. Ref. Ref.  female 0.486* –0.038* 0.599*** (0.197) (0.016) (0.138) Age –0.026*** 0.002*** –0.016** (0.007) (0.001) (0.005) Ability: low Ref. Ref. Ref.  intermediate –0.575** 0.055** 0.098 (0.215) (0.019) (0.160)  high –1.565*** 0.140*** –0.148 (0.299) (0.021) (0.189) Motivation –1.485*** 0.154*** –2.959*** (0.392) (0.032) (0.281) Single items –0.751** 0.078*** –0.495* (0.270) (0.022) (0.222) Gender: female * single items –0.522 0.046* 0.219 (0.297) (0.022) (0.221) Age * single items 0.003 –0.001 –0.008 (0.011) (0.001) (0.008) Ability: intermediate * single items 0.743* –0.057* –0.396 (0.325) (0.026) (0.246) Ability: high * single items 0.716 –0.068* –0.778* (0.442) (0.028) (0.307) Motivation * single items 0.690 –0.101* –0.886 (0.594) (0.045) (0.456) Constant –1.260*** 0.499*** –1.280*** (0.175) (0.016) (0.144) N 1,880 1,880 2,491 McFadden’s R2 0.083 0.186 Adjusted R2 0.074 (1) (2) (3) Straightlining (0/1) Differentiation (Pd) Nonsubstantive responses (0/1) Gender: male Ref. Ref. Ref.  female 0.486* –0.038* 0.599*** (0.197) (0.016) (0.138) Age –0.026*** 0.002*** –0.016** (0.007) (0.001) (0.005) Ability: low Ref. Ref. Ref.  intermediate –0.575** 0.055** 0.098 (0.215) (0.019) (0.160)  high –1.565*** 0.140*** –0.148 (0.299) (0.021) (0.189) Motivation –1.485*** 0.154*** –2.959*** (0.392) (0.032) (0.281) Single items –0.751** 0.078*** –0.495* (0.270) (0.022) (0.222) Gender: female * single items –0.522 0.046* 0.219 (0.297) (0.022) (0.221) Age * single items 0.003 –0.001 –0.008 (0.011) (0.001) (0.008) Ability: intermediate * single items 0.743* –0.057* –0.396 (0.325) (0.026) (0.246) Ability: high * single items 0.716 –0.068* –0.778* (0.442) (0.028) (0.307) Motivation * single items 0.690 –0.101* –0.886 (0.594) (0.045) (0.456) Constant –1.260*** 0.499*** –1.280*** (0.175) (0.016) (0.144) N 1,880 1,880 2,491 McFadden’s R2 0.083 0.186 Adjusted R2 0.074 * p < 0.05, **p < 0.01, ***p < 0.001. Note.— Model 1 and 3 are logistic regressions, and Model 2 is a linear regression. Cell entries are unstandardized regression coefficients with standard errors in parentheses. Table 2. Mean Differences in the Substantive Answers to Five Items (Experiment 1) Items MGrid MSingle Δ t p Item A 5.19 5.16 –0.03 –0.271 0.786 Item B 6.67 6.53 –0.14 –1.231 0.218 Item C 6.02 5.92 –0.10 –0.970 0.332 Item D 6.77 6.78 0.01 0.078 0.938 Item E 5.77 5.74 –0.03 –0.300 0.764 Items MGrid MSingle Δ t p Item A 5.19 5.16 –0.03 –0.271 0.786 Item B 6.67 6.53 –0.14 –1.231 0.218 Item C 6.02 5.92 –0.10 –0.970 0.332 Item D 6.77 6.78 0.01 0.078 0.938 Item E 5.77 5.74 –0.03 –0.300 0.764 Table 2. Mean Differences in the Substantive Answers to Five Items (Experiment 1) Items MGrid MSingle Δ t p Item A 5.19 5.16 –0.03 –0.271 0.786 Item B 6.67 6.53 –0.14 –1.231 0.218 Item C 6.02 5.92 –0.10 –0.970 0.332 Item D 6.77 6.78 0.01 0.078 0.938 Item E 5.77 5.74 –0.03 –0.300 0.764 Items MGrid MSingle Δ t p Item A 5.19 5.16 –0.03 –0.271 0.786 Item B 6.67 6.53 –0.14 –1.231 0.218 Item C 6.02 5.92 –0.10 –0.970 0.332 Item D 6.77 6.78 0.01 0.078 0.938 Item E 5.77 5.74 –0.03 –0.300 0.764 Table 3. Linear and Logistic Regressions on Indicators of Response Quality (Experiment 2) (1) (2) Straightlining (0/1) Differentiation (Pd) Gender: male Ref. Ref.  female 0.136 –0.034 (0.639) (0.027) Age 0.051* –0.000 (0.026) (0.001) Ability: low Ref. Ref.  intermediate –3.487*** 0.128*** (0.948) (0.032)  high –2.525** 0.165*** (0.822) (0.034) Motivation –8.394*** 0.227*** (1.892) (0.060) Single items –1.794 0.096** (0.955) (0.035) Gender: female * single items 0.817 0.007 (0.985) (0.036) Age * single items –0.103* 0.002 (0.042) (0.001) Ability: intermediate * single items 2.235 –0.064 (1.366) (0.043) Ability: high * single items 1.979 –0.076 (1.124) (0.045) Motivation * single items 5.074* –0.197* (2.516) (0.081) Constant –2.136*** 0.553*** (0.548) (0.027) N 511 509 McFadden’s R2 0.369 Adjusted R2 0.133 (1) (2) Straightlining (0/1) Differentiation (Pd) Gender: male Ref. Ref.  female 0.136 –0.034 (0.639) (0.027) Age 0.051* –0.000 (0.026) (0.001) Ability: low Ref. Ref.  intermediate –3.487*** 0.128*** (0.948) (0.032)  high –2.525** 0.165*** (0.822) (0.034) Motivation –8.394*** 0.227*** (1.892) (0.060) Single items –1.794 0.096** (0.955) (0.035) Gender: female * single items 0.817 0.007 (0.985) (0.036) Age * single items –0.103* 0.002 (0.042) (0.001) Ability: intermediate * single items 2.235 –0.064 (1.366) (0.043) Ability: high * single items 1.979 –0.076 (1.124) (0.045) Motivation * single items 5.074* –0.197* (2.516) (0.081) Constant –2.136*** 0.553*** (0.548) (0.027) N 511 509 McFadden’s R2 0.369 Adjusted R2 0.133 * p < 0.05, **p < 0.01, ***p < 0.001. Note.— Model 1 is a logistic and Model 2 a linear regression. Cell entries are unstandardized regression coefficients with standard errors in parentheses. Table 3. Linear and Logistic Regressions on Indicators of Response Quality (Experiment 2) (1) (2) Straightlining (0/1) Differentiation (Pd) Gender: male Ref. Ref.  female 0.136 –0.034 (0.639) (0.027) Age 0.051* –0.000 (0.026) (0.001) Ability: low Ref. Ref.  intermediate –3.487*** 0.128*** (0.948) (0.032)  high –2.525** 0.165*** (0.822) (0.034) Motivation –8.394*** 0.227*** (1.892) (0.060) Single items –1.794 0.096** (0.955) (0.035) Gender: female * single items 0.817 0.007 (0.985) (0.036) Age * single items –0.103* 0.002 (0.042) (0.001) Ability: intermediate * single items 2.235 –0.064 (1.366) (0.043) Ability: high * single items 1.979 –0.076 (1.124) (0.045) Motivation * single items 5.074* –0.197* (2.516) (0.081) Constant –2.136*** 0.553*** (0.548) (0.027) N 511 509 McFadden’s R2 0.369 Adjusted R2 0.133 (1) (2) Straightlining (0/1) Differentiation (Pd) Gender: male Ref. Ref.  female 0.136 –0.034 (0.639) (0.027) Age 0.051* –0.000 (0.026) (0.001) Ability: low Ref. Ref.  intermediate –3.487*** 0.128*** (0.948) (0.032)  high –2.525** 0.165*** (0.822) (0.034) Motivation –8.394*** 0.227*** (1.892) (0.060) Single items –1.794 0.096** (0.955) (0.035) Gender: female * single items 0.817 0.007 (0.985) (0.036) Age * single items –0.103* 0.002 (0.042) (0.001) Ability: intermediate * single items 2.235 –0.064 (1.366) (0.043) Ability: high * single items 1.979 –0.076 (1.124) (0.045) Motivation * single items 5.074* –0.197* (2.516) (0.081) Constant –2.136*** 0.553*** (0.548) (0.027) N 511 509 McFadden’s R2 0.369 Adjusted R2 0.133 * p < 0.05, **p < 0.01, ***p < 0.001. Note.— Model 1 is a logistic and Model 2 a linear regression. Cell entries are unstandardized regression coefficients with standard errors in parentheses. Table 4. Mean Differences in the Substantive Answers to Seven Items (Experiment 2) Items MGrid MSingle ΔM t p Item A 6.00 5.91 –0.09 –0.321 0.748 Item B 4.98 4.63 –0.35 –1.227 0.220 Item C 6.46 6.48 0.01 0.046 0.963 Item D 3.91 3.78 –0.13 –0.510 0.610 Item E 3.90 3.85 –0.05 –0.195 0.846 Item F 6.08 6.16 0.08 0.286 0.775 Item G 3.40 3.60 0.20 0.829 0.408 Items MGrid MSingle ΔM t p Item A 6.00 5.91 –0.09 –0.321 0.748 Item B 4.98 4.63 –0.35 –1.227 0.220 Item C 6.46 6.48 0.01 0.046 0.963 Item D 3.91 3.78 –0.13 –0.510 0.610 Item E 3.90 3.85 –0.05 –0.195 0.846 Item F 6.08 6.16 0.08 0.286 0.775 Item G 3.40 3.60 0.20 0.829 0.408 Table 4. Mean Differences in the Substantive Answers to Seven Items (Experiment 2) Items MGrid MSingle ΔM t p Item A 6.00 5.91 –0.09 –0.321 0.748 Item B 4.98 4.63 –0.35 –1.227 0.220 Item C 6.46 6.48 0.01 0.046 0.963 Item D 3.91 3.78 –0.13 –0.510 0.610 Item E 3.90 3.85 –0.05 –0.195 0.846 Item F 6.08 6.16 0.08 0.286 0.775 Item G 3.40 3.60 0.20 0.829 0.408 Items MGrid MSingle ΔM t p Item A 6.00 5.91 –0.09 –0.321 0.748 Item B 4.98 4.63 –0.35 –1.227 0.220 Item C 6.46 6.48 0.01 0.046 0.963 Item D 3.91 3.78 –0.13 –0.510 0.610 Item E 3.90 3.85 –0.05 –0.195 0.846 Item F 6.08 6.16 0.08 0.286 0.775 Item G 3.40 3.60 0.20 0.829 0.408 Table 5. Wording and Response Options of the Questions on Gender, Age, Education, and Interest in Politics (Experiment 1 & 2) Question Question wording Response scale Gender Please state your gender. (1) male (2) female Age Please enter the year you were born in. open-ended question that permitted numerical responses in the range of 1900 to 1994 in Experiment 1 and 1900 to 1995 in Experiment 2 Education What general school-leaving certificate do you have? (1) finished school without school-leaving certificate (2) lowest formal qualification of Germany’s tripartite secondary school system, after 8 or 9 years of schooling (3) intermediary secondary qualification, after 10 years of schooling (4) certificate fulfilling entrance requirements to study at a polytechnical college (5) higher qualification, entitling holders to study at a university (6) university degree (7) I am still in high school Interest in politics In general terms: how interested in politics are you? (1) very interested (2) interested (3) moderately interested (4) slightly interested (5) not interested at all (99) no answer Question Question wording Response scale Gender Please state your gender. (1) male (2) female Age Please enter the year you were born in. open-ended question that permitted numerical responses in the range of 1900 to 1994 in Experiment 1 and 1900 to 1995 in Experiment 2 Education What general school-leaving certificate do you have? (1) finished school without school-leaving certificate (2) lowest formal qualification of Germany’s tripartite secondary school system, after 8 or 9 years of schooling (3) intermediary secondary qualification, after 10 years of schooling (4) certificate fulfilling entrance requirements to study at a polytechnical college (5) higher qualification, entitling holders to study at a university (6) university degree (7) I am still in high school Interest in politics In general terms: how interested in politics are you? (1) very interested (2) interested (3) moderately interested (4) slightly interested (5) not interested at all (99) no answer Note.— Authors’ translations of the question and the response option wording. The sixth response option of the question on education (university degree) was not provided in Experiment 2. Table 5. Wording and Response Options of the Questions on Gender, Age, Education, and Interest in Politics (Experiment 1 & 2) Question Question wording Response scale Gender Please state your gender. (1) male (2) female Age Please enter the year you were born in. open-ended question that permitted numerical responses in the range of 1900 to 1994 in Experiment 1 and 1900 to 1995 in Experiment 2 Education What general school-leaving certificate do you have? (1) finished school without school-leaving certificate (2) lowest formal qualification of Germany’s tripartite secondary school system, after 8 or 9 years of schooling (3) intermediary secondary qualification, after 10 years of schooling (4) certificate fulfilling entrance requirements to study at a polytechnical college (5) higher qualification, entitling holders to study at a university (6) university degree (7) I am still in high school Interest in politics In general terms: how interested in politics are you? (1) very interested (2) interested (3) moderately interested (4) slightly interested (5) not interested at all (99) no answer Question Question wording Response scale Gender Please state your gender. (1) male (2) female Age Please enter the year you were born in. open-ended question that permitted numerical responses in the range of 1900 to 1994 in Experiment 1 and 1900 to 1995 in Experiment 2 Education What general school-leaving certificate do you have? (1) finished school without school-leaving certificate (2) lowest formal qualification of Germany’s tripartite secondary school system, after 8 or 9 years of schooling (3) intermediary secondary qualification, after 10 years of schooling (4) certificate fulfilling entrance requirements to study at a polytechnical college (5) higher qualification, entitling holders to study at a university (6) university degree (7) I am still in high school Interest in politics In general terms: how interested in politics are you? (1) very interested (2) interested (3) moderately interested (4) slightly interested (5) not interested at all (99) no answer Note.— Authors’ translations of the question and the response option wording. The sixth response option of the question on education (university degree) was not provided in Experiment 2. Figure 6. View largeDownload slide Screenshot of the question in the grid design (Experiment 1). Note.—Authors’ translations of the question and the response option wording. Figure 6. View largeDownload slide Screenshot of the question in the grid design (Experiment 1). Note.—Authors’ translations of the question and the response option wording. Figure 7. View largeDownload slide Screenshot of the first item in the single-item design (Experiment 1).Note.—Authors’ translations of the question and the response option wording. Figure 7. View largeDownload slide Screenshot of the first item in the single-item design (Experiment 1).Note.—Authors’ translations of the question and the response option wording. Figure 8. View largeDownload slide Screenshot of the question in the grid design (Experiment 2).Note.—Authors’ translations of the question and the response option wording. Figure 8. View largeDownload slide Screenshot of the question in the grid design (Experiment 2).Note.—Authors’ translations of the question and the response option wording. Figure 9. View largeDownload slide Screenshot of the first item in the single-item design (Experiment 2).Note.— Authors’ translations of the question and the response option wording. Figure 9. View largeDownload slide Screenshot of the first item in the single-item design (Experiment 2).Note.— Authors’ translations of the question and the response option wording. © The Author 2017. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

Journal

Journal of Survey Statistics and MethodologyOxford University Press

Published: Oct 5, 2017

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off