Motivated Misreporting in Web Panels

Motivated Misreporting in Web Panels Previous studies of reporting to filter questions have shown that respondents learn to say “No” to filter questions to shorten the interview, a phenomenon called motivated misreporting. Similar learning effects have been observed in panel surveys: respondents seem to recall the structure of a survey from earlier waves and, in subsequent waves, give responses that shorten the interview. Hence, concerns arise that misreporting to filter questions worsens over time in a panel study. We conducted an experiment using filter questions in two consecutive waves of a monthly online panel to study how misreporting to filter questions changes over time. While we replicate previous findings on the filter question format effect, we do not find any support for the hypothesis that responses to filter questions worsen over time. Our findings add to the literature on data quality in web panels, panel conditioning, and motivated misreporting. 1. BACKGROUND Motivated misreporting refers to the phenomenon whereby respondents deliberately give inaccurate survey responses to reduce the burden of the survey. Motivated misreporting has been shown either within a single survey or over two or more waves of a panel survey, but we are not aware of any study that has analyzed motivated misreporting within and between waves of a panel study. We test whether motivated misreporting worsens over two waves of a panel survey. To begin with, we review relevant findings and theoretical explanations in the literature. Many surveys use filter questions to determine respondent eligibility for follow-up questions. Their purpose is to reduce burden for respondents by asking only relevant questions. Two common formats are used for these filter questions. In the interleafed format, respondents are asked a filter question, and the follow-up questions, if triggered, follow immediately. In the grouped format, respondents are first asked all filters before answering the follow-ups that apply. Several studies have shown that respondents trigger fewer follow-ups in the interleafed format than in the grouped format (Kessler, Wittchen, Abelson, McGonagle, Schwarz, et al. 1998; Duan, Alegria, Canino, McGuire, Takeuchi 2007; Kreuter, McCulloch, Presser, Tourangeau 2011; Eckman, Kreuter, Kirchner, Jäckle, Tourangeau, et al. 2014). In the interleafed format, respondents learn how the survey is structured and begin to say “No” to the filter questions to avoid the follow-up questions. Comparisons with administrative records have shown that the grouped format collects more accurate reports (Eckman et al. 2014). Consistent with respondents’ learning, the difference in reporting between the two filter question formats is larger for items asked later in a survey (Kreuter et al. 2011). Duan et al. (2007) call such misreporting due to learning the structure of the interview survey conditioning, because respondents’ reporting behavior is conditioned by survey participation. A similar mechanism underlies one form of panel conditioning. Respondents may learn in an earlier wave how the questionnaire is structured and use this information to misreport. They then become worse reporters over time in a panel survey (Bailar 1989; Waterton and Lievesley 1989; Cantor 2010; Yan and Eckman 2012). Other forms of panel conditioning are also possible, such as respondents becoming better reporters (Kroh, Winter, Schupp 2016) or panel conditioning resulting in changes in respondents’ attitudes (Warren and Halpern-Manners 2012) or behavior (Bach and Eckman 2017). However, given the evidence from survey conditioning, (i.e., respondents’ tendency to become worse reporters due to learning the structure of the questionnaire), we focus on the worse respondents mechanism in this study.1 Halpern-Manners and Warren (2012) find evidence for the worse reporters hypothesis in the Current Population Survey. Some respondents interviewed for the second time overreport employment. The authors suspect that respondents remember from the first round that reporting unemployment triggers extensive follow-up questions and thus misreport unemployment in the second wave to skip follow-up questions. However, given the topic of unemployment, social desirability may also influence respondents’ reports. Additional support for the worse reporters hypothesis has been found in reports of home alteration and repair jobs (Neter and Waksberg 1964), functional limitations of elderly people (Mathiowetz and Lair 1994), and everyday personal hygiene product use (Nancarrow and Cartwright 2007). Schonlau and Toepoel (2015) find support for the worse reporters hypothesis in the Longitudinal Internet Studies for the Social Sciences (LISS) panel, which also provides the data for our analysis. They show that straightlining, (i.e., the tendency to give the same responses to a series of questions with identical answer choices) increases with respondents’ panel experience. On the other hand, using different studies, Cohen and Burt (1985) and Struminskaya (2016) find no evidence that panel respondents become worse reporters over time. Regarding the likelihood of the occurrence of panel conditioning, van Landeghem (2012) reports that panel conditioning becomes more likely the higher the number of exposures to the same survey. Moreover, Halpern-Manners, Warren, and Torche (2014) report that panel conditioning is more likely the shorter the interval between waves. Although the studies cited above have analyzed panel and survey conditioning separately, the literature falls short of testing these two effects jointly. Given the consistent findings on survey conditioning and the evidence from previous studies of panel conditioning, a careful test of both phenomena in a joint context is needed. A joint test is also desirable because both forms of conditioning share the same theoretical mechanism, a desire by respondents to reduce the burden of the survey. In this paper, we use a two-by-two design where half of the respondents in wave one receives the questions in the interleafed format and half of the respondents receives the questions in the grouped format. In wave two, half of the respondents is assigned to the same format as in wave one and the other half switches to the other format. In this study, we test three hypotheses. We first replicate findings on misreporting to filter questions (i.e., survey conditioning). In line with previous research, we expect that the percentage of filter questions triggered will be greater in a grouped format than in an interleafed format in wave one. We call this first hypothesis the survey conditioning hypothesis. In addition to replicating survey conditioning, we explore whether survey conditioning depends on the education of a respondent. To date, there is no evidence regarding an interaction between survey conditioning and education or ability. Second, we aim to determine whether motivated misreporting persists over waves. Kreuter et al. (2011) show that misreporting in an interleafed format is worse in later items in a survey and conclude that respondents learn how the survey is structured. When respondents are presented with the same questions in wave two, they already know the structure and can misreport from the very first filter question. Thus, we hypothesize that respondents become worse respondents (i.e., the percentage of filters triggered will be smaller in the second wave among respondents interviewed in the same format in both waves). We call this the panel conditioning hypothesis. Respondents in the interleafed format can learn in wave one and underreport in wave two. Respondents in the grouped format have no opportunity to learn in wave one, but they may nevertheless recall the structure in wave two. Consequently, we expect a decrease in filter questions triggered in wave two among respondents interviewed in the same format in both waves. Third, we study how panel conditioning affects misreporting when respondents are interviewed in different formats in each wave (i.e., when the format changes from grouped to interleafed—and vice versa—over time). We refer to the effects of changing the format over time as the changed format panel conditioning hypothesis. Recognizing response patterns in this context and disentangling panel and survey conditioning (i.e., changes in reporting due to changing the format) is harder, since both panel conditioning and the format may influence responses at the same time, possibly even in different directions. If the questions’ formatting is switched from grouped to interleafed, misreporting should increase over time, due to both survey conditioning and panel conditioning. However, when switching from interleafed to grouped, the effect is somewhat more complex. We expect that respondents recall the questionnaire from the first wave, and thus, due to panel conditioning, misreporting should increase. This is the worse reporters mechanism. At the same time, however, the grouped format usually collects more reports. Thus, we would expect a decrease in misreporting after switching the format. Consequently, we cannot predict whether misreporting will increase or decrease over time for this group. Theory does not predict whether gains in data quality from changing to the better grouped format are outweighed by losses due to panel conditioning. Yet, this question is especially important for panel surveys using the interleafed format that wish to change to the grouped format to reduce motivated misreporting.2Table 1 summarizes the second and third hypotheses and the expected effects on the level of misreporting. Table 1. Panel Conditioning Hypotheses Filter question format   Hypothesis  Effect on misreporting  Wave one  Wave two  Interleafed  Interleafed  Panel conditioning  Increasea  Grouped  Grouped  Panel conditioning  Increasea  Grouped  Interleafed  Changed format panel cond.  Increasea  Interleafed  Grouped  Changed format panel cond.  Unclear  Filter question format   Hypothesis  Effect on misreporting  Wave one  Wave two  Interleafed  Interleafed  Panel conditioning  Increasea  Grouped  Grouped  Panel conditioning  Increasea  Grouped  Interleafed  Changed format panel cond.  Increasea  Interleafed  Grouped  Changed format panel cond.  Unclear  a Corresponds to a decrease in the number of filters triggered. Table 1. Panel Conditioning Hypotheses Filter question format   Hypothesis  Effect on misreporting  Wave one  Wave two  Interleafed  Interleafed  Panel conditioning  Increasea  Grouped  Grouped  Panel conditioning  Increasea  Grouped  Interleafed  Changed format panel cond.  Increasea  Interleafed  Grouped  Changed format panel cond.  Unclear  Filter question format   Hypothesis  Effect on misreporting  Wave one  Wave two  Interleafed  Interleafed  Panel conditioning  Increasea  Grouped  Grouped  Panel conditioning  Increasea  Grouped  Interleafed  Changed format panel cond.  Increasea  Interleafed  Grouped  Changed format panel cond.  Unclear  a Corresponds to a decrease in the number of filters triggered. In addition to testing these three hypotheses, we explore whether both survey conditioning and panel conditioning depend on respondents’ cognitive abilities. Evidence from studies on response order effects, non-differentiation, and satisficing show that respondents with low education are most susceptible to such effects (e.g., Krosnick and Alwin 1987; Narayan and Krosnick 1996; Holbrook, Krosnick, Moore, Tourangeau 2007). Similarly, Binswanger, Schunk, Toepoel (2013) find a panel conditioning effect in difficult attitudinal questions only among low educated respondents. In our case, survey conditioning may, on the one hand, increase with cognitive ability, because respondents with higher cognitive abilities may be faster in discovering repetitive patterns of the questionnaire. On the other hand, respondents with higher cognitive abilities might instead show less misreporting in the interleafed format as they may be more aware of the importance of accurate survey reports for scientific purposes. The same arguments may hold for panel conditioning: respondents with higher cognitive abilities may be better at recalling having answered the same questionnaire one month ago. However, their better understanding of the importance of accurate survey reports for research may again counteract. 2. THE STUDY The data we use for our analysis of survey and panel conditioning comes from the Dutch LISS panel, a longstanding Internet panel based on a probability sample. Sample members complete questionnaires of about 15 to 30 minutes on a monthly basis (Scherpenzeel and Das 2010). In 2012, we put several filter question experiments in two consecutive waves of the LISS panel. In the first wave (April 2012), LISS participants (n = 3330) were randomly assigned to either the interleafed or grouped filter question format. In the second wave (May 2012), all panel members were again randomly assigned to one of the two formats. The resulting design has four cells (two formats by two waves), as shown in table 5.3 Table 5. Percent of Filter Questions Triggered, by Experimental Condition Format   Percent of filters triggered   n  |Z-statistic|a  p-value  Wave one  Wave two  Wave one  Wave two  Interleafed  Interleafed  36.1 (0.007)  36.5 (0.007)  571  0.86  0.353  Grouped  Grouped  43.3 (0.007)  43.4 (0.007)  565  0.00  0.948  Grouped  Interleafed  42.8 (0.008)  35.0 (0.007)  517  116.34  <0.001  Interleafed  Grouped  36.4 (0.007)  43.3 (0.008)  509  107.87  <0.001  Format   Percent of filters triggered   n  |Z-statistic|a  p-value  Wave one  Wave two  Wave one  Wave two  Interleafed  Interleafed  36.1 (0.007)  36.5 (0.007)  571  0.86  0.353  Grouped  Grouped  43.3 (0.007)  43.4 (0.007)  565  0.00  0.948  Grouped  Interleafed  42.8 (0.008)  35.0 (0.007)  517  116.34  <0.001  Interleafed  Grouped  36.4 (0.007)  43.3 (0.008)  509  107.87  <0.001  Note.— Standard errors (in parentheses) clustered at the respondent level. a H0: % triggered interleafed = % triggered grouped. Table 5. Percent of Filter Questions Triggered, by Experimental Condition Format   Percent of filters triggered   n  |Z-statistic|a  p-value  Wave one  Wave two  Wave one  Wave two  Interleafed  Interleafed  36.1 (0.007)  36.5 (0.007)  571  0.86  0.353  Grouped  Grouped  43.3 (0.007)  43.4 (0.007)  565  0.00  0.948  Grouped  Interleafed  42.8 (0.008)  35.0 (0.007)  517  116.34  <0.001  Interleafed  Grouped  36.4 (0.007)  43.3 (0.008)  509  107.87  <0.001  Format   Percent of filters triggered   n  |Z-statistic|a  p-value  Wave one  Wave two  Wave one  Wave two  Interleafed  Interleafed  36.1 (0.007)  36.5 (0.007)  571  0.86  0.353  Grouped  Grouped  43.3 (0.007)  43.4 (0.007)  565  0.00  0.948  Grouped  Interleafed  42.8 (0.008)  35.0 (0.007)  517  116.34  <0.001  Interleafed  Grouped  36.4 (0.007)  43.3 (0.008)  509  107.87  <0.001  Note.— Standard errors (in parentheses) clustered at the respondent level. a H0: % triggered interleafed = % triggered grouped. In the interleafed format, respondents were asked 13 filter questions in random order. Two follow-up questions were asked immediately after each filter, if applicable. In the grouped format, respondents were asked all 13 filter questions (in random order) before answering any applicable follow-ups. Apart from the filter questions and a handful of questions asking for respondents’ experience with the questionnaire, no other questions were asked in this module.4 Given the small number of questions, the median response time was a little more than three minutes in both waves and 95 percent of all respondents answered the questionnaire in less than eight minutes. All filter questions asked about purchases of items such as groceries, clothes, or tickets for movies during the last month (see the original questions in the Online Supplementary Materials). We chose these questions to ensure that most respondents triggered at least a few follow-up questions in each wave. Also, we chose items that should not be influenced by circumstances, such as seasonality, that would lead to a real change of purchasing behavior between the two waves of the experiment. Thus, we expect that any differences in reporting between the two waves are, on average, caused by a change in reporting and not by a change in behavior. 2.2 Nonresponse and Attrition About 79% of the LISS panel members selected for the study participated in the first wave of the experiment (American Association for Public Opinion Research (AAPOR) RR1). Conditional on wave one response, the participation rate in wave two is 82% (third column in table 2). For the analysis sample of our research questions, however, we discard those who responded only in wave one (n = 478) or only in wave two (n = 326). Panel attrition can easily be mistaken for panel conditioning, and disentangling the two effects is one of the major challenges when analyzing panel conditioning (Bach and Eckman 2017). Using only two wave respondents allows us to eliminate any confounding effects of attrition (Warren and Halpern-Manners 2012). Two additional respondents are excluded as they broke off the survey, one in wave one and one in wave two. The resulting analysis sample consists of 2,162 people (third column in table 2). Table 2. Response Rates by Filter Question Format and Wave Filter question format in wave one   Wave one   Wave two       n  Response ratea  nb  Response ratec  Interleafed  Respondents  1,303  78.9%  1,080  81.8%  Nonrespondents  347    242    Grouped  Respondents  1,337  81.0%  1,082  82.0%  Nonrespondents  313    238    Total number of respondents  2,640  79.3%  2,162  81.9%  |Z-statistic|d      1.48    0.10  p-value      0.139    0.919  Filter question format in wave one   Wave one   Wave two       n  Response ratea  nb  Response ratec  Interleafed  Respondents  1,303  78.9%  1,080  81.8%  Nonrespondents  347    242    Grouped  Respondents  1,337  81.0%  1,082  82.0%  Nonrespondents  313    238    Total number of respondents  2,640  79.3%  2,162  81.9%  |Z-statistic|d      1.48    0.10  p-value      0.139    0.919  a AAPOR RR1 b Among wave one respondents c AAPOR RR1, conditional on wave one response d H0: % triggered interleafed = % triggered grouped Table 2. Response Rates by Filter Question Format and Wave Filter question format in wave one   Wave one   Wave two       n  Response ratea  nb  Response ratec  Interleafed  Respondents  1,303  78.9%  1,080  81.8%  Nonrespondents  347    242    Grouped  Respondents  1,337  81.0%  1,082  82.0%  Nonrespondents  313    238    Total number of respondents  2,640  79.3%  2,162  81.9%  |Z-statistic|d      1.48    0.10  p-value      0.139    0.919  Filter question format in wave one   Wave one   Wave two       n  Response ratea  nb  Response ratec  Interleafed  Respondents  1,303  78.9%  1,080  81.8%  Nonrespondents  347    242    Grouped  Respondents  1,337  81.0%  1,082  82.0%  Nonrespondents  313    238    Total number of respondents  2,640  79.3%  2,162  81.9%  |Z-statistic|d      1.48    0.10  p-value      0.139    0.919  a AAPOR RR1 b Among wave one respondents c AAPOR RR1, conditional on wave one response d H0: % triggered interleafed = % triggered grouped Before we report the results of our analysis, however, we make sure that sub-setting the analysis sample to two-wave respondents does not confound the experimental manipulations regarding the filter question formats and, thus, our estimates of survey and panel conditioning. To do so, we check whether nonresponse in wave one and attrition differ by the two formats and waves. Regarding nonresponse in wave one, response rates do not differ between the two formats (second column in table 2), and we do not find any substantial or significant differences in sociodemographic variables between the formats (results not shown). Moreover, nonresponse over time (i.e., attrition) does not seem to be related to sociodemographic variables: we find only one substantial and significant difference between those who respond to both waves and those who participate in the first wave only. Attriters are younger, but do not differ from two-wave respondents in gender, education, marital status, household composition, or their response behavior in previous waves of the LISS panel (results not shown). Importantly, neither the filter question format in wave one, the number of filters triggered in wave one, nor their interaction is predictive of attrition (table 3). Wave two response rates do not differ between the two wave one formats (fourth column in table 2). Table 3. Predictors of Attrition   Coefficient  Grouped (ref.: interleafed)  0.031 (0.037)  No. of filters triggered in wave one  0.002 (0.005)  Grouped*No. of filters triggered in wave one  −0.002 (0.007)  n  2,640a    Coefficient  Grouped (ref.: interleafed)  0.031 (0.037)  No. of filters triggered in wave one  0.002 (0.005)  Grouped*No. of filters triggered in wave one  −0.002 (0.007)  n  2,640a  Note.— Linear probability model. Attrition (Yes/No). Standard errors in parentheses. Constant not shown. a All wave one respondents Table 3. Predictors of Attrition   Coefficient  Grouped (ref.: interleafed)  0.031 (0.037)  No. of filters triggered in wave one  0.002 (0.005)  Grouped*No. of filters triggered in wave one  −0.002 (0.007)  n  2,640a    Coefficient  Grouped (ref.: interleafed)  0.031 (0.037)  No. of filters triggered in wave one  0.002 (0.005)  Grouped*No. of filters triggered in wave one  −0.002 (0.007)  n  2,640a  Note.— Linear probability model. Attrition (Yes/No). Standard errors in parentheses. Constant not shown. a All wave one respondents Since all LISS participants were randomly assigned to the two filter question formats in wave two, we also test whether there are any important differences between the two-wave respondents and all respondents of wave two, (i.e., whether dropping some respondents from wave two interferes with the randomization to the filter question formats in wave two). After sub-setting the sample to two-wave respondents, we do not find any significant differences in sociodemographic variables or the number of filters triggered in wave one between the two formats in wave two (results not shown). Therefore, we are confident that nonresponse and attrition patterns do not differ among the subgroups. Importantly, attrition is not influenced by the experimental conditions or the filter question experience in wave one and thus should not bias the results of our analysis. Although the sample restrictions limit the representativeness of the data (external validity), they do not jeopardize our interpretation of the experimental results (internal validity). In the next two sections, we present and discuss the results of our analysis regarding our three hypotheses (i.e., whether respondents learn to misreport within a single wave of the survey and whether misreporting persists over two waves). 3. RESULTS As expected by the survey conditioning hypothesis, the percentage of filter questions triggered in wave one is significantly smaller in the interleafed format than in the grouped format: 36% versus 43% (i.e., 4.7 versus 5.6 filters triggered, see table 4). Survey conditioning takes place in wave one as we expected. When we also include the position of a filter question (i.e., whether a filter was asked in the first, second, or third third of the section) in the analysis, we find that the probability of triggering a filter in the interleafed format decreases significantly with the position of a filter in the questionnaire (results not shown). This finding further supports our first hypothesis that respondents learn to misreport in the interleafed format. Table 4. Percent of Filter Questions Triggered in Wave One by Format Format  Wave one % triggered  Interleafed  36.3 (0.005)  Grouped  43.1 (0.006)  Total  39.7 (0.004)  |Z-statistic|a  84.83  p-value  <0.001  N  2162  Format  Wave one % triggered  Interleafed  36.3 (0.005)  Grouped  43.1 (0.006)  Total  39.7 (0.004)  |Z-statistic|a  84.83  p-value  <0.001  N  2162  Note.— Standard errors (in parentheses) clustered at the respondent level. a H0: % triggered interleafed = % triggered grouped Table 4. Percent of Filter Questions Triggered in Wave One by Format Format  Wave one % triggered  Interleafed  36.3 (0.005)  Grouped  43.1 (0.006)  Total  39.7 (0.004)  |Z-statistic|a  84.83  p-value  <0.001  N  2162  Format  Wave one % triggered  Interleafed  36.3 (0.005)  Grouped  43.1 (0.006)  Total  39.7 (0.004)  |Z-statistic|a  84.83  p-value  <0.001  N  2162  Note.— Standard errors (in parentheses) clustered at the respondent level. a H0: % triggered interleafed = % triggered grouped Regarding the second hypothesis regarding panel conditioning, we expect that the percentage of filters triggered should decrease over time due to learning when filter questions are asked in the same format in each wave (rows one and two of table 1). Row one of table 5 shows that respondents who were interviewed in the interleafed format in both waves show no significant change in reporting across time. The same result holds for respondents interviewed in the grouped format in both waves. Thus, we do not find any support for the panel conditioning hypothesis: respondents do not trigger fewer filters when interviewed for the second time in the same format. Next, we turn to our third hypothesis and those respondents who changed formats from wave one to wave two (rows three and four of table 1). We expect that respondents switched from grouped to interleafed underreport in wave two due to panel conditioning (see table 1). Row three of table 5 shows, as expected, that these respondents trigger significantly fewer filters in wave two than in wave one and thus increase misreporting. Switching from interleafed to grouped, however, leads to a significant increase in filters triggered and thus less misreporting (row four). Both effects have about the same absolute size. Moreover, the effect size is approximately equal to the difference between the interleafed format and the grouped format in wave one. For this reason, we interpret the changes over time seen in the bottom rows of table 5 as driven by the format change and not by panel conditioning. Thus, we find no support for the third hypotheses that panel conditioning influences misreporting when the filter question format changes between waves. In both panel conditioning hypotheses, we hypothesize that respondents learn from participation in the first wave of the experiment how follow-up questions can be avoided. However, a few respondents (n = 48) did not trigger a single filter question in wave one. Therefore, these respondents could not learn. To assess whether this confounds the results shown in table 5, we rerun the analyses without these 48 respondents. Results without these respondents do not differ substantially from the results shown in table 5 (results not shown). Regarding the influence of cognitive ability, we find that survey conditioning does not depend on cognitive ability, measured by respondents’ highest educational degree achieved. In other words, the difference between filters triggered in the interleafed format and filters triggered in the grouped format does not vary with respondents’ education (results not shown). Similarly, we do not find a panel conditioning effect when replicating the results of table 5 in subgroups defined by respondents’ highest educational degree. We interpret these findings as evidence that survey conditioning and panel conditioning do not depend on respondents’ cognitive ability. To sum up the results presented here, we find strong evidence for survey conditioning. Fewer filters are triggered in the interleafed format than in the grouped format, additional evidence that respondents in the interleafed format learn how the survey is structured and use that information to reduce the burden of the survey by misreporting. However, there is no evidence that respondents remember the survey structure over time in a panel survey context and misreport more in the second wave. Therefore, there is no support for the worse respondents mechanism. 4. DISCUSSION Survey researchers are increasingly aware that asking filter questions in different formats can affect responses and measurement error. Studies on misreporting in cross-sectional surveys have shown that respondents asked in an interleafed format trigger fewer follow-ups than respondents interviewed in a grouped format. The common explanation for such survey conditioning is that respondents learn how a survey is structured and use this information to speed through the interview and reduce the burden of the survey. Similar concerns have been raised in the context of panel surveys: if respondents remember the structure of a survey from prior waves, they might recall that giving certain answers can reduce the length of the survey. We first hypothesized that respondents surveyed in the interleafed filter question format would misreport more than respondents surveyed in the grouped format. We found in both waves of our experiment that respondents in the interleafed format trigger fewer filters than respondents in the grouped format. Our interpretation of this finding is that misreporting is more pronounced in the interleafed format. Studies that compared responses to filter questions with validation records (see section 1) showed that the grouped format produces more correct responses. Thus, we are confident in our interpretation of the grouped format as more accurate than the interleafed format. These results are as expected from previous studies. In our second and third hypotheses, we suspected that we would see more misreporting in a second wave of a panel study. If respondents remember the structure of a questionnaire from prior participation, then misreporting in a second wave should increase: respondents would recognize the filter questions and trigger fewer filters in a second wave. In this study, the two waves were separated by only four weeks, which should make panel conditioning more likely (Warren and Halpern-Manners 2012; Halpern-Manners et al. 2014). However, we did not find support for either of the two panel conditioning hypotheses. Respondents interviewed in the same format in both waves do not misreport more in the second wave (the panel conditioning hypothesis). We did see differences in triggering rates for those respondents whose formats changed across waves (the changed format panel conditioning hypothesis). However, these changes in the levels of misreporting are likely only due to the format itself and not due to the worse respondents (or better respondents) mechanism. Interestingly, neither survey conditioning nor panel conditioning seem to vary with respondents’ cognitive abilities, as measured by respondents’ highest educational degrees. To ensure that our findings are not distorted by attrition, we considered only those respondents who participated in both waves of the survey. Furthermore, we chose a battery of filter questions that should not lead to changes-in-behavior panel conditioning. One might argue that the absence of panel conditioning is due to LISS respondents being very good or very motivated respondents who are not susceptible to motivated misreporting. Yet, the study by Schonlau and Toepoel (2015) regarding straightlining (see section 1) and the clear signs of misreporting in the interleafed format in wave one (see section 3) show that LISS respondents are as good (or bad) as any other respondents. Even so, we did not find a panel conditioning effect in our filter questions. This lack of a panel conditioning effect is somewhat surprising. While respondents show a strong tendency to remember the structure of a questionnaire in a single survey, they do not seem to recall this information when the same questions are repeated four weeks later. Even respondents in the interleafed format in wave one, many of whom misreported, no longer show any signs of misreporting when interviewed for a second time in a grouped format. The most likely explanation for this finding is the resetting effect reported by Kreuter et al. (2011). They find that respondents’ misreporting is reset when a new section within a questionnaire starts. A similar logic might apply to the panel survey context: if misreporting resets when a new section starts, then the same should hold for the start of a new wave in a panel survey. The degree of panel conditioning seems to depend on the topic and the burden of the questions asked in a survey (Warren and Halpern-Manners 2012). The questions we asked in the LISS panel were fairly easy to answer, dealt with a possibly boring topic (e.g., “What did you buy last month?”) and respondents needed only a few minutes to answer them (see section 2). In addition, LISS respondents were exposed to our questionnaire only twice and have answered many other, potentially more interesting, novel or burdensome LISS modules in the past. Responding to our filter questions module may not have been distinct from the general experience of participating in the LISS panel, and thus, respondents may not have remembered the questions (Tourangeau, Rips, Rasinksi 2000, chapter 3). Questions dealing with more burdensome or sensitive topics may produce different results, potentially even resulting in respondents becoming better reporters over time (compare the studies cited in section 1). Moreover, results may differ when respondents are exposed to the survey more than twice (see section 3). Thus, more research exploring the influence of topic and survey burden as well as the level of exposure to the survey on motivated misreporting and panel conditioning needs to be done, a point also made by Eckman et al. (2014). Our finding, however, that motivated misreporting does not increase over waves of a panel survey is good news for survey practitioners. Although misreporting can pose a serious problem, repeated participation in the same survey does not make the problem worse. Importantly, for panel surveys that used the interleafed format in the past, our work suggests that changing to a grouped format is a good idea, as it can help increase data quality by eliminating survey conditioning without introducing panel conditioning. Supplementary Materials Supplementary materials are available online at http://www.oxfordjournals.org/our_journals/jssam/. Acknowledgments The authors would like to thank Frauke Kreuter, Georg-Christoph Haas, the Editor-in-Chief, and three anonymous reviewers who provided helpful comments on earlier drafts of this paper. Carola Krimmer provided research assistance. The LISS panel data was collected by CentERdata (Tilburg University, The Netherlands) through its MESS project funded by the Netherlands Organization for Scientific Research. Footnotes 1 An extensive discussion of theoretical mechanisms underlying panel conditioning, their outcomes, and a review of relevant studies is provided by Warren and Halpern-Manners (2012). 2 Other panel surveys might consider a switch from the grouped to the interleafed format, for reasons we do not discuss here, for brevity. See Kreuter et al. (2018) for a discussion of the advantages of the interleafed format. 3 There was a third format included in the original experiment. However, we do not use these cases. Results concerning that format are reported in Kreuter et al. (2018). 4 While no other questions were asked in this module, respondents are usually invited to answer more than one module per month. We do not have any information on other modules our respondents answered in these months. REFERENCES Bach R. L., Eckman S. ( 2017), “Does Participating in a Panel Survey Change Respondents’ Labor Market Behavior?” IAB Discussion Paper, 15/2017, Available at http://doku.iab.de/discussionpapers/2017/dp1517.pdf, last accessed October 9, 2017. Bailar B. A. ( 1989), “Information Needs, Surveys, and Measurement Errors,” in Panel Surveys , eds. Kasprzyk D., Duncan G. J., Kalton G., Singh M. P., pp. 1– 24, New York: Wiley. Binswanger J., Schunk D., Toepoel V. ( 2013), “Panel Conditioning in Difficult Attitudinal Questions,” Public Opinion Quarterly , 77, 783– 797. Google Scholar CrossRef Search ADS   Cantor D. ( 2010), “A Review and Summary of Studies on Panel Conditioning,” in Handbook of Longitudinal Research. Design, Measurement and Analysis , eds. Menard S., pp. 123– 138, Amsterdam: Elsevier. Cohen S. B., Burt V. L. ( 1985), “Data Collection Frequency Effect in the National Medical Care Expenditure Survey,” Journal of Economic and Social Measurement , 13, 125– 151. Google Scholar PubMed  Duan N., Alegria M., Canino G., McGuire T., Takeuchi D. ( 2007), “Survey Conditioning in Self-Reported Mental Health Service Use: Randomized Comparison of Alternative Instrument Formats,” Health Research and Educational Trust , 42, 890– 907. Eckman S., Kreuter F., Kirchner A., Jäckle A., Tourangeau R., Presser S. ( 2014 ), “Assessing the Mechanisms of Misreporting to Filter Questions in Surveys,” Public Opinion Quarterly , 78, 721– 733. Google Scholar CrossRef Search ADS   Halpern-Manners A., Warren J. R. ( 2012 ), “Panel Conditioning in Longitudinal Studies: Evidence from Labor Force Items in the Current Population Survey,” Demography , 49, 1499– 1519. Google Scholar CrossRef Search ADS PubMed  Halpern-Manners A., Warren J. R., Torche F. ( 2014), “Panel Conditioning in Longitudinal Study of Illicit Behaviors,” Public Opinion Quarterly , 78, 565– 590. Google Scholar CrossRef Search ADS   Holbrook A. L., Krosnick J. A., Moore D., Tourangeau R. ( 2007), “Response Order Effects in Dichotomous Categorical Questions Presented Orally: The Impact of Question and Respondent Attributes,” Public Opinion Quarterly , 71, 1– 25. Google Scholar CrossRef Search ADS   Kessler R. C., Wittchen H-U., Abelson J. M., McGonagle K., Schwarz N., Kendler K. S., Knäuper B., Zhao S. ( 1998), “Methodological Studies of the Composite International Diagnostic Interview (CIDI) in the US National Comorbidity Survey (NCS),” International Journal of Methods in Psychiatric Research , 7, 33– 55. Google Scholar CrossRef Search ADS   Kreuter F., Eckman S., Tourangeau R. ( 2018), “Salience of Burden and Its Effects on Response Behavior to Skip Questions. Experimental Results from Telephone and Web-Surveys,” in Advances in Questionnaire Design, Development, Evaluation and Testing , eds. Beatty P., Collins D., Kaye L., Padilla J., Willis G., Wilmot A., Hoboken, NJ: Wiley. Kreuter F., McCulloch S., Presser S., Tourangeau R. ( 2011), “The Effects of Asking Filter Questions in Interleafed Versus Grouped Format,” Sociological Methods and Research , 40, 88– 104. Google Scholar CrossRef Search ADS   Kroh M., Winter F., Schupp J. ( 2016), “Using Person-Fit Measures to Assess the Impact of Panel Conditioning on Reliability,” Public Opinion Quarterly , 80, 914– 942. Google Scholar CrossRef Search ADS   Krosnick J. A., Alwin D. F. ( 1987), “An Evaluation of a Cognitive Theory of Response-Order Effects in Survey Measurement,” Public Opinion Quarterly , 51, 201– 219. Google Scholar CrossRef Search ADS   Mathiowetz N. A., Lair T. J. ( 1994), “Getting Better? Change or Error in the Measurement of Functional Limitations,” Journal of Economic and Social Measurement , 20, 237– 262. Nancarrow C., Cartwright T. ( 2007), “Online Access Panels and Tracking Research. The Conditioning Issue,” International Journal of Market Research , 49, 573– 594. Narayan S., Krosnick J. A. ( 1996), “Education Moderates Some Response Effects in Attitude Measurement,” Public Opinion Quarterly , 60, 58– 88. Google Scholar CrossRef Search ADS   Neter J., Waksberg J. ( 1964), “Conditioning Effects from Repeated Household Interviews,” Journal of Marketing , 28, 51– 56. Google Scholar CrossRef Search ADS   Scherpenzeel A., Das M. ( 2010), “True Longitudinal and Probability-Based Internet Panels: Evidence from the Netherlands,” in Social and Behavioral Research and the Internet: Advances in Applied Methods and Research Strategies , eds. Das M., Ester P., Kaczmirek L., pp. 77– 104, Boca Raton, FL: Taylor & Francis. Schonlau M., Toepoel V. ( 2015), “Straightlining in Web Survey Panels Over Time,” Survey Research Methods , 9, 125– 137. Struminskaya B. ( 2016), “Respondent Conditioning in Online Panel Surveys. Results of Two Field Experiments,” Social Science Computer Review , 34, 95– 115. Google Scholar CrossRef Search ADS   Tourangeau R., Rips L. J., Rasinksi K. ( 2000), The Psychology of Survey Response , Cambridge: Cambridge University Press. Google Scholar CrossRef Search ADS   van Landeghem B. ( 2012), “Panel Conditioning and Self-Reported Satisfaction: Evidence from International Panel Data and Repeated Cross-Sections,” SOEPpapers, 484, Available at https://www.diw.de/documents/publikationen/73/diw_01.c.408184.de/diw_sp0484.pdf, last accessed October 9, 2017. Warren J. R., Halpern-Manners A. ( 2012), “Panel Conditioning in Longitudinal Social Science Surveys,” Sociological Methods and Research , 41, 491– 534. Google Scholar CrossRef Search ADS   Waterton J., Lievesley D. ( 1989), “Evidence, of Conditioning Effects in the British Social Attitudes Panel,” in Panel Surveys , eds. Kasprzyk D., Duncan G. J., Kalton G., Singh M. P., pp. 319– 339, New York: Wiley. Yan T., Eckman S. ( 2012), “Panel Conditioning: Change in True Value versus Change in Self-Report,” Proceedings of the American Statistical Association, Survey Research Methods Section, pp. 4726–4736. © The Author 2017. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. All rights reserved. For Permissions, please email: journals.permissions@oup.com http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Survey Statistics and Methodology Oxford University Press

Motivated Misreporting in Web Panels

Loading next page...
 
/lp/ou_press/motivated-misreporting-in-web-panels-ZST09dz7kv
Publisher
Oxford University Press
Copyright
© The Author 2017. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. All rights reserved. For Permissions, please email: journals.permissions@oup.com
ISSN
2325-0984
eISSN
2325-0992
D.O.I.
10.1093/jssam/smx030
Publisher site
See Article on Publisher Site

Abstract

Previous studies of reporting to filter questions have shown that respondents learn to say “No” to filter questions to shorten the interview, a phenomenon called motivated misreporting. Similar learning effects have been observed in panel surveys: respondents seem to recall the structure of a survey from earlier waves and, in subsequent waves, give responses that shorten the interview. Hence, concerns arise that misreporting to filter questions worsens over time in a panel study. We conducted an experiment using filter questions in two consecutive waves of a monthly online panel to study how misreporting to filter questions changes over time. While we replicate previous findings on the filter question format effect, we do not find any support for the hypothesis that responses to filter questions worsen over time. Our findings add to the literature on data quality in web panels, panel conditioning, and motivated misreporting. 1. BACKGROUND Motivated misreporting refers to the phenomenon whereby respondents deliberately give inaccurate survey responses to reduce the burden of the survey. Motivated misreporting has been shown either within a single survey or over two or more waves of a panel survey, but we are not aware of any study that has analyzed motivated misreporting within and between waves of a panel study. We test whether motivated misreporting worsens over two waves of a panel survey. To begin with, we review relevant findings and theoretical explanations in the literature. Many surveys use filter questions to determine respondent eligibility for follow-up questions. Their purpose is to reduce burden for respondents by asking only relevant questions. Two common formats are used for these filter questions. In the interleafed format, respondents are asked a filter question, and the follow-up questions, if triggered, follow immediately. In the grouped format, respondents are first asked all filters before answering the follow-ups that apply. Several studies have shown that respondents trigger fewer follow-ups in the interleafed format than in the grouped format (Kessler, Wittchen, Abelson, McGonagle, Schwarz, et al. 1998; Duan, Alegria, Canino, McGuire, Takeuchi 2007; Kreuter, McCulloch, Presser, Tourangeau 2011; Eckman, Kreuter, Kirchner, Jäckle, Tourangeau, et al. 2014). In the interleafed format, respondents learn how the survey is structured and begin to say “No” to the filter questions to avoid the follow-up questions. Comparisons with administrative records have shown that the grouped format collects more accurate reports (Eckman et al. 2014). Consistent with respondents’ learning, the difference in reporting between the two filter question formats is larger for items asked later in a survey (Kreuter et al. 2011). Duan et al. (2007) call such misreporting due to learning the structure of the interview survey conditioning, because respondents’ reporting behavior is conditioned by survey participation. A similar mechanism underlies one form of panel conditioning. Respondents may learn in an earlier wave how the questionnaire is structured and use this information to misreport. They then become worse reporters over time in a panel survey (Bailar 1989; Waterton and Lievesley 1989; Cantor 2010; Yan and Eckman 2012). Other forms of panel conditioning are also possible, such as respondents becoming better reporters (Kroh, Winter, Schupp 2016) or panel conditioning resulting in changes in respondents’ attitudes (Warren and Halpern-Manners 2012) or behavior (Bach and Eckman 2017). However, given the evidence from survey conditioning, (i.e., respondents’ tendency to become worse reporters due to learning the structure of the questionnaire), we focus on the worse respondents mechanism in this study.1 Halpern-Manners and Warren (2012) find evidence for the worse reporters hypothesis in the Current Population Survey. Some respondents interviewed for the second time overreport employment. The authors suspect that respondents remember from the first round that reporting unemployment triggers extensive follow-up questions and thus misreport unemployment in the second wave to skip follow-up questions. However, given the topic of unemployment, social desirability may also influence respondents’ reports. Additional support for the worse reporters hypothesis has been found in reports of home alteration and repair jobs (Neter and Waksberg 1964), functional limitations of elderly people (Mathiowetz and Lair 1994), and everyday personal hygiene product use (Nancarrow and Cartwright 2007). Schonlau and Toepoel (2015) find support for the worse reporters hypothesis in the Longitudinal Internet Studies for the Social Sciences (LISS) panel, which also provides the data for our analysis. They show that straightlining, (i.e., the tendency to give the same responses to a series of questions with identical answer choices) increases with respondents’ panel experience. On the other hand, using different studies, Cohen and Burt (1985) and Struminskaya (2016) find no evidence that panel respondents become worse reporters over time. Regarding the likelihood of the occurrence of panel conditioning, van Landeghem (2012) reports that panel conditioning becomes more likely the higher the number of exposures to the same survey. Moreover, Halpern-Manners, Warren, and Torche (2014) report that panel conditioning is more likely the shorter the interval between waves. Although the studies cited above have analyzed panel and survey conditioning separately, the literature falls short of testing these two effects jointly. Given the consistent findings on survey conditioning and the evidence from previous studies of panel conditioning, a careful test of both phenomena in a joint context is needed. A joint test is also desirable because both forms of conditioning share the same theoretical mechanism, a desire by respondents to reduce the burden of the survey. In this paper, we use a two-by-two design where half of the respondents in wave one receives the questions in the interleafed format and half of the respondents receives the questions in the grouped format. In wave two, half of the respondents is assigned to the same format as in wave one and the other half switches to the other format. In this study, we test three hypotheses. We first replicate findings on misreporting to filter questions (i.e., survey conditioning). In line with previous research, we expect that the percentage of filter questions triggered will be greater in a grouped format than in an interleafed format in wave one. We call this first hypothesis the survey conditioning hypothesis. In addition to replicating survey conditioning, we explore whether survey conditioning depends on the education of a respondent. To date, there is no evidence regarding an interaction between survey conditioning and education or ability. Second, we aim to determine whether motivated misreporting persists over waves. Kreuter et al. (2011) show that misreporting in an interleafed format is worse in later items in a survey and conclude that respondents learn how the survey is structured. When respondents are presented with the same questions in wave two, they already know the structure and can misreport from the very first filter question. Thus, we hypothesize that respondents become worse respondents (i.e., the percentage of filters triggered will be smaller in the second wave among respondents interviewed in the same format in both waves). We call this the panel conditioning hypothesis. Respondents in the interleafed format can learn in wave one and underreport in wave two. Respondents in the grouped format have no opportunity to learn in wave one, but they may nevertheless recall the structure in wave two. Consequently, we expect a decrease in filter questions triggered in wave two among respondents interviewed in the same format in both waves. Third, we study how panel conditioning affects misreporting when respondents are interviewed in different formats in each wave (i.e., when the format changes from grouped to interleafed—and vice versa—over time). We refer to the effects of changing the format over time as the changed format panel conditioning hypothesis. Recognizing response patterns in this context and disentangling panel and survey conditioning (i.e., changes in reporting due to changing the format) is harder, since both panel conditioning and the format may influence responses at the same time, possibly even in different directions. If the questions’ formatting is switched from grouped to interleafed, misreporting should increase over time, due to both survey conditioning and panel conditioning. However, when switching from interleafed to grouped, the effect is somewhat more complex. We expect that respondents recall the questionnaire from the first wave, and thus, due to panel conditioning, misreporting should increase. This is the worse reporters mechanism. At the same time, however, the grouped format usually collects more reports. Thus, we would expect a decrease in misreporting after switching the format. Consequently, we cannot predict whether misreporting will increase or decrease over time for this group. Theory does not predict whether gains in data quality from changing to the better grouped format are outweighed by losses due to panel conditioning. Yet, this question is especially important for panel surveys using the interleafed format that wish to change to the grouped format to reduce motivated misreporting.2Table 1 summarizes the second and third hypotheses and the expected effects on the level of misreporting. Table 1. Panel Conditioning Hypotheses Filter question format   Hypothesis  Effect on misreporting  Wave one  Wave two  Interleafed  Interleafed  Panel conditioning  Increasea  Grouped  Grouped  Panel conditioning  Increasea  Grouped  Interleafed  Changed format panel cond.  Increasea  Interleafed  Grouped  Changed format panel cond.  Unclear  Filter question format   Hypothesis  Effect on misreporting  Wave one  Wave two  Interleafed  Interleafed  Panel conditioning  Increasea  Grouped  Grouped  Panel conditioning  Increasea  Grouped  Interleafed  Changed format panel cond.  Increasea  Interleafed  Grouped  Changed format panel cond.  Unclear  a Corresponds to a decrease in the number of filters triggered. Table 1. Panel Conditioning Hypotheses Filter question format   Hypothesis  Effect on misreporting  Wave one  Wave two  Interleafed  Interleafed  Panel conditioning  Increasea  Grouped  Grouped  Panel conditioning  Increasea  Grouped  Interleafed  Changed format panel cond.  Increasea  Interleafed  Grouped  Changed format panel cond.  Unclear  Filter question format   Hypothesis  Effect on misreporting  Wave one  Wave two  Interleafed  Interleafed  Panel conditioning  Increasea  Grouped  Grouped  Panel conditioning  Increasea  Grouped  Interleafed  Changed format panel cond.  Increasea  Interleafed  Grouped  Changed format panel cond.  Unclear  a Corresponds to a decrease in the number of filters triggered. In addition to testing these three hypotheses, we explore whether both survey conditioning and panel conditioning depend on respondents’ cognitive abilities. Evidence from studies on response order effects, non-differentiation, and satisficing show that respondents with low education are most susceptible to such effects (e.g., Krosnick and Alwin 1987; Narayan and Krosnick 1996; Holbrook, Krosnick, Moore, Tourangeau 2007). Similarly, Binswanger, Schunk, Toepoel (2013) find a panel conditioning effect in difficult attitudinal questions only among low educated respondents. In our case, survey conditioning may, on the one hand, increase with cognitive ability, because respondents with higher cognitive abilities may be faster in discovering repetitive patterns of the questionnaire. On the other hand, respondents with higher cognitive abilities might instead show less misreporting in the interleafed format as they may be more aware of the importance of accurate survey reports for scientific purposes. The same arguments may hold for panel conditioning: respondents with higher cognitive abilities may be better at recalling having answered the same questionnaire one month ago. However, their better understanding of the importance of accurate survey reports for research may again counteract. 2. THE STUDY The data we use for our analysis of survey and panel conditioning comes from the Dutch LISS panel, a longstanding Internet panel based on a probability sample. Sample members complete questionnaires of about 15 to 30 minutes on a monthly basis (Scherpenzeel and Das 2010). In 2012, we put several filter question experiments in two consecutive waves of the LISS panel. In the first wave (April 2012), LISS participants (n = 3330) were randomly assigned to either the interleafed or grouped filter question format. In the second wave (May 2012), all panel members were again randomly assigned to one of the two formats. The resulting design has four cells (two formats by two waves), as shown in table 5.3 Table 5. Percent of Filter Questions Triggered, by Experimental Condition Format   Percent of filters triggered   n  |Z-statistic|a  p-value  Wave one  Wave two  Wave one  Wave two  Interleafed  Interleafed  36.1 (0.007)  36.5 (0.007)  571  0.86  0.353  Grouped  Grouped  43.3 (0.007)  43.4 (0.007)  565  0.00  0.948  Grouped  Interleafed  42.8 (0.008)  35.0 (0.007)  517  116.34  <0.001  Interleafed  Grouped  36.4 (0.007)  43.3 (0.008)  509  107.87  <0.001  Format   Percent of filters triggered   n  |Z-statistic|a  p-value  Wave one  Wave two  Wave one  Wave two  Interleafed  Interleafed  36.1 (0.007)  36.5 (0.007)  571  0.86  0.353  Grouped  Grouped  43.3 (0.007)  43.4 (0.007)  565  0.00  0.948  Grouped  Interleafed  42.8 (0.008)  35.0 (0.007)  517  116.34  <0.001  Interleafed  Grouped  36.4 (0.007)  43.3 (0.008)  509  107.87  <0.001  Note.— Standard errors (in parentheses) clustered at the respondent level. a H0: % triggered interleafed = % triggered grouped. Table 5. Percent of Filter Questions Triggered, by Experimental Condition Format   Percent of filters triggered   n  |Z-statistic|a  p-value  Wave one  Wave two  Wave one  Wave two  Interleafed  Interleafed  36.1 (0.007)  36.5 (0.007)  571  0.86  0.353  Grouped  Grouped  43.3 (0.007)  43.4 (0.007)  565  0.00  0.948  Grouped  Interleafed  42.8 (0.008)  35.0 (0.007)  517  116.34  <0.001  Interleafed  Grouped  36.4 (0.007)  43.3 (0.008)  509  107.87  <0.001  Format   Percent of filters triggered   n  |Z-statistic|a  p-value  Wave one  Wave two  Wave one  Wave two  Interleafed  Interleafed  36.1 (0.007)  36.5 (0.007)  571  0.86  0.353  Grouped  Grouped  43.3 (0.007)  43.4 (0.007)  565  0.00  0.948  Grouped  Interleafed  42.8 (0.008)  35.0 (0.007)  517  116.34  <0.001  Interleafed  Grouped  36.4 (0.007)  43.3 (0.008)  509  107.87  <0.001  Note.— Standard errors (in parentheses) clustered at the respondent level. a H0: % triggered interleafed = % triggered grouped. In the interleafed format, respondents were asked 13 filter questions in random order. Two follow-up questions were asked immediately after each filter, if applicable. In the grouped format, respondents were asked all 13 filter questions (in random order) before answering any applicable follow-ups. Apart from the filter questions and a handful of questions asking for respondents’ experience with the questionnaire, no other questions were asked in this module.4 Given the small number of questions, the median response time was a little more than three minutes in both waves and 95 percent of all respondents answered the questionnaire in less than eight minutes. All filter questions asked about purchases of items such as groceries, clothes, or tickets for movies during the last month (see the original questions in the Online Supplementary Materials). We chose these questions to ensure that most respondents triggered at least a few follow-up questions in each wave. Also, we chose items that should not be influenced by circumstances, such as seasonality, that would lead to a real change of purchasing behavior between the two waves of the experiment. Thus, we expect that any differences in reporting between the two waves are, on average, caused by a change in reporting and not by a change in behavior. 2.2 Nonresponse and Attrition About 79% of the LISS panel members selected for the study participated in the first wave of the experiment (American Association for Public Opinion Research (AAPOR) RR1). Conditional on wave one response, the participation rate in wave two is 82% (third column in table 2). For the analysis sample of our research questions, however, we discard those who responded only in wave one (n = 478) or only in wave two (n = 326). Panel attrition can easily be mistaken for panel conditioning, and disentangling the two effects is one of the major challenges when analyzing panel conditioning (Bach and Eckman 2017). Using only two wave respondents allows us to eliminate any confounding effects of attrition (Warren and Halpern-Manners 2012). Two additional respondents are excluded as they broke off the survey, one in wave one and one in wave two. The resulting analysis sample consists of 2,162 people (third column in table 2). Table 2. Response Rates by Filter Question Format and Wave Filter question format in wave one   Wave one   Wave two       n  Response ratea  nb  Response ratec  Interleafed  Respondents  1,303  78.9%  1,080  81.8%  Nonrespondents  347    242    Grouped  Respondents  1,337  81.0%  1,082  82.0%  Nonrespondents  313    238    Total number of respondents  2,640  79.3%  2,162  81.9%  |Z-statistic|d      1.48    0.10  p-value      0.139    0.919  Filter question format in wave one   Wave one   Wave two       n  Response ratea  nb  Response ratec  Interleafed  Respondents  1,303  78.9%  1,080  81.8%  Nonrespondents  347    242    Grouped  Respondents  1,337  81.0%  1,082  82.0%  Nonrespondents  313    238    Total number of respondents  2,640  79.3%  2,162  81.9%  |Z-statistic|d      1.48    0.10  p-value      0.139    0.919  a AAPOR RR1 b Among wave one respondents c AAPOR RR1, conditional on wave one response d H0: % triggered interleafed = % triggered grouped Table 2. Response Rates by Filter Question Format and Wave Filter question format in wave one   Wave one   Wave two       n  Response ratea  nb  Response ratec  Interleafed  Respondents  1,303  78.9%  1,080  81.8%  Nonrespondents  347    242    Grouped  Respondents  1,337  81.0%  1,082  82.0%  Nonrespondents  313    238    Total number of respondents  2,640  79.3%  2,162  81.9%  |Z-statistic|d      1.48    0.10  p-value      0.139    0.919  Filter question format in wave one   Wave one   Wave two       n  Response ratea  nb  Response ratec  Interleafed  Respondents  1,303  78.9%  1,080  81.8%  Nonrespondents  347    242    Grouped  Respondents  1,337  81.0%  1,082  82.0%  Nonrespondents  313    238    Total number of respondents  2,640  79.3%  2,162  81.9%  |Z-statistic|d      1.48    0.10  p-value      0.139    0.919  a AAPOR RR1 b Among wave one respondents c AAPOR RR1, conditional on wave one response d H0: % triggered interleafed = % triggered grouped Before we report the results of our analysis, however, we make sure that sub-setting the analysis sample to two-wave respondents does not confound the experimental manipulations regarding the filter question formats and, thus, our estimates of survey and panel conditioning. To do so, we check whether nonresponse in wave one and attrition differ by the two formats and waves. Regarding nonresponse in wave one, response rates do not differ between the two formats (second column in table 2), and we do not find any substantial or significant differences in sociodemographic variables between the formats (results not shown). Moreover, nonresponse over time (i.e., attrition) does not seem to be related to sociodemographic variables: we find only one substantial and significant difference between those who respond to both waves and those who participate in the first wave only. Attriters are younger, but do not differ from two-wave respondents in gender, education, marital status, household composition, or their response behavior in previous waves of the LISS panel (results not shown). Importantly, neither the filter question format in wave one, the number of filters triggered in wave one, nor their interaction is predictive of attrition (table 3). Wave two response rates do not differ between the two wave one formats (fourth column in table 2). Table 3. Predictors of Attrition   Coefficient  Grouped (ref.: interleafed)  0.031 (0.037)  No. of filters triggered in wave one  0.002 (0.005)  Grouped*No. of filters triggered in wave one  −0.002 (0.007)  n  2,640a    Coefficient  Grouped (ref.: interleafed)  0.031 (0.037)  No. of filters triggered in wave one  0.002 (0.005)  Grouped*No. of filters triggered in wave one  −0.002 (0.007)  n  2,640a  Note.— Linear probability model. Attrition (Yes/No). Standard errors in parentheses. Constant not shown. a All wave one respondents Table 3. Predictors of Attrition   Coefficient  Grouped (ref.: interleafed)  0.031 (0.037)  No. of filters triggered in wave one  0.002 (0.005)  Grouped*No. of filters triggered in wave one  −0.002 (0.007)  n  2,640a    Coefficient  Grouped (ref.: interleafed)  0.031 (0.037)  No. of filters triggered in wave one  0.002 (0.005)  Grouped*No. of filters triggered in wave one  −0.002 (0.007)  n  2,640a  Note.— Linear probability model. Attrition (Yes/No). Standard errors in parentheses. Constant not shown. a All wave one respondents Since all LISS participants were randomly assigned to the two filter question formats in wave two, we also test whether there are any important differences between the two-wave respondents and all respondents of wave two, (i.e., whether dropping some respondents from wave two interferes with the randomization to the filter question formats in wave two). After sub-setting the sample to two-wave respondents, we do not find any significant differences in sociodemographic variables or the number of filters triggered in wave one between the two formats in wave two (results not shown). Therefore, we are confident that nonresponse and attrition patterns do not differ among the subgroups. Importantly, attrition is not influenced by the experimental conditions or the filter question experience in wave one and thus should not bias the results of our analysis. Although the sample restrictions limit the representativeness of the data (external validity), they do not jeopardize our interpretation of the experimental results (internal validity). In the next two sections, we present and discuss the results of our analysis regarding our three hypotheses (i.e., whether respondents learn to misreport within a single wave of the survey and whether misreporting persists over two waves). 3. RESULTS As expected by the survey conditioning hypothesis, the percentage of filter questions triggered in wave one is significantly smaller in the interleafed format than in the grouped format: 36% versus 43% (i.e., 4.7 versus 5.6 filters triggered, see table 4). Survey conditioning takes place in wave one as we expected. When we also include the position of a filter question (i.e., whether a filter was asked in the first, second, or third third of the section) in the analysis, we find that the probability of triggering a filter in the interleafed format decreases significantly with the position of a filter in the questionnaire (results not shown). This finding further supports our first hypothesis that respondents learn to misreport in the interleafed format. Table 4. Percent of Filter Questions Triggered in Wave One by Format Format  Wave one % triggered  Interleafed  36.3 (0.005)  Grouped  43.1 (0.006)  Total  39.7 (0.004)  |Z-statistic|a  84.83  p-value  <0.001  N  2162  Format  Wave one % triggered  Interleafed  36.3 (0.005)  Grouped  43.1 (0.006)  Total  39.7 (0.004)  |Z-statistic|a  84.83  p-value  <0.001  N  2162  Note.— Standard errors (in parentheses) clustered at the respondent level. a H0: % triggered interleafed = % triggered grouped Table 4. Percent of Filter Questions Triggered in Wave One by Format Format  Wave one % triggered  Interleafed  36.3 (0.005)  Grouped  43.1 (0.006)  Total  39.7 (0.004)  |Z-statistic|a  84.83  p-value  <0.001  N  2162  Format  Wave one % triggered  Interleafed  36.3 (0.005)  Grouped  43.1 (0.006)  Total  39.7 (0.004)  |Z-statistic|a  84.83  p-value  <0.001  N  2162  Note.— Standard errors (in parentheses) clustered at the respondent level. a H0: % triggered interleafed = % triggered grouped Regarding the second hypothesis regarding panel conditioning, we expect that the percentage of filters triggered should decrease over time due to learning when filter questions are asked in the same format in each wave (rows one and two of table 1). Row one of table 5 shows that respondents who were interviewed in the interleafed format in both waves show no significant change in reporting across time. The same result holds for respondents interviewed in the grouped format in both waves. Thus, we do not find any support for the panel conditioning hypothesis: respondents do not trigger fewer filters when interviewed for the second time in the same format. Next, we turn to our third hypothesis and those respondents who changed formats from wave one to wave two (rows three and four of table 1). We expect that respondents switched from grouped to interleafed underreport in wave two due to panel conditioning (see table 1). Row three of table 5 shows, as expected, that these respondents trigger significantly fewer filters in wave two than in wave one and thus increase misreporting. Switching from interleafed to grouped, however, leads to a significant increase in filters triggered and thus less misreporting (row four). Both effects have about the same absolute size. Moreover, the effect size is approximately equal to the difference between the interleafed format and the grouped format in wave one. For this reason, we interpret the changes over time seen in the bottom rows of table 5 as driven by the format change and not by panel conditioning. Thus, we find no support for the third hypotheses that panel conditioning influences misreporting when the filter question format changes between waves. In both panel conditioning hypotheses, we hypothesize that respondents learn from participation in the first wave of the experiment how follow-up questions can be avoided. However, a few respondents (n = 48) did not trigger a single filter question in wave one. Therefore, these respondents could not learn. To assess whether this confounds the results shown in table 5, we rerun the analyses without these 48 respondents. Results without these respondents do not differ substantially from the results shown in table 5 (results not shown). Regarding the influence of cognitive ability, we find that survey conditioning does not depend on cognitive ability, measured by respondents’ highest educational degree achieved. In other words, the difference between filters triggered in the interleafed format and filters triggered in the grouped format does not vary with respondents’ education (results not shown). Similarly, we do not find a panel conditioning effect when replicating the results of table 5 in subgroups defined by respondents’ highest educational degree. We interpret these findings as evidence that survey conditioning and panel conditioning do not depend on respondents’ cognitive ability. To sum up the results presented here, we find strong evidence for survey conditioning. Fewer filters are triggered in the interleafed format than in the grouped format, additional evidence that respondents in the interleafed format learn how the survey is structured and use that information to reduce the burden of the survey by misreporting. However, there is no evidence that respondents remember the survey structure over time in a panel survey context and misreport more in the second wave. Therefore, there is no support for the worse respondents mechanism. 4. DISCUSSION Survey researchers are increasingly aware that asking filter questions in different formats can affect responses and measurement error. Studies on misreporting in cross-sectional surveys have shown that respondents asked in an interleafed format trigger fewer follow-ups than respondents interviewed in a grouped format. The common explanation for such survey conditioning is that respondents learn how a survey is structured and use this information to speed through the interview and reduce the burden of the survey. Similar concerns have been raised in the context of panel surveys: if respondents remember the structure of a survey from prior waves, they might recall that giving certain answers can reduce the length of the survey. We first hypothesized that respondents surveyed in the interleafed filter question format would misreport more than respondents surveyed in the grouped format. We found in both waves of our experiment that respondents in the interleafed format trigger fewer filters than respondents in the grouped format. Our interpretation of this finding is that misreporting is more pronounced in the interleafed format. Studies that compared responses to filter questions with validation records (see section 1) showed that the grouped format produces more correct responses. Thus, we are confident in our interpretation of the grouped format as more accurate than the interleafed format. These results are as expected from previous studies. In our second and third hypotheses, we suspected that we would see more misreporting in a second wave of a panel study. If respondents remember the structure of a questionnaire from prior participation, then misreporting in a second wave should increase: respondents would recognize the filter questions and trigger fewer filters in a second wave. In this study, the two waves were separated by only four weeks, which should make panel conditioning more likely (Warren and Halpern-Manners 2012; Halpern-Manners et al. 2014). However, we did not find support for either of the two panel conditioning hypotheses. Respondents interviewed in the same format in both waves do not misreport more in the second wave (the panel conditioning hypothesis). We did see differences in triggering rates for those respondents whose formats changed across waves (the changed format panel conditioning hypothesis). However, these changes in the levels of misreporting are likely only due to the format itself and not due to the worse respondents (or better respondents) mechanism. Interestingly, neither survey conditioning nor panel conditioning seem to vary with respondents’ cognitive abilities, as measured by respondents’ highest educational degrees. To ensure that our findings are not distorted by attrition, we considered only those respondents who participated in both waves of the survey. Furthermore, we chose a battery of filter questions that should not lead to changes-in-behavior panel conditioning. One might argue that the absence of panel conditioning is due to LISS respondents being very good or very motivated respondents who are not susceptible to motivated misreporting. Yet, the study by Schonlau and Toepoel (2015) regarding straightlining (see section 1) and the clear signs of misreporting in the interleafed format in wave one (see section 3) show that LISS respondents are as good (or bad) as any other respondents. Even so, we did not find a panel conditioning effect in our filter questions. This lack of a panel conditioning effect is somewhat surprising. While respondents show a strong tendency to remember the structure of a questionnaire in a single survey, they do not seem to recall this information when the same questions are repeated four weeks later. Even respondents in the interleafed format in wave one, many of whom misreported, no longer show any signs of misreporting when interviewed for a second time in a grouped format. The most likely explanation for this finding is the resetting effect reported by Kreuter et al. (2011). They find that respondents’ misreporting is reset when a new section within a questionnaire starts. A similar logic might apply to the panel survey context: if misreporting resets when a new section starts, then the same should hold for the start of a new wave in a panel survey. The degree of panel conditioning seems to depend on the topic and the burden of the questions asked in a survey (Warren and Halpern-Manners 2012). The questions we asked in the LISS panel were fairly easy to answer, dealt with a possibly boring topic (e.g., “What did you buy last month?”) and respondents needed only a few minutes to answer them (see section 2). In addition, LISS respondents were exposed to our questionnaire only twice and have answered many other, potentially more interesting, novel or burdensome LISS modules in the past. Responding to our filter questions module may not have been distinct from the general experience of participating in the LISS panel, and thus, respondents may not have remembered the questions (Tourangeau, Rips, Rasinksi 2000, chapter 3). Questions dealing with more burdensome or sensitive topics may produce different results, potentially even resulting in respondents becoming better reporters over time (compare the studies cited in section 1). Moreover, results may differ when respondents are exposed to the survey more than twice (see section 3). Thus, more research exploring the influence of topic and survey burden as well as the level of exposure to the survey on motivated misreporting and panel conditioning needs to be done, a point also made by Eckman et al. (2014). Our finding, however, that motivated misreporting does not increase over waves of a panel survey is good news for survey practitioners. Although misreporting can pose a serious problem, repeated participation in the same survey does not make the problem worse. Importantly, for panel surveys that used the interleafed format in the past, our work suggests that changing to a grouped format is a good idea, as it can help increase data quality by eliminating survey conditioning without introducing panel conditioning. Supplementary Materials Supplementary materials are available online at http://www.oxfordjournals.org/our_journals/jssam/. Acknowledgments The authors would like to thank Frauke Kreuter, Georg-Christoph Haas, the Editor-in-Chief, and three anonymous reviewers who provided helpful comments on earlier drafts of this paper. Carola Krimmer provided research assistance. The LISS panel data was collected by CentERdata (Tilburg University, The Netherlands) through its MESS project funded by the Netherlands Organization for Scientific Research. Footnotes 1 An extensive discussion of theoretical mechanisms underlying panel conditioning, their outcomes, and a review of relevant studies is provided by Warren and Halpern-Manners (2012). 2 Other panel surveys might consider a switch from the grouped to the interleafed format, for reasons we do not discuss here, for brevity. See Kreuter et al. (2018) for a discussion of the advantages of the interleafed format. 3 There was a third format included in the original experiment. However, we do not use these cases. Results concerning that format are reported in Kreuter et al. (2018). 4 While no other questions were asked in this module, respondents are usually invited to answer more than one module per month. We do not have any information on other modules our respondents answered in these months. REFERENCES Bach R. L., Eckman S. ( 2017), “Does Participating in a Panel Survey Change Respondents’ Labor Market Behavior?” IAB Discussion Paper, 15/2017, Available at http://doku.iab.de/discussionpapers/2017/dp1517.pdf, last accessed October 9, 2017. Bailar B. A. ( 1989), “Information Needs, Surveys, and Measurement Errors,” in Panel Surveys , eds. Kasprzyk D., Duncan G. J., Kalton G., Singh M. P., pp. 1– 24, New York: Wiley. Binswanger J., Schunk D., Toepoel V. ( 2013), “Panel Conditioning in Difficult Attitudinal Questions,” Public Opinion Quarterly , 77, 783– 797. Google Scholar CrossRef Search ADS   Cantor D. ( 2010), “A Review and Summary of Studies on Panel Conditioning,” in Handbook of Longitudinal Research. Design, Measurement and Analysis , eds. Menard S., pp. 123– 138, Amsterdam: Elsevier. Cohen S. B., Burt V. L. ( 1985), “Data Collection Frequency Effect in the National Medical Care Expenditure Survey,” Journal of Economic and Social Measurement , 13, 125– 151. Google Scholar PubMed  Duan N., Alegria M., Canino G., McGuire T., Takeuchi D. ( 2007), “Survey Conditioning in Self-Reported Mental Health Service Use: Randomized Comparison of Alternative Instrument Formats,” Health Research and Educational Trust , 42, 890– 907. Eckman S., Kreuter F., Kirchner A., Jäckle A., Tourangeau R., Presser S. ( 2014 ), “Assessing the Mechanisms of Misreporting to Filter Questions in Surveys,” Public Opinion Quarterly , 78, 721– 733. Google Scholar CrossRef Search ADS   Halpern-Manners A., Warren J. R. ( 2012 ), “Panel Conditioning in Longitudinal Studies: Evidence from Labor Force Items in the Current Population Survey,” Demography , 49, 1499– 1519. Google Scholar CrossRef Search ADS PubMed  Halpern-Manners A., Warren J. R., Torche F. ( 2014), “Panel Conditioning in Longitudinal Study of Illicit Behaviors,” Public Opinion Quarterly , 78, 565– 590. Google Scholar CrossRef Search ADS   Holbrook A. L., Krosnick J. A., Moore D., Tourangeau R. ( 2007), “Response Order Effects in Dichotomous Categorical Questions Presented Orally: The Impact of Question and Respondent Attributes,” Public Opinion Quarterly , 71, 1– 25. Google Scholar CrossRef Search ADS   Kessler R. C., Wittchen H-U., Abelson J. M., McGonagle K., Schwarz N., Kendler K. S., Knäuper B., Zhao S. ( 1998), “Methodological Studies of the Composite International Diagnostic Interview (CIDI) in the US National Comorbidity Survey (NCS),” International Journal of Methods in Psychiatric Research , 7, 33– 55. Google Scholar CrossRef Search ADS   Kreuter F., Eckman S., Tourangeau R. ( 2018), “Salience of Burden and Its Effects on Response Behavior to Skip Questions. Experimental Results from Telephone and Web-Surveys,” in Advances in Questionnaire Design, Development, Evaluation and Testing , eds. Beatty P., Collins D., Kaye L., Padilla J., Willis G., Wilmot A., Hoboken, NJ: Wiley. Kreuter F., McCulloch S., Presser S., Tourangeau R. ( 2011), “The Effects of Asking Filter Questions in Interleafed Versus Grouped Format,” Sociological Methods and Research , 40, 88– 104. Google Scholar CrossRef Search ADS   Kroh M., Winter F., Schupp J. ( 2016), “Using Person-Fit Measures to Assess the Impact of Panel Conditioning on Reliability,” Public Opinion Quarterly , 80, 914– 942. Google Scholar CrossRef Search ADS   Krosnick J. A., Alwin D. F. ( 1987), “An Evaluation of a Cognitive Theory of Response-Order Effects in Survey Measurement,” Public Opinion Quarterly , 51, 201– 219. Google Scholar CrossRef Search ADS   Mathiowetz N. A., Lair T. J. ( 1994), “Getting Better? Change or Error in the Measurement of Functional Limitations,” Journal of Economic and Social Measurement , 20, 237– 262. Nancarrow C., Cartwright T. ( 2007), “Online Access Panels and Tracking Research. The Conditioning Issue,” International Journal of Market Research , 49, 573– 594. Narayan S., Krosnick J. A. ( 1996), “Education Moderates Some Response Effects in Attitude Measurement,” Public Opinion Quarterly , 60, 58– 88. Google Scholar CrossRef Search ADS   Neter J., Waksberg J. ( 1964), “Conditioning Effects from Repeated Household Interviews,” Journal of Marketing , 28, 51– 56. Google Scholar CrossRef Search ADS   Scherpenzeel A., Das M. ( 2010), “True Longitudinal and Probability-Based Internet Panels: Evidence from the Netherlands,” in Social and Behavioral Research and the Internet: Advances in Applied Methods and Research Strategies , eds. Das M., Ester P., Kaczmirek L., pp. 77– 104, Boca Raton, FL: Taylor & Francis. Schonlau M., Toepoel V. ( 2015), “Straightlining in Web Survey Panels Over Time,” Survey Research Methods , 9, 125– 137. Struminskaya B. ( 2016), “Respondent Conditioning in Online Panel Surveys. Results of Two Field Experiments,” Social Science Computer Review , 34, 95– 115. Google Scholar CrossRef Search ADS   Tourangeau R., Rips L. J., Rasinksi K. ( 2000), The Psychology of Survey Response , Cambridge: Cambridge University Press. Google Scholar CrossRef Search ADS   van Landeghem B. ( 2012), “Panel Conditioning and Self-Reported Satisfaction: Evidence from International Panel Data and Repeated Cross-Sections,” SOEPpapers, 484, Available at https://www.diw.de/documents/publikationen/73/diw_01.c.408184.de/diw_sp0484.pdf, last accessed October 9, 2017. Warren J. R., Halpern-Manners A. ( 2012), “Panel Conditioning in Longitudinal Social Science Surveys,” Sociological Methods and Research , 41, 491– 534. Google Scholar CrossRef Search ADS   Waterton J., Lievesley D. ( 1989), “Evidence, of Conditioning Effects in the British Social Attitudes Panel,” in Panel Surveys , eds. Kasprzyk D., Duncan G. J., Kalton G., Singh M. P., pp. 319– 339, New York: Wiley. Yan T., Eckman S. ( 2012), “Panel Conditioning: Change in True Value versus Change in Self-Report,” Proceedings of the American Statistical Association, Survey Research Methods Section, pp. 4726–4736. © The Author 2017. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. All rights reserved. For Permissions, please email: journals.permissions@oup.com

Journal

Journal of Survey Statistics and MethodologyOxford University Press

Published: Oct 25, 2017

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off