TY - JOUR AU - Kirchner,, Antje AB - Abstract Asking questions fluently, exactly as worded, and at a reasonable pace is a fundamental part of a survey interviewer’s role. Doing so allows the question to be asked as intended by the researcher and may decrease the risk of measurement error and contribute to rapport. Despite the central importance placed on reading questions exactly as worded, interviewers commonly misread questions, and it is not always clear why. Thus, understanding the risk of measurement error requires understanding how different interviewers, respondents, and question features may trigger question reading problems. In this article, we evaluate the effects of question features on question asking behaviors, controlling for interviewer and respondent characteristics. We also examine how question asking behaviors are related to question-asking time. Using two nationally representative telephone surveys in the United States, we find that longer questions and questions with transition statements are less likely to be read exactly and fluently, that questions with higher reading levels and parentheticals are less likely to be read exactly across both surveys and that disfluent readings decrease as interviewers gain experience across the field period. Other question characteristics vary in their associations with the outcomes across the two surveys. We also find that inexact and disfluent question readings are longer, but read at a faster pace, than exact and fluent question reading. We conclude with implications for interviewer training and questionnaire design. 1. INTRODUCTION Asking a question fluently, exactly as worded, and at a reasonable pace is the “first principle of good interviewing technique” and a fundamental part of a survey interviewer’s job (Fowler and Mangione 1990, p. 34). Doing so allows the question to be asked as intended by the researcher (Brenner 1982) and helps move efficiently through the questionnaire, which can build rapport with respondents (Garbarski, Schaeffer, and Dykema 2016). Yet research has shown that questions are read exactly as worded as little as 28 and as much as 97 percent of the time (Ongena and Dijkstra 2006). Even well-trained interviewers deviate from exact and fluent question reading (Billiet and Loosveldt 1988; Fowler and Mangione 1990). Idiosyncrasies in interviewer deviations from exact question reading make it difficult to anticipate a direct effect of question misreading on measurement error (e.g., Fowler and Mangione 1990; Mangione, Fowler, and Louis 1992; Dykema, Lepkowski, and Blixt 1997; Schaeffer, Dykema, and Maynard 2010). But deviations from “ideal” or “paradigmatic” question-answer sequences consistently predict increased measurement error (Schaeffer and Dykema 2011). Deviations in question-asking behaviors also may trigger interactional problems later in the survey, reducing adequate answers and increasing problematic answers (Holbrook, Johnson, Cho, Shavitt, and Chavez 2015, 2016; Johnson, et al. 2015). Misreadings are also associated with other interviewer skills such as probing and accurately recording answers (Fowler and Mangione 1990), which in turn are associated with measurement errors (Mangione et al. 1992). Disfluent readings (i.e., with “ums” and “uhs”) may also affect respondent processing. Disfluencies occur when words or thoughts are new or less rehearsed (e.g., Arnold, Fagnano, and Tanenhaus 2003) and indicate uncertainty about what is being said (e.g., Bortfeld, Leon, Bloom, Schober, and Brennan 2001). Disfluent speakers are generally perceived as less confident, trustworthy, credible, or easily understood (e.g., Ketrow 1990; Oksenberg and Cannell 1988; Bortfeld et al. 2001; Ehlen, Schober, and Conrad 2007; Conrad, Schober, and Dijkstra 2008; Charoenruk and Olson 2018). Yet disfluencies can provide more time for respondents to process requests (Bradburn, Rips, and Shevell 1987; Brennan and Schober 2001). Thus, question reading disfluencies may affect perceptions of interviewers and respondent processing time. This article examines (1) what question characteristics are associated with exact question readings versus misreadings, (2) what question characteristics are associated with disfluent question readings, and (3) how these deviations from exact and fluent readings are associated with the time and pace of administration. The latter is important because the pace of question reading affects respondent processing time and thus respondents’ task difficulty. Interviewers are commonly trained to read questions at a two words-per-second (wps) pace (slower than the three-to-four-wps pace of everyday speech; Smith and Shaffer 1991, 1995) to move efficiently through the questionnaire (Cannell, Miller, and Oksenberg 1981; Viterna and Maynard 2002). However, the empirical association between question administration pace and data quality outcomes is mixed (Groves and Magilavy 1986; Mingay and Greenwell 1989; Krosnick and Presser 2010; Schaeffer and Dykema 2011). To our knowledge, whether questions being read fluently and exactly as written is actually associated with pace of administration has not been evaluated. Unlike most behavior coding studies, which evaluate in-person surveys (Ongena and Dijkstra 2006), this article addresses these research gaps in two national telephone surveys, enabling replication across studies conducted by different survey firms with different questionnaires. Question reading in telephone surveys is especially important to understand because telephone surveys rely solely on spoken words; there are no other channels of communication (e.g., show cards) (Schwarz, Strack, Hippler, and Bishop 1991; de Leeuw 1992). Furthermore, mixed-mode survey designs are increasingly using telephone for nonresponse follow-up (e.g., de Leeuw 2018; Dillman, Smyth, and Christian 2014), making minimizing interviewer error to promote comparability of modes essential. 2. EXISTING MODELS FOR INTERVIEWER READING BEHAVIORS Even though the influence of question characteristics on interviewer reading behaviors has long been recognized (Fowler and Mangione 1990; Presser and Zhao 1992), existing theoretical interviewer reading models ascribe misreadings largely to characteristics of interviewers, with only modest mention of possible question-level influences (Sander, Conrad, Mullin, and Herrmann 1992; Ongena and Dijkstra 2007). Interviewers are said to deviate because they are not committed to verbatim reading, the question is hard to read, the reading task is unclear, or they are trying to help respondents (Fowler and Mangione 1990; Fowler 1991; Sander et al. 1992; Ongena and Dijkstra 2007). Moreover, studies of reading behaviors generally look at individual or subsets of questions in isolation, providing a more qualitative evaluation of question characteristics associated with problems (Brick, Calahan, Gray, Severynse, and Stowe 1994; Hess, Rothgeb, and Zukerberg 1997; Esposito 2002; Ongena and Dijkstra 2006). 3. A MODEL FOR THE EFFECT OF QUESTION CHARACTERISTICS ON READING BEHAVIORS In addition to interviewer skills, experience, and motivation, survey questions also contain packages of characteristics that may jointly affect question reading. We identify four groups of question characteristics (figure 1). Figure 1. Open in new tabDownload slide Conceptual Model for the Effect of Question Characteristics on Reading Behaviors. Figure 1. Open in new tabDownload slide Conceptual Model for the Effect of Question Characteristics on Reading Behaviors. 3.1 Question Length Long questions increase the risk of words being omitted or changed during question administration, and therefore have higher rates of question misreadings (Presser and Zhao 1992; Calahan, Mitchell, Gray, Chen, and Tsapogas 1997; Childs and Landreth 2006; Ongena and Dijkstra 2007; Dykema, Schaeffer, Garbarski, and Hout in press; Holbrook et al. 2015, 2016). For example, if interviewers feel information is redundant or unnecessary, they may omit it (e.g., response options, Childs and Landreth 2006). In normal conversations, longer sentences are more likely to contain disfluencies (e.g., Shriberg 1996; Bortfeld et al. 2001). Thus, we expect longer questions will be less likely to be read exactly and more likely to have disfluencies. Transition statements alert respondents to new question topics but also lengthen the question and often simply repeat the question’s meaning. As such, interviewers may omit them (Brick, Tubbs, Collins, Nolin, Cantor, Levin, and Carnes 1997). Thus, we expect questions with transition statements will be less likely to be read exactly and more likely to have disfluencies. 3.2 Question Complexity Complex questions will be more challenging to read accurately and quickly (Fowler and Mangione 1990; Ongena and Dijkstra 2007). Prior evaluations have operationalized question complexity using question reading level (Lenzner 2014; Olson and Smyth 2015; Holbrook et al. 2016), syntactical properties of the question (Graesser, Cai, Louwerse, and Daniel 2006; Schaeffer and Dykema 2011; Lenzner 2012), and presence of unknown terms (Olson and Smyth 2015). For example, questions with higher Flesch reading grade levels, those judged as “difficult” by raters, and those with more problems identified by the Question Understanding Aid tool (QUAID) and the Problem Classification Coding System have a higher probability of being misread (Dykema et al. in press; Fowler and Mangione 1990; Holbrook et al. 2015). The QUAID syntactic complexity measures include unfamiliar technical terms (i.e., words unlikely to be known), vague or imprecise relative terms (i.e., adverbs and adjectives), vague or ambiguous noun-phrases (i.e., nouns with multiple meanings and pronouns), questions that contain complex syntax (i.e., sentences with multiple clauses), and questions with indications of working memory overload (i.e., containing multiple ideas simultaneously) (Graesser et al. 2006). Questions with complex syntax may be misread if interviewers attempt to add information or reword to aid comprehension. Thus, we expect questions with higher reading grade levels and more syntactic complexity to be less likely to be read exactly and more likely to have disfluencies. 3.3 Questions Requiring Interviewer Decisions The question features summarized here give interviewers discretion over what to read, increasing interviewer burden (Japec 2008). Parenthetical information in a question may be skipped by interviewers, especially if it seems ancillary (Olson and Smyth 2015; Dykema, Schaeffer, Garbarski, Nordheim, Banghart 2016). Similarly, for many battery items reading the question stem and response options is optional (i.e., those appearing after the first or second item) (Ongena and Dijkstra 2007; Olson, Smyth and Cochran 2018). Additionally, emphasized words indicate that a particular word is important but do not specify how interviewers should read the emphasis (Olson and Smyth 2015, 2017). Thus, interviewers may change question wording to communicate to respondents that the word is important. Interviewer instructions, which are intended to cue interviewers about how to ask a question, may distract from the question text itself, leading to misreadings. In general, we hypothesize that discretionary information will decrease the chance that the question is read exactly and increase the chance that it is read with disfluencies. 3.4 Highly Practiced Questions Practice makes reading tasks easier and more fluent (Ongena and Dijkstra 2007). Thus, we expect questions that are more practiced, such as demographic items that are common to multiple surveys, will be more likely to be read exactly and fluently (Cannell, Marquis, and Laurent 1977) than less practiced questions such as survey-specific attitudinal or behavioral items. Alternatively, interviewers may be more likely to try to administer these questions from memory, making them less likely to read exactly (Ongena and Dijkstra 2007). Similar to normal conversation where disfluencies occur more often at the beginning of interactions and less as the interaction progresses (Bortfeld et al. 2001), disfluencies and question misreadings may also decrease with experience within a survey. Any initial hesitation or rapport challenges leading to disfluencies or misreadings are likely to be worked out on early questions, giving later questions an advantage. Existing research, however, has found no difference in reading errors across early and later questions (Presser and Zhao 1992; Holbrook et al. 2015). Finally, interviewers gain experience (i.e., practice) with questions over the field period (Olson and Peytchev 2007; Olson and Bilgen 2011; Kirchner and Olson 2017), which may reduce misreading and disfluency. Alternatively, this experience may reveal common respondent difficulties on questions, leading interviewers to make changes to avoid respondent problems. 4. INTERVIEWER QUESTION READING AND ADMINISTRATION TIME Interviewers’ question administration time may cue respondents about expected response quality, with a fast delivery suggesting that less careful answers may be acceptable (Cannell et al. 1981; Fowler and Mangione 1990; Loosveldt and Beullens 2013). Although it is well established that interview length varies across interviewers (e.g., Olson and Peytchev 2007; Olson and Bilgen 2011; Loosveldt and Beullens 2013; Kirchner and Olson 2017; Vandenplas, Loosveldt, Beullens, and Denies 2017), it is less well understood why this variation occurs. One reason might be differences in how interviewers ask survey questions. No known previous study has evaluated the association between question asking behaviors and length or pace of question administration. Question misreadings might occur because interviewers omit words or phrases, shortening question administration times. However, misreadings may include added words, restarts, or repeats of the question, lengthening question times. We anticipate disfluencies will lengthen question times because they add utterances (e.g., “uh,” “um”) and may be surrounded by pauses, stutters, or other vocal cues. Question administration time can be measured several ways. Total time spent on question-asking (i.e., not answering, probing, etc.) is one measure. However, lengthier questions will necessarily have longer total times. To account for question length, a second measure of question administration time is the number of words asked per second or pace. While interviewers are commonly instructed to read questions at two wps (Cannell et al. 1981; Fowler and Mangione 1990), they may speak more quickly when recovering from a misreading or a disfluency. As such, we will examine both measures. In sum, we examine the association of question characteristics that contribute to question length, question complexity, interviewer decisions, and interviewer practice with two interviewer reading behaviors, exact question reading and disfluencies, and whether these two reading behaviors are associated with question administration time. 5. DATA AND METHODS The data come from two telephone surveys: the fifty-four-item Work and Leisure Today (WLT) survey of landline telephone households, conducted by Abt SRBI during August 2013 (AAPOR RR3 = 6.3 percent), and the seventy-three-item US/Japan Newspaper Opinion Poll (NOP) survey of landline and cell phone households conducted by Gallup during November 2013 (AAPOR RR1 = 7.4 percent). Two surveys conducted by different organizations permit us to evaluate whether question features that predict misreadings replicate across two survey “houses” with (potentially) different training and monitoring procedures. Full questionnaires for both studies are in the online supplementary material. All 450 interviews conducted by twenty-two interviewers from WLT were audio recorded, transcribed, and behavior coded. From NOP, a stratified random subset of 438 of the 992 audio-recorded interviews were selected. To select this subset, interviewers who conducted fewer than ten interviews were first excluded to help stabilize the multilevel models (van Breukelen and Moerbeek 2013; Vassallo, Durrant, and Smith 2017). Remaining interviewers were then stratified by overall experience (less than one year versus one year or more), and a random subset of interviewers was selected from each of these groups. All of the interviews for the thirty-one selected interviewers were transcribed and behavior coded. Behavior coding was conducted using Sequence Viewer (Dijkstra 2016). First, each audio recording was synced at the conversational turn level to its transcript. Trained undergraduate behavior coders identified the time in deciseconds in the audio file at which each conversational turn began (onset time) and ended (offset time), allowing us to isolate question reading time. For this analysis, coders identified the actor, the initial behavior (question asking), details about the initial behavior (exactly as written), and whether or not there were any disfluencies for each conversational turn. A random 10 percent subsample of coded interviews was coded by two master coders to assess reliability; in both surveys, weighted kappas exceed 0.99 for the actor, 0.90 for the initial action, 0.69 for the details about the initial action, and 0.87 for any disfluencies, all above the common cutpoint of 0.40 (Bilgen and Belli 2010). 5.1 Dependent Variables—Interviewer Reading Behaviors The first dependent variable is a dichotomous measure of whether the interviewer read the question exactly as written the first time it was read (the initial question asking). Interviewers read questions exactly in 49.86 percent of WLT and 64.03 percent of NOP initial question administrations (table 1). Table 1. Means and Percentages for Interviewer Reading Behaviors, Question Characteristics, Respondent Characteristics, and Interviewer Characteristics WLT NOP n Percent/ mean SD n Percent mean SD Dependent variables   Exact question reading 20,927 49.86% 30,079 64.03%   Disfluencies 20,927 18.67% 30,079 17.34%   Total number of seconds 20,926 6.57 4.95 30,078 6.01 5.24   Words per second 20,926 2.76 0.97 30,708 2.22 0.99 Independent variables  Question length   Number of words 54 14.56 12.71 73 25.88 10.15   Transition statement 54 12.96% 73 4.11%  Question complexity   Question reading level 54 6.64 4.76 73 7.49 3.21   QUAID measurements    Unfamiliar technical term 54 46.30% 73 57.53%    Vague or imprecise relative term 54 79.63% 73 43.84%    Vague or ambiguous noun-phrase 54 37.04% 73 16.44%    Complex syntax 54 5.56% 73 4.11%    Working memory overload 54 5.56% 73 15.07%  Questions requiring interviewer decisions   Parentheses 54 9.26% 73 4.11%   Emphasis 54 14.81% 73 46.58%   Interviewer instructions 54 37.04% 73 34.25%   Battery items    First question in battery 4 7.41% 12,165 5.82%    Later question in battery 18 33.33% 1,751 53.74%    Not in battery 32 59.26% 16,163 40.44%  Highly practiced questions   Question content    Attitude or behavior 40 74.07% 64 87.67%    Demographic 14 25.93% 9 12.33%   Sequential question number 54 27.50 15.48 73 46.41 22.27   Interviewer within-study experience 22 12.12 7.27 31 14.45 4.50  Question control variables   Response option format    Open-ended 22 40.74% 4 5.48%    Closed-nominal 6 11.11% 30 21.10%    Closed-ordinal 18 33.33% 16 21.92%    Yes/No 8 14.81% 23 31.51%  Respondent characteristics   Female = 1 449 63.92% 438 41.78%   Age 449 61.34 16.72 438 54.95 17.43   HS degree or less = 1 449 28.51% 438 26.26%   Employed = 1 449 40.98% 438 47.03%   Cell phone n/a 438 42.69%  Interviewer characteristics   Interviewer female = 1 22 54.55% 31 58.06%   Interviewer nonwhite = 1 22 59.09% 31 0.00%   Interviewer 1+ year experience = 1 22 68.18% 31 51.61% WLT NOP n Percent/ mean SD n Percent mean SD Dependent variables   Exact question reading 20,927 49.86% 30,079 64.03%   Disfluencies 20,927 18.67% 30,079 17.34%   Total number of seconds 20,926 6.57 4.95 30,078 6.01 5.24   Words per second 20,926 2.76 0.97 30,708 2.22 0.99 Independent variables  Question length   Number of words 54 14.56 12.71 73 25.88 10.15   Transition statement 54 12.96% 73 4.11%  Question complexity   Question reading level 54 6.64 4.76 73 7.49 3.21   QUAID measurements    Unfamiliar technical term 54 46.30% 73 57.53%    Vague or imprecise relative term 54 79.63% 73 43.84%    Vague or ambiguous noun-phrase 54 37.04% 73 16.44%    Complex syntax 54 5.56% 73 4.11%    Working memory overload 54 5.56% 73 15.07%  Questions requiring interviewer decisions   Parentheses 54 9.26% 73 4.11%   Emphasis 54 14.81% 73 46.58%   Interviewer instructions 54 37.04% 73 34.25%   Battery items    First question in battery 4 7.41% 12,165 5.82%    Later question in battery 18 33.33% 1,751 53.74%    Not in battery 32 59.26% 16,163 40.44%  Highly practiced questions   Question content    Attitude or behavior 40 74.07% 64 87.67%    Demographic 14 25.93% 9 12.33%   Sequential question number 54 27.50 15.48 73 46.41 22.27   Interviewer within-study experience 22 12.12 7.27 31 14.45 4.50  Question control variables   Response option format    Open-ended 22 40.74% 4 5.48%    Closed-nominal 6 11.11% 30 21.10%    Closed-ordinal 18 33.33% 16 21.92%    Yes/No 8 14.81% 23 31.51%  Respondent characteristics   Female = 1 449 63.92% 438 41.78%   Age 449 61.34 16.72 438 54.95 17.43   HS degree or less = 1 449 28.51% 438 26.26%   Employed = 1 449 40.98% 438 47.03%   Cell phone n/a 438 42.69%  Interviewer characteristics   Interviewer female = 1 22 54.55% 31 58.06%   Interviewer nonwhite = 1 22 59.09% 31 0.00%   Interviewer 1+ year experience = 1 22 68.18% 31 51.61% Open in new tab Table 1. Means and Percentages for Interviewer Reading Behaviors, Question Characteristics, Respondent Characteristics, and Interviewer Characteristics WLT NOP n Percent/ mean SD n Percent mean SD Dependent variables   Exact question reading 20,927 49.86% 30,079 64.03%   Disfluencies 20,927 18.67% 30,079 17.34%   Total number of seconds 20,926 6.57 4.95 30,078 6.01 5.24   Words per second 20,926 2.76 0.97 30,708 2.22 0.99 Independent variables  Question length   Number of words 54 14.56 12.71 73 25.88 10.15   Transition statement 54 12.96% 73 4.11%  Question complexity   Question reading level 54 6.64 4.76 73 7.49 3.21   QUAID measurements    Unfamiliar technical term 54 46.30% 73 57.53%    Vague or imprecise relative term 54 79.63% 73 43.84%    Vague or ambiguous noun-phrase 54 37.04% 73 16.44%    Complex syntax 54 5.56% 73 4.11%    Working memory overload 54 5.56% 73 15.07%  Questions requiring interviewer decisions   Parentheses 54 9.26% 73 4.11%   Emphasis 54 14.81% 73 46.58%   Interviewer instructions 54 37.04% 73 34.25%   Battery items    First question in battery 4 7.41% 12,165 5.82%    Later question in battery 18 33.33% 1,751 53.74%    Not in battery 32 59.26% 16,163 40.44%  Highly practiced questions   Question content    Attitude or behavior 40 74.07% 64 87.67%    Demographic 14 25.93% 9 12.33%   Sequential question number 54 27.50 15.48 73 46.41 22.27   Interviewer within-study experience 22 12.12 7.27 31 14.45 4.50  Question control variables   Response option format    Open-ended 22 40.74% 4 5.48%    Closed-nominal 6 11.11% 30 21.10%    Closed-ordinal 18 33.33% 16 21.92%    Yes/No 8 14.81% 23 31.51%  Respondent characteristics   Female = 1 449 63.92% 438 41.78%   Age 449 61.34 16.72 438 54.95 17.43   HS degree or less = 1 449 28.51% 438 26.26%   Employed = 1 449 40.98% 438 47.03%   Cell phone n/a 438 42.69%  Interviewer characteristics   Interviewer female = 1 22 54.55% 31 58.06%   Interviewer nonwhite = 1 22 59.09% 31 0.00%   Interviewer 1+ year experience = 1 22 68.18% 31 51.61% WLT NOP n Percent/ mean SD n Percent mean SD Dependent variables   Exact question reading 20,927 49.86% 30,079 64.03%   Disfluencies 20,927 18.67% 30,079 17.34%   Total number of seconds 20,926 6.57 4.95 30,078 6.01 5.24   Words per second 20,926 2.76 0.97 30,708 2.22 0.99 Independent variables  Question length   Number of words 54 14.56 12.71 73 25.88 10.15   Transition statement 54 12.96% 73 4.11%  Question complexity   Question reading level 54 6.64 4.76 73 7.49 3.21   QUAID measurements    Unfamiliar technical term 54 46.30% 73 57.53%    Vague or imprecise relative term 54 79.63% 73 43.84%    Vague or ambiguous noun-phrase 54 37.04% 73 16.44%    Complex syntax 54 5.56% 73 4.11%    Working memory overload 54 5.56% 73 15.07%  Questions requiring interviewer decisions   Parentheses 54 9.26% 73 4.11%   Emphasis 54 14.81% 73 46.58%   Interviewer instructions 54 37.04% 73 34.25%   Battery items    First question in battery 4 7.41% 12,165 5.82%    Later question in battery 18 33.33% 1,751 53.74%    Not in battery 32 59.26% 16,163 40.44%  Highly practiced questions   Question content    Attitude or behavior 40 74.07% 64 87.67%    Demographic 14 25.93% 9 12.33%   Sequential question number 54 27.50 15.48 73 46.41 22.27   Interviewer within-study experience 22 12.12 7.27 31 14.45 4.50  Question control variables   Response option format    Open-ended 22 40.74% 4 5.48%    Closed-nominal 6 11.11% 30 21.10%    Closed-ordinal 18 33.33% 16 21.92%    Yes/No 8 14.81% 23 31.51%  Respondent characteristics   Female = 1 449 63.92% 438 41.78%   Age 449 61.34 16.72 438 54.95 17.43   HS degree or less = 1 449 28.51% 438 26.26%   Employed = 1 449 40.98% 438 47.03%   Cell phone n/a 438 42.69%  Interviewer characteristics   Interviewer female = 1 22 54.55% 31 58.06%   Interviewer nonwhite = 1 22 59.09% 31 0.00%   Interviewer 1+ year experience = 1 22 68.18% 31 51.61% Open in new tab The second dependent variable is a dichotomous measure of whether the interviewer had any disfluencies in the initial question asking, regardless of whether the question was read exactly as written. Disfluencies included any sounds such as “uh, um, oh, eh,” and stutters, repairs, and restarting statements. Interviewers had disfluencies in 18.68 percent of WLT and 17.34 percent of NOP initial question administrations. These two indicators are not mutually exclusive—questions read exactly as written can include disfluencies. However, misreadings and disfluencies are related. In WLT, 7.63 percent of exact question readings contained disfluencies compared with 29.67 percent of inexact question readings. Similarly in NOP, 9.26 percent of exact question readings contained disfluencies compared with 37.71 percent of inexact question readings. (The online supplementary material contains models predicting exact question reading excluding disfluencies.) Thus, interviewers cue respondents to their own problems with reading the question using disfluencies. 5.2 Dependent Variables—Question Administration Time The first measure of question administration time is the number of seconds the interviewer spent in the initial question asking, calculated by subtracting the synced audio file onset times from the offset times and dividing by ten (converting from deciseconds to seconds). The average number of seconds per question was 6.57 in WLT and 6.01 in NOP. The second measure of question administration time is pace (i.e., wps), measured by dividing the number of words, including disfluencies, in the initial question reading (calculated using Stata’s word count function) by the number of seconds (calculated previously) for the initial question reading. The average number of wps is 2.76 in WLT and 2.22 in NOP, both slightly faster than the recommended two wps. In each survey, there was one audio file that did not sync with Sequence Viewer, and thus timing data are missing on those files. 5.3 Independent Variables—Question Characteristics 5.3.1 Question length Two variables measure question length. The first is the number of words in the question as it appeared on the interviewer’s CATI screen, including response options for questions in which they were read to respondents. Average question length was 14.56 words in WLT, and 25.88 words in NOP. In the analyses, the number of words is grand-mean centered. The second measure is an indicator variable for the presence of a transition statement in the survey question, coded as 1 = transition statement present (WLT: 12.96 percent of questions; NOP: 4.11 percent of questions). 5.3.2 Question complexity We include six measures of question complexity. The first is the question’s Flesch-Kincaid reading grade level, obtained using Microsoft Word; low grade levels indicate that the question is easier to read. The average Flesch-Kincaid reading grade level is 6.64 for WLT and 7.49 for NOP, indicating average reading levels between sixth and eighth grade. Reading grade level is grand-mean centered. The next five measures of question complexity come from the online QUAID tool (Graesser et al. 2006), providing evaluations of the linguistic properties of questions and response options. We create five indicator variables: whether the question or response options contain an unfamiliar technical term (WLT: 46.30 percent; NOP: 57.53 percent), a vague or imprecise relative term, including vague quantifiers (WLT: 79.63 percent; NOP: 43.84 percent), a vague or ambiguous noun-phrase (WLT: 37.04 percent; NOP: 16.44 percent), complex syntax (WLT: 5.56 percent; NOP: 4.11 percent), and working memory overload (WLT: 5.56 percent; NOP: 15.07 percent). Models containing a count of the number of QUAID-identified problems, rather than indicators for each problem, are available in the online supplementary material. 5.3.3 Question features requiring interviewer decisions There are four measures of question characteristics that require interviewer decision-making. The first two are dichotomous indicators for whether the question includes parentheticals (WLT: 9.26 percent; NOP: 4.11 percent) or emphasis in the question stem (WLT: 14.81 percent; NOP: 46.58 percent). In WLT, emphasis was used on words in the question stem, often related to time domains (e.g., “usually,” “in the last week”), but in NOP, it was often placed on the word “or” in a series of interviewer-read response options (e.g., “very much,” “some,” “not very much,” OR “not at all”). The third measure is an indicator of whether any interviewer instructions (often indicating what to read or not to read) occurred (WLT: 37.04 percent; NOP: 34.25 percent). The fourth is a set of indicators that were operationalized differently across the studies for battery items. In WLT, items within batteries were administered in a fixed order, and as such, three indicator variables capture whether each item was the first in the battery (7.41 percent), a later question in a battery (33.33 percent), or not in a battery (59.26 percent). In NOP, the order of items within a battery was randomized. Because each battery item appeared as both the first and later questions in a battery across interviews, we code each question administration as the first question administered in the battery (5.82 percent), a later question administered in a battery (53.74 percent), or a question that was not part of a battery (40.44 percent). Thus, the interpretation of the first/later battery items in WLT is confounded with the question content, but the interpretation in NOP is not. 5.3.4 Highly practiced questions Three variables capture interviewer practice on questions. The first is a question-level indicator for whether the question is demographic (which is common in multiple surveys) (WLT: 25.93 percent; NOP: 12.33 percent) versus attitudinal or behavioral (WLT: 74.07 percent; NOP: 87.67 percent). Although virtually all of the NOP items are attitudinal (with the only behavioral items being related to type of telephone ownership), the WLT is roughly evenly split between attitudinal and behavioral items. The second practice variable is the sequential question number for each item administered in the questionnaire (up to fifty-four in WLT and seventy-three in NOP). Finally, we include a respondent-level measure of each interviewer’s within-study experience by assigning a value of one to the first respondent that the interviewer interviewed, two to the second respondent, and so on, reflecting the practice the interviewers get on these items over the field period. Appendix tables 1 and 2 contain correlation matrices for the question characteristics in each survey. Table 2. Variance Components and Proportion of Variance (Variance Partition Coefficients) due to Interviewers, Questions, and Respondents WLT NOP Variance Variance partition coefficient Variance Variance partition coefficient Exact question reading  Interviewer 0.697 0.12 0.321 0.07  Question 1.354 0.24 0.922 0.19  Respondent 0.250 0.04 0.280 0.06 Likelihood-ratio test 6127.34**** 6346.32**** Disfluencies  Interviewer 1.094 0.22 0.976 0.20  Question 0.442 0.09 0.436 0.09  Respondent 0.189 0.04 0.247 0.05 Likelihood-ratio test 3700.69**** 4325.40**** Total seconds  Interviewer 1.292 0.05 0.419 0.02  Question 16.186 0.67 20.811 0.75  Respondent 0.400 0.02 0.081 0.003  Residual 6.400 0.26 6.294 0.23 Likelihood-ratio test 27065.24**** 43374.05**** Words per second  Interviewer 0.177 0.19 0.078 0.08  Question 0.251 0.28 0.548 0.55  Respondent 0.041 0.04 0.016 0.02  Residual 0.442 0.49 0.359 0.36 Likelihood-ratio test 14965.5**** 29069.51**** WLT NOP Variance Variance partition coefficient Variance Variance partition coefficient Exact question reading  Interviewer 0.697 0.12 0.321 0.07  Question 1.354 0.24 0.922 0.19  Respondent 0.250 0.04 0.280 0.06 Likelihood-ratio test 6127.34**** 6346.32**** Disfluencies  Interviewer 1.094 0.22 0.976 0.20  Question 0.442 0.09 0.436 0.09  Respondent 0.189 0.04 0.247 0.05 Likelihood-ratio test 3700.69**** 4325.40**** Total seconds  Interviewer 1.292 0.05 0.419 0.02  Question 16.186 0.67 20.811 0.75  Respondent 0.400 0.02 0.081 0.003  Residual 6.400 0.26 6.294 0.23 Likelihood-ratio test 27065.24**** 43374.05**** Words per second  Interviewer 0.177 0.19 0.078 0.08  Question 0.251 0.28 0.548 0.55  Respondent 0.041 0.04 0.016 0.02  Residual 0.442 0.49 0.359 0.36 Likelihood-ratio test 14965.5**** 29069.51**** Note.— * p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001. Open in new tab Table 2. Variance Components and Proportion of Variance (Variance Partition Coefficients) due to Interviewers, Questions, and Respondents WLT NOP Variance Variance partition coefficient Variance Variance partition coefficient Exact question reading  Interviewer 0.697 0.12 0.321 0.07  Question 1.354 0.24 0.922 0.19  Respondent 0.250 0.04 0.280 0.06 Likelihood-ratio test 6127.34**** 6346.32**** Disfluencies  Interviewer 1.094 0.22 0.976 0.20  Question 0.442 0.09 0.436 0.09  Respondent 0.189 0.04 0.247 0.05 Likelihood-ratio test 3700.69**** 4325.40**** Total seconds  Interviewer 1.292 0.05 0.419 0.02  Question 16.186 0.67 20.811 0.75  Respondent 0.400 0.02 0.081 0.003  Residual 6.400 0.26 6.294 0.23 Likelihood-ratio test 27065.24**** 43374.05**** Words per second  Interviewer 0.177 0.19 0.078 0.08  Question 0.251 0.28 0.548 0.55  Respondent 0.041 0.04 0.016 0.02  Residual 0.442 0.49 0.359 0.36 Likelihood-ratio test 14965.5**** 29069.51**** WLT NOP Variance Variance partition coefficient Variance Variance partition coefficient Exact question reading  Interviewer 0.697 0.12 0.321 0.07  Question 1.354 0.24 0.922 0.19  Respondent 0.250 0.04 0.280 0.06 Likelihood-ratio test 6127.34**** 6346.32**** Disfluencies  Interviewer 1.094 0.22 0.976 0.20  Question 0.442 0.09 0.436 0.09  Respondent 0.189 0.04 0.247 0.05 Likelihood-ratio test 3700.69**** 4325.40**** Total seconds  Interviewer 1.292 0.05 0.419 0.02  Question 16.186 0.67 20.811 0.75  Respondent 0.400 0.02 0.081 0.003  Residual 6.400 0.26 6.294 0.23 Likelihood-ratio test 27065.24**** 43374.05**** Words per second  Interviewer 0.177 0.19 0.078 0.08  Question 0.251 0.28 0.548 0.55  Respondent 0.041 0.04 0.016 0.02  Residual 0.442 0.49 0.359 0.36 Likelihood-ratio test 14965.5**** 29069.51**** Note.— * p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001. Open in new tab 5.4 Control Variables We control for type of response option through indicator variables for open-ended (WLT: 40.74 percent; NOP: 5.48 percent), closed-nominal (WLT: 11.11 percent; NOP: 21.10 percent), closed-ordinal (WLT: 33.33 percent; NOP: 21.92 percent), and yes/no formats (WLT: 14.81 percent; NOP: 31.51 percent) because these features strongly influence interviewer tasks (e.g., Mathiowetz and Cannell 1980; Sykes and Collins 1992; Childs and Landreth 2006; Dykema, et al. in press; Holbrook, et al. 2015). However, we do not have a clear theoretical basis for an association between question format and misreadings, in part because questions that share a format can vary widely in other relevant features. The type of response options also co-occurs with many of the question characteristics of interest, especially in NOP. As this may lead to potential multicollinearity estimation issues, models excluding this control variable for both surveys are available in the online supplementary material. Because interviewers may adapt question reading to respondents who are particularly likely to have problems (Dykema et al. 2016), we control for common measures of respondent cognitive ability, age, and education (e.g., Belli, Weiss, and Lepkowski 1999; Cannell, Fowler and Marquis 1968). Age in years is included as a grand-mean-centered continuous variable (mean WLT: 61.34; NOP: 54.95). Respondent education is operationalized as an indicator of a high school degree or less (WLT: 28.51 percent; NOP: 26.26 percent) versus some college and more. In addition, we control for whether the respondent is female (WLT: 63.92 percent; NOP: 41.78 percent) because conversational norms differ for males and females, although we do not anticipate this translating into different question reading (Dykema et al. 2016). We also control for whether the respondent is employed (a skip pattern trigger) (WLT: 40.98 percent; NOP: 47.03 percent) and, in NOP, whether the interview was conducted on a cell phone (42.69 percent) (Timbrook, Olson, and Smyth 2018). Finally, interviewer overall experience, gender, and/or race may also affect pace of speech and disfluencies (Charoenruk and Olson 2018), although we have no expectations about how question reading behaviors will differ by these characteristics. Thus, we control for whether interviewers are female in both studies (WLT: 54.55 percent; NOP: 58.06 percent) and non-white in WLT (59.09 percent); all interviewers in NOP are white. Interviewer job experience is measured by an indicator variable for one or more years of experience (WLT: 68.18 percent; NOP: 51.61 percent). 5.5 Analysis Reading behaviors occur at the question level. Each question-level outcome is cross-classified within questions and respondents; questions and respondents are nested within interviewers. Thus, we use cross-classified random effects logistic regression models to predict exact reading and any disfluencies with question, respondent, and interviewer characteristics (Raudenbush and Bryk 2002; Beretvas 2011). In particular, we predict the logit of the probability of exact question reading and any disfluencies for each question, where Yi(j1,j2)k=1 indicates that the question was read exactly as worded or that there was a disfluent reading. The base model with no covariates contains the overall mean (β0), a random effect for the respondent (uj1) ⁠, the question (uj2) ⁠, and the interviewer (υk) ⁠. All random effects are assumed to be normally distributed with mean zero and variance for the respondents of τuj1 ⁠, for the questions of τuj2 ⁠, and for the interviewers of τuk (Beretvas 2011). We first calculate the proportion of the variance in each outcome associated with questions, respondents, and interviewers (i.e., the base model). We then add covariates related to question characteristics, respondent characteristics, and interviewer characteristics: logit{Pr(Yi(j1,j2)k=1)}=β0+∑q=1QβqQuestionCharj2k+∑r=1RβrRespondentCharj1k+∑w=1WβwInterviewerChark+υk+uj1+uj2 We report both odds ratios and average marginal effects (AMEs) for the predictor variables. Average marginal effects for categorical variables represent the difference between the average probability of an event occurring, assuming that all individuals in the dataset have a value of the focal category and the average assuming that all individuals have a value of the reference category, holding all other variables at their observed values (Williams 2012). These models were estimated using Stata 15.1’s meqrlogit command with random intercepts for questions, respondents, and interviewers and a QR decomposition for the variance components (Rabe-Hesketh and Skrondal 2012). To test the association between question reading and the dependent variables of administration time (seconds) and pace (words per second), we use a cross-classified multilevel linear model. We use the same basic structure as the question reading models but include the two reading behaviors as our independent variables and an additional residual term (ei(j1,j2)k) ⁠: Yi(j1,j2)k=β0+β1ExactQni(j1,j2)k+β2Disfluenciesi(j1,j2)k+∑q=1QβqQuestionCharj2k+∑r=1RβrRespondentCharj1k+∑w=1WβwInterviewerChark+υk+uj1+uj2+ei(j1,j2)k The administration time models were estimated using Stata 15.1’s mixed command and restricted maximum likelihood. Because the question, respondent, and interviewer characteristics are of secondary importance to the administration time models, full models are shown in the online supplementary material. 6. FINDINGS 6.1 Variance Components for Question Reading Behaviors Table 2 contains the variance components and proportion of variance for all four outcomes for both surveys, which are remarkably consistent across the two surveys. For both surveys, more of the variation in the probability of reading questions exactly as worded is at the question level. In WLT, about 24 percent of the variance in exact question reading is at the question level compared with 12 percent at the interviewer level and only 4 percent at the respondent level. In NOP, 19 percent of the variance in exact question reading is at the question level, compared with 6–7 percent for interviewers and respondents. In contrast, for disfluent reading, around 20 percent of the variation is at the interviewer level for both surveys compared with about 9 percent at the question level and only 4–5 percent at the respondent level. The contributions of questions to total variance are even larger for the total time of administration (WLT: 67 percent; NOP: 75 percent) and words per second (WLT: 28 percent; NOP: 55 percent). In both studies, between 2 percent and 5 percent of the variation in total time and 8 percent and 19 percent of the variation in pace is at the interviewer level, and between 0 percent and 4 percent of the variation in timing is at the respondent level. 6.2 Question Length Table 3 contains odds ratios and 95 percent confidence intervals for both surveys; the online supplementary material contains the average marginal effects and standard errors. Table 3. Odds Ratios and 95 Percent Confidence Intervals Predicting Exact Question Reading and Any Disfluencies, Initial Question Reading, WLT and NOP Survey Exact question reading Any disfluencies WLT NOP WLT NOP Odds Ratio 95% CI Odds Ratio 95% CI Odds Ratio 95% CI Odds Ratio 95% CI Question length  Number of words 0.971** (0.950, 0.991) 0.916**** (0.875, 0.959) 1.011 (0.995, 1.027) 1.027** (1.012, 1.043)  Transition statement 0.444* (0.218, 0.904) 0.089*** (0.017, 0.465) 2.650*** (1.564, 4.491) 3.142**** (1.790, 5.516) Question complexity  Question reading level 0.949* (0.910, 0.989) 0.881* (0.787, 0.987) 1.020 (0.989, 1.053) 1.038+ (0.999, 1.078)  Unfamiliar Technical Term 0.745 (0.481, 1.153) 0.545* (0.322, 0.923) 1.081 (0.780, 1.500) 1.234* (1.030, 1.477)  Vague or Imprecise Relative Term 1.807* (1.067, 3.059) 1.093 (0.526, 2.275) 0.672* (0.452, 0.998) 0.910 (0.712, 1.164)  Vague or Ambiguous Noun-phrase 1.078 (0.737, 1.576) 0.816 (0.393, 1.694) 1.148 (0.865, 1.522) 0.727* (0.563, 0.937)  Complex Syntax 0.615 (0.239, 1.585) 4.805* (1.271, 18.171) 1.396 (0.656, 2.969) 0.675 (0.435, 1.047)  Working Memory Overload 2.100 (0.882, 5.000) 1.194 (0.545, 2.615) 0.680 (0.355, 1.301) 1.096 (0.845, 1.421) Question features requiring interviewer decisions  Parentheses 0.112**** (0.054, 0.234) 0.087**** (0.027, 0.285) 2.002* (1.173, 3.416) 1.860** (1.222, 2.831)  Interviewer instructions 0.870 (0.470, 1.610) 0.053**** (0.013, 0.222) 1.515 (0.950, 2.415) 2.464* (1.516, 4.005)  Emphasis 1.030 (0.541, 1.963) 7.959**** (3.596, 17.614) 1.154 (0.712, 1.871) 0.730* (0.538, 0.992)  First question in battery 0.765 (0.358, 1.634) 5.893** (1.698, 20.454) 0.995 (0.567, 1.747) 2.001** (1.230, 3.273)  Later question in battery 0.445* (0.231, 0.858) 52.336**** (15.519, 176.499) 1.227 (0.756, 1.990) 0.316*** (0.199, 0.501) Highly practiced questions  Demographic 0.494* (0.269, 0.906) 1.138 (0.365, 3.545) 1.293 (0.817, 2.047) 1.253 (0.856, 1.835)  Sequential question number 0.995 (0.980, 1.010) 1.088*** (1.073, 1.102) 0.989 (0.976, 1.002) 1.000 (0.993, 1.007)  Interviewer Within-study experience 1.004 (0.996, 1.012) 1.028*** (1.015, 1.041) 0.980**** (0.972, 0.988) 0.959*** (0.947, 0.971)  Interviewer 1+ Year experience = 1 0.597 (0.270, 1.322) 0.775 (0.494, 1.217) 2.806* (1.179, 6.681) 1.131 (0.563, 2.272) Question controls  Open-ended 0.296** (0.125, 0.704) 1.972 (0.484, 8.028) 1.279 (0.672, 2.434) 0.967 (0.602, 1.554)  Closed-nominal — — — —  Closed-ordinal 0.337* (0.136, 0.834) 5.680** (1.732, 18.325) 1.785 (0.909, 3.506) 1.490+ (0.985, 2.254)  Yes/No 0.767 (0.347, 1.697) 0.222* (0.049, 0.998) 1.174 (0.647, 2.129) 1.842* (1.102, 3.080) Respondent characteristics  Female = 1 0.939 (0.836, 1.054) 1.013 (0.888, 1.155) 0.865* (0.768, 0.974) 0.978 (0.862, 1.110)  Age 0.993*** (0.989, 0.997) 0.998 (0.994, 1.002) 1.005** (1.001, 1.009) 0.998 (0.994, 1.002)  HS degree or less = 1 0.860* (0.756, 0.979) 0.915 (0.791, 1.058) 0.989 (0.864, 1.132) 0.963 (0.837, 1.108)  Employed = 1 1.173 (0.995, 1.382) 1.065 (0.923, 1.228) 0.968 (0.825, 1.136) 1.042 (0.908, 1.195)  Cell phone = 1 n/a 1.020 (0.860, 1.209) n/a 0.956 (0.811, 1.126) Interviewer characteristics  Interviewer Female = 1 0.928 (0.462, 1.866) 1.029 (0.653, 1.623) 0.585 (0.274, 1.252) 0.487* (0.241, 0.986)  Interviewer Nonwhite = 1 0.895 (0.428, 1.875) n/a 1.532 (0.684, 3.428) n/a  Intercept 6.686** (1.684, 26.537) 0.033*** (0.004, 0.248) 0.079**** (0.022, 0.286) 0.196*** (0.076, 0.503)  Observations 20927 30079 20927 30079  Model Fit:  Variance Interviewer 0.628 0.352 0.749 0.895  Variance Question 0.313 0.841 0.157 0.073  Variance Respondent 0.219 0.318 0.155 0.221  Likelihood ratio test for variance components 2881.96**** 3122.48**** 2226.40**** 2868.23****  Log-likelihood −11381.15 −15375.888 −8173.89 −11245.03  Wald chi-square 204.78**** 1773.10**** 136.52**** 1101.04****  AIC 22822.31 30811.78 16407.78 22550.06 Exact question reading Any disfluencies WLT NOP WLT NOP Odds Ratio 95% CI Odds Ratio 95% CI Odds Ratio 95% CI Odds Ratio 95% CI Question length  Number of words 0.971** (0.950, 0.991) 0.916**** (0.875, 0.959) 1.011 (0.995, 1.027) 1.027** (1.012, 1.043)  Transition statement 0.444* (0.218, 0.904) 0.089*** (0.017, 0.465) 2.650*** (1.564, 4.491) 3.142**** (1.790, 5.516) Question complexity  Question reading level 0.949* (0.910, 0.989) 0.881* (0.787, 0.987) 1.020 (0.989, 1.053) 1.038+ (0.999, 1.078)  Unfamiliar Technical Term 0.745 (0.481, 1.153) 0.545* (0.322, 0.923) 1.081 (0.780, 1.500) 1.234* (1.030, 1.477)  Vague or Imprecise Relative Term 1.807* (1.067, 3.059) 1.093 (0.526, 2.275) 0.672* (0.452, 0.998) 0.910 (0.712, 1.164)  Vague or Ambiguous Noun-phrase 1.078 (0.737, 1.576) 0.816 (0.393, 1.694) 1.148 (0.865, 1.522) 0.727* (0.563, 0.937)  Complex Syntax 0.615 (0.239, 1.585) 4.805* (1.271, 18.171) 1.396 (0.656, 2.969) 0.675 (0.435, 1.047)  Working Memory Overload 2.100 (0.882, 5.000) 1.194 (0.545, 2.615) 0.680 (0.355, 1.301) 1.096 (0.845, 1.421) Question features requiring interviewer decisions  Parentheses 0.112**** (0.054, 0.234) 0.087**** (0.027, 0.285) 2.002* (1.173, 3.416) 1.860** (1.222, 2.831)  Interviewer instructions 0.870 (0.470, 1.610) 0.053**** (0.013, 0.222) 1.515 (0.950, 2.415) 2.464* (1.516, 4.005)  Emphasis 1.030 (0.541, 1.963) 7.959**** (3.596, 17.614) 1.154 (0.712, 1.871) 0.730* (0.538, 0.992)  First question in battery 0.765 (0.358, 1.634) 5.893** (1.698, 20.454) 0.995 (0.567, 1.747) 2.001** (1.230, 3.273)  Later question in battery 0.445* (0.231, 0.858) 52.336**** (15.519, 176.499) 1.227 (0.756, 1.990) 0.316*** (0.199, 0.501) Highly practiced questions  Demographic 0.494* (0.269, 0.906) 1.138 (0.365, 3.545) 1.293 (0.817, 2.047) 1.253 (0.856, 1.835)  Sequential question number 0.995 (0.980, 1.010) 1.088*** (1.073, 1.102) 0.989 (0.976, 1.002) 1.000 (0.993, 1.007)  Interviewer Within-study experience 1.004 (0.996, 1.012) 1.028*** (1.015, 1.041) 0.980**** (0.972, 0.988) 0.959*** (0.947, 0.971)  Interviewer 1+ Year experience = 1 0.597 (0.270, 1.322) 0.775 (0.494, 1.217) 2.806* (1.179, 6.681) 1.131 (0.563, 2.272) Question controls  Open-ended 0.296** (0.125, 0.704) 1.972 (0.484, 8.028) 1.279 (0.672, 2.434) 0.967 (0.602, 1.554)  Closed-nominal — — — —  Closed-ordinal 0.337* (0.136, 0.834) 5.680** (1.732, 18.325) 1.785 (0.909, 3.506) 1.490+ (0.985, 2.254)  Yes/No 0.767 (0.347, 1.697) 0.222* (0.049, 0.998) 1.174 (0.647, 2.129) 1.842* (1.102, 3.080) Respondent characteristics  Female = 1 0.939 (0.836, 1.054) 1.013 (0.888, 1.155) 0.865* (0.768, 0.974) 0.978 (0.862, 1.110)  Age 0.993*** (0.989, 0.997) 0.998 (0.994, 1.002) 1.005** (1.001, 1.009) 0.998 (0.994, 1.002)  HS degree or less = 1 0.860* (0.756, 0.979) 0.915 (0.791, 1.058) 0.989 (0.864, 1.132) 0.963 (0.837, 1.108)  Employed = 1 1.173 (0.995, 1.382) 1.065 (0.923, 1.228) 0.968 (0.825, 1.136) 1.042 (0.908, 1.195)  Cell phone = 1 n/a 1.020 (0.860, 1.209) n/a 0.956 (0.811, 1.126) Interviewer characteristics  Interviewer Female = 1 0.928 (0.462, 1.866) 1.029 (0.653, 1.623) 0.585 (0.274, 1.252) 0.487* (0.241, 0.986)  Interviewer Nonwhite = 1 0.895 (0.428, 1.875) n/a 1.532 (0.684, 3.428) n/a  Intercept 6.686** (1.684, 26.537) 0.033*** (0.004, 0.248) 0.079**** (0.022, 0.286) 0.196*** (0.076, 0.503)  Observations 20927 30079 20927 30079  Model Fit:  Variance Interviewer 0.628 0.352 0.749 0.895  Variance Question 0.313 0.841 0.157 0.073  Variance Respondent 0.219 0.318 0.155 0.221  Likelihood ratio test for variance components 2881.96**** 3122.48**** 2226.40**** 2868.23****  Log-likelihood −11381.15 −15375.888 −8173.89 −11245.03  Wald chi-square 204.78**** 1773.10**** 136.52**** 1101.04****  AIC 22822.31 30811.78 16407.78 22550.06 Note.— * p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001. Open in new tab Table 3. Odds Ratios and 95 Percent Confidence Intervals Predicting Exact Question Reading and Any Disfluencies, Initial Question Reading, WLT and NOP Survey Exact question reading Any disfluencies WLT NOP WLT NOP Odds Ratio 95% CI Odds Ratio 95% CI Odds Ratio 95% CI Odds Ratio 95% CI Question length  Number of words 0.971** (0.950, 0.991) 0.916**** (0.875, 0.959) 1.011 (0.995, 1.027) 1.027** (1.012, 1.043)  Transition statement 0.444* (0.218, 0.904) 0.089*** (0.017, 0.465) 2.650*** (1.564, 4.491) 3.142**** (1.790, 5.516) Question complexity  Question reading level 0.949* (0.910, 0.989) 0.881* (0.787, 0.987) 1.020 (0.989, 1.053) 1.038+ (0.999, 1.078)  Unfamiliar Technical Term 0.745 (0.481, 1.153) 0.545* (0.322, 0.923) 1.081 (0.780, 1.500) 1.234* (1.030, 1.477)  Vague or Imprecise Relative Term 1.807* (1.067, 3.059) 1.093 (0.526, 2.275) 0.672* (0.452, 0.998) 0.910 (0.712, 1.164)  Vague or Ambiguous Noun-phrase 1.078 (0.737, 1.576) 0.816 (0.393, 1.694) 1.148 (0.865, 1.522) 0.727* (0.563, 0.937)  Complex Syntax 0.615 (0.239, 1.585) 4.805* (1.271, 18.171) 1.396 (0.656, 2.969) 0.675 (0.435, 1.047)  Working Memory Overload 2.100 (0.882, 5.000) 1.194 (0.545, 2.615) 0.680 (0.355, 1.301) 1.096 (0.845, 1.421) Question features requiring interviewer decisions  Parentheses 0.112**** (0.054, 0.234) 0.087**** (0.027, 0.285) 2.002* (1.173, 3.416) 1.860** (1.222, 2.831)  Interviewer instructions 0.870 (0.470, 1.610) 0.053**** (0.013, 0.222) 1.515 (0.950, 2.415) 2.464* (1.516, 4.005)  Emphasis 1.030 (0.541, 1.963) 7.959**** (3.596, 17.614) 1.154 (0.712, 1.871) 0.730* (0.538, 0.992)  First question in battery 0.765 (0.358, 1.634) 5.893** (1.698, 20.454) 0.995 (0.567, 1.747) 2.001** (1.230, 3.273)  Later question in battery 0.445* (0.231, 0.858) 52.336**** (15.519, 176.499) 1.227 (0.756, 1.990) 0.316*** (0.199, 0.501) Highly practiced questions  Demographic 0.494* (0.269, 0.906) 1.138 (0.365, 3.545) 1.293 (0.817, 2.047) 1.253 (0.856, 1.835)  Sequential question number 0.995 (0.980, 1.010) 1.088*** (1.073, 1.102) 0.989 (0.976, 1.002) 1.000 (0.993, 1.007)  Interviewer Within-study experience 1.004 (0.996, 1.012) 1.028*** (1.015, 1.041) 0.980**** (0.972, 0.988) 0.959*** (0.947, 0.971)  Interviewer 1+ Year experience = 1 0.597 (0.270, 1.322) 0.775 (0.494, 1.217) 2.806* (1.179, 6.681) 1.131 (0.563, 2.272) Question controls  Open-ended 0.296** (0.125, 0.704) 1.972 (0.484, 8.028) 1.279 (0.672, 2.434) 0.967 (0.602, 1.554)  Closed-nominal — — — —  Closed-ordinal 0.337* (0.136, 0.834) 5.680** (1.732, 18.325) 1.785 (0.909, 3.506) 1.490+ (0.985, 2.254)  Yes/No 0.767 (0.347, 1.697) 0.222* (0.049, 0.998) 1.174 (0.647, 2.129) 1.842* (1.102, 3.080) Respondent characteristics  Female = 1 0.939 (0.836, 1.054) 1.013 (0.888, 1.155) 0.865* (0.768, 0.974) 0.978 (0.862, 1.110)  Age 0.993*** (0.989, 0.997) 0.998 (0.994, 1.002) 1.005** (1.001, 1.009) 0.998 (0.994, 1.002)  HS degree or less = 1 0.860* (0.756, 0.979) 0.915 (0.791, 1.058) 0.989 (0.864, 1.132) 0.963 (0.837, 1.108)  Employed = 1 1.173 (0.995, 1.382) 1.065 (0.923, 1.228) 0.968 (0.825, 1.136) 1.042 (0.908, 1.195)  Cell phone = 1 n/a 1.020 (0.860, 1.209) n/a 0.956 (0.811, 1.126) Interviewer characteristics  Interviewer Female = 1 0.928 (0.462, 1.866) 1.029 (0.653, 1.623) 0.585 (0.274, 1.252) 0.487* (0.241, 0.986)  Interviewer Nonwhite = 1 0.895 (0.428, 1.875) n/a 1.532 (0.684, 3.428) n/a  Intercept 6.686** (1.684, 26.537) 0.033*** (0.004, 0.248) 0.079**** (0.022, 0.286) 0.196*** (0.076, 0.503)  Observations 20927 30079 20927 30079  Model Fit:  Variance Interviewer 0.628 0.352 0.749 0.895  Variance Question 0.313 0.841 0.157 0.073  Variance Respondent 0.219 0.318 0.155 0.221  Likelihood ratio test for variance components 2881.96**** 3122.48**** 2226.40**** 2868.23****  Log-likelihood −11381.15 −15375.888 −8173.89 −11245.03  Wald chi-square 204.78**** 1773.10**** 136.52**** 1101.04****  AIC 22822.31 30811.78 16407.78 22550.06 Exact question reading Any disfluencies WLT NOP WLT NOP Odds Ratio 95% CI Odds Ratio 95% CI Odds Ratio 95% CI Odds Ratio 95% CI Question length  Number of words 0.971** (0.950, 0.991) 0.916**** (0.875, 0.959) 1.011 (0.995, 1.027) 1.027** (1.012, 1.043)  Transition statement 0.444* (0.218, 0.904) 0.089*** (0.017, 0.465) 2.650*** (1.564, 4.491) 3.142**** (1.790, 5.516) Question complexity  Question reading level 0.949* (0.910, 0.989) 0.881* (0.787, 0.987) 1.020 (0.989, 1.053) 1.038+ (0.999, 1.078)  Unfamiliar Technical Term 0.745 (0.481, 1.153) 0.545* (0.322, 0.923) 1.081 (0.780, 1.500) 1.234* (1.030, 1.477)  Vague or Imprecise Relative Term 1.807* (1.067, 3.059) 1.093 (0.526, 2.275) 0.672* (0.452, 0.998) 0.910 (0.712, 1.164)  Vague or Ambiguous Noun-phrase 1.078 (0.737, 1.576) 0.816 (0.393, 1.694) 1.148 (0.865, 1.522) 0.727* (0.563, 0.937)  Complex Syntax 0.615 (0.239, 1.585) 4.805* (1.271, 18.171) 1.396 (0.656, 2.969) 0.675 (0.435, 1.047)  Working Memory Overload 2.100 (0.882, 5.000) 1.194 (0.545, 2.615) 0.680 (0.355, 1.301) 1.096 (0.845, 1.421) Question features requiring interviewer decisions  Parentheses 0.112**** (0.054, 0.234) 0.087**** (0.027, 0.285) 2.002* (1.173, 3.416) 1.860** (1.222, 2.831)  Interviewer instructions 0.870 (0.470, 1.610) 0.053**** (0.013, 0.222) 1.515 (0.950, 2.415) 2.464* (1.516, 4.005)  Emphasis 1.030 (0.541, 1.963) 7.959**** (3.596, 17.614) 1.154 (0.712, 1.871) 0.730* (0.538, 0.992)  First question in battery 0.765 (0.358, 1.634) 5.893** (1.698, 20.454) 0.995 (0.567, 1.747) 2.001** (1.230, 3.273)  Later question in battery 0.445* (0.231, 0.858) 52.336**** (15.519, 176.499) 1.227 (0.756, 1.990) 0.316*** (0.199, 0.501) Highly practiced questions  Demographic 0.494* (0.269, 0.906) 1.138 (0.365, 3.545) 1.293 (0.817, 2.047) 1.253 (0.856, 1.835)  Sequential question number 0.995 (0.980, 1.010) 1.088*** (1.073, 1.102) 0.989 (0.976, 1.002) 1.000 (0.993, 1.007)  Interviewer Within-study experience 1.004 (0.996, 1.012) 1.028*** (1.015, 1.041) 0.980**** (0.972, 0.988) 0.959*** (0.947, 0.971)  Interviewer 1+ Year experience = 1 0.597 (0.270, 1.322) 0.775 (0.494, 1.217) 2.806* (1.179, 6.681) 1.131 (0.563, 2.272) Question controls  Open-ended 0.296** (0.125, 0.704) 1.972 (0.484, 8.028) 1.279 (0.672, 2.434) 0.967 (0.602, 1.554)  Closed-nominal — — — —  Closed-ordinal 0.337* (0.136, 0.834) 5.680** (1.732, 18.325) 1.785 (0.909, 3.506) 1.490+ (0.985, 2.254)  Yes/No 0.767 (0.347, 1.697) 0.222* (0.049, 0.998) 1.174 (0.647, 2.129) 1.842* (1.102, 3.080) Respondent characteristics  Female = 1 0.939 (0.836, 1.054) 1.013 (0.888, 1.155) 0.865* (0.768, 0.974) 0.978 (0.862, 1.110)  Age 0.993*** (0.989, 0.997) 0.998 (0.994, 1.002) 1.005** (1.001, 1.009) 0.998 (0.994, 1.002)  HS degree or less = 1 0.860* (0.756, 0.979) 0.915 (0.791, 1.058) 0.989 (0.864, 1.132) 0.963 (0.837, 1.108)  Employed = 1 1.173 (0.995, 1.382) 1.065 (0.923, 1.228) 0.968 (0.825, 1.136) 1.042 (0.908, 1.195)  Cell phone = 1 n/a 1.020 (0.860, 1.209) n/a 0.956 (0.811, 1.126) Interviewer characteristics  Interviewer Female = 1 0.928 (0.462, 1.866) 1.029 (0.653, 1.623) 0.585 (0.274, 1.252) 0.487* (0.241, 0.986)  Interviewer Nonwhite = 1 0.895 (0.428, 1.875) n/a 1.532 (0.684, 3.428) n/a  Intercept 6.686** (1.684, 26.537) 0.033*** (0.004, 0.248) 0.079**** (0.022, 0.286) 0.196*** (0.076, 0.503)  Observations 20927 30079 20927 30079  Model Fit:  Variance Interviewer 0.628 0.352 0.749 0.895  Variance Question 0.313 0.841 0.157 0.073  Variance Respondent 0.219 0.318 0.155 0.221  Likelihood ratio test for variance components 2881.96**** 3122.48**** 2226.40**** 2868.23****  Log-likelihood −11381.15 −15375.888 −8173.89 −11245.03  Wald chi-square 204.78**** 1773.10**** 136.52**** 1101.04****  AIC 22822.31 30811.78 16407.78 22550.06 Note.— * p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001. Open in new tab We start with the indicators for question length. Across both reading behaviors and both surveys, longer questions are less likely to be read exactly as written (WLT: OR = 0.971, AME = −0.006; NOP: OR = 0.916, AME = −0.015). There is no significant association between number of words and disfluencies in WLT, although in NOP, each additional word in the survey question increases the probability of reading the question disfluently (OR = 1.027, AME = 0.003). Transition statements further exacerbate the problem: questions with a transition statement are significantly less likely to be read exactly as worded (WLT: OR = 0.444, AME = −0.166; NOP: OR = 0.089, AME = −0.404) and more likely to contain disfluencies (WLT: OR = 2.650, AME = 0.122; NOP: OR = 3.142, AME = 0.125). Thus, as hypothesized, question length is negatively associated with reading the question exactly as written and positively associated with disfluent readings. 6.3 Question Complexity There are mixed associations across the six indicators of question complexity. As hypothesized, in both surveys, questions with higher Flesch reading grade levels are significantly less likely to be read exactly as worded (WLT: OR = 0.949, AME = −0.011; NOP: OR = 0.881, AME = −0.021). However, there is no statistically significant association between Flesch reading grade levels and disfluencies. The measures of question complexity from QUAID are not consistently associated with either reading outcome across the two studies and, when statistically different from zero, are often opposite the hypothesized direction. Thus, the question complexity findings are mixed: a high reading level generally poses problems for interviewers reading questions exactly as worded, but the other question complexity features do not necessarily have the same effect. 6.4 Question Features Requiring Interviewer Decisions The next set of question characteristics are those requiring interviewers to make administration decisions. As hypothesized, questions with parenthetical insertions are less likely to be read exactly as worded (WLT: OR = 0.112, AME = −0.447; NOP: OR = 0.087, AME = −0.408) and more likely to be read with disfluencies (WLT: OR = 2.001, AME = 0.087; NOP: OR = 1.860, AME = 0.068) than those without, even though interviewers were trained to read the information in parentheses. We see less consistency in direction and significance across the two surveys for the other interviewer-decision question characteristics. In NOP but not WLT, questions with interviewer instructions are less likely to be read exactly as worded (OR = 0.053, AME = −0.489) and more likely to be read with disfluencies (OR = 2.464, AME = 0.098) than questions without instructions. Likewise, in NOP, emphasis increases the probability of being read exactly as worded (OR = 7.959, AME = 0.346) and decreases the probability of being read with disfluencies (OR = 0.730, AME = −0.034), but there is no significant association between emphasis and either outcome in WLT. In WLT, there is no difference in exact readings or disfluencies between items that are the first item in a battery and those that are not in a battery. However, in NOP, first items in a battery are more likely to be read exactly as written (OR = 5.893, AME = 0.240) and more likely to be read disfluently (OR = 2.001, AME = 0.124) than questions that are not in a battery. In WLT, items that are later in a battery are less likely to be read exactly as written (OR = 0.445, AME = −0.164) than those not in a battery but are not more or less likely to be read with disfluencies. In contrast, in NOP, later questions in a battery are more likely to be read exactly as written (OR = 52.336, AME = 0.556) and less likely to be read disfluently (OR = 0.316, AME = −0.123). The lack of replication for battery items likely reflects that the two surveys differ substantially in how battery items appear. WLT does not randomly rotate battery items; thus, item content and order are fully confounded. NOP does randomly rotate the battery items. Additionally, all of the NOP battery items used dichotomous response options (“yes/no,” “do/do not,” “concerned/not concerned”), whereas WLT had a variety of different response formats across battery items. Finally, WLT contained sensitive items and unfamiliar terms in batteries, where NOP did not. In sum, interviewer decisions related to the presence of words in parentheses are disruptive to reading behaviors in both surveys. The other interviewer-decision question characteristics are less consistently associated with reading behaviors across these two surveys. 6.5 Highly Practiced Questions Interviewer practice with questions similarly shows mixed associations with exact question reading, although more consistent associations for disfluencies. Within-survey experience (i.e., within-study practice) consistently predicts fewer disfluencies for interviews conducted later in the field period across both surveys (WLT: OR = 0.980, AME = −0.003; NOP: OR = 0.959, AME = −0.005). In NOP but not WLT, questions read on later interviews are also more likely to be read exactly as written (OR = 1.028, AME = 0.005). Thus, practicing interviews over the course of the field period improves reading fluency and, in the NOP survey, exact reading. Demographic questions are less likely to be read exactly as worded in WLT (OR = 0.494, AME = −0.138), but there is no association in NOP and no association with disfluencies in either survey. Questions that appear later in the survey are more likely to be read exactly as written in NOP (OR = 1.088, AME = 0.014, p < 0.0001) but not in WLT, and there is no association with disfluencies in either survey. 6.6 Explained Variance in Question Reading Behaviors One measure of model fit is the proportion of variance explained at each of the levels of analysis—here, the variance that can be associated with questions, respondents, and interviewers. The variables included in our models explained 77 percent of the question-level variance for exact question reading for WLT and 9 percent for NOP (see the online supplementary material). The covariates explain 64 percent of the question-level variance for disfluencies for WLT and 83 percent for NOP. Thus, in both surveys, we are able to predict variation in the types of questions that will be read disfluently; we can also anticipate variation in the types of questions that will be read inexactly in WLT but have less success in NOP. For exact question reading in NOP, the interviewer-related and respondent-related variance increases with covariates added to the models; in WLT, only 10 percent of the variance in exact question reading related to interviewers, and 12 percent of the variance related to respondents is explained by our included covariates. For disfluencies, we explain 32 percent of the variance related to interviewers in WLT and 8 percent in NOP. We also explain 18 percent of the variance in disfluencies related to respondents in WLT and 11 percent in NOP. 6.7 Question Reading Behaviors and Efficiency To address whether questions read exactly as worded or questions without disfluencies are read more efficiently, we first evaluate the number of seconds that the interviewers spent asking each question (Table 4; see the online supplementary material). The average number of seconds to ask a question read exactly as worded was 5.54 seconds in WLT and 5.07 seconds in NOP. When the questions were misread, this time increased in both surveys by about two seconds. Similarly, questions asked without disfluencies took about 6.30 seconds in WLT and 5.50 seconds in NOP. Disfluencies increased administration time by 1.44 seconds in WLT and 3.45 seconds in NOP. Although these increases are only a few seconds, they can accumulate across items and respondents. Table 4. Mean Length of Time Asking Question and Mean Reading Pace by Question Reading Behaviors Total time(seconds) Reading speed (Words per second) WLT NOP WLT NOP Questions read exactly as worded 5.539 5.071 2.712 2.013 Questions not read exactly as worded 7.592 7.931 2.805 2.587 z-test −14.82**** −38.93**** −7.54**** −43.37**** n 20, 926 30, 078 20, 926 30, 078 Questions read without disfluencies 6.300 5.502 2.706 2.132 Questions read with disfluencies 7.738 8.947 2.989 2.637 z-test 11.05**** 29.41**** 0.75 −0.30 n 20, 926 30, 078 20, 926 30, 078 Total time(seconds) Reading speed (Words per second) WLT NOP WLT NOP Questions read exactly as worded 5.539 5.071 2.712 2.013 Questions not read exactly as worded 7.592 7.931 2.805 2.587 z-test −14.82**** −38.93**** −7.54**** −43.37**** n 20, 926 30, 078 20, 926 30, 078 Questions read without disfluencies 6.300 5.502 2.706 2.132 Questions read with disfluencies 7.738 8.947 2.989 2.637 z-test 11.05**** 29.41**** 0.75 −0.30 n 20, 926 30, 078 20, 926 30, 078 Note.— Means are unadjusted sample means. Z-test is from cross-classified multilevel linear model predicting pace with both exact question reading and disfluencies, controlling for question characteristics, respondent characteristics, and interviewer characteristics. *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001. Open in new tab Table 4. Mean Length of Time Asking Question and Mean Reading Pace by Question Reading Behaviors Total time(seconds) Reading speed (Words per second) WLT NOP WLT NOP Questions read exactly as worded 5.539 5.071 2.712 2.013 Questions not read exactly as worded 7.592 7.931 2.805 2.587 z-test −14.82**** −38.93**** −7.54**** −43.37**** n 20, 926 30, 078 20, 926 30, 078 Questions read without disfluencies 6.300 5.502 2.706 2.132 Questions read with disfluencies 7.738 8.947 2.989 2.637 z-test 11.05**** 29.41**** 0.75 −0.30 n 20, 926 30, 078 20, 926 30, 078 Total time(seconds) Reading speed (Words per second) WLT NOP WLT NOP Questions read exactly as worded 5.539 5.071 2.712 2.013 Questions not read exactly as worded 7.592 7.931 2.805 2.587 z-test −14.82**** −38.93**** −7.54**** −43.37**** n 20, 926 30, 078 20, 926 30, 078 Questions read without disfluencies 6.300 5.502 2.706 2.132 Questions read with disfluencies 7.738 8.947 2.989 2.637 z-test 11.05**** 29.41**** 0.75 −0.30 n 20, 926 30, 078 20, 926 30, 078 Note.— Means are unadjusted sample means. Z-test is from cross-classified multilevel linear model predicting pace with both exact question reading and disfluencies, controlling for question characteristics, respondent characteristics, and interviewer characteristics. *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001. Open in new tab Although questions read inexactly take longer than questions read exactly, they are read at a faster pace (more words per second; inexact: WLT 2.81 wps, NOP 2.59 wps; exact: WLT 2.17 wps, NOP 2.01 wps). In both studies, these differences hold in multivariate models. Disfluencies follow a similar pattern in that questions read without disfluencies are read at a slower speed than questions read with disfluencies, but this difference is not statistically significant in either survey in multivariate models. Although the effects of this pace difference on the answers that are provided cannot be directly evaluated from this analysis, it is clear that in addition to wording deviations with inexact readings, respondents also receive different stimuli in the form of how quickly questions are read. 7. CONCLUSION AND DISCUSSION We examined the role of question characteristics on question reading behaviors and administration time in two CATI surveys. We found that multiple question characteristics do matter for question reading behaviors. Furthermore, misreadings and disfluencies slow efficient progress through the questionnaire. Questions that are misread or read disfluently take longer to read and are read at a faster pace than questions that are read as written and without disfluencies. Thus, meaningful cost implications of misreading questions exist beyond potential data quality disruptions. Although the number of words in the question contributes to misreading and reading with disfluencies, other question characteristics such as transition statements, measures of complexity (e.g., reading level), and features allowing interviewer decisions (e.g., presence of parentheses) also affect whether a question is misread and/or read with disfluencies. Thus, it is not simply question length that causes reading problems; other features of the question also matter. In short, the more work an interviewer has to do, the more likely they will read a question incorrectly or disfluently. The various measures of question complexity obtained from QUAID inconsistently predicted these question reading behaviors. Question complexity issues such as vague words or working memory overload issues do not lead to less accurate question reading. In NOP, questions with complex syntax were more likely to be read exactly as worded rather than less likely, even though this measure is specifically about questions being difficult to read. Although the Flesch reading grade level is significantly associated with question misreading, this measure presents challenges. Readability measures were developed for long passages of text, not the short passages represented in survey questions (Lenzner 2014; Zhou, Jeong, and Green 2017). We used Microsoft Word to calculate the Flesch reading grade level of our questions, as recommended by Zhou et al. (2017). Yet the Flesch reading grade level is only one measure of readability; other measures have been developed, with only limited evaluation of their efficacy with survey questions (Lenzner 2014). The online supplementary material contains the correlation between different readability measures for the WLT survey. Future work should evaluate alternative measures of question complexity, including perhaps using human coders to evaluate the questions on perceptions of complexity in a variety of domains. As interviewers continue to practice survey questions over the course of the field period, they read them more fluently. This reduction in disfluencies and increased exact question reading could be one reason that interviews tend to shorten in total length over the course of the field period (e.g., Olson and Peytchev 2007). Future work will examine other interview behaviors related to length of the interview. In this paper, we pursued the goal of replication. We examined two different surveys with two different interviewer training and monitoring procedures and with notable differences in the questionnaire design, content, and format. Different organizations vary in types of survey clients, surveys, and interviewer employee pools. Interviewers at an organization mainly conducting attitudinal surveys will have a different set of experiences than interviewers at an organization mainly conducting behavioral surveys. Furthermore, different organizations have different training and monitoring rules. Thus, we expect variation in how survey questions are administered across survey organizations, although it is difficult to anticipate exactly how this will occur. Despite these differences, there was a good deal of replication in which question features predicted reading behaviors across these two surveys. This is reassuring and provides confidence that these findings are not due to particular interviewer corps. Future research should use identical questionnaires at multiple organizations to more clearly disentangle the effects of question characteristics from house effects. Future research should also examine additional features that may affect question reading behaviors but that could not be included here because they did not appear in both surveys or were too rare (e.g., question sensitivity). One challenge to evaluating question characteristics is that they appear in a package and are thus collinear. For instance, in NOP, all battery items had dichotomous nominal response options. Because of this collinearity, we are limited in the set of question features that we could directly measure in our models across both surveys. These packages of question features may also contribute to lack of replication on some question characteristics across the two studies. Future research should include surveys with more variation in how packages of question characteristics occur, to the extent possible. The implications of these findings for questionnaire design are clear. Questionnaire designers should write simpler questions that require fewer interviewer decisions. In particular, parentheticals should be avoided. Additionally, questions that have high reading levels, are long, or contain transition statements are more likely to pose reading problems and should be avoided where possible. If transition statements are considered essential for cognitive processing, then the researcher should anticipate that interviewers will misread these statements or cue respondents to a change in topic with a disfluent reading. The implications for interviewer training are also clear. For maintaining efficiency in survey administration, it is important to train interviewers to read questions exactly as worded. Trainers should anticipate that interviewers will need more practice on certain types of questions; long questions and questions with transition statements, parentheticals, instructions, or higher reading levels should receive special attention during interviewer training. With additional training for these questions, it may be possible to prevent some of the misreadings and disfluencies observed in the two surveys examined here. This training will also help achieve a more efficient, shorter interview. In sum, question misreadings and disfluencies are strongly associated with decisions that are made by survey researchers when writing questions. Interviewers explain less of the variance in question reading than the questions themselves, suggesting a smaller role of interviewer motivation to adhere to the tenants of standardized interviewing than of questionnaire design decisions. Writing better questions should improve question reading and hopefully improve data quality. Supplementary Material Supplementary materials are available online at academic.oup.com/jssam. KristenOlson and Jolene D. Smyth are with the Department of Sociology, University of Nebraska-Lincoln 711 Oldfather Hall, Lincoln, NE 68588-0324, USA. AntjeKirchner is with the Program for Research in Survey Methodology, Survey Research Division, RTI International 3040 Cornwallis Road, Research Triangle Park, NC 27709, USA. Article presented at the 2017 American Association for Public Opinion Research annual conference, New Orleans, Louisiana, May 2017. This material is based upon work supported by the National Science Foundation [grant number SES-1132015 to KO]. The authors thank the editor and anonymous reviewers for helpful feedback. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. REFERENCES Arnold J. E. , Fagnano M. , Tanenhaus M. K. ( 2003 ), “ Disfluencies Signal Theee, Um, New Information ,” Journal of Psycholinguistic Research , 32 , 25 – 36 . Google Scholar Crossref Search ADS PubMed WorldCat Belli R. F. , Weiss P. S. , Lepkowski J. M. ( 1999 ), “Dynamics of Survey Interviewing and the Quality of Survey Reports: Age Comparisons,” in Cognition, Aging, and Self-Reports , eds. Schwarz N. , Park D. C. , pp. 303 – 325 , Hove, England : Psychology Press/Erlbaum (UK) Taylor & Francis . Google Preview WorldCat COPAC Beretvas S. N. ( 2011 ), “Cross-Classifed and Multiple-Membership Models,” in Handbook of Advanced Multilevel Analysis , eds. Hox J. J. , Kyle Roberts J. , pp. 313 – 334, New York : Routledge . Google Preview WorldCat COPAC Bilgen I. , Belli R. F. ( 2010 ), “ Comparison of Verbal Behaviors between Calendar and Standardized Conventional Questionnaires ,” Journal of Official Statistics , 26 , 481 – 505 . WorldCat Billiet J. , Loosveldt G. ( 1988 ), “ Improvement of the Quality of Responses to Factual Survey Questions by Interviewer Training ,” Public Opinion Quarterly , 52 , 190 – 211 . Google Scholar Crossref Search ADS WorldCat Bortfeld H. , Leon S. D. , Bloom J. E. , Schober M. F. , Brennan S. E. ( 2001 ). “ Disfluency Rates in Conversation: Effects of Age, Relationship, Topic, Role, and Gender ,” Language and Speech , 44 , 123 – 147 . Google Scholar Crossref Search ADS PubMed WorldCat Bradburn N. , Rips L. J. , Shevell S. K. ( 1987 ). “ Answering Autobiographical Questions: The Impact of Memory and Inference on Surveys ,” Science , 236 , 157 – 161 . Google Scholar Crossref Search ADS PubMed WorldCat Brennan S. E. , Schober M. F. ( 2001 ). “ How Listeners Compensate for Disfluencies in Spontaneous Speech ,” Journal of Memory and Language , 44 , 274 – 296 . Google Scholar Crossref Search ADS WorldCat Brenner M. ( 1982 ). “Response-Effects of ‘Role-Restricted’ Characteristics of the Interviewer,” in Response Behaviour in the Survey-Interview , eds. Dijkstra W. , van der Zouwen J. , pp. 131 – 165 . London : Academic Press . Google Preview WorldCat COPAC Brick J. M. , Tubbs E. , Collins M. A. , Nolin M. J. , Cantor D. , Levin K. , Carnes Y. ( 1997 ), “ Telephone Coverage Bias and Recorded Interviews in the 1993 National Household Education Survey (NHES: 93) ,” National Center for Education Statistics U.S. Department of Education . Washington, DC : U.S. Department of Education, National Center for Education Statistics . Google Preview WorldCat COPAC Brick J. M. , Calahan M. , Gray L. , Severynse J. , Stowe P. ( 1994 ). A Study of Selected Nonsampling Errors in the 1991 Survey of Recent College Graduates. NCES 95-640 , Washington, DC : U.S. Department of Education, Office of Educational Research and Improvement . Google Preview WorldCat COPAC Calahan M. , Mitchell S. , Gray L. , Chen S. , Tsapogas J. ( 1997 ), “ Recorded Interview Behavior Coding Study National Study of Recent College Graduates ,” JSM Proceedings, Survey Research Methods Section, pp. 846–851. Alexandria, VA: American Statistical Association. WorldCat Cannell C. F. , Fowler F. J. , Marquis K. H. ( 1968 ), The Influence of Interviewer and Respondent Psychological and Behavioral Variables on the Reporting in Household Interviews , Rockville, MD : U.S. Department of Health, Education, and Welfare. Public Health Service. National Center for Health Statistics . Google Preview WorldCat COPAC Cannell C. F. , Marquis K. H. , Laurent A. ( 1977 ), A Summary of Studies of Interviewing Methodology , Rockville, MD : U.S. Department OF Health, Education, and Welfare, Public Health Service, Health Resources Administration, National Center for Health Statistics . Google Preview WorldCat COPAC Cannell C. F. , Miller P. V. , Oksenberg L. ( 1981 ), “ Research on Interviewing Techniques ,” Sociological Methodology , 12 , 389 – 437 . Google Scholar Crossref Search ADS WorldCat Charoenruk N. , Olson K. ( 2018 ), “ Do Listeners Perceive Interviewers’ Attributes from Their Voices and Do Perceptions Differ by Question Type? ,” Field Methods , 30 , 312 –328. Google Scholar Crossref Search ADS WorldCat Childs J. H. , Landreth A. ( 2006 ). “ Analyzing Interviewer/Respondent Interactions While Using a Mobile Computer-Assisted Personal Interview Device ,” Field Methods , 18 , 335 – 351 . Google Scholar Crossref Search ADS WorldCat Conrad F. G. , Schober M. F. , Dijkstra W. ( 2008 ), “Cues of Communication Difficulty in Telephone Interviews,” in Advances in Telephone Survey Methodology , eds. Lepkowski J. M. , Tucker C. , Michael Brick J. , de Leeuw E. D. , Japec L. , Lavrakas P. J. , Link M. W. , Sangster R. L. , pp. 212 – 230 . John Wiley & Sons, Inc . Google Preview WorldCat COPAC de Leeuw E. ( 1992 ), Data Quality in Mail, Face to Face, and Telephone Surveys , Amsterdam : TT-Publikates . Google Preview WorldCat COPAC Leeuw E. ( 2018 ), “ Mixed-Mode: Past, Present, and Future ,” Survey Research Methods , 12 , 75 – 89 . WorldCat Dijkstra W. ( 2016 ), “Sequence Viewer 6.1,” Available at http://www.sequenceviewer.nl/index.html. Last accessed May 5, 2017. Dillman D. A. , Smyth J. D. , Christian L. M. ( 2014 ), Internet, Phone, Mail, and Mixed Mode Surveys: The Tailored Design Method , Hoboken, NJ : John Wiley & Sons . Google Preview WorldCat COPAC Dykema J. , Lepkowski J. M. , Blixt S. ( 1997 ), “The Effect of Interviewer and Respondent Behavior on Data Quality: Analysis of Interaction Coding in a Validation Study,” in Survey Measurement and Process Quality , eds. Lyberg L. , Biemer P. , Collins M. , de Leeuw E. , Dippo C. , Schwarz N. , Trewin D. , pp. 287 – 310 . New York : John Wiley & Sons, Inc . Google Preview WorldCat COPAC Dykema J. , Schaeffer N. C. , Garbarski D. , Nordheim E. V. , Banghart M. , Cyffka K. ( 2016 ), “ The Impact of Parenthetical Phrases on Interviewers’ and Respondents’ Processing of Survey Questions ,” Survey Practice , 9 ( 2 ). WorldCat Dykema J. , Schaeffer N. C. , Garbarski D. , Hout M. (forthcoming), “The Role of Question Characteristics in Designing and Evaluating Survey Questions,” in Advances in Questionnaire Design, Development, Evaluation, and Testing , eds. Beatty P. , Collins D. , Kaye L. , Padilla J.-L. , Willis G. , Wilmot A. . Hoboken, NJ : Wiley . Google Preview WorldCat COPAC Ehlen P. , Schober M. F. , Conrad F. G. ( 2007 ), “ Modeling Speech Disfluency to Predict Conceptual Misalignment in Speech Survey Interfaces ,” Discourse Processes , 44 , 245 – 265 . Google Scholar Crossref Search ADS WorldCat Esposito J. L. ( 2002 ), “Iterative, Multiple-Method Questionnaire Evaluation Research: A Case Study,” International Conference on Questionnaire Development, Evaluation and Testing (QDET) Methods, Charleston, SC, 14–17 November 2002. Fowler F. J. ( 1991 ), “Reducing Interviewer-Related Error through Interviewer Training, Supervision, and Other Means,” in Measurement Errors in Surveys , eds. Biemer P. , Groves R. M. , Lyberg L. , Mathiowetz N. A. , udman S. S , pp. 259 – 278 . New York : John Wiley & Sons, Inc . Google Preview WorldCat COPAC Fowler F. J. , Mangione T. W. ( 1990 ), Standardized Survey Interviewing: Minimizing Interviewer-Related Error , Newbury Park : Sage Publications . Google Preview WorldCat COPAC Garbarski D. , Schaeffer N. C. , Dykema J. ( 2016 ), “ Interviewing Practices, Conversational Practices, and Rapport: Responsiveness and Engagement in the Standardized Survey Interview ,” Sociological Methodology , 46, 1–38. doi: 10.1177/0081175016637890. WorldCat Graesser A. C. , Cai Z. , Louwerse M. M. , Daniel F. ( 2006 ), “ Question Understanding Aid (QUAID): A Web Facility That Helps Survey Methodologists Improve the Comprehensibility of Questions ,” Public Opinion Quarterly , 70 , 3 –22. Google Scholar Crossref Search ADS WorldCat Groves R. M. , Magilavy L. J. ( 1986 ). “ Measuring and Explaining Interviewer Effects in Centralized Telephone Facilities ,” Public Opinion Quarterly , 50 , 251 – 266 . Google Scholar Crossref Search ADS WorldCat Hess J. , Rothgeb J. , Zukerberg A. ( 1997 ), “Survey of Program Dynamics: Pretest Evaluation Report,” Center for Survey Methods Research, U.S. Bureau of the Census. Holbrook A. L. , Johnson T. P. , Cho Y. I. , Shavitt S. , Chavez N. , Weiner S. ( 2015 ), “The Effect of Question Characteristics on Respondent and Interviewer Behaviors,” Annual Meeting of the American Association for Public Opinion Research, Hollywood, FL. Holbrook A. L. , Johnson T. P. , Cho Y. I. , Shavitt S. , Chavez N. , Weiner S. ( 2016 ), “ Do Interviewer Errors Help Explain the Impact of Question Characteristics on Respondent Difficulties? ,” Survey Practice , 9 (2) . Google Scholar Crossref Search ADS WorldCat Japec L. ( 2008 ). “Interviewer Error and Interviewer Burden,” in Advances in Telephone Survey Methodology , eds. Lepkowski J. M. , Tucker C. , Michael Brick J. , de Leeuw E. D. , Japec L. , Lavrakas P. J. , Link M. W. , Sangster R. L. , pp. 185 – 211 . Hoboken, NJ: John Wiley & Sons, Inc . Google Preview WorldCat COPAC Johnson T. P. , Shariff-Marco S. , Willis G. , Cho Y. I. , Breen N. , Gee G. C. , Krieger N. , Grant D. , Alegria M. , Mays V. M. , Williams D. R. , Landrine H. , Liu B. , Reeve B. B. , Takeuchi D. , Ponce N. A. ( 2015 ), “ Sources of Interactional Problems in a Survey of Racial/Ethnic Discrimination ,” International Journal of Public Opinion Research , 27 , 244 – 263 . Google Scholar Crossref Search ADS PubMed WorldCat Ketrow S. M. ( 1990 ), “ Attributes of a Telemarketer's Voice and Persuasiveness ,” Journal of Direct Marketing , 4 , 7 – 21 . Google Scholar Crossref Search ADS WorldCat Kirchner A. , Olson K. ( 2017 ), “ Examining Changes of Interview Length Over the Course of the Field Period ,” Journal of Survey Statistics and Methodology , 5 , 84 – 108 . WorldCat Krosnick J. A. , Presser S. ( 2010 ), “Question and Questionnaire Design,” in Handbook of Survey Research, Second Edition , eds. Marsden P. V. , Wright J. D. , pp. 263 – 313 . Bingley, UK : Emerald Group Publishing Limited . Google Preview WorldCat COPAC Lenzner T. ( 2012 ), “ Effects of Survey Question Comprehensibility on Response Quality ,” Field Methods , 24 , 409 – 428 . Google Scholar Crossref Search ADS WorldCat Lenzner T. ( 2014 ), “ Are Readability Formulas Valid Tools for Assessing Survey Question Difficulty? ,” Sociological Methods & Research , 43 , 677 – 698 . Google Scholar Crossref Search ADS WorldCat Loosveldt G. , Beullens K. ( 2013 ), “ The Impact of Respondents and Interviewers on Interview Speed in Face-to-Face Interviews ,” Social Science Research , 42 , 1422 – 1430 . Google Scholar Crossref Search ADS PubMed WorldCat Mangione T. W. , Fowler F. J. Jr , Louis T. A. ( 1992 ), “ Question Characteristics and Interviewer Effects ,” Journal of Official Statistics , 8 , 293 – 307 . WorldCat Mathiowetz N. A. , Cannell C. F. ( 1980 ), “ Coding Interviewer Behavior as a Method of Evaluating Performance ,” JSM Proceedings, Survey Research Methods Section, pp. 525–528. Alexandria, VA: American Statistical Association. WorldCat Mingay D. J. , Greenwell M. T. ( 1989 ), “ Memory Bias and Response-Order Effects ,” Journal of Official Statistics , 5 , 253 – 263 . WorldCat Oksenberg L. , Cannell C. F. ( 1988 ), “Effects of Interviewer Vocal Characteristics on Nonresponse,” in Telephone Survey Methodology , eds. Groves R. M. , Biemer P. , Lyberg L. , Massey J. T. , Nicholls W. L. II , Waksberg J. , pp. 257 – 269 . New York : John Wiley & Sons, Inc . Google Preview WorldCat COPAC Olson K. , Peytchev A. ( 2007 ), “ Effect of Interviewer Experience on Interview Pace and Interviewer Attitudes, ” Public Opinion Quarterly , 71 , 273 – 286 . Google Scholar Crossref Search ADS WorldCat Olson K. , Bilgen I. ( 2011 ). “ The Role of Interviewer Experience on Acquiescence ,” Public Opinion Quarterly , 75 , 99 – 114 . Google Scholar Crossref Search ADS WorldCat Olson K. , Smyth J. D. ( 2015 ). “ The Effect of CATI Questions, Respondents, and Interviewers on Response Time ,” Journal of Survey Statistics and Methodology , 3 , 361 – 396 . Google Scholar Crossref Search ADS WorldCat Olson K. , Smyth J. D. ( 2017 ), “‘During the LAST YEAR, Did You…’: The Effect of Emphasis in CATI Survey Questions on Data Quality,” Groningen Symposium on Language and Interaction, University of Groningen, The Netherlands, January 2017. Olson K. , Smyth J. D. , Cochran B. ( 2018 ), “ Item Location, the Interviewer–Respondent Interaction, and Responses to Battery Questions in Telephone Surveys ,” Sociological Methodology , 48 , 225 – 268 . Google Scholar Crossref Search ADS WorldCat Ongena Y. P. , Dijkstra W. ( 2006 ), “ Methods of Behavior Coding of Survey Interviews ,” Journal of Official Statistics , 22 , 419 – 451 . WorldCat Ongena Y. P. , Dijkstra W. ( 2007 ), “ A Model of Cognitive Processes and Conversational Principles in Survey Interview Interaction ,” Applied Cognitive Psychology , 21 , 145 – 163 . Google Scholar Crossref Search ADS WorldCat Presser S. , Zhao S. ( 1992 ), “ Attributes of Questions and Interviewers as Correlates of Interviewing Performance ,” Public Opinion Quarterly , 56 , 236 – 240 . Google Scholar Crossref Search ADS WorldCat Rabe-Hesketh S. , Skrondal A. ( 2012 ), Multilevel and Longitudinal Modeling Using Stata, Third Edition, Volume II: Categorical Responses, Counts, and Survival (3rd ed.), College Station, TX : Stata Press . Google Preview WorldCat COPAC Raudenbush S. W. , Bryk A. S. ( 2002 ), Hierarchical Linear Models: Applications and Data Analysis Methods (2nd ed.), Newbury Park, CA : Sage . Google Preview WorldCat COPAC Sander J. E. , Conrad F. G. , Mullin P. A. , Herrmann D. J. ( 1992 ), “ Cognitive Modeling of the Survey Interview ,” JSM Proceedings, Survey Research Methods Section, pp. 818–823. Alexandria, VA: American Statistical Association. WorldCat Schaeffer N. C. , Dykema J. ( 2011 ), “Response 1 to Fowler's Chapter: Coding the Behavior of Interviewers and Respondents to Evaluate Survey Questions,” in Question Evaluation Methods: Contributing to the Science of Data Quality , eds. Madans J. , Miller K. , Maitland A. , Willis G. , pp. 23 – 39 . Hoboken, NJ : John Wiley & Sons . Google Preview WorldCat COPAC Schaeffer N. C. , Dykema J. , Maynard D. W. ( 2010 ), “Interviewers and Interviewing,” in Handbook of Survey Research , eds. Marsden P. V. , Wright J. D. , pp. 437 – 470 . Bingley, UK : Emerald Group Publishing . Google Preview WorldCat COPAC Schwarz N. , Strack F. , Hippler H. J. , Bishop G. ( 1991 ), “ The Impact of Administration Mode on Response Effects in Survey Measurement ,” Applied Cognitive Psychology , 5 , 193 . Google Scholar Crossref Search ADS WorldCat Shriberg E. ( 1996 ). “Disfluencies in Switchboard,” Proceedings, International Conference on Spoken Language Processing (ICSLP ’96), Philadelphia, PA. Smith S. M. , Shaffer D. R. ( 1991 ), “ Celerity and Cajolery: Rapid Speech May Promote or Inhibit Persausion through Its Impact on Message Elaboration ,” Personality and Social Psychology Bulletin , 17 , 663 – 669 . Google Scholar Crossref Search ADS WorldCat Smith S. M. , Shaffer D. R. ( 1995 ), “ Speed of Speech and Persuasion: Evidence for Multiple Effects ,” Personality and Social Psychology Bulletin , 21 , 1051 – 1060 . Google Scholar Crossref Search ADS WorldCat Sykes W. , Collins M. ( 1992 ), “ Anatomy of the Survey Interview ,” Journal of Official Statistics , 8 , 277 – 291 . WorldCat Timbrook J. K. Olson, and J.D. Smyth. ( 2018 ), “ Why do Mobile Interviews Take Longer? A Behavior Coding Perspective ,” Public Opinion Quarterly , 82 , 553 –582. Google Scholar Crossref Search ADS WorldCat van Breukelen G. , Moerbeek M. ( 2013 ), “Chapter 11: Design Considerations in Multilevel Studies,” in The SAGE Handbook of Multilevel Modeling , eds. Scott M. A. , Simonoff J. S. , Marx B. D. , London : SAGE Publications Ltd . Google Preview WorldCat COPAC Vandenplas C. , Loosveldt G. , Beullens K. , Denies K. ( 2017 ), “ Are Interviewer Effects on Interview Speed Related to Interviewer Effects on Straight-Lining Tendency in the European Social Survey? An Interviewer-Related Analysis ,” Journal of Survey Statistics and Methodology , 6, 516–538, doi: 10.1093/jssam/smx034. WorldCat Vassallo R. , Durrant G. , Smith P. ( 2017 ). “ Separating Interviewer and Area Effects by Using a Cross-Classified Multilevel Logistic Model: Simulation Findings and Implications for Survey Designs ,” Journal of the Royal Statistical Society: Series A (Statistics in Society) , 180 , 531 – 550 . Google Scholar Crossref Search ADS WorldCat Viterna J. , Maynard D. W. ( 2002 ), “How Uniform is Standardization? Variation within and across Survey Research Centers regarding Protocols for Interviewing,” in Standardization and Tacit Knowledge: Interaction and Practice in the Survey Interview , eds. Maynard D. W. , pp. 365 – 397, New York : John Wiley & Sons . Google Preview WorldCat COPAC Williams R. ( 2012 ). “ Using the Margins Command to Estimate and Interpret Adjusted Predictions and Marginal Effects ,” Stata Journal , 12 , 308 – 331 . Google Scholar Crossref Search ADS WorldCat Zhou S. , Jeong H. , Green P. A. ( 2017 ), “ How Consistent are the Best-Known Readability Equations in Estimating the Readability of Design Standards? ,” IEEE Transactions on Professional Communication , 60 , 97 – 111 . Google Scholar Crossref Search ADS WorldCat Appendix Table 1. Correlation Matrix of Question Characteristics, Work, and Leisure Today Survey Number of words Transition statement Question reading level Unfamiliar technical term Vague or imprecise relative term Vague or ambiguous noun-phrase Complex syntax Working memory overload Parentheses Interviewer instructions Emphasis Not in battery First question in battery Later question in battery Demographic Sequential question number Open-ended text Open-ended numeric Closed-nominal Closed-ordinal Yes/No Length Number of words 1.00 Transition statement 0.48 1.00 Complexity Question reading level 0.13 −0.03 1.00 Unfamiliar technical term −0.09 0.31 0.24 1.00 Vague or imprecise relative term 0.18 0.06 0.08 0.01 1.00 Vague or ambiguous noun-phrase 0.38 0.05 0.25 −0.02 −0.09 1.00 Complex syntax 0.17 −0.09 0.14 0.10 0.12 −0.02 1.00 Working memory overload −0.03 −0.09 0.08 0.26 0.12 −0.02 0.29 1.00 Decisions Parentheses 0.17 −0.12 0.44 0.09 0.16 0.28 −0.08 0.20 1.00 Interviewer instructions 0.01 −0.07 −0.06 −0.17 −0.28 0.05 −0.19 −0.02 −0.11 1.00 Emphasis 0.23 0.30 0.04 0.03 0.21 0.00 −0.10 −0.10 −0.13 0.33 1.00 Not in battery 0.26 0.10 −0.09 0.09 −0.05 0.17 0.04 0.04 −0.13 0.17 0.13 1.00 First question in battery 0.37 0.31 0.06 0.02 0.14 0.08 0.24 −0.07 −0.09 0.08 0.28 −0.34 1.00 Later question in battery −0.47 −0.27 0.06 −0.11 −0.03 −0.22 −0.17 0.00 0.18 −0.22 −0.29 −0.85 −0.20 1.00 Practice Demographic −0.06 −0.10 −0.10 0.04 −0.02 −0.10 0.23 0.23 −0.19 0.07 −0.25 0.49 −0.17 −0.42 1.00 Sequential question number −0.14 0.06 0.10 0.24 −0.34 −0.08 0.16 0.19 −0.27 0.47 0.09 0.29 −0.11 −0.24 0.48 1.00 Controls Open-ended text −0.04 −0.12 −0.02 −0.04 −0.16 0.15 −0.08 0.20 −0.10 0.42 −0.13 0.26 −0.09 −0.23 0.10 0.11 1.00 Open-ended numeric 0.00 0.21 −0.12 −0.07 −0.15 −0.11 −0.16 −0.16 −0.22 0.55 0.62 0.08 0.11 −0.14 −0.13 0.27 −0.22 1.00 Closed-nominal −0.03 0.21 −0.07 0.38 −0.11 −0.03 0.17 −0.09 −0.11 −0.27 −0.15 0.29 −0.10 −0.25 0.46 0.26 −0.11 −0.24 1.00 Closed-ordinal −0.10 −0.16 0.19 −0.11 0.36 −0.05 0.00 0.00 0.32 −0.46 −0.29 −0.69 0.10 0.67 −0.33 −0.42 −0.23 −0.48 −0.25 1.00 Yes/No 0.18 −0.16 −0.01 −0.07 −0.05 0.11 0.13 0.13 0.05 −0.21 −0.17 0.35 −0.12 −0.29 0.11 −0.11 −0.13 −0.28 −0.15 −0.29 1.00 Number of words Transition statement Question reading level Unfamiliar technical term Vague or imprecise relative term Vague or ambiguous noun-phrase Complex syntax Working memory overload Parentheses Interviewer instructions Emphasis Not in battery First question in battery Later question in battery Demographic Sequential question number Open-ended text Open-ended numeric Closed-nominal Closed-ordinal Yes/No Length Number of words 1.00 Transition statement 0.48 1.00 Complexity Question reading level 0.13 −0.03 1.00 Unfamiliar technical term −0.09 0.31 0.24 1.00 Vague or imprecise relative term 0.18 0.06 0.08 0.01 1.00 Vague or ambiguous noun-phrase 0.38 0.05 0.25 −0.02 −0.09 1.00 Complex syntax 0.17 −0.09 0.14 0.10 0.12 −0.02 1.00 Working memory overload −0.03 −0.09 0.08 0.26 0.12 −0.02 0.29 1.00 Decisions Parentheses 0.17 −0.12 0.44 0.09 0.16 0.28 −0.08 0.20 1.00 Interviewer instructions 0.01 −0.07 −0.06 −0.17 −0.28 0.05 −0.19 −0.02 −0.11 1.00 Emphasis 0.23 0.30 0.04 0.03 0.21 0.00 −0.10 −0.10 −0.13 0.33 1.00 Not in battery 0.26 0.10 −0.09 0.09 −0.05 0.17 0.04 0.04 −0.13 0.17 0.13 1.00 First question in battery 0.37 0.31 0.06 0.02 0.14 0.08 0.24 −0.07 −0.09 0.08 0.28 −0.34 1.00 Later question in battery −0.47 −0.27 0.06 −0.11 −0.03 −0.22 −0.17 0.00 0.18 −0.22 −0.29 −0.85 −0.20 1.00 Practice Demographic −0.06 −0.10 −0.10 0.04 −0.02 −0.10 0.23 0.23 −0.19 0.07 −0.25 0.49 −0.17 −0.42 1.00 Sequential question number −0.14 0.06 0.10 0.24 −0.34 −0.08 0.16 0.19 −0.27 0.47 0.09 0.29 −0.11 −0.24 0.48 1.00 Controls Open-ended text −0.04 −0.12 −0.02 −0.04 −0.16 0.15 −0.08 0.20 −0.10 0.42 −0.13 0.26 −0.09 −0.23 0.10 0.11 1.00 Open-ended numeric 0.00 0.21 −0.12 −0.07 −0.15 −0.11 −0.16 −0.16 −0.22 0.55 0.62 0.08 0.11 −0.14 −0.13 0.27 −0.22 1.00 Closed-nominal −0.03 0.21 −0.07 0.38 −0.11 −0.03 0.17 −0.09 −0.11 −0.27 −0.15 0.29 −0.10 −0.25 0.46 0.26 −0.11 −0.24 1.00 Closed-ordinal −0.10 −0.16 0.19 −0.11 0.36 −0.05 0.00 0.00 0.32 −0.46 −0.29 −0.69 0.10 0.67 −0.33 −0.42 −0.23 −0.48 −0.25 1.00 Yes/No 0.18 −0.16 −0.01 −0.07 −0.05 0.11 0.13 0.13 0.05 −0.21 −0.17 0.35 −0.12 −0.29 0.11 −0.11 −0.13 −0.28 −0.15 −0.29 1.00 Note.— Bolded correlations are statistically significant at a p < 0.05 level. Open in new tab Appendix Table 1. Correlation Matrix of Question Characteristics, Work, and Leisure Today Survey Number of words Transition statement Question reading level Unfamiliar technical term Vague or imprecise relative term Vague or ambiguous noun-phrase Complex syntax Working memory overload Parentheses Interviewer instructions Emphasis Not in battery First question in battery Later question in battery Demographic Sequential question number Open-ended text Open-ended numeric Closed-nominal Closed-ordinal Yes/No Length Number of words 1.00 Transition statement 0.48 1.00 Complexity Question reading level 0.13 −0.03 1.00 Unfamiliar technical term −0.09 0.31 0.24 1.00 Vague or imprecise relative term 0.18 0.06 0.08 0.01 1.00 Vague or ambiguous noun-phrase 0.38 0.05 0.25 −0.02 −0.09 1.00 Complex syntax 0.17 −0.09 0.14 0.10 0.12 −0.02 1.00 Working memory overload −0.03 −0.09 0.08 0.26 0.12 −0.02 0.29 1.00 Decisions Parentheses 0.17 −0.12 0.44 0.09 0.16 0.28 −0.08 0.20 1.00 Interviewer instructions 0.01 −0.07 −0.06 −0.17 −0.28 0.05 −0.19 −0.02 −0.11 1.00 Emphasis 0.23 0.30 0.04 0.03 0.21 0.00 −0.10 −0.10 −0.13 0.33 1.00 Not in battery 0.26 0.10 −0.09 0.09 −0.05 0.17 0.04 0.04 −0.13 0.17 0.13 1.00 First question in battery 0.37 0.31 0.06 0.02 0.14 0.08 0.24 −0.07 −0.09 0.08 0.28 −0.34 1.00 Later question in battery −0.47 −0.27 0.06 −0.11 −0.03 −0.22 −0.17 0.00 0.18 −0.22 −0.29 −0.85 −0.20 1.00 Practice Demographic −0.06 −0.10 −0.10 0.04 −0.02 −0.10 0.23 0.23 −0.19 0.07 −0.25 0.49 −0.17 −0.42 1.00 Sequential question number −0.14 0.06 0.10 0.24 −0.34 −0.08 0.16 0.19 −0.27 0.47 0.09 0.29 −0.11 −0.24 0.48 1.00 Controls Open-ended text −0.04 −0.12 −0.02 −0.04 −0.16 0.15 −0.08 0.20 −0.10 0.42 −0.13 0.26 −0.09 −0.23 0.10 0.11 1.00 Open-ended numeric 0.00 0.21 −0.12 −0.07 −0.15 −0.11 −0.16 −0.16 −0.22 0.55 0.62 0.08 0.11 −0.14 −0.13 0.27 −0.22 1.00 Closed-nominal −0.03 0.21 −0.07 0.38 −0.11 −0.03 0.17 −0.09 −0.11 −0.27 −0.15 0.29 −0.10 −0.25 0.46 0.26 −0.11 −0.24 1.00 Closed-ordinal −0.10 −0.16 0.19 −0.11 0.36 −0.05 0.00 0.00 0.32 −0.46 −0.29 −0.69 0.10 0.67 −0.33 −0.42 −0.23 −0.48 −0.25 1.00 Yes/No 0.18 −0.16 −0.01 −0.07 −0.05 0.11 0.13 0.13 0.05 −0.21 −0.17 0.35 −0.12 −0.29 0.11 −0.11 −0.13 −0.28 −0.15 −0.29 1.00 Number of words Transition statement Question reading level Unfamiliar technical term Vague or imprecise relative term Vague or ambiguous noun-phrase Complex syntax Working memory overload Parentheses Interviewer instructions Emphasis Not in battery First question in battery Later question in battery Demographic Sequential question number Open-ended text Open-ended numeric Closed-nominal Closed-ordinal Yes/No Length Number of words 1.00 Transition statement 0.48 1.00 Complexity Question reading level 0.13 −0.03 1.00 Unfamiliar technical term −0.09 0.31 0.24 1.00 Vague or imprecise relative term 0.18 0.06 0.08 0.01 1.00 Vague or ambiguous noun-phrase 0.38 0.05 0.25 −0.02 −0.09 1.00 Complex syntax 0.17 −0.09 0.14 0.10 0.12 −0.02 1.00 Working memory overload −0.03 −0.09 0.08 0.26 0.12 −0.02 0.29 1.00 Decisions Parentheses 0.17 −0.12 0.44 0.09 0.16 0.28 −0.08 0.20 1.00 Interviewer instructions 0.01 −0.07 −0.06 −0.17 −0.28 0.05 −0.19 −0.02 −0.11 1.00 Emphasis 0.23 0.30 0.04 0.03 0.21 0.00 −0.10 −0.10 −0.13 0.33 1.00 Not in battery 0.26 0.10 −0.09 0.09 −0.05 0.17 0.04 0.04 −0.13 0.17 0.13 1.00 First question in battery 0.37 0.31 0.06 0.02 0.14 0.08 0.24 −0.07 −0.09 0.08 0.28 −0.34 1.00 Later question in battery −0.47 −0.27 0.06 −0.11 −0.03 −0.22 −0.17 0.00 0.18 −0.22 −0.29 −0.85 −0.20 1.00 Practice Demographic −0.06 −0.10 −0.10 0.04 −0.02 −0.10 0.23 0.23 −0.19 0.07 −0.25 0.49 −0.17 −0.42 1.00 Sequential question number −0.14 0.06 0.10 0.24 −0.34 −0.08 0.16 0.19 −0.27 0.47 0.09 0.29 −0.11 −0.24 0.48 1.00 Controls Open-ended text −0.04 −0.12 −0.02 −0.04 −0.16 0.15 −0.08 0.20 −0.10 0.42 −0.13 0.26 −0.09 −0.23 0.10 0.11 1.00 Open-ended numeric 0.00 0.21 −0.12 −0.07 −0.15 −0.11 −0.16 −0.16 −0.22 0.55 0.62 0.08 0.11 −0.14 −0.13 0.27 −0.22 1.00 Closed-nominal −0.03 0.21 −0.07 0.38 −0.11 −0.03 0.17 −0.09 −0.11 −0.27 −0.15 0.29 −0.10 −0.25 0.46 0.26 −0.11 −0.24 1.00 Closed-ordinal −0.10 −0.16 0.19 −0.11 0.36 −0.05 0.00 0.00 0.32 −0.46 −0.29 −0.69 0.10 0.67 −0.33 −0.42 −0.23 −0.48 −0.25 1.00 Yes/No 0.18 −0.16 −0.01 −0.07 −0.05 0.11 0.13 0.13 0.05 −0.21 −0.17 0.35 −0.12 −0.29 0.11 −0.11 −0.13 −0.28 −0.15 −0.29 1.00 Note.— Bolded correlations are statistically significant at a p < 0.05 level. Open in new tab Appendix Table 2. Correlation Matrix of Question Characteristics, US/Japan Newspaper Opinion Poll Number of words Transition statement Question reading level Unfamiliar technical term Vague or imprecise relative term Vague or ambiguous noun-phrase Complex syntax Working memory overload Parentheses Interviewer instructions Emphasis Not in battery First question in battery Later question in battery Demographic Sequential question number Open-ended text Open-ended numeric Closed-nominal Closed-ordinal Yes/No Length Number of words 1.00 Transition statement 0.05 1.00 Complexity Question reading level 0.30 −0.06 1.00 Unfamiliar technical term 0.12 −0.31 0.28 1.00 Vague or imprecise relative term 0.18 −0.42 −0.14 0.03 1.00 Vague or ambiguous noun-phrase 0.18 −0.14 0.19 0.08 0.28 1.00 Complex syntax 0.26 −0.14 −0.03 −0.10 0.23 0.09 1.00 Working memory overload 0.11 −0.29 0.00 0.05 0.32 0.12 −0.09 1.00 Decisions Parentheses −0.15 −0.14 −0.03 0.04 −0.04 0.09 −0.04 −0.09 1.00 Interviewer instructions 0.19 −0.43 −0.16 0.04 0.70 0.23 0.29 0.34 −0.15 1.00 Emphasis 0.45 0.19 −0.15 −0.20 0.34 −0.04 0.08 0.22 −0.19 0.48 1.00 Not in battery 0.05 −0.54 −0.13 0.09 0.78 0.28 0.23 0.40 −0.04 0.82 0.28 1.00 First question in battery x x x x x x x x x x x x x Later question in battery x x x x x x x x x x x x x x Practice Demographic −0.22 −0.16 0.29 0.07 0.26 0.17 0.13 0.31 −0.08 0.43 −0.02 0.42 x x 1.00 Sequential question number −0.07 −0.51 0.21 0.30 0.41 0.41 0.10 0.15 0.01 0.43 −0.32 0.58 x x 0.46 1.00 Controls Open-ended text −0.19 −0.08 0.03 0.10 0.13 0.27 −0.02 −0.05 −0.02 0.16 −0.11 0.13 x x 0.31 0.17 1.00 Open-ended numeric −0.12 0.01 0.13 −0.10 0.10 0.28 0.30 −0.09 −0.04 0.29 −0.06 0.23 x x 0.34 0.27 −0.02 1.00 Closed-nominal −0.05 0.75 −0.03 −0.13 −0.18 −0.22 −0.17 −0.20 −0.17 −0.13 0.34 −0.29 x x 0.03 −0.31 −0.10 −0.17 1.00 Closed-ordinal 0.50 −0.36 −0.20 −0.01 0.53 0.21 0.22 0.42 −0.11 0.52 0.37 0.60 x x −0.10 0.09 −0.06 −0.11 −0.44 1.00 Yes/No −0.29 −0.46 0.15 0.17 −0.36 −0.14 −0.14 −0.12 0.31 −0.49 −0.63 −0.36 x x −0.16 0.09 −0.08 −0.14 −0.57 −0.36 1.00 Number of words Transition statement Question reading level Unfamiliar technical term Vague or imprecise relative term Vague or ambiguous noun-phrase Complex syntax Working memory overload Parentheses Interviewer instructions Emphasis Not in battery First question in battery Later question in battery Demographic Sequential question number Open-ended text Open-ended numeric Closed-nominal Closed-ordinal Yes/No Length Number of words 1.00 Transition statement 0.05 1.00 Complexity Question reading level 0.30 −0.06 1.00 Unfamiliar technical term 0.12 −0.31 0.28 1.00 Vague or imprecise relative term 0.18 −0.42 −0.14 0.03 1.00 Vague or ambiguous noun-phrase 0.18 −0.14 0.19 0.08 0.28 1.00 Complex syntax 0.26 −0.14 −0.03 −0.10 0.23 0.09 1.00 Working memory overload 0.11 −0.29 0.00 0.05 0.32 0.12 −0.09 1.00 Decisions Parentheses −0.15 −0.14 −0.03 0.04 −0.04 0.09 −0.04 −0.09 1.00 Interviewer instructions 0.19 −0.43 −0.16 0.04 0.70 0.23 0.29 0.34 −0.15 1.00 Emphasis 0.45 0.19 −0.15 −0.20 0.34 −0.04 0.08 0.22 −0.19 0.48 1.00 Not in battery 0.05 −0.54 −0.13 0.09 0.78 0.28 0.23 0.40 −0.04 0.82 0.28 1.00 First question in battery x x x x x x x x x x x x x Later question in battery x x x x x x x x x x x x x x Practice Demographic −0.22 −0.16 0.29 0.07 0.26 0.17 0.13 0.31 −0.08 0.43 −0.02 0.42 x x 1.00 Sequential question number −0.07 −0.51 0.21 0.30 0.41 0.41 0.10 0.15 0.01 0.43 −0.32 0.58 x x 0.46 1.00 Controls Open-ended text −0.19 −0.08 0.03 0.10 0.13 0.27 −0.02 −0.05 −0.02 0.16 −0.11 0.13 x x 0.31 0.17 1.00 Open-ended numeric −0.12 0.01 0.13 −0.10 0.10 0.28 0.30 −0.09 −0.04 0.29 −0.06 0.23 x x 0.34 0.27 −0.02 1.00 Closed-nominal −0.05 0.75 −0.03 −0.13 −0.18 −0.22 −0.17 −0.20 −0.17 −0.13 0.34 −0.29 x x 0.03 −0.31 −0.10 −0.17 1.00 Closed-ordinal 0.50 −0.36 −0.20 −0.01 0.53 0.21 0.22 0.42 −0.11 0.52 0.37 0.60 x x −0.10 0.09 −0.06 −0.11 −0.44 1.00 Yes/No −0.29 −0.46 0.15 0.17 −0.36 −0.14 −0.14 −0.12 0.31 −0.49 −0.63 −0.36 x x −0.16 0.09 −0.08 −0.14 −0.57 −0.36 1.00 Note.— Bolded correlations are statistically significant at a p < 0.05 level. Correlations related to first and later questions in a battery are omitted because battery items were randomized within respondents. Open in new tab Appendix Table 2. Correlation Matrix of Question Characteristics, US/Japan Newspaper Opinion Poll Number of words Transition statement Question reading level Unfamiliar technical term Vague or imprecise relative term Vague or ambiguous noun-phrase Complex syntax Working memory overload Parentheses Interviewer instructions Emphasis Not in battery First question in battery Later question in battery Demographic Sequential question number Open-ended text Open-ended numeric Closed-nominal Closed-ordinal Yes/No Length Number of words 1.00 Transition statement 0.05 1.00 Complexity Question reading level 0.30 −0.06 1.00 Unfamiliar technical term 0.12 −0.31 0.28 1.00 Vague or imprecise relative term 0.18 −0.42 −0.14 0.03 1.00 Vague or ambiguous noun-phrase 0.18 −0.14 0.19 0.08 0.28 1.00 Complex syntax 0.26 −0.14 −0.03 −0.10 0.23 0.09 1.00 Working memory overload 0.11 −0.29 0.00 0.05 0.32 0.12 −0.09 1.00 Decisions Parentheses −0.15 −0.14 −0.03 0.04 −0.04 0.09 −0.04 −0.09 1.00 Interviewer instructions 0.19 −0.43 −0.16 0.04 0.70 0.23 0.29 0.34 −0.15 1.00 Emphasis 0.45 0.19 −0.15 −0.20 0.34 −0.04 0.08 0.22 −0.19 0.48 1.00 Not in battery 0.05 −0.54 −0.13 0.09 0.78 0.28 0.23 0.40 −0.04 0.82 0.28 1.00 First question in battery x x x x x x x x x x x x x Later question in battery x x x x x x x x x x x x x x Practice Demographic −0.22 −0.16 0.29 0.07 0.26 0.17 0.13 0.31 −0.08 0.43 −0.02 0.42 x x 1.00 Sequential question number −0.07 −0.51 0.21 0.30 0.41 0.41 0.10 0.15 0.01 0.43 −0.32 0.58 x x 0.46 1.00 Controls Open-ended text −0.19 −0.08 0.03 0.10 0.13 0.27 −0.02 −0.05 −0.02 0.16 −0.11 0.13 x x 0.31 0.17 1.00 Open-ended numeric −0.12 0.01 0.13 −0.10 0.10 0.28 0.30 −0.09 −0.04 0.29 −0.06 0.23 x x 0.34 0.27 −0.02 1.00 Closed-nominal −0.05 0.75 −0.03 −0.13 −0.18 −0.22 −0.17 −0.20 −0.17 −0.13 0.34 −0.29 x x 0.03 −0.31 −0.10 −0.17 1.00 Closed-ordinal 0.50 −0.36 −0.20 −0.01 0.53 0.21 0.22 0.42 −0.11 0.52 0.37 0.60 x x −0.10 0.09 −0.06 −0.11 −0.44 1.00 Yes/No −0.29 −0.46 0.15 0.17 −0.36 −0.14 −0.14 −0.12 0.31 −0.49 −0.63 −0.36 x x −0.16 0.09 −0.08 −0.14 −0.57 −0.36 1.00 Number of words Transition statement Question reading level Unfamiliar technical term Vague or imprecise relative term Vague or ambiguous noun-phrase Complex syntax Working memory overload Parentheses Interviewer instructions Emphasis Not in battery First question in battery Later question in battery Demographic Sequential question number Open-ended text Open-ended numeric Closed-nominal Closed-ordinal Yes/No Length Number of words 1.00 Transition statement 0.05 1.00 Complexity Question reading level 0.30 −0.06 1.00 Unfamiliar technical term 0.12 −0.31 0.28 1.00 Vague or imprecise relative term 0.18 −0.42 −0.14 0.03 1.00 Vague or ambiguous noun-phrase 0.18 −0.14 0.19 0.08 0.28 1.00 Complex syntax 0.26 −0.14 −0.03 −0.10 0.23 0.09 1.00 Working memory overload 0.11 −0.29 0.00 0.05 0.32 0.12 −0.09 1.00 Decisions Parentheses −0.15 −0.14 −0.03 0.04 −0.04 0.09 −0.04 −0.09 1.00 Interviewer instructions 0.19 −0.43 −0.16 0.04 0.70 0.23 0.29 0.34 −0.15 1.00 Emphasis 0.45 0.19 −0.15 −0.20 0.34 −0.04 0.08 0.22 −0.19 0.48 1.00 Not in battery 0.05 −0.54 −0.13 0.09 0.78 0.28 0.23 0.40 −0.04 0.82 0.28 1.00 First question in battery x x x x x x x x x x x x x Later question in battery x x x x x x x x x x x x x x Practice Demographic −0.22 −0.16 0.29 0.07 0.26 0.17 0.13 0.31 −0.08 0.43 −0.02 0.42 x x 1.00 Sequential question number −0.07 −0.51 0.21 0.30 0.41 0.41 0.10 0.15 0.01 0.43 −0.32 0.58 x x 0.46 1.00 Controls Open-ended text −0.19 −0.08 0.03 0.10 0.13 0.27 −0.02 −0.05 −0.02 0.16 −0.11 0.13 x x 0.31 0.17 1.00 Open-ended numeric −0.12 0.01 0.13 −0.10 0.10 0.28 0.30 −0.09 −0.04 0.29 −0.06 0.23 x x 0.34 0.27 −0.02 1.00 Closed-nominal −0.05 0.75 −0.03 −0.13 −0.18 −0.22 −0.17 −0.20 −0.17 −0.13 0.34 −0.29 x x 0.03 −0.31 −0.10 −0.17 1.00 Closed-ordinal 0.50 −0.36 −0.20 −0.01 0.53 0.21 0.22 0.42 −0.11 0.52 0.37 0.60 x x −0.10 0.09 −0.06 −0.11 −0.44 1.00 Yes/No −0.29 −0.46 0.15 0.17 −0.36 −0.14 −0.14 −0.12 0.31 −0.49 −0.63 −0.36 x x −0.16 0.09 −0.08 −0.14 −0.57 −0.36 1.00 Note.— Bolded correlations are statistically significant at a p < 0.05 level. Correlations related to first and later questions in a battery are omitted because battery items were randomized within respondents. Open in new tab © The Author(s) 2019. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. All rights reserved. For permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) TI - The Effect of Question Characteristics on Question Reading Behaviors in Telephone Surveys JF - Journal of Survey Statistics and Methodology DO - 10.1093/jssam/smz031 DA - 2019-09-24 UR - https://www.deepdyve.com/lp/oxford-university-press/the-effect-of-question-characteristics-on-question-reading-behaviors-utwC63Q008 SP - 1 VL - Advance Article IS - DP - DeepDyve ER -