How do question evaluation methods compare in predicting problems observed in typical survey conditions?

How do question evaluation methods compare in predicting problems observed in typical survey... Abstract This paper tests two hypotheses about how well five different methods—the Question Understanding AID (QUAID), the Survey Quality Predictor (SQP), expert review, the Questionnaire Appraisal System (QAS), and cognitive interviews—predict problems (as measured by missing data, behavior codes, and response latency) that occur in typical survey conditions. We find partial support for both the complementary methods hypothesis (using the evaluations methods together will yield the best prediction of problems) and the test environment hypothesis (the more directly the evaluation method observes the response process, the better it will predict problems). In addition, we find evidence that the methods perform somewhat differently for items measuring subjective as opposed to objective characteristics. 1. INTRODUCTION Questionnaire designers rely on various methods to assess their work. Yet studies have found that different methods of evaluating questions often lead to different conclusions (Presser and Blair, 1994; Willis, Schechter, and Whitaker, 1999; Rothgeb, Willis, and Forsyth, 2001; Yan, Kreuter, and Tourangeau, 2012). Thus it is important to understand how well the various question evaluation methods predict problems observed in typical survey conditions. As far as we know, however, only two studies have explored this issue: Forsyth, Rothgeb, and Willis (2004) examined how well assessments of twelve questions—by cognitive interviews, expert review, and the Questionnaire Appraisal System—predicted observable problems (as assessed by behavior coding of interviewer-respondent interactions) when the questions were administered in the field. And Blair, Ackerman, Piccinino, and Levenstein (2007) examined how well cognitive interview assessments of twenty-four questions predicted the items’ performance in the field (again, as assessed by behavior coding of interviewer-respondent interactions).1 We explore this issue in greater depth by comparing the performance of five testing methods (cognitive interviews, expert review, the Questionnaire Appraisal System, and two computer-based approaches—the Survey Quality Predictor and the Question Understanding Aid) in predicting four problems (respondent requests for clarification, initial adequate answers, response latency, and missing data) that arose during the administration of eighty-eight questions in a random digit dial telephone survey. In addition, we examine whether the methods performed differently for items that measured objective versus subjective matters. Our analysis is guided by two hypotheses. The first hypothesis stems from the finding that evaluation methods often disagree in their diagnoses of problems. This is frequently interpreted to mean that the methods are complementary and therefore it is better to use multiple methods together (Presser, Rothgeb, Couper, Lessler, Martin et al. 2004; Yan, Kreuter, and Tourangeau, 2012). We call this the “complementary methods hypothesis.” Our second hypothesis—the “test environment hypothesis”—proposes that methods that more closely observe the response process have an advantage over those that observe the process less closely or not at all. The response process is set within a sociocultural context, and some approaches, such as cognitive interviews, allow the researcher to observe the process in that context (Gerber and Wellens 1997; Miller 2011). Expert review does not directly observe the process, but draws on prior experience and research that may be informed by contextual considerations. The Questionnaire Appraisal System draws less on context, and computerized evaluations do so the least. 2. METHODS Our data come from the 2006 Joint Program in Survey Methodology (JPSM) Survey Practicum. The JPSM Survey Practicum is a two-semester course in which graduate students gain experience developing a questionnaire, sampling a population, collecting and analyzing data, and reporting results. The aim of the 2006 Practicum was to examine the reliability of survey responses. The questionnaire included four types of questions: two types of attitudinal questions—questions asking about relatively familiar issues (the Iraq war and wiretapping) and questions about an unfamiliar issue (a new school-based program in mathematics or English); quasi-attitudinal questions (e.g., self-ratings of health); behavioral questions (e.g., doctor visits, trips to movies and to restaurants); and demographic questions. A random digit dial sample of the noninstitutionalized adult population with landlines in the contiguous United States yielded 739 interviews (for an AAPOR response rate 2 of 24.8 percent). The computer-assisted interviewing was conducted by Westat telephone interviewers during the summer of 2006. 2.1 Independent Variables The eighty-eight questions shown in Appendix A were evaluated using a variety of methods. The appendix separates the fifty-two subjective measures from the thirty-six objective ones as this distinction has been shown to affect problems answering questions (e.g., Alwin 2007). Thus, we present our analyses separately for the two kinds of items. 2.1.1 Computer based systems We used two computer-based systems. The first, the Question Understanding Aid (QUAID),2 is based on computational models developed in the fields of computer science, computational linguistics, discourse processing, and cognitive science. The software identifies technical features of questions that have the potential to cause comprehension problems. It rates questions on five classes of comprehension problems: unfamiliar technical terms, vague or imprecise predicate or relative terms, vague or imprecise noun phrases, complex syntax, and working memory overload. QUAID identifies these problems by comparing the words in a question to several databases (e.g., Coltheart’s MRC Psycholinguistics Database). The second automated system, the Survey Quality Predictor (SQP), which is based on a meta-analysis of multitrait-multimethod (MTMM) studies, predicts the reliability, validity, method effects, and total quality of questions (Saris and Gallhofer 2007). Total quality is the product of reliability and validity. To use SQP, each question is coded according to the variables from the MTMM studies. One of the authors coded the questions using SQP 2.0.3 2.1.2 Expert Review Three reviewers, each of whom had either a PhD in survey methodology or a related discipline or more than five years of experience as a survey researcher, were given the questions and the following instructions: Question wordings, introductions associated with questions, and response categories are considered in scope for this evaluation. For each survey question, identify and briefly explain each specific problem you find. Please type a brief description of the problem immediately following the question in the attached document. You may observe multiple problems with a question. Please describe each one. You do not need to type anything after questions for which you do not observe a problem. This is an unstructured expert review as it did not provide a checklist or other specific guidance about how to conduct the evaluation. As far as we are aware, the relative frequency of structured versus unstructured expert reviews is not known. Likewise, we know of no evidence about the frequency with which subject matter experts are included with survey experts. 2.1.3 Forms Appraisal Students from a JPSM graduate level course on questionnaire design were asked to evaluate the questions using the Questionnaire Appraisal System (QAS).4 Students were assigned different sections of the questionnaire and each was asked to evaluate independently whether the questions had any of twenty-six potential problems. The form also called for a brief description of each problem found. 2.1.4 Cognitive Interviews About a month later, the same students who did the QAS coding conducted cognitive interviews of the questions. Ten students did cognitive interviews using only questions they had not coded with the QAS. The other seven students did cognitive interviews in which about one-third of the questions were questions they had coded with the QAS. In the language of Willis (2015), these constituted a single round (as opposed to an iterative round) of reparative (as opposed to interpretive) interviews. Each student was instructed to develop a cognitive protocol, including think-aloud exercises and probes, and then to interview four subjects, recruited from among their friends, neighbors, co-workers, or other convenient populations. Thus, there was substantial variability in the nature of the interviews (e.g., in the balance between concurrent and retrospective probing and in the content of the preplanned probes). All interviews were recorded so that the students could review the recordings when preparing reports that listed the problems they diagnosed (in their own words). Both the reports and audio tapes were turned in but our analysis is based only on the reports. There are many ways to conduct cognitive interviews (e.g., Beatty, Paul, and Willis 2007; Miller, Chepp, Willson, and Padilla 2014; Willis 2015). Our approach differs from a commonly recommended one to use a single protocol for each round of interviews. We know of no estimates of how often this approach is used as opposed to our approach (in which the protocols vary across interviews). But even with a single protocol, there is apt to be significant variability in how it is implemented. Indeed, cognitive interviewing is inherently unstandardized because “the objective is not to produce cookie-cutter responses to standard stimuli, but to enable our participants to provide rich elaborated information” (Willis 2015). Based partly on the finding that organizations using different protocols to test a common questionnaire arrived at similar conclusions (Willis, Schechter, and Whitaker 1999), we think our approach is likely to have yielded problems similar to those that would have been produced using a single protocol, though (as we recommend in our conclusion) this is an issue for future research. 2.1.5 Problem Coding The problems identified from QUAID, QAS, expert review and cognitive interviews were coded according to the scheme used by Presser and Blair (1994), which has four basic categories of problems: respondent semantic, respondent task, interviewer, and analysis. (SQP yields a single number that does not identify particular problem types.) Respondent semantic problems refer to respondents having difficulty understanding or remembering a question or having diverse understandings of the question’s meaning. They are divided into two types: problems due to the structure of the question or the questionnaire (for instance, item wordiness or connections between questions) and those due to the meaning of terms or concepts in the question. Respondent task problems are of three types: difficulty recalling information or formulating an answer; insufficient response categories; and question sensitivity. Interviewer problems refer to problems reading the question or difficulty understanding how to implement a question. Analysis problems involve difficulties confronted during data analysis (e.g., lack of variation in responses). Two research assistants decided which of the Presser-Blair categories best fit each of the 262 problems identified by the cognitive interviews. In a preliminary sample of similar problems double-coded by these research assistants, the overall inter-coder agreement, as measured by Cohen’s kappa, was 0.76, and the kappas by category were respondent semantic, 0.91; respondent task, 0.73; interviewer, 0.68; and analysis, 0.49. The 155 problems identified by expert review were coded into the Presser-Blair categories by the first author. A crosswalk (shown in table 1) was used to systematically code the QUAID and QAS problems into Presser-Blair categories. The first author then determined which problems matched across methods and assigned an identifier to each problem. Table 1. Crosswalk between QUAID, QAS, and Presser-Blair Codes Presser-Blair codes  QUAID  QAS  Semantic I: Structure of question or questionnaire  Information overload  Working memory overload      Structure/organization  Complex syntax  2a-b: Conflicting or inaccurate instructions, complicated instructions 3a: Wording: complex syntax    Transition problem      Semantic II: Meaning of words  Boundary lines  Vague or imprecise relative or technical term, vague or ambiguous noun phrase  3c-d: Vague, reference period    Technical term not understood  Unfamiliar technical term  3b & 7c. Technical term    Common term not understood        Double-barreled    4c. Double barreled  Respondent Task I: Recall/Response Formation  Recall/response difficult    5d. Computation    Recall/response impossible    5a-c: Knowledge, Attitude, Recall    Recall/response redundant        Recall/response resisted    4a-b: Inappropriate assumptions, Assumes constant behavior  Respondent Task II: Response Categories  Overlapping  Vague or imprecise relative or technical term or noun phrase  7d-e: Vague, Overlapping    Insufficient    7a & f: Open ended, Missing    Too fine distinction        Inappropriate    7b. Mismatch  Respondent Task III: Sensitivity  Sensitivity    6a-c: Sensitive content, Sensitive wording, Socially acceptable  Interviewer  Procedural        Reading problem    1a-c: What to read, Missing information, How to read    Coding answers to open      Analysis  Question answered same by all respondents        Question suggests answers        Acquiescence        Order of response categories            8. Other  Presser-Blair codes  QUAID  QAS  Semantic I: Structure of question or questionnaire  Information overload  Working memory overload      Structure/organization  Complex syntax  2a-b: Conflicting or inaccurate instructions, complicated instructions 3a: Wording: complex syntax    Transition problem      Semantic II: Meaning of words  Boundary lines  Vague or imprecise relative or technical term, vague or ambiguous noun phrase  3c-d: Vague, reference period    Technical term not understood  Unfamiliar technical term  3b & 7c. Technical term    Common term not understood        Double-barreled    4c. Double barreled  Respondent Task I: Recall/Response Formation  Recall/response difficult    5d. Computation    Recall/response impossible    5a-c: Knowledge, Attitude, Recall    Recall/response redundant        Recall/response resisted    4a-b: Inappropriate assumptions, Assumes constant behavior  Respondent Task II: Response Categories  Overlapping  Vague or imprecise relative or technical term or noun phrase  7d-e: Vague, Overlapping    Insufficient    7a & f: Open ended, Missing    Too fine distinction        Inappropriate    7b. Mismatch  Respondent Task III: Sensitivity  Sensitivity    6a-c: Sensitive content, Sensitive wording, Socially acceptable  Interviewer  Procedural        Reading problem    1a-c: What to read, Missing information, How to read    Coding answers to open      Analysis  Question answered same by all respondents        Question suggests answers        Acquiescence        Order of response categories            8. Other  Table 1. Crosswalk between QUAID, QAS, and Presser-Blair Codes Presser-Blair codes  QUAID  QAS  Semantic I: Structure of question or questionnaire  Information overload  Working memory overload      Structure/organization  Complex syntax  2a-b: Conflicting or inaccurate instructions, complicated instructions 3a: Wording: complex syntax    Transition problem      Semantic II: Meaning of words  Boundary lines  Vague or imprecise relative or technical term, vague or ambiguous noun phrase  3c-d: Vague, reference period    Technical term not understood  Unfamiliar technical term  3b & 7c. Technical term    Common term not understood        Double-barreled    4c. Double barreled  Respondent Task I: Recall/Response Formation  Recall/response difficult    5d. Computation    Recall/response impossible    5a-c: Knowledge, Attitude, Recall    Recall/response redundant        Recall/response resisted    4a-b: Inappropriate assumptions, Assumes constant behavior  Respondent Task II: Response Categories  Overlapping  Vague or imprecise relative or technical term or noun phrase  7d-e: Vague, Overlapping    Insufficient    7a & f: Open ended, Missing    Too fine distinction        Inappropriate    7b. Mismatch  Respondent Task III: Sensitivity  Sensitivity    6a-c: Sensitive content, Sensitive wording, Socially acceptable  Interviewer  Procedural        Reading problem    1a-c: What to read, Missing information, How to read    Coding answers to open      Analysis  Question answered same by all respondents        Question suggests answers        Acquiescence        Order of response categories            8. Other  Presser-Blair codes  QUAID  QAS  Semantic I: Structure of question or questionnaire  Information overload  Working memory overload      Structure/organization  Complex syntax  2a-b: Conflicting or inaccurate instructions, complicated instructions 3a: Wording: complex syntax    Transition problem      Semantic II: Meaning of words  Boundary lines  Vague or imprecise relative or technical term, vague or ambiguous noun phrase  3c-d: Vague, reference period    Technical term not understood  Unfamiliar technical term  3b & 7c. Technical term    Common term not understood        Double-barreled    4c. Double barreled  Respondent Task I: Recall/Response Formation  Recall/response difficult    5d. Computation    Recall/response impossible    5a-c: Knowledge, Attitude, Recall    Recall/response redundant        Recall/response resisted    4a-b: Inappropriate assumptions, Assumes constant behavior  Respondent Task II: Response Categories  Overlapping  Vague or imprecise relative or technical term or noun phrase  7d-e: Vague, Overlapping    Insufficient    7a & f: Open ended, Missing    Too fine distinction        Inappropriate    7b. Mismatch  Respondent Task III: Sensitivity  Sensitivity    6a-c: Sensitive content, Sensitive wording, Socially acceptable  Interviewer  Procedural        Reading problem    1a-c: What to read, Missing information, How to read    Coding answers to open      Analysis  Question answered same by all respondents        Question suggests answers        Acquiescence        Order of response categories            8. Other  2.2 Dependent Variables Our measures of problems in the field are of three kinds: two behavior codes, response latency, and missing data. Behavior coding involves the coding of behaviors that indicate a breakdown or potential problem with the question-answer process (Fowler 2011). Two results from behavior coding that are commonly used to assess survey questions are the percentage of respondents who provide an adequate answer to a question and the percentage of respondents who request clarification of the question. Although these behaviors may not always stem from problems with a question, Hess, Singer, and Bushery (1999) found that they were significant predictors of the reliability of questions. Response latency refers to how long it takes respondents to answer a question. Like behavior coding, response latencies provide a quantitative assessment of the difficulty respondents have with a question. The assumption is that problems with a question lead to slower response times, because resolution of the problems requires additional time (Basilli and Scott 1996). Draisma and Dijkstra (2004) show that longer response latencies were related to inaccurate responses. Item nonresponse is one of the most widely used indicators of data quality. Research suggests that question sensitivity and the cognitive effort needed to answer the question are two of the most important determinants of item nonresponse (Pickery and Loosveldt 2001; Shoemaker, Eicholz, and Skewes 2002). 2.2.1 Behavior coding Resource constraints led us to behavior code only a random subsample of 377 of the 739 interviews. Two research assistants coded 292 interviews and the first author coded eighty-five interviews. The coding scheme included interviewer codes (assessing the extent to which the interviewer read the question exactly and whether or not the interviewer probed or repeated the question) and respondent codes (indicating the adequacy of the respondent’s initial answer, whether the respondent requested clarification, whether the respondent used pauses or fillers, and whether the respondent interrupted the reading of the question). Approximately six percent of the cases were double coded, and Cohen’s kappa indicated very high agreement (0.9) for the two codes we use in our analysis: initial adequate answers and requests for clarification. 2.2.2 Response latency Again, due to resource constraints, only the first 111 cases that were behavior coded were selected for response latency measurement: the elapsed time from the end of the interviewer’s reading of a question to the beginning of the respondent’s answer. One research assistant coded eighty-eight interviews, and the first author coded the remaining twenty-three interviews. The 111 cases were asked an average of forty-six questions (many of the survey’s eighty-eight, questions were part of wording experiments asked of random half samples) for a total of 5,102 possible response latencies from which we dropped about five percent (287) because the respondent interrupted the reading of the question. The latencies were highly skewed, and thus, we assigned observations beyond the upper and lower one percentile the exact values of the upper and lower one percentile, respectively. We then log transformed the latencies for use in our statistical models. 2.2.3 Item nonresponse We coded responses to each survey question as zero if the respondent provided a valid answer and one if the respondent gave an invalid answer such as “don’t know” or refused to answer. 3. RESULTS 3.1 Descriptive Statistics Our primary focus is how well the assessments from the ex-ante (QUAID, SQP, Expert Review, and QAS) and laboratory (cognitive interview) methods predicted problems observed under actual survey conditions, as measured by behavior coding, item nonresponse, and response latency. Descriptive statistics for the four dependent variables are shown in table 2. As can be seen there, nonresponse was higher and more variable to the subjective items than to the objective ones. The means and variances of the remaining three dependent variables were relatively similar across the two types of items. Table 2. Means, Standard Deviations, and Ranges for Dependent Variables   Objective questions (n = 36)   Subjective questions (n = 52)   Dependent variable  Mean  Standard deviation  Min  Max  Mean  Standard deviation  Min  Max  % adequate answers  84.06  14.03  41.97  97.16  74.35  10.2  40  91.8  % requests for clarification  5.62  5.48  0.53  25.33  7.35  5.85  0  32.76  Log response latency (milliseconds)  7.16  0.63  6.15  8.86  7.36  0.47  6.2  8.41  % item nonresponse  0.92  0.88  0  2.94  4.51  5.94  0  23.08    Objective questions (n = 36)   Subjective questions (n = 52)   Dependent variable  Mean  Standard deviation  Min  Max  Mean  Standard deviation  Min  Max  % adequate answers  84.06  14.03  41.97  97.16  74.35  10.2  40  91.8  % requests for clarification  5.62  5.48  0.53  25.33  7.35  5.85  0  32.76  Log response latency (milliseconds)  7.16  0.63  6.15  8.86  7.36  0.47  6.2  8.41  % item nonresponse  0.92  0.88  0  2.94  4.51  5.94  0  23.08  Table 2. Means, Standard Deviations, and Ranges for Dependent Variables   Objective questions (n = 36)   Subjective questions (n = 52)   Dependent variable  Mean  Standard deviation  Min  Max  Mean  Standard deviation  Min  Max  % adequate answers  84.06  14.03  41.97  97.16  74.35  10.2  40  91.8  % requests for clarification  5.62  5.48  0.53  25.33  7.35  5.85  0  32.76  Log response latency (milliseconds)  7.16  0.63  6.15  8.86  7.36  0.47  6.2  8.41  % item nonresponse  0.92  0.88  0  2.94  4.51  5.94  0  23.08    Objective questions (n = 36)   Subjective questions (n = 52)   Dependent variable  Mean  Standard deviation  Min  Max  Mean  Standard deviation  Min  Max  % adequate answers  84.06  14.03  41.97  97.16  74.35  10.2  40  91.8  % requests for clarification  5.62  5.48  0.53  25.33  7.35  5.85  0  32.76  Log response latency (milliseconds)  7.16  0.63  6.15  8.86  7.36  0.47  6.2  8.41  % item nonresponse  0.92  0.88  0  2.94  4.51  5.94  0  23.08  With the exception of the SQP score, the independent variables in the model refer to the number of times that a method diagnosed a particular problem type (see table 1) for a question. Descriptive statistics for each of these diagnoses (as well as for the SQP score) are shown in table 3. Overall, QAS is most apt to identify problems, followed by cognitive interviews, expert review, and QUAID. Table 3. Means, Standard Deviations, and Ranges for Independent Variables   Objective questions (n = 36)   Subjective questions (n = 52)   Variable  Mean  Standard deviation  Min  Max  Mean  Standard deviation  Min  Max  Semantic I problems: question structure   QUAID  0.61  0.55  0  2  0.54  0.58  0  2   Expert review  0.50  0.81  0  3  0.25  0.56  0  2   QAS  0.83  1.08  0  4  0.56  0.85  0  3   Cognitive  interviews  0  0  0  0  0.19  0.49  0  2  Semantic II problems: meaning   QUAID  2.06  1.41  0  5  1.46  1.13  0  4   Expert review  0.61  0.84  0  3  0.87  0.89  0  3   QAS  3.56  2.32  0  8  3.38  1.59  0  8   Cognitive  interviews  1.11  1.14  0  5  2.00  2.04  0  8  Respondent task I problems: recall/judgment   QUAID  –  –  –  –  –  –  –  –   Expert review  0.61  0.64  0  2  0.21  0.46  0  2   QAS  1.19  1.31  0  5  2.06  1.84  0  6   Cognitive  interviews  0.67  1.55  0  7  0.87  1.51  0  6  Respondent task II problems: response categories   QUAID  0.11  0.52  0  3  0.23  0.43  0  1   Expert review  0.17  0.45  0  2  0.04  0.19  0  1   QAS  0.25  0.65  0  3  0.92  1.08  0  3   Cognitive  interviews  0.19  0.71  0  3  0.27  0.6  0  3  Respondent task III problems: sensitivity   QUAID  –  –  –  –  –  –  –  –   Expert review  0  0  0  0  0.15  0.41  0  2   QAS  0.78  0.76  0  3  0.85  0.80  0  3   Cognitive  interviews  0.44  0.81  0  2  0.04  0.19  0  1  Interviewer problems   QUAID  –  –  –  –  –  –  –  –   Expert review  0  0  0  0  0  0  0  0   QAS  0.08  0.28  0  1  0.02  0.14  0  1   Cognitive  interviews  0  0  0  0  0  0  0  0  Analysis problems   QUAID  –  –  –  –  –  –  –  –   Expert review  0.11  0.32  0  1  0.08  0.27  0  1   QAS  0  0  0  0  0.06  0.31  0  2   Cognitive  interviews  0  0  0  0  0  0  0  0  Other method   SQP total  quality score  0.6  0.06  0.49  0.71  0.56  0.05  0.46  0.63    Objective questions (n = 36)   Subjective questions (n = 52)   Variable  Mean  Standard deviation  Min  Max  Mean  Standard deviation  Min  Max  Semantic I problems: question structure   QUAID  0.61  0.55  0  2  0.54  0.58  0  2   Expert review  0.50  0.81  0  3  0.25  0.56  0  2   QAS  0.83  1.08  0  4  0.56  0.85  0  3   Cognitive  interviews  0  0  0  0  0.19  0.49  0  2  Semantic II problems: meaning   QUAID  2.06  1.41  0  5  1.46  1.13  0  4   Expert review  0.61  0.84  0  3  0.87  0.89  0  3   QAS  3.56  2.32  0  8  3.38  1.59  0  8   Cognitive  interviews  1.11  1.14  0  5  2.00  2.04  0  8  Respondent task I problems: recall/judgment   QUAID  –  –  –  –  –  –  –  –   Expert review  0.61  0.64  0  2  0.21  0.46  0  2   QAS  1.19  1.31  0  5  2.06  1.84  0  6   Cognitive  interviews  0.67  1.55  0  7  0.87  1.51  0  6  Respondent task II problems: response categories   QUAID  0.11  0.52  0  3  0.23  0.43  0  1   Expert review  0.17  0.45  0  2  0.04  0.19  0  1   QAS  0.25  0.65  0  3  0.92  1.08  0  3   Cognitive  interviews  0.19  0.71  0  3  0.27  0.6  0  3  Respondent task III problems: sensitivity   QUAID  –  –  –  –  –  –  –  –   Expert review  0  0  0  0  0.15  0.41  0  2   QAS  0.78  0.76  0  3  0.85  0.80  0  3   Cognitive  interviews  0.44  0.81  0  2  0.04  0.19  0  1  Interviewer problems   QUAID  –  –  –  –  –  –  –  –   Expert review  0  0  0  0  0  0  0  0   QAS  0.08  0.28  0  1  0.02  0.14  0  1   Cognitive  interviews  0  0  0  0  0  0  0  0  Analysis problems   QUAID  –  –  –  –  –  –  –  –   Expert review  0.11  0.32  0  1  0.08  0.27  0  1   QAS  0  0  0  0  0.06  0.31  0  2   Cognitive  interviews  0  0  0  0  0  0  0  0  Other method   SQP total  quality score  0.6  0.06  0.49  0.71  0.56  0.05  0.46  0.63  Note: QUAID entries with a “–” represent a problem that method is not designed to identify. Table 3. Means, Standard Deviations, and Ranges for Independent Variables   Objective questions (n = 36)   Subjective questions (n = 52)   Variable  Mean  Standard deviation  Min  Max  Mean  Standard deviation  Min  Max  Semantic I problems: question structure   QUAID  0.61  0.55  0  2  0.54  0.58  0  2   Expert review  0.50  0.81  0  3  0.25  0.56  0  2   QAS  0.83  1.08  0  4  0.56  0.85  0  3   Cognitive  interviews  0  0  0  0  0.19  0.49  0  2  Semantic II problems: meaning   QUAID  2.06  1.41  0  5  1.46  1.13  0  4   Expert review  0.61  0.84  0  3  0.87  0.89  0  3   QAS  3.56  2.32  0  8  3.38  1.59  0  8   Cognitive  interviews  1.11  1.14  0  5  2.00  2.04  0  8  Respondent task I problems: recall/judgment   QUAID  –  –  –  –  –  –  –  –   Expert review  0.61  0.64  0  2  0.21  0.46  0  2   QAS  1.19  1.31  0  5  2.06  1.84  0  6   Cognitive  interviews  0.67  1.55  0  7  0.87  1.51  0  6  Respondent task II problems: response categories   QUAID  0.11  0.52  0  3  0.23  0.43  0  1   Expert review  0.17  0.45  0  2  0.04  0.19  0  1   QAS  0.25  0.65  0  3  0.92  1.08  0  3   Cognitive  interviews  0.19  0.71  0  3  0.27  0.6  0  3  Respondent task III problems: sensitivity   QUAID  –  –  –  –  –  –  –  –   Expert review  0  0  0  0  0.15  0.41  0  2   QAS  0.78  0.76  0  3  0.85  0.80  0  3   Cognitive  interviews  0.44  0.81  0  2  0.04  0.19  0  1  Interviewer problems   QUAID  –  –  –  –  –  –  –  –   Expert review  0  0  0  0  0  0  0  0   QAS  0.08  0.28  0  1  0.02  0.14  0  1   Cognitive  interviews  0  0  0  0  0  0  0  0  Analysis problems   QUAID  –  –  –  –  –  –  –  –   Expert review  0.11  0.32  0  1  0.08  0.27  0  1   QAS  0  0  0  0  0.06  0.31  0  2   Cognitive  interviews  0  0  0  0  0  0  0  0  Other method   SQP total  quality score  0.6  0.06  0.49  0.71  0.56  0.05  0.46  0.63    Objective questions (n = 36)   Subjective questions (n = 52)   Variable  Mean  Standard deviation  Min  Max  Mean  Standard deviation  Min  Max  Semantic I problems: question structure   QUAID  0.61  0.55  0  2  0.54  0.58  0  2   Expert review  0.50  0.81  0  3  0.25  0.56  0  2   QAS  0.83  1.08  0  4  0.56  0.85  0  3   Cognitive  interviews  0  0  0  0  0.19  0.49  0  2  Semantic II problems: meaning   QUAID  2.06  1.41  0  5  1.46  1.13  0  4   Expert review  0.61  0.84  0  3  0.87  0.89  0  3   QAS  3.56  2.32  0  8  3.38  1.59  0  8   Cognitive  interviews  1.11  1.14  0  5  2.00  2.04  0  8  Respondent task I problems: recall/judgment   QUAID  –  –  –  –  –  –  –  –   Expert review  0.61  0.64  0  2  0.21  0.46  0  2   QAS  1.19  1.31  0  5  2.06  1.84  0  6   Cognitive  interviews  0.67  1.55  0  7  0.87  1.51  0  6  Respondent task II problems: response categories   QUAID  0.11  0.52  0  3  0.23  0.43  0  1   Expert review  0.17  0.45  0  2  0.04  0.19  0  1   QAS  0.25  0.65  0  3  0.92  1.08  0  3   Cognitive  interviews  0.19  0.71  0  3  0.27  0.6  0  3  Respondent task III problems: sensitivity   QUAID  –  –  –  –  –  –  –  –   Expert review  0  0  0  0  0.15  0.41  0  2   QAS  0.78  0.76  0  3  0.85  0.80  0  3   Cognitive  interviews  0.44  0.81  0  2  0.04  0.19  0  1  Interviewer problems   QUAID  –  –  –  –  –  –  –  –   Expert review  0  0  0  0  0  0  0  0   QAS  0.08  0.28  0  1  0.02  0.14  0  1   Cognitive  interviews  0  0  0  0  0  0  0  0  Analysis problems   QUAID  –  –  –  –  –  –  –  –   Expert review  0.11  0.32  0  1  0.08  0.27  0  1   QAS  0  0  0  0  0.06  0.31  0  2   Cognitive  interviews  0  0  0  0  0  0  0  0  Other method   SQP total  quality score  0.6  0.06  0.49  0.71  0.56  0.05  0.46  0.63  Note: QUAID entries with a “–” represent a problem that method is not designed to identify. 3.2 Multivariate Results Table 4 shows the Pearson correlations among the diagnoses for the objective items, and table 5 shows the same correlations for the subjective items. There are fewer entries in the former table as, among the objective items, cognitive interviews never identified a question structure problem, expert review never identified a sensitivity problem, and QAS never identified an analysis problem. For the objective items, among the 19 pairs of methods in which more than one method made a particular diagnosis (bolded in the table), 12 of the associations are statistically significant, with the significant correlations ranging between .33 and .81 and all being in the expected direction. For the subjective items, among the 25 pairs of methods in which more than one method made a particular diagnosis (bolded in the table), only 9 of the associations are statistically significant, with the significant correlations ranging between .29 and .68 and one of them being in the wrong direction. Thus consistent with findings from prior studies, there is only modest agreement between evaluation methods, though the agreement is higher for objective items than for subjective ones. Table 4. Pearson Correlations among the Method Assessments for the Objective Items   Variable  Semantic I   Semantic II   Respondent task I   Respondent task II   Respondent task III   Interviewer   Analysis   Survey       Quaid  Expert review  QAS  Cog. Int.  Quaid  Expert review  QAS  Cog. Int.  Expert review  QAS  Cog. Int.  Quaid  Expert review  QAS  Cog. Int.  Expert review  QAS  Cog. Int.  QAS  Expert review  QAS  Quality Predictor  Semantic I  Quaid    0.39*  0.51*  –  0.36*  0.22  0.53*  0.39*  0.37*  0.43*  0.35*  0.25  0.04  −0.12  0.20  –  −0.01  0.01  0.03  0.09  –  0.26    Exp. Rev      0.81*  –  0.17  0.08  0.41*  0.06  0.38*  0.18  −0.18  0.07  −0.16  −0.08  −0.12  –  −0.28  −0.35*  0.06  0.22  –  0.42*    QAS        –  0.42*  0.12  0.40*  0.11  0.31  0.27  0.00  0.19  −0.06  −0.06  −0.03  –  −0.25  −0.21  0.05  0.06  –  0.30    Cog. Int.          –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  Semantic II  Quaid            0.28  0.16  0.26  0.21  0.09  0.10  −0.09  0.12  −0.14  0.05  –  −0.12  0.20  −0.30  0.05  –  −0.03    Exp. Rev              0.64*  0.61*  0.03  0.15  −0.32  −0.16  −0.28  −0.29  −0.21  –  −0.18  0.01  −0.10  0.27  –  0.17    QAS                0.63*  0.21  0.32  −0.12  0.04  −0.31  −0.28  −0.15  –  −0.04  0.02  0.19  0.26  –  0.25    Cog. Int.                  −0.09  0.08  −0.11  −0.12  −0.21  −0.31  −0.06  –  −0.14  0.29  −0.12  0.12  –  −0.04  Respndt I  Exp. Rev                    0.67*  0.44*  0.13  −0.07  −0.24  −0.02  –  −0.01  −0.26  −0.13  0.22  –  0.23    QAS                      0.33*  0.13  −0.06  −0.26  −0.01  –  −0.04  −0.25  −0.12  0.22  –  0.35*    Cog. Int.                        0.58*  0.45*  −0.06  0.53*  –  0.20  0.19  −0.13  −0.15  –  −0.18  Respndt II  Quaid                          0.41*  −0.08  0.40*  –  0.14  0.02  −0.07  −0.08  –  −0.06    Exp. Rev                            0.25  0.52*  –  0.11  −0.05  −0.11  −0.13  –  −0.19    QAS                              0.08  –  0.46*  −0.22  −0.12  −0.14  –  −0.02    Cog. Int.                                -  0.24  0.14  −0.08  −0.10  –  −0.11  Respndt III  Exp. Rev                                  –  –  –  –  –  –    QAS                                    0.30  −0.04  0.22  –  −0.25    Cog. Int.                                      0.08  −0.20  –  −0.73*  Interviewer  QAS                                        −0.11  –  −0.08  Analysis  Exp. Rev                                          –  0.24    QAS                                            −  Survey Quality Predictor                                                Variable  Semantic I   Semantic II   Respondent task I   Respondent task II   Respondent task III   Interviewer   Analysis   Survey       Quaid  Expert review  QAS  Cog. Int.  Quaid  Expert review  QAS  Cog. Int.  Expert review  QAS  Cog. Int.  Quaid  Expert review  QAS  Cog. Int.  Expert review  QAS  Cog. Int.  QAS  Expert review  QAS  Quality Predictor  Semantic I  Quaid    0.39*  0.51*  –  0.36*  0.22  0.53*  0.39*  0.37*  0.43*  0.35*  0.25  0.04  −0.12  0.20  –  −0.01  0.01  0.03  0.09  –  0.26    Exp. Rev      0.81*  –  0.17  0.08  0.41*  0.06  0.38*  0.18  −0.18  0.07  −0.16  −0.08  −0.12  –  −0.28  −0.35*  0.06  0.22  –  0.42*    QAS        –  0.42*  0.12  0.40*  0.11  0.31  0.27  0.00  0.19  −0.06  −0.06  −0.03  –  −0.25  −0.21  0.05  0.06  –  0.30    Cog. Int.          –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  Semantic II  Quaid            0.28  0.16  0.26  0.21  0.09  0.10  −0.09  0.12  −0.14  0.05  –  −0.12  0.20  −0.30  0.05  –  −0.03    Exp. Rev              0.64*  0.61*  0.03  0.15  −0.32  −0.16  −0.28  −0.29  −0.21  –  −0.18  0.01  −0.10  0.27  –  0.17    QAS                0.63*  0.21  0.32  −0.12  0.04  −0.31  −0.28  −0.15  –  −0.04  0.02  0.19  0.26  –  0.25    Cog. Int.                  −0.09  0.08  −0.11  −0.12  −0.21  −0.31  −0.06  –  −0.14  0.29  −0.12  0.12  –  −0.04  Respndt I  Exp. Rev                    0.67*  0.44*  0.13  −0.07  −0.24  −0.02  –  −0.01  −0.26  −0.13  0.22  –  0.23    QAS                      0.33*  0.13  −0.06  −0.26  −0.01  –  −0.04  −0.25  −0.12  0.22  –  0.35*    Cog. Int.                        0.58*  0.45*  −0.06  0.53*  –  0.20  0.19  −0.13  −0.15  –  −0.18  Respndt II  Quaid                          0.41*  −0.08  0.40*  –  0.14  0.02  −0.07  −0.08  –  −0.06    Exp. Rev                            0.25  0.52*  –  0.11  −0.05  −0.11  −0.13  –  −0.19    QAS                              0.08  –  0.46*  −0.22  −0.12  −0.14  –  −0.02    Cog. Int.                                -  0.24  0.14  −0.08  −0.10  –  −0.11  Respndt III  Exp. Rev                                  –  –  –  –  –  –    QAS                                    0.30  −0.04  0.22  –  −0.25    Cog. Int.                                      0.08  −0.20  –  −0.73*  Interviewer  QAS                                        −0.11  –  −0.08  Analysis  Exp. Rev                                          –  0.24    QAS                                            −  Survey Quality Predictor                                              * p < 0.05. Bolded entries are those for which more than one method made a particular diagnosis. Table 4. Pearson Correlations among the Method Assessments for the Objective Items   Variable  Semantic I   Semantic II   Respondent task I   Respondent task II   Respondent task III   Interviewer   Analysis   Survey       Quaid  Expert review  QAS  Cog. Int.  Quaid  Expert review  QAS  Cog. Int.  Expert review  QAS  Cog. Int.  Quaid  Expert review  QAS  Cog. Int.  Expert review  QAS  Cog. Int.  QAS  Expert review  QAS  Quality Predictor  Semantic I  Quaid    0.39*  0.51*  –  0.36*  0.22  0.53*  0.39*  0.37*  0.43*  0.35*  0.25  0.04  −0.12  0.20  –  −0.01  0.01  0.03  0.09  –  0.26    Exp. Rev      0.81*  –  0.17  0.08  0.41*  0.06  0.38*  0.18  −0.18  0.07  −0.16  −0.08  −0.12  –  −0.28  −0.35*  0.06  0.22  –  0.42*    QAS        –  0.42*  0.12  0.40*  0.11  0.31  0.27  0.00  0.19  −0.06  −0.06  −0.03  –  −0.25  −0.21  0.05  0.06  –  0.30    Cog. Int.          –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  Semantic II  Quaid            0.28  0.16  0.26  0.21  0.09  0.10  −0.09  0.12  −0.14  0.05  –  −0.12  0.20  −0.30  0.05  –  −0.03    Exp. Rev              0.64*  0.61*  0.03  0.15  −0.32  −0.16  −0.28  −0.29  −0.21  –  −0.18  0.01  −0.10  0.27  –  0.17    QAS                0.63*  0.21  0.32  −0.12  0.04  −0.31  −0.28  −0.15  –  −0.04  0.02  0.19  0.26  –  0.25    Cog. Int.                  −0.09  0.08  −0.11  −0.12  −0.21  −0.31  −0.06  –  −0.14  0.29  −0.12  0.12  –  −0.04  Respndt I  Exp. Rev                    0.67*  0.44*  0.13  −0.07  −0.24  −0.02  –  −0.01  −0.26  −0.13  0.22  –  0.23    QAS                      0.33*  0.13  −0.06  −0.26  −0.01  –  −0.04  −0.25  −0.12  0.22  –  0.35*    Cog. Int.                        0.58*  0.45*  −0.06  0.53*  –  0.20  0.19  −0.13  −0.15  –  −0.18  Respndt II  Quaid                          0.41*  −0.08  0.40*  –  0.14  0.02  −0.07  −0.08  –  −0.06    Exp. Rev                            0.25  0.52*  –  0.11  −0.05  −0.11  −0.13  –  −0.19    QAS                              0.08  –  0.46*  −0.22  −0.12  −0.14  –  −0.02    Cog. Int.                                -  0.24  0.14  −0.08  −0.10  –  −0.11  Respndt III  Exp. Rev                                  –  –  –  –  –  –    QAS                                    0.30  −0.04  0.22  –  −0.25    Cog. Int.                                      0.08  −0.20  –  −0.73*  Interviewer  QAS                                        −0.11  –  −0.08  Analysis  Exp. Rev                                          –  0.24    QAS                                            −  Survey Quality Predictor                                                Variable  Semantic I   Semantic II   Respondent task I   Respondent task II   Respondent task III   Interviewer   Analysis   Survey       Quaid  Expert review  QAS  Cog. Int.  Quaid  Expert review  QAS  Cog. Int.  Expert review  QAS  Cog. Int.  Quaid  Expert review  QAS  Cog. Int.  Expert review  QAS  Cog. Int.  QAS  Expert review  QAS  Quality Predictor  Semantic I  Quaid    0.39*  0.51*  –  0.36*  0.22  0.53*  0.39*  0.37*  0.43*  0.35*  0.25  0.04  −0.12  0.20  –  −0.01  0.01  0.03  0.09  –  0.26    Exp. Rev      0.81*  –  0.17  0.08  0.41*  0.06  0.38*  0.18  −0.18  0.07  −0.16  −0.08  −0.12  –  −0.28  −0.35*  0.06  0.22  –  0.42*    QAS        –  0.42*  0.12  0.40*  0.11  0.31  0.27  0.00  0.19  −0.06  −0.06  −0.03  –  −0.25  −0.21  0.05  0.06  –  0.30    Cog. Int.          –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  Semantic II  Quaid            0.28  0.16  0.26  0.21  0.09  0.10  −0.09  0.12  −0.14  0.05  –  −0.12  0.20  −0.30  0.05  –  −0.03    Exp. Rev              0.64*  0.61*  0.03  0.15  −0.32  −0.16  −0.28  −0.29  −0.21  –  −0.18  0.01  −0.10  0.27  –  0.17    QAS                0.63*  0.21  0.32  −0.12  0.04  −0.31  −0.28  −0.15  –  −0.04  0.02  0.19  0.26  –  0.25    Cog. Int.                  −0.09  0.08  −0.11  −0.12  −0.21  −0.31  −0.06  –  −0.14  0.29  −0.12  0.12  –  −0.04  Respndt I  Exp. Rev                    0.67*  0.44*  0.13  −0.07  −0.24  −0.02  –  −0.01  −0.26  −0.13  0.22  –  0.23    QAS                      0.33*  0.13  −0.06  −0.26  −0.01  –  −0.04  −0.25  −0.12  0.22  –  0.35*    Cog. Int.                        0.58*  0.45*  −0.06  0.53*  –  0.20  0.19  −0.13  −0.15  –  −0.18  Respndt II  Quaid                          0.41*  −0.08  0.40*  –  0.14  0.02  −0.07  −0.08  –  −0.06    Exp. Rev                            0.25  0.52*  –  0.11  −0.05  −0.11  −0.13  –  −0.19    QAS                              0.08  –  0.46*  −0.22  −0.12  −0.14  –  −0.02    Cog. Int.                                -  0.24  0.14  −0.08  −0.10  –  −0.11  Respndt III  Exp. Rev                                  –  –  –  –  –  –    QAS                                    0.30  −0.04  0.22  –  −0.25    Cog. Int.                                      0.08  −0.20  –  −0.73*  Interviewer  QAS                                        −0.11  –  −0.08  Analysis  Exp. Rev                                          –  0.24    QAS                                            −  Survey Quality Predictor                                              * p < 0.05. Bolded entries are those for which more than one method made a particular diagnosis. Table 5. Pearson Correlations among the Method Assessments for the Subjective Items   Variable  Semantic I   Semantic II   Respondent task I   Respondent task II   Respondent task III   Interviewer   Analysis   Survey       Quaid  Expert review  QAS  Cog. Int.  Quaid  Expert review  QAS  Cog. Int.  Expert review  QAS  Cog. Int.  Quaid  Expert review  QAS  Cog. Int.  Expert review  QAS  Cog. Int.  QAS  Expert review  QAS  Quality Predictor  Semantic I  Quaid    0.12  −0.31*  −0.10  −0.03  −0.01  0.26  0.18  0.01  −0.09  −0.03  0.36*  −0.01  0.16  −0.26  0.30*  0.01  −0.19  0.11  0.11  0.04  −0.11    Exp. Rev      −0.01  −0.04  0.16  −0.17  −0.07  0.03  0.10  0.02  0.18  0.00  0.09  −0.16  −0.21  0.34*  0.26  0.27  −0.06  0.00  0.14  −0.18    QAS        0.11  0.07  0.15  0.36*  0.26  −0.16  0.23  0.26  −0.20  −0.13  0.11  0.20  −0.25  −0.02  −0.01  0.07  0.07  0.32*  0.15    Cog. Int.          0.12  −0.03  0.13  0.26  −0.01  0.03  −0.04  0.07  −0.08  0.25  0.09  −0.05  −0.02  −0.08  −0.06  −0.12  −0.08  0.08  Semantic II  Quaid            0.00  0.11  0.03  0.11  0.19  0.31*  −0.14  0.10  −0.08  −0.10  −0.07  −0.05  0.10  0.07  0.14  0.09  −0.08    Exp. Rev              0.37*  0.42*  0.07  0.50*  0.03  0.29*  0.14  0.32*  0.11  −0.05  0.19  0.03  0.02  0.04  0.10  0.20    QAS                0.44*  −0.01  0.24  0.10  0.13  −0.11  0.06  0.07  0.09  0.09  0.01  −0.03  −0.02  0.11  0.12    Cog. Int.                  −0.27*  0.31*  0.17  0.38*  −0.15  0.46*  −0.08  −0.09  0.02  0.05  −0.14  0.11  0.00  0.37*  Respndt I  Exp. Rev                    −0.04  0.10  −0.26  −0.09  −0.32*  −0.14  0.03  0.25  0.13  −0.07  −0.13  −0.09  −0.30*    QAS                      0.49*  0.16  0.05  0.39*  0.22  −0.32*  0.05  0.27  0.15  0.19  0.17  0.55*    Cog. Int.                        −0.23  0.15  −0.01  −0.07  −0.19  −0.15  −0.05  −0.08  0.22  0.23  0.36*  Respndt II  Quaid                          0.13  0.68*  −0.17  0.13  0.16  −0.11  0.26  0.18  −0.10  0.19    Exp. Rev                            0.11  0.08  0.41*  0.16  −0.04  −0.03  −0.06  −0.04  −0.12    QAS                              0.21  −0.15  0.08  −0.08  0.27  0.09  −0.16  0.44*    Cog. Int.                                −0.17  −0.12  0.08  −0.06  −0.13  0.02  0.43*  Respndt III  Exp. Rev                                  0.37*  −0.07  −0.05  −0.11  −0.07  −0.50*    QAS                                    0.29*  −0.15  0.15  0.04  −0.34*    Cog. Int.                                      −0.03  −0.06  −0.04  0.06  Interviewer  QAS                                        −0.04  −0.03  −0.02  Analysis  Exp. Rev                                          0.42*  0.12    QAS                                            0.13  Survey Quality Predictor                                                Variable  Semantic I   Semantic II   Respondent task I   Respondent task II   Respondent task III   Interviewer   Analysis   Survey       Quaid  Expert review  QAS  Cog. Int.  Quaid  Expert review  QAS  Cog. Int.  Expert review  QAS  Cog. Int.  Quaid  Expert review  QAS  Cog. Int.  Expert review  QAS  Cog. Int.  QAS  Expert review  QAS  Quality Predictor  Semantic I  Quaid    0.12  −0.31*  −0.10  −0.03  −0.01  0.26  0.18  0.01  −0.09  −0.03  0.36*  −0.01  0.16  −0.26  0.30*  0.01  −0.19  0.11  0.11  0.04  −0.11    Exp. Rev      −0.01  −0.04  0.16  −0.17  −0.07  0.03  0.10  0.02  0.18  0.00  0.09  −0.16  −0.21  0.34*  0.26  0.27  −0.06  0.00  0.14  −0.18    QAS        0.11  0.07  0.15  0.36*  0.26  −0.16  0.23  0.26  −0.20  −0.13  0.11  0.20  −0.25  −0.02  −0.01  0.07  0.07  0.32*  0.15    Cog. Int.          0.12  −0.03  0.13  0.26  −0.01  0.03  −0.04  0.07  −0.08  0.25  0.09  −0.05  −0.02  −0.08  −0.06  −0.12  −0.08  0.08  Semantic II  Quaid            0.00  0.11  0.03  0.11  0.19  0.31*  −0.14  0.10  −0.08  −0.10  −0.07  −0.05  0.10  0.07  0.14  0.09  −0.08    Exp. Rev              0.37*  0.42*  0.07  0.50*  0.03  0.29*  0.14  0.32*  0.11  −0.05  0.19  0.03  0.02  0.04  0.10  0.20    QAS                0.44*  −0.01  0.24  0.10  0.13  −0.11  0.06  0.07  0.09  0.09  0.01  −0.03  −0.02  0.11  0.12    Cog. Int.                  −0.27*  0.31*  0.17  0.38*  −0.15  0.46*  −0.08  −0.09  0.02  0.05  −0.14  0.11  0.00  0.37*  Respndt I  Exp. Rev                    −0.04  0.10  −0.26  −0.09  −0.32*  −0.14  0.03  0.25  0.13  −0.07  −0.13  −0.09  −0.30*    QAS                      0.49*  0.16  0.05  0.39*  0.22  −0.32*  0.05  0.27  0.15  0.19  0.17  0.55*    Cog. Int.                        −0.23  0.15  −0.01  −0.07  −0.19  −0.15  −0.05  −0.08  0.22  0.23  0.36*  Respndt II  Quaid                          0.13  0.68*  −0.17  0.13  0.16  −0.11  0.26  0.18  −0.10  0.19    Exp. Rev                            0.11  0.08  0.41*  0.16  −0.04  −0.03  −0.06  −0.04  −0.12    QAS                              0.21  −0.15  0.08  −0.08  0.27  0.09  −0.16  0.44*    Cog. Int.                                −0.17  −0.12  0.08  −0.06  −0.13  0.02  0.43*  Respndt III  Exp. Rev                                  0.37*  −0.07  −0.05  −0.11  −0.07  −0.50*    QAS                                    0.29*  −0.15  0.15  0.04  −0.34*    Cog. Int.                                      −0.03  −0.06  −0.04  0.06  Interviewer  QAS                                        −0.04  −0.03  −0.02  Analysis  Exp. Rev                                          0.42*  0.12    QAS                                            0.13  Survey Quality Predictor                                              * p < 0.05. Bolded entries are those for which more than one method made a particular diagnosis. Table 5. Pearson Correlations among the Method Assessments for the Subjective Items   Variable  Semantic I   Semantic II   Respondent task I   Respondent task II   Respondent task III   Interviewer   Analysis   Survey       Quaid  Expert review  QAS  Cog. Int.  Quaid  Expert review  QAS  Cog. Int.  Expert review  QAS  Cog. Int.  Quaid  Expert review  QAS  Cog. Int.  Expert review  QAS  Cog. Int.  QAS  Expert review  QAS  Quality Predictor  Semantic I  Quaid    0.12  −0.31*  −0.10  −0.03  −0.01  0.26  0.18  0.01  −0.09  −0.03  0.36*  −0.01  0.16  −0.26  0.30*  0.01  −0.19  0.11  0.11  0.04  −0.11    Exp. Rev      −0.01  −0.04  0.16  −0.17  −0.07  0.03  0.10  0.02  0.18  0.00  0.09  −0.16  −0.21  0.34*  0.26  0.27  −0.06  0.00  0.14  −0.18    QAS        0.11  0.07  0.15  0.36*  0.26  −0.16  0.23  0.26  −0.20  −0.13  0.11  0.20  −0.25  −0.02  −0.01  0.07  0.07  0.32*  0.15    Cog. Int.          0.12  −0.03  0.13  0.26  −0.01  0.03  −0.04  0.07  −0.08  0.25  0.09  −0.05  −0.02  −0.08  −0.06  −0.12  −0.08  0.08  Semantic II  Quaid            0.00  0.11  0.03  0.11  0.19  0.31*  −0.14  0.10  −0.08  −0.10  −0.07  −0.05  0.10  0.07  0.14  0.09  −0.08    Exp. Rev              0.37*  0.42*  0.07  0.50*  0.03  0.29*  0.14  0.32*  0.11  −0.05  0.19  0.03  0.02  0.04  0.10  0.20    QAS                0.44*  −0.01  0.24  0.10  0.13  −0.11  0.06  0.07  0.09  0.09  0.01  −0.03  −0.02  0.11  0.12    Cog. Int.                  −0.27*  0.31*  0.17  0.38*  −0.15  0.46*  −0.08  −0.09  0.02  0.05  −0.14  0.11  0.00  0.37*  Respndt I  Exp. Rev                    −0.04  0.10  −0.26  −0.09  −0.32*  −0.14  0.03  0.25  0.13  −0.07  −0.13  −0.09  −0.30*    QAS                      0.49*  0.16  0.05  0.39*  0.22  −0.32*  0.05  0.27  0.15  0.19  0.17  0.55*    Cog. Int.                        −0.23  0.15  −0.01  −0.07  −0.19  −0.15  −0.05  −0.08  0.22  0.23  0.36*  Respndt II  Quaid                          0.13  0.68*  −0.17  0.13  0.16  −0.11  0.26  0.18  −0.10  0.19    Exp. Rev                            0.11  0.08  0.41*  0.16  −0.04  −0.03  −0.06  −0.04  −0.12    QAS                              0.21  −0.15  0.08  −0.08  0.27  0.09  −0.16  0.44*    Cog. Int.                                −0.17  −0.12  0.08  −0.06  −0.13  0.02  0.43*  Respndt III  Exp. Rev                                  0.37*  −0.07  −0.05  −0.11  −0.07  −0.50*    QAS                                    0.29*  −0.15  0.15  0.04  −0.34*    Cog. Int.                                      −0.03  −0.06  −0.04  0.06  Interviewer  QAS                                        −0.04  −0.03  −0.02  Analysis  Exp. Rev                                          0.42*  0.12    QAS                                            0.13  Survey Quality Predictor                                                Variable  Semantic I   Semantic II   Respondent task I   Respondent task II   Respondent task III   Interviewer   Analysis   Survey       Quaid  Expert review  QAS  Cog. Int.  Quaid  Expert review  QAS  Cog. Int.  Expert review  QAS  Cog. Int.  Quaid  Expert review  QAS  Cog. Int.  Expert review  QAS  Cog. Int.  QAS  Expert review  QAS  Quality Predictor  Semantic I  Quaid    0.12  −0.31*  −0.10  −0.03  −0.01  0.26  0.18  0.01  −0.09  −0.03  0.36*  −0.01  0.16  −0.26  0.30*  0.01  −0.19  0.11  0.11  0.04  −0.11    Exp. Rev      −0.01  −0.04  0.16  −0.17  −0.07  0.03  0.10  0.02  0.18  0.00  0.09  −0.16  −0.21  0.34*  0.26  0.27  −0.06  0.00  0.14  −0.18    QAS        0.11  0.07  0.15  0.36*  0.26  −0.16  0.23  0.26  −0.20  −0.13  0.11  0.20  −0.25  −0.02  −0.01  0.07  0.07  0.32*  0.15    Cog. Int.          0.12  −0.03  0.13  0.26  −0.01  0.03  −0.04  0.07  −0.08  0.25  0.09  −0.05  −0.02  −0.08  −0.06  −0.12  −0.08  0.08  Semantic II  Quaid            0.00  0.11  0.03  0.11  0.19  0.31*  −0.14  0.10  −0.08  −0.10  −0.07  −0.05  0.10  0.07  0.14  0.09  −0.08    Exp. Rev              0.37*  0.42*  0.07  0.50*  0.03  0.29*  0.14  0.32*  0.11  −0.05  0.19  0.03  0.02  0.04  0.10  0.20    QAS                0.44*  −0.01  0.24  0.10  0.13  −0.11  0.06  0.07  0.09  0.09  0.01  −0.03  −0.02  0.11  0.12    Cog. Int.                  −0.27*  0.31*  0.17  0.38*  −0.15  0.46*  −0.08  −0.09  0.02  0.05  −0.14  0.11  0.00  0.37*  Respndt I  Exp. Rev                    −0.04  0.10  −0.26  −0.09  −0.32*  −0.14  0.03  0.25  0.13  −0.07  −0.13  −0.09  −0.30*    QAS                      0.49*  0.16  0.05  0.39*  0.22  −0.32*  0.05  0.27  0.15  0.19  0.17  0.55*    Cog. Int.                        −0.23  0.15  −0.01  −0.07  −0.19  −0.15  −0.05  −0.08  0.22  0.23  0.36*  Respndt II  Quaid                          0.13  0.68*  −0.17  0.13  0.16  −0.11  0.26  0.18  −0.10  0.19    Exp. Rev                            0.11  0.08  0.41*  0.16  −0.04  −0.03  −0.06  −0.04  −0.12    QAS                              0.21  −0.15  0.08  −0.08  0.27  0.09  −0.16  0.44*    Cog. Int.                                −0.17  −0.12  0.08  −0.06  −0.13  0.02  0.43*  Respndt III  Exp. Rev                                  0.37*  −0.07  −0.05  −0.11  −0.07  −0.50*    QAS                                    0.29*  −0.15  0.15  0.04  −0.34*    Cog. Int.                                      −0.03  −0.06  −0.04  0.06  Interviewer  QAS                                        −0.04  −0.03  −0.02  Analysis  Exp. Rev                                          0.42*  0.12    QAS                                            0.13  Survey Quality Predictor                                              * p < 0.05. Bolded entries are those for which more than one method made a particular diagnosis. Do the diagnoses actually predict problems in the field? Table 6 shows the Pearson correlations between each diagnosis and the four kinds of problems, separately for the subjective and objective items. As may be seen there, initial adequate answers to both subjective and objective items were predicted (p < 0 .05) by recall/judgment problems identified by expert review, QAS, and cognitive interviews: the more each of these methods identified such a problem, the lower the proportion of adequate answers produced by the item. The same three diagnoses of recall/judgment problems (by expert review, QAS, and cognitive interviews) also predicted clarification requests, though only for the objective items (and the cognitive interview prediction just missed the 0.05 level). In addition, QUAID meaning diagnoses predicted clarification requests, but again only for objective items. By contrast, clarification requests to the subjective items were predicted by cognitive interview diagnoses of meaning problems and expert review diagnoses of sensitivity problems. The more these methods identified such problems, the higher the clarification requests to the item. Response latency was also generally predicted by recall/judgment problems identified by expert review, cognitive interviews, and QAS: the more such problems were diagnosed, the longer the response latency (though the cognitive interview prediction for objective items just missed 0.05, and the expert review prediction did not hold for subjective items). In addition, longer latency to the objective items was predicted by QUAID meaning problems, and longer latency to the subjective items was predicted by cognitive interview meaning problems. Finally, higher nonresponse to the subjective items was predicted by recall/judgment problems identified by both cognitive interviews and QAS, and higher nonresponse to the objective items was predicted by SQP and by sensitivity problems diagnosed by cognitive interviews. Table 6. Pearson Correlations between the Predictor Variables (Method Diagnoses) and the Four Measures of Field Problems, Separately for the Objective and Subjective Items Predictor  Log response latency   Adequate Answers   Requests for clarification   Item nonresponse     Objective  Subjective  Objective  Subjective  Objective  Subjective  Objective  Subjective  Semantic I problems: question structure   QUAID  0.08  −0.31*  −0.18  0.1  0.16  0.03  −0.05  −0.37*   Expert review  −0.15  −0.21  0.08  −0.09  −0.15  0.22  −0.15  −0.02   QAS  −0.09  0.25  −0.01  0.05  −0.1  0.03  −0.26  0.13   Cognitive interviews  –  0.2  –  −0.05  –  0.19  –  0.04  Semantic II problems: meaning   QUAID  0.33*  0.13  −0.25  −0.19  0.34*  0.06  0  0.03   Expert review  −0.04  0.11  0.42*  −0.1  0.05  0.15  −0.2  0.23   QAS  −0.18  0.12  0.33  −0.05  −0.05  0.18  −0.29  0.05   Cognitive interviews  −0.21  0.31*  0.36*  −0.13  −0.07  0.33*  −0.11  0.16  Respondent task I problems: recall/judgment   QUAID                   Expert review  0.58*  −0.12  −0.54*  −0.29*  0.57*  −0.1  −0.29  0.23   QAS  0.46*  0.47*  −0.40*  −0.42*  0.45*  0.13  −0.28  0.48*   Cognitive interviews  0.32  0.45*  −0.50*  −0.63*  0.31  0.22  −0.13  0.67*  Respondent task II problems: response categories   QUAID  0.06  −0.14  −0.15  0.14  0.03  0.03  −0.16  −0.21   Expert review  0.13  −0.11  −0.32  −0.07  −0.05  0.14  0.2  0.07   QAS  0.05  0.13  −0.1  0.02  −0.23  −0.02  0.06  0.05   Cognitive interviews  −0.08  0.12  −0.11  −0.07  −0.03  0.02  −0.03  0.07  Respondent task III problems: sensitivity   QUAID  –  –  –  –  –  –  –  –   Expert review  –  −0.27  –  0.25  –  0.31*  –  −0.25   QAS  0.19  −0.06  −0.3  −0.05  0.05  0.1  0.14  0.09   Cognitive interviews  −0.19  0.17  0.05  −0.19  −0.07  0.05  0.37*  0.15  Interviewer problems   QUAID  –  –  –  –  –  –  –  –   Expert review  –  –  –  –  –  –  –  –   QAS  −0.32  0.04  0.16  0.06  −0.18  −0.11  −0.09  −0.08   Cognitive interviews  –  –  –  –  –  –  –  –  Analysis problems   QUAID  –  –  –  –  –  –  –     Expert review  −0.03  0.01  0.14  0.03  −0.07  −0.05  −0.14  −0.04   QAS  –  0.13  –  −0.1  –  0.01  –  0.03   Cognitive interviews  –  –  –  –  –  –  –  –  Other method   SQP total quality score  0.13  0.49*  0.04  −0.32*  0.17  0.09  −0.46*  0.40*  Predictor  Log response latency   Adequate Answers   Requests for clarification   Item nonresponse     Objective  Subjective  Objective  Subjective  Objective  Subjective  Objective  Subjective  Semantic I problems: question structure   QUAID  0.08  −0.31*  −0.18  0.1  0.16  0.03  −0.05  −0.37*   Expert review  −0.15  −0.21  0.08  −0.09  −0.15  0.22  −0.15  −0.02   QAS  −0.09  0.25  −0.01  0.05  −0.1  0.03  −0.26  0.13   Cognitive interviews  –  0.2  –  −0.05  –  0.19  –  0.04  Semantic II problems: meaning   QUAID  0.33*  0.13  −0.25  −0.19  0.34*  0.06  0  0.03   Expert review  −0.04  0.11  0.42*  −0.1  0.05  0.15  −0.2  0.23   QAS  −0.18  0.12  0.33  −0.05  −0.05  0.18  −0.29  0.05   Cognitive interviews  −0.21  0.31*  0.36*  −0.13  −0.07  0.33*  −0.11  0.16  Respondent task I problems: recall/judgment   QUAID                   Expert review  0.58*  −0.12  −0.54*  −0.29*  0.57*  −0.1  −0.29  0.23   QAS  0.46*  0.47*  −0.40*  −0.42*  0.45*  0.13  −0.28  0.48*   Cognitive interviews  0.32  0.45*  −0.50*  −0.63*  0.31  0.22  −0.13  0.67*  Respondent task II problems: response categories   QUAID  0.06  −0.14  −0.15  0.14  0.03  0.03  −0.16  −0.21   Expert review  0.13  −0.11  −0.32  −0.07  −0.05  0.14  0.2  0.07   QAS  0.05  0.13  −0.1  0.02  −0.23  −0.02  0.06  0.05   Cognitive interviews  −0.08  0.12  −0.11  −0.07  −0.03  0.02  −0.03  0.07  Respondent task III problems: sensitivity   QUAID  –  –  –  –  –  –  –  –   Expert review  –  −0.27  –  0.25  –  0.31*  –  −0.25   QAS  0.19  −0.06  −0.3  −0.05  0.05  0.1  0.14  0.09   Cognitive interviews  −0.19  0.17  0.05  −0.19  −0.07  0.05  0.37*  0.15  Interviewer problems   QUAID  –  –  –  –  –  –  –  –   Expert review  –  –  –  –  –  –  –  –   QAS  −0.32  0.04  0.16  0.06  −0.18  −0.11  −0.09  −0.08   Cognitive interviews  –  –  –  –  –  –  –  –  Analysis problems   QUAID  –  –  –  –  –  –  –     Expert review  −0.03  0.01  0.14  0.03  −0.07  −0.05  −0.14  −0.04   QAS  –  0.13  –  −0.1  –  0.01  –  0.03   Cognitive interviews  –  –  –  –  –  –  –  –  Other method   SQP total quality score  0.13  0.49*  0.04  −0.32*  0.17  0.09  −0.46*  0.40*  * p < 0.05. Table 6. Pearson Correlations between the Predictor Variables (Method Diagnoses) and the Four Measures of Field Problems, Separately for the Objective and Subjective Items Predictor  Log response latency   Adequate Answers   Requests for clarification   Item nonresponse     Objective  Subjective  Objective  Subjective  Objective  Subjective  Objective  Subjective  Semantic I problems: question structure   QUAID  0.08  −0.31*  −0.18  0.1  0.16  0.03  −0.05  −0.37*   Expert review  −0.15  −0.21  0.08  −0.09  −0.15  0.22  −0.15  −0.02   QAS  −0.09  0.25  −0.01  0.05  −0.1  0.03  −0.26  0.13   Cognitive interviews  –  0.2  –  −0.05  –  0.19  –  0.04  Semantic II problems: meaning   QUAID  0.33*  0.13  −0.25  −0.19  0.34*  0.06  0  0.03   Expert review  −0.04  0.11  0.42*  −0.1  0.05  0.15  −0.2  0.23   QAS  −0.18  0.12  0.33  −0.05  −0.05  0.18  −0.29  0.05   Cognitive interviews  −0.21  0.31*  0.36*  −0.13  −0.07  0.33*  −0.11  0.16  Respondent task I problems: recall/judgment   QUAID                   Expert review  0.58*  −0.12  −0.54*  −0.29*  0.57*  −0.1  −0.29  0.23   QAS  0.46*  0.47*  −0.40*  −0.42*  0.45*  0.13  −0.28  0.48*   Cognitive interviews  0.32  0.45*  −0.50*  −0.63*  0.31  0.22  −0.13  0.67*  Respondent task II problems: response categories   QUAID  0.06  −0.14  −0.15  0.14  0.03  0.03  −0.16  −0.21   Expert review  0.13  −0.11  −0.32  −0.07  −0.05  0.14  0.2  0.07   QAS  0.05  0.13  −0.1  0.02  −0.23  −0.02  0.06  0.05   Cognitive interviews  −0.08  0.12  −0.11  −0.07  −0.03  0.02  −0.03  0.07  Respondent task III problems: sensitivity   QUAID  –  –  –  –  –  –  –  –   Expert review  –  −0.27  –  0.25  –  0.31*  –  −0.25   QAS  0.19  −0.06  −0.3  −0.05  0.05  0.1  0.14  0.09   Cognitive interviews  −0.19  0.17  0.05  −0.19  −0.07  0.05  0.37*  0.15  Interviewer problems   QUAID  –  –  –  –  –  –  –  –   Expert review  –  –  –  –  –  –  –  –   QAS  −0.32  0.04  0.16  0.06  −0.18  −0.11  −0.09  −0.08   Cognitive interviews  –  –  –  –  –  –  –  –  Analysis problems   QUAID  –  –  –  –  –  –  –     Expert review  −0.03  0.01  0.14  0.03  −0.07  −0.05  −0.14  −0.04   QAS  –  0.13  –  −0.1  –  0.01  –  0.03   Cognitive interviews  –  –  –  –  –  –  –  –  Other method   SQP total quality score  0.13  0.49*  0.04  −0.32*  0.17  0.09  −0.46*  0.40*  Predictor  Log response latency   Adequate Answers   Requests for clarification   Item nonresponse     Objective  Subjective  Objective  Subjective  Objective  Subjective  Objective  Subjective  Semantic I problems: question structure   QUAID  0.08  −0.31*  −0.18  0.1  0.16  0.03  −0.05  −0.37*   Expert review  −0.15  −0.21  0.08  −0.09  −0.15  0.22  −0.15  −0.02   QAS  −0.09  0.25  −0.01  0.05  −0.1  0.03  −0.26  0.13   Cognitive interviews  –  0.2  –  −0.05  –  0.19  –  0.04  Semantic II problems: meaning   QUAID  0.33*  0.13  −0.25  −0.19  0.34*  0.06  0  0.03   Expert review  −0.04  0.11  0.42*  −0.1  0.05  0.15  −0.2  0.23   QAS  −0.18  0.12  0.33  −0.05  −0.05  0.18  −0.29  0.05   Cognitive interviews  −0.21  0.31*  0.36*  −0.13  −0.07  0.33*  −0.11  0.16  Respondent task I problems: recall/judgment   QUAID                   Expert review  0.58*  −0.12  −0.54*  −0.29*  0.57*  −0.1  −0.29  0.23   QAS  0.46*  0.47*  −0.40*  −0.42*  0.45*  0.13  −0.28  0.48*   Cognitive interviews  0.32  0.45*  −0.50*  −0.63*  0.31  0.22  −0.13  0.67*  Respondent task II problems: response categories   QUAID  0.06  −0.14  −0.15  0.14  0.03  0.03  −0.16  −0.21   Expert review  0.13  −0.11  −0.32  −0.07  −0.05  0.14  0.2  0.07   QAS  0.05  0.13  −0.1  0.02  −0.23  −0.02  0.06  0.05   Cognitive interviews  −0.08  0.12  −0.11  −0.07  −0.03  0.02  −0.03  0.07  Respondent task III problems: sensitivity   QUAID  –  –  –  –  –  –  –  –   Expert review  –  −0.27  –  0.25  –  0.31*  –  −0.25   QAS  0.19  −0.06  −0.3  −0.05  0.05  0.1  0.14  0.09   Cognitive interviews  −0.19  0.17  0.05  −0.19  −0.07  0.05  0.37*  0.15  Interviewer problems   QUAID  –  –  –  –  –  –  –  –   Expert review  –  –  –  –  –  –  –  –   QAS  −0.32  0.04  0.16  0.06  −0.18  −0.11  −0.09  −0.08   Cognitive interviews  –  –  –  –  –  –  –  –  Analysis problems   QUAID  –  –  –  –  –  –  –     Expert review  −0.03  0.01  0.14  0.03  −0.07  −0.05  −0.14  −0.04   QAS  –  0.13  –  −0.1  –  0.01  –  0.03   Cognitive interviews  –  –  –  –  –  –  –  –  Other method   SQP total quality score  0.13  0.49*  0.04  −0.32*  0.17  0.09  −0.46*  0.40*  * p < 0.05. In addition, however, there were seven significant (p < 0 .05) predictions that were in the wrong direction. For the subjective items, higher SQP scores (representing higher quality) predicted fewer adequate answers, higher nonresponse, and longer response latency. Similarly, for the subjective items, more QUAID question structure problems predicted lower nonresponse and shorter response latency. Finally, for the objective items, more expert review meaning problems and more cognitive interview meaning problems predicted more adequate answers. We find all these associations uninterpretable, though note that five of the seven involve the artificial intelligence approaches (SQP and QUAID). Given that there are 176 predictions from the methods (twenty-two for each of four dependent variables times two for the subjective and objective items), we believe these results are flukes and, thus, exclude them from the multivariate model to which we turn next (though, as reported in note 6, we evaluated whether their exclusion affected conclusions drawn from the model). Multivariate analysis allows us to assess the extent to which the significant bivariate effects were independent. In addition, if the dependent variables are causally ordered, path analysis allows us to examine whether the association among them (shown in table 7) mediates the connection between method predictions and the measures of quality. Put differently, if any of the dependent variables are causes of another dependent variable then some methods predictions may be mediated by these dependent variables. The causal ordering of three of the quality measures is unclear. By contrast, causal arrows to nonresponse from clarification requests, initial adequate answers, and response latency seem more likely than arrows in the opposite direction. This causal structure is shown in the figure 1a and b path model that we estimated using the R statistical package lavaan. (It should be noted that although table 7 shows that all the problem measures are inter-correlated among the subjective items, nonresponse is unrelated to the other three measures among the objective items. This is an indication of the usefulness of separating the two question types, but also means that mediation is only a possibility for the subjective items.) Table 7. Pearson Correlations among the Dependent Variables for Objective and Subjective Items   Objective items   Subjective items   Dependent variable  Response latency  Adequate answers  Requests for clarification  Item nonresponse  Response latency  Adequate answers  Requests for clarification  Item nonresponse  Response latency    −0.80*  0.85*  −0.01    −0.55*  0.51*  0.65*  Adequate answers      −0.63*  −0.13      −0.35*  −0.77*  Requests for clarification        −0.01        0.28*    Objective items   Subjective items   Dependent variable  Response latency  Adequate answers  Requests for clarification  Item nonresponse  Response latency  Adequate answers  Requests for clarification  Item nonresponse  Response latency    −0.80*  0.85*  −0.01    −0.55*  0.51*  0.65*  Adequate answers      −0.63*  −0.13      −0.35*  −0.77*  Requests for clarification        −0.01        0.28*  * p < 0.05. Table 7. Pearson Correlations among the Dependent Variables for Objective and Subjective Items   Objective items   Subjective items   Dependent variable  Response latency  Adequate answers  Requests for clarification  Item nonresponse  Response latency  Adequate answers  Requests for clarification  Item nonresponse  Response latency    −0.80*  0.85*  −0.01    −0.55*  0.51*  0.65*  Adequate answers      −0.63*  −0.13      −0.35*  −0.77*  Requests for clarification        −0.01        0.28*    Objective items   Subjective items   Dependent variable  Response latency  Adequate answers  Requests for clarification  Item nonresponse  Response latency  Adequate answers  Requests for clarification  Item nonresponse  Response latency    −0.80*  0.85*  −0.01    −0.55*  0.51*  0.65*  Adequate answers      −0.63*  −0.13      −0.35*  −0.77*  Requests for clarification        −0.01        0.28*  * p < 0.05. Figure 1. View largeDownload slide View largeDownload slide Path diagrams using the evaluation methods to predict problems identified in the field for (a) the objective items and (b) the subjective items. Figure 1. View largeDownload slide View largeDownload slide Path diagrams using the evaluation methods to predict problems identified in the field for (a) the objective items and (b) the subjective items. The results show that the bivariate effects significant at less than the 0.05 level generally hold up in the path model, although a few become of borderline (or not quite borderline) significance.5 The exceptions mainly involve the QAS diagnoses of recall/judgment problems, most of which lose their predictive power. Thus, these QAS diagnoses are not independent of the predictions made by the other methods (in the case of adequate answers, clarification requests, and response latency) or are mediated by the effects of the other dependent variables (in the case of nonresponse). The only other exceptions involve the prediction of nonresponse by cognitive interviews, which, in the case of objective items, is due to its overlap with the SQP prediction, and in the case of subjective items, is due to mediation by the other dependent variables.6 4. DISCUSSION Most of the evaluation methods we examined predicted all four of our problem indicators: requests for clarification, adequate answers, response latency and missing data. The particular combination of methods that was most predictive varied across the four problems. Taken together, the evaluation methods explained on the order of forty percent of the variability in these measures of data quality. It is often suggested that because of the low level of agreement between evaluation methods, it is best to use multiple methods. On the one hand, this hypothesis was not completely supported, as not all the methods made independent contributions to the predictions. On the other hand, the hypothesis was significantly supported, since using two methods almost always improved predictions over any one method. Consistent with the test environment hypothesis, the results generally showed cognitive interviews and expert reviews to provide the best predictions, the computer-based methods (SQP and QUAID) to be least predictive, and QAS to be in the middle. A key exception, however, was the SQP prediction of nonresponse for objective items (though not for subjective ones). We are uncertain why SQP performed well in this way, since the method was developed mainly with subjective items. Thus, although all of our results are in need of replication (as we discuss below), the need for replication seems particularly great for this result. Overall, our results suggest that researchers who rely on a single testing method (which we believe may be the norm) would benefit from doing more and that the best combination of methods may be expert reviews followed by cognitive interviews. As is true for all research, our results may have been influenced by how we implemented the methods. With the exception of the computer-based methods, there is no single correct way to implement evaluation methods. Earlier we cited references discussing the many different ways of conducting cognitive interviews. Although it has been much less discussed, there are also many different ways to implement the other methods. Expert reviews, for instance, can be done in groups or individually by people with only methodological expertise or also by those with subject matter expertise, unstructured or structured by issues to address, and so on. We know virtually nothing about how such factors influence the outcome of the methods. Consequently, we do not know how sensitive our results are to the particular ways we implemented the methods. Similarly, we don’t know how sensitive our results are to the particular sample of items we used, though our results indicate that some methods perform differently for objective versus subjective items. Thus, further work using other items and other variants of the testing methods is essential. Although problems observed by behavior coding, nonresponse, and response latency are useful measures of data quality, they are indirect ones. Items may be answered quickly, generate no requests for clarification, no missing data, and no initial inadequate answers, yet still suffer from problems. In other work examining the degree to which different testing methods predicted the reliability of items (Maitland and Presser 2016), we found support for the complementary methods hypothesis, largely consistent with the present work. By contrast—and less consistent with the present work—our study of reliability found little support for the test environment hypothesis. Thus, we need to better understand the connection between observable problems and reliability of measurement. In addition, research that compares how well the different evaluation methods predict validity would be invaluable. Finally, it would be valuable to assess whether some sequences of conducting the methods are better than others as well as whether revisions based on multiple method evaluations are better than those based on single method evaluations. Footnotes 1 Willis and Schechter (1997) examined the field performance of five questions that had been tested with cognitive interviews, but focused mainly on an indirect measure of problems: differences in item distributions. 2 See Graesser et al., 2006 and http://www.memphis.edu/iis/projects/quaid.php. 3 See http://sqp.upf.edu/. 4 See Willis and Lessler (1999). 5 We designate p < 0.15 in addition to the more conventional significant (< 0.05) and borderline (< 0.10) designations because of our relatively small sample size (n = 52 and 36), but give the two results at that level less credence. 6 If the seven effects in the wrong direction that we described as flukes are added to the figure 1 model, three effects (the effects of SQP on adequate answers and on item nonresponse for subjective questions and the effect of cognitive interview meaning problems on adequate answers for objective questions) are nowhere near significant (p > 0.15). More importantly, the addition of the seven paths generally does not alter the main conclusions from figure 1, the sole exception being that the effect of QAS recall problems on response latency for subjective questions, which was clearly significant (p < 0.05), is not even near significant (p = 0.30). The only other changes involve either small shifts in p values that affect the number of asterisks to be assigned (the effect of cognitive interview meaning problems on response latency for subjective questions changes from p = 0.15 to p = 0.09, and the effect of cognitive interview recall problems on adequate answers for objective questions changes from p < 0.05 to p = 0.08) or larger shifts in p value that mean that a p < 0.15 effect—which we gave less credence—is far from significant (the effect for subjective items of QAS recall problems on adequate answers and of requests for clarification on item nonresponse). REFERENCES Alwin D. F. ( 2007), Margins of Error: A Study of Reliability in Survey Measurement. Hoboken, NJ: John Wiley and Sons, Inc. Basilli N. J., Scott B. S. ( 1996), “Response Latency as a Signal to Question Problems in Survey Research.” Public Opinion Quarterly , 60, 390– 399. Google Scholar CrossRef Search ADS   Beatty Paul, Willis G. ( 2007), “Research Synthesis: The Practice of Cognitive Interviewing.” Public Opinion Quarterly , 71, 287– 311. Google Scholar CrossRef Search ADS   Blair J., Ackerman A., Piccinino L., Levenstein R. ( 2007), “Using Behavior Coding to Validate Cognitive Interview Findings.” in Proceedings of the ASA Section on Survey Research Methods . Alexandria, VA: American Statistical Association. Draisma S., Dijkstra W. ( 2004), “Response Latency and (Para) Linguistic Expressions as Indicators of Response Error.” in Methods for Testing and Evaluating Survey Questionnaires , eds. Presser S., Rothgeb J. M., Couper M. P., Lessler J. T., Martin E. A., Martin J., Singer E., pp. 131– 147, Hoboken, NJ: John Wiley and Sons. Google Scholar CrossRef Search ADS   Forsyth B., Rothgeb J. M., Wills G. B. ( 2004), “Does Pretesting Make a Difference? An Experimental Test.” in Methods for Testing and Evaluating Survey Questionnaires , eds. Presser S., Rothgeb J. M., Couper M. P., Lessler J. T., Martin E. A., Martin J., Singer E., pp. 525– 546, Hoboken, NJ: John Wiley and Sons. Google Scholar CrossRef Search ADS   Fowler F. J ( 2011), Coding the Behavior of Interviewers and Respondents to Evaluate Survey Questions.” in Question Evaluation Methods: Contributing to the Science of Data Quality , eds. Madans J., Miller K., Maitland A., Willis G., pp. 7– 21, Hoboken, NJ: Wiley. Gerber E., Wellens T. ( 1997). Perspectives on Pretesting: “Cognition” in the Cognitive Interview? Bulletin de Methodologique Sociologique , 55, 18– 39. Google Scholar CrossRef Search ADS   Graesser A. C., Cai Z., Louwerse M. M., Daniel F. ( 2006), “Question Understanding Aid (QUAID): A Web Facility That Tests Question Comprehensibility.” Public Opinion Quarterly , 70, 3– 22. Google Scholar CrossRef Search ADS   Hess J., Singer E., Bushery J. ( 1999), “Predicting Test-Retest Reliability from Behavior Coding.” International Journal of Public Opinion Research , 11, 346– 360. Google Scholar CrossRef Search ADS   Maitland A., Presser S. ( 2016), “How Accurately Do Different Evaluation Methods Predict the Reliability of Survey Questions?” Journal of Survey Statistics and Methodology , 4, 362– 381. Google Scholar CrossRef Search ADS   Miller K. ( 2011), “Cognitive Interviewing” in Question Evaluation Methods , eds. Madans J., Miller K., Maitland A., Willis G., pp. 51– 76. New York: John Wiley and Sons. Google Scholar CrossRef Search ADS   Miller K., Chepp V., Willson S., Luis Padilla J. ( 2014), Cognitive Interviewing Methodology . New York: John Wiley and Sons. Google Scholar CrossRef Search ADS   Pickery J., Loosveldt G. ( 2001), “An Exploration of Question Characteristics that Mediate Interviewer Effects on Item Nonresponse.” Journal of Official Statistics , 17, 337– 350. Presser S.,, Blair J. ( 1994), “Survey Pretesting: Do Different Methods Produce Different Results?” Sociological Methodology , 24, 73– 104. Google Scholar CrossRef Search ADS   Presser S., Rothgeb J. M., Couper M. P., Lessler J. T., Martin E., Martin J., Singer E. ( 2004), Methods for Testing and Evaluating Survey Questionnaires . Hoboken, NJ: John Wiley and Sons. Google Scholar CrossRef Search ADS   Rothgeb J. M., Willis G. B., Forsyth B. H. ( 2001), “Questionnaire Pretesting Methods: Do Different Techniques and Different Organizations Produce Similar Results?” in Proceedings of the ASA Section on Survey Research Methods . Alexandria, VA: American Statistical Association. Saris W., Gallhofer I. ( 2007), Design, Evaluation, and Analysis of Questionnaires for Survey Research . Hoboken, NJ: John Wiley and Sons Inc. Google Scholar CrossRef Search ADS   Shoemaker P., Eicholz M., Skewes E. ( 2002), “Item Nonresponse: Distinguishing between Don’t Know and Refuse.” International Journal of Public Opinion Research , 14, 193– 201. Google Scholar CrossRef Search ADS   Willis G. B. ( 2015), Analysis of the Cognitive Interview in Questionnaire Design Understanding Qualitative Research . New York: Oxford University Press. Willis G., Lessler J. ( 1999), The BRFSS-QAS: A Guide for Systematically Evaluating Survey Question Wording . Rockville, MD: Research Triangle Institute. Willis G. B., Schechter S. ( 1997), “Evaluation of Cognitive Interviewing Techniques: Do Results Generalize to the Field?” Bulletin de Methodolgie Sociologique , 55, 40– 66. Google Scholar CrossRef Search ADS   Willis G. B., Schechter S., Whitaker K. ( 1999), “A Comparison of Cognitive Interviewing, Expert Review, and Behavior Coding: What Do They Tell Us?” in Proceedings of the ASA Section on Survey Research Methods . Alexandria, VA: American Statistical Association. Yan T., Kreuter F., Tourangeau R. ( 2012), “Evaluating Survey Questions: A Comparison of Methods.” Journal of Official Statistics , 28, 503– 529. Appendix A. Evaluated Questions (Numbers refer to the order in the questionnaire) #  Objective Questions  13  Do you have any family members or close friends who are serving or did serve in Iraq? [yes, no]  22  During the last two years, did you work as a volunteer for a political candidate running for national, state, or local office and got no pay at all or only a very small amount of pay for your work? [yes, no]  23  During the last two years, did you contribute money to a political candidate, a political party, a political action committee, or any other organization that supported political candidates? [yes, no]  24  During the last two years, did you work with others in your community or neighborhood to deal with some issue or problem? [yes, no]  25  During the last two years, did you contact a government official in person, by phone, or by letter about a problem or issue? [yes, no]  26  During the last two years, did you ever work as a volunteer for a political candidate running for national, state, or local office and got no pay at all or only a very small amount of pay for your work, or did you never do this? [did, never did this]  27  During the last two years, did you ever contribute money to a political candidate, a political party, a political action committee, or any other organization that supported political candidates, or did you never do this? [did, never did this]  28  During the last two years, did you ever work with others in your community or neighborhood to deal with some issue or problem, or did you never do this? [did, never did this]  29  During the last two years, did you ever contact a government official in person, by phone, or by letter about a problem or issue, or did you never do this? [did, never did this]  30  During the last two years, did you or did you not work as a volunteer for a political candidate running for national, state, or local office and got no pay at all or only a very small amount of pay for your work? [did, did not]  31  During the last two years, did you or did you not contribute money to a political candidate, a political party, a political action committee, or any other organization that supported political candidates? [did, did not]  32  During the last two years, did you or did you not work with others in your community or neighborhood to deal with some issue or problem? [did, did not]  33  During the last two years, did you or did you not contact a government official in person, by phone, or by letter about a problem or issue? [did, did not]  45  Are you or anyone in your household currently employed by a school or educational institution? [yes, no]  46  Have you ever been told by a doctor or other health professional that you had arthritis, also called rheumatism? [yes, no]  47  Have you ever been told by a doctor or other health professional that you had a heart problem? [yes, no]  48  Have you ever been told by a doctor or other health professional that you had hypertension, also called high blood pressure? [yes, no]  49  Have you ever been told by a doctor or other health professional that you had diabetes? [yes, no]  50  Have you ever been told by a doctor or other health professional that you had a kidney, bladder, or renal problem? [yes, no]  51  Have you ever been told by a doctor or other health professional that you had Multiple Sclerosis (MS), or Muscular Dystrophy (MD)? [yes, no]  55  In the last 12 months, about how often did you go to a theater to see a movie? [at least once a week, a few times a month, about once a month, a few times a year, once or twice a year, never]  56  In the last 12 months, about how often did you eat in a restaurant, not including take-out? [at least once a week, a few times a month, about once a month, a few times a year, once or twice a year, never]  57  In the last 12 months, about how often did you exercise, including walking for fitness, gardening, or running? [at least once a week, a few times a month, about once a month, a few times a year, once or twice a year, never]  58  In the last 30 days, how many times did you go to a theater to see a movie? [open-numeric]  59  In the last 30 days, how many times did you eat in a restaurant, not including take-out? [open-numeric]  60  In the last 30 days, how many times did you exercise, including walking for fitness, gardening, or running? [open-numeric]  61  Since January 2006, how many times have you seen a doctor, a dentist, or other health care professional about your own health at a doctor’s office, a clinic, or some other place? [open-numeric]  62  How did you arrive at your answer? [recall each visit and count them,estimate from how often you usually see a doctor, or just guess]  74  In what year were you born?  75  What is the highest level of education that you have completed?  83  Are you Spanish, Hispanic, or Latino?  84  What is your race? Would you say you are White, Black or African-American, Asian, [includes: Asian Indian, Chinese, Filipino, Japanese, Korean, Vietnamese] Pacific Islander, [includes: Native Hawaiian, Guamanian, Samoan], American Indian or Alaska Native, or Some other race?  85  Are you now married, living with a partner, widowed, divorced, separated, or never married?  86  Including yourself, how many people live in your home?  87  How many of these people are age 18 and under?  88  How many of them are currently in school?  #  Objective Questions  13  Do you have any family members or close friends who are serving or did serve in Iraq? [yes, no]  22  During the last two years, did you work as a volunteer for a political candidate running for national, state, or local office and got no pay at all or only a very small amount of pay for your work? [yes, no]  23  During the last two years, did you contribute money to a political candidate, a political party, a political action committee, or any other organization that supported political candidates? [yes, no]  24  During the last two years, did you work with others in your community or neighborhood to deal with some issue or problem? [yes, no]  25  During the last two years, did you contact a government official in person, by phone, or by letter about a problem or issue? [yes, no]  26  During the last two years, did you ever work as a volunteer for a political candidate running for national, state, or local office and got no pay at all or only a very small amount of pay for your work, or did you never do this? [did, never did this]  27  During the last two years, did you ever contribute money to a political candidate, a political party, a political action committee, or any other organization that supported political candidates, or did you never do this? [did, never did this]  28  During the last two years, did you ever work with others in your community or neighborhood to deal with some issue or problem, or did you never do this? [did, never did this]  29  During the last two years, did you ever contact a government official in person, by phone, or by letter about a problem or issue, or did you never do this? [did, never did this]  30  During the last two years, did you or did you not work as a volunteer for a political candidate running for national, state, or local office and got no pay at all or only a very small amount of pay for your work? [did, did not]  31  During the last two years, did you or did you not contribute money to a political candidate, a political party, a political action committee, or any other organization that supported political candidates? [did, did not]  32  During the last two years, did you or did you not work with others in your community or neighborhood to deal with some issue or problem? [did, did not]  33  During the last two years, did you or did you not contact a government official in person, by phone, or by letter about a problem or issue? [did, did not]  45  Are you or anyone in your household currently employed by a school or educational institution? [yes, no]  46  Have you ever been told by a doctor or other health professional that you had arthritis, also called rheumatism? [yes, no]  47  Have you ever been told by a doctor or other health professional that you had a heart problem? [yes, no]  48  Have you ever been told by a doctor or other health professional that you had hypertension, also called high blood pressure? [yes, no]  49  Have you ever been told by a doctor or other health professional that you had diabetes? [yes, no]  50  Have you ever been told by a doctor or other health professional that you had a kidney, bladder, or renal problem? [yes, no]  51  Have you ever been told by a doctor or other health professional that you had Multiple Sclerosis (MS), or Muscular Dystrophy (MD)? [yes, no]  55  In the last 12 months, about how often did you go to a theater to see a movie? [at least once a week, a few times a month, about once a month, a few times a year, once or twice a year, never]  56  In the last 12 months, about how often did you eat in a restaurant, not including take-out? [at least once a week, a few times a month, about once a month, a few times a year, once or twice a year, never]  57  In the last 12 months, about how often did you exercise, including walking for fitness, gardening, or running? [at least once a week, a few times a month, about once a month, a few times a year, once or twice a year, never]  58  In the last 30 days, how many times did you go to a theater to see a movie? [open-numeric]  59  In the last 30 days, how many times did you eat in a restaurant, not including take-out? [open-numeric]  60  In the last 30 days, how many times did you exercise, including walking for fitness, gardening, or running? [open-numeric]  61  Since January 2006, how many times have you seen a doctor, a dentist, or other health care professional about your own health at a doctor’s office, a clinic, or some other place? [open-numeric]  62  How did you arrive at your answer? [recall each visit and count them,estimate from how often you usually see a doctor, or just guess]  74  In what year were you born?  75  What is the highest level of education that you have completed?  83  Are you Spanish, Hispanic, or Latino?  84  What is your race? Would you say you are White, Black or African-American, Asian, [includes: Asian Indian, Chinese, Filipino, Japanese, Korean, Vietnamese] Pacific Islander, [includes: Native Hawaiian, Guamanian, Samoan], American Indian or Alaska Native, or Some other race?  85  Are you now married, living with a partner, widowed, divorced, separated, or never married?  86  Including yourself, how many people live in your home?  87  How many of these people are age 18 and under?  88  How many of them are currently in school?  #  Subjective Questions  1  Do you think the war in Iraq has helped, hurt, or had no effect on the image of the United States in the world? [helped, hurt, had no effect]  2  Do you think the Iraq war will turn out to be another Vietnam? [yes, no]  3  Over the next year, do you think that the U.S. military in Iraq will suffer more casualties or fewer casualties than it did in the last year? [more, fewer, the same (if volunteered)]  4  When do you think the United States will withdraw all of its troops from Iraq? [in less than a year, one to 3 years from now, more than 3 years from now]  5  Do you think Osama bin Laden is currently planning an attack against the United States? [yes, no]  6  Do you believe the US and its allies will defeat the Al Qaeda terrorist network? [yes, no]  7  How worried are you that there will be another terrorist attack on the United States? [very worried, somewhat worried, not very worried, not worried at all]  8  How worried are you that you or someone in your family will become a victim of terrorism in the United States? [very worried, somewhat worried, not very worried, not worried at all]  9  How do you now feel about continued US military involvement in the Iraq war? [strongly favor it, somewhat favor it, somewhat oppose it, strongly oppose it]  10  How do you now feel about continued US military involvement in the Iraq war? [favor, oppose]  11  How important is the Iraq war to you? [very important, somewhat important, not too important, not important at all]  12  Would you say your views on the Iraq war are mainly on one side of the issue, or are your views about this issue mixed? [mainly on one side, mixed]  14  How important do you think wiretapping is in maintaining the security of the United States? [very important, somewhat important, not very important, not at all important]  15  How do you feel about the government’s monitoring of telephone calls in the United States as a way to reduce the threat of terrorism? [strongly approve it, somewhat approve it, neither approve nor disapprove, somewhat disapprove, strongly disapprove it]  16  How important is it to you that the government protects Americans’ right to privacy? [very important, somewhat important, not very important, not at all important]  17  How concerned are you about losing your right to privacy as a result of the steps taken by the government to fight terrorism? [very concerned, somewhat concerned, not very concerned, not at all concerned]  18  How much do you favor or oppose the President authorizing wiretaps of Americans without prior court approval? [strongly favor it, somewhat favor it, somewhat oppose it, strongly oppose it]  19  Do you favor or oppose the President authorizing wiretaps of Americans without prior court approval? [favor, oppose]  20  How important is the wiretapping issue to you? [very important, somewhat important, not too important, not important at all]  21  Would you say your views on the wiretapping issue are mainly on one side of the issue, or are your views about this issue mixed? [mainly on one side, mixed]  34  How important do you think mathematics training in our elementary schools is to the economic success of the United States? [very important, somewhat important, not very important, not at all important]  35  Do you think having good mathematics skills has a positive effect, a negative effect, or no effect at all on the job opportunities available to a recent high school graduate? [positive effect, negative effect, no effect]  36  How much do you think a person’s standard of living in America depends on having good mathematics skills? [a lot on math skills, somewhat, not much, not at all on math skills]  37  Do you think that the mathematics skills of American elementary school students are better, worse, or about the same as those of elementary school students in countries such as Singapore and Japan? [better, worse, about the same]  38  How important do you think reading and writing training in our elementary schools is to the economic success of the United States? [very important, somewhat important, not very important, not at all important]  39  Do you think having good reading and writing skills has a positive effect, a negative effect, or no effect at all on the job opportunities available to a recent high school graduate? [positive effect, negative effect, no effect]  40  How much do you think a person’s standard of living in America depends on having good reading and writing skills? [a lot on reading and writing skills, somewhat, not much, not at all on reading and writing skills]  41  Do you think that the reading and writing skills of American elementary school students are better, worse, or about the same as those of elementary school students in countries such as Singapore and Japan? [better, worse, about the same]  42  If there were only resources for only one of these programs, which would you prefer the mathematics program or the reading and writing program? [mathematics, reading and writing]  43  How important is this choice between mathematics versus reading and writing to you? [very important, somewhat important, not very important, not at all important]  44  Would you say your views on the choice between more attention to mathematics versus reading and writing are mainly on one side of the issue, or are your views about this issue mixed? [mainly on one side, mixed]  52  How satisfied are you currently with your life as a whole? [very satisfied, somewhat satisfied, neither satisfied nor dissatisfied, somewhat dissatisfied, very dissatisfied]  53  Would you say that your physical health in general is excellent, very good, good, fair, or poor? [excellent, very good, good, fair, poor]  54  Would you say that your health in general is excellent, very good, good, fair, or poor? [excellent, very good, good, fair, poor]  63  How certain are you that you have seen a doctor, dentist or other health care professional [INSERT ANSWER TO 63 OR “zero” IF 63=0] times since January 2006? [very certain, somewhat certain, somewhat uncertain, or very uncertain]  64  How likely is it that you will eat fatty foods in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  65  How likely is it that you will eat sweets in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  66  How likely is it that you will not eat fatty foods in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  67  How likely is it that you will not eat sweets in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  68  How likely is it that you will avoid eating fatty foods in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  69  How likely is it that you will avoid eating sweets in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  70  How likely is it that you will eat fresh fruit, such as apples, strawberries, watermelon, or bananas, in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  71  How likely is it that you will eat fresh vegetables, such as lettuce, tomatoes, peppers, or spinach, in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  72  How likely is it that you will eat fresh fruit in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  73  How likely is it that you will eat fresh vegetables in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  76  Generally speaking, do you usually think of yourself as a Republican, a Democrat, an Independent, or something else?  77  Would you call yourself a strong Republican, or a not very strong Republican?  78  Would you call yourself a strong Democrat, or a not very strong Democrat?  79  Do you think of yourself as closer to the Republican party, or the Democratic party?  80  When it comes to politics, do you usually think of yourself as liberal, middle of the road, conservative, or haven’t you thought much about this?  81  Would you say you are extremely liberal, liberal, or slightly liberal?  82  Would you say you are extremely conservative, conservative, or slightly conservative?  #  Subjective Questions  1  Do you think the war in Iraq has helped, hurt, or had no effect on the image of the United States in the world? [helped, hurt, had no effect]  2  Do you think the Iraq war will turn out to be another Vietnam? [yes, no]  3  Over the next year, do you think that the U.S. military in Iraq will suffer more casualties or fewer casualties than it did in the last year? [more, fewer, the same (if volunteered)]  4  When do you think the United States will withdraw all of its troops from Iraq? [in less than a year, one to 3 years from now, more than 3 years from now]  5  Do you think Osama bin Laden is currently planning an attack against the United States? [yes, no]  6  Do you believe the US and its allies will defeat the Al Qaeda terrorist network? [yes, no]  7  How worried are you that there will be another terrorist attack on the United States? [very worried, somewhat worried, not very worried, not worried at all]  8  How worried are you that you or someone in your family will become a victim of terrorism in the United States? [very worried, somewhat worried, not very worried, not worried at all]  9  How do you now feel about continued US military involvement in the Iraq war? [strongly favor it, somewhat favor it, somewhat oppose it, strongly oppose it]  10  How do you now feel about continued US military involvement in the Iraq war? [favor, oppose]  11  How important is the Iraq war to you? [very important, somewhat important, not too important, not important at all]  12  Would you say your views on the Iraq war are mainly on one side of the issue, or are your views about this issue mixed? [mainly on one side, mixed]  14  How important do you think wiretapping is in maintaining the security of the United States? [very important, somewhat important, not very important, not at all important]  15  How do you feel about the government’s monitoring of telephone calls in the United States as a way to reduce the threat of terrorism? [strongly approve it, somewhat approve it, neither approve nor disapprove, somewhat disapprove, strongly disapprove it]  16  How important is it to you that the government protects Americans’ right to privacy? [very important, somewhat important, not very important, not at all important]  17  How concerned are you about losing your right to privacy as a result of the steps taken by the government to fight terrorism? [very concerned, somewhat concerned, not very concerned, not at all concerned]  18  How much do you favor or oppose the President authorizing wiretaps of Americans without prior court approval? [strongly favor it, somewhat favor it, somewhat oppose it, strongly oppose it]  19  Do you favor or oppose the President authorizing wiretaps of Americans without prior court approval? [favor, oppose]  20  How important is the wiretapping issue to you? [very important, somewhat important, not too important, not important at all]  21  Would you say your views on the wiretapping issue are mainly on one side of the issue, or are your views about this issue mixed? [mainly on one side, mixed]  34  How important do you think mathematics training in our elementary schools is to the economic success of the United States? [very important, somewhat important, not very important, not at all important]  35  Do you think having good mathematics skills has a positive effect, a negative effect, or no effect at all on the job opportunities available to a recent high school graduate? [positive effect, negative effect, no effect]  36  How much do you think a person’s standard of living in America depends on having good mathematics skills? [a lot on math skills, somewhat, not much, not at all on math skills]  37  Do you think that the mathematics skills of American elementary school students are better, worse, or about the same as those of elementary school students in countries such as Singapore and Japan? [better, worse, about the same]  38  How important do you think reading and writing training in our elementary schools is to the economic success of the United States? [very important, somewhat important, not very important, not at all important]  39  Do you think having good reading and writing skills has a positive effect, a negative effect, or no effect at all on the job opportunities available to a recent high school graduate? [positive effect, negative effect, no effect]  40  How much do you think a person’s standard of living in America depends on having good reading and writing skills? [a lot on reading and writing skills, somewhat, not much, not at all on reading and writing skills]  41  Do you think that the reading and writing skills of American elementary school students are better, worse, or about the same as those of elementary school students in countries such as Singapore and Japan? [better, worse, about the same]  42  If there were only resources for only one of these programs, which would you prefer the mathematics program or the reading and writing program? [mathematics, reading and writing]  43  How important is this choice between mathematics versus reading and writing to you? [very important, somewhat important, not very important, not at all important]  44  Would you say your views on the choice between more attention to mathematics versus reading and writing are mainly on one side of the issue, or are your views about this issue mixed? [mainly on one side, mixed]  52  How satisfied are you currently with your life as a whole? [very satisfied, somewhat satisfied, neither satisfied nor dissatisfied, somewhat dissatisfied, very dissatisfied]  53  Would you say that your physical health in general is excellent, very good, good, fair, or poor? [excellent, very good, good, fair, poor]  54  Would you say that your health in general is excellent, very good, good, fair, or poor? [excellent, very good, good, fair, poor]  63  How certain are you that you have seen a doctor, dentist or other health care professional [INSERT ANSWER TO 63 OR “zero” IF 63=0] times since January 2006? [very certain, somewhat certain, somewhat uncertain, or very uncertain]  64  How likely is it that you will eat fatty foods in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  65  How likely is it that you will eat sweets in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  66  How likely is it that you will not eat fatty foods in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  67  How likely is it that you will not eat sweets in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  68  How likely is it that you will avoid eating fatty foods in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  69  How likely is it that you will avoid eating sweets in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  70  How likely is it that you will eat fresh fruit, such as apples, strawberries, watermelon, or bananas, in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  71  How likely is it that you will eat fresh vegetables, such as lettuce, tomatoes, peppers, or spinach, in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  72  How likely is it that you will eat fresh fruit in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  73  How likely is it that you will eat fresh vegetables in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  76  Generally speaking, do you usually think of yourself as a Republican, a Democrat, an Independent, or something else?  77  Would you call yourself a strong Republican, or a not very strong Republican?  78  Would you call yourself a strong Democrat, or a not very strong Democrat?  79  Do you think of yourself as closer to the Republican party, or the Democratic party?  80  When it comes to politics, do you usually think of yourself as liberal, middle of the road, conservative, or haven’t you thought much about this?  81  Would you say you are extremely liberal, liberal, or slightly liberal?  82  Would you say you are extremely conservative, conservative, or slightly conservative?  Appendix A. Evaluated Questions (Numbers refer to the order in the questionnaire) #  Objective Questions  13  Do you have any family members or close friends who are serving or did serve in Iraq? [yes, no]  22  During the last two years, did you work as a volunteer for a political candidate running for national, state, or local office and got no pay at all or only a very small amount of pay for your work? [yes, no]  23  During the last two years, did you contribute money to a political candidate, a political party, a political action committee, or any other organization that supported political candidates? [yes, no]  24  During the last two years, did you work with others in your community or neighborhood to deal with some issue or problem? [yes, no]  25  During the last two years, did you contact a government official in person, by phone, or by letter about a problem or issue? [yes, no]  26  During the last two years, did you ever work as a volunteer for a political candidate running for national, state, or local office and got no pay at all or only a very small amount of pay for your work, or did you never do this? [did, never did this]  27  During the last two years, did you ever contribute money to a political candidate, a political party, a political action committee, or any other organization that supported political candidates, or did you never do this? [did, never did this]  28  During the last two years, did you ever work with others in your community or neighborhood to deal with some issue or problem, or did you never do this? [did, never did this]  29  During the last two years, did you ever contact a government official in person, by phone, or by letter about a problem or issue, or did you never do this? [did, never did this]  30  During the last two years, did you or did you not work as a volunteer for a political candidate running for national, state, or local office and got no pay at all or only a very small amount of pay for your work? [did, did not]  31  During the last two years, did you or did you not contribute money to a political candidate, a political party, a political action committee, or any other organization that supported political candidates? [did, did not]  32  During the last two years, did you or did you not work with others in your community or neighborhood to deal with some issue or problem? [did, did not]  33  During the last two years, did you or did you not contact a government official in person, by phone, or by letter about a problem or issue? [did, did not]  45  Are you or anyone in your household currently employed by a school or educational institution? [yes, no]  46  Have you ever been told by a doctor or other health professional that you had arthritis, also called rheumatism? [yes, no]  47  Have you ever been told by a doctor or other health professional that you had a heart problem? [yes, no]  48  Have you ever been told by a doctor or other health professional that you had hypertension, also called high blood pressure? [yes, no]  49  Have you ever been told by a doctor or other health professional that you had diabetes? [yes, no]  50  Have you ever been told by a doctor or other health professional that you had a kidney, bladder, or renal problem? [yes, no]  51  Have you ever been told by a doctor or other health professional that you had Multiple Sclerosis (MS), or Muscular Dystrophy (MD)? [yes, no]  55  In the last 12 months, about how often did you go to a theater to see a movie? [at least once a week, a few times a month, about once a month, a few times a year, once or twice a year, never]  56  In the last 12 months, about how often did you eat in a restaurant, not including take-out? [at least once a week, a few times a month, about once a month, a few times a year, once or twice a year, never]  57  In the last 12 months, about how often did you exercise, including walking for fitness, gardening, or running? [at least once a week, a few times a month, about once a month, a few times a year, once or twice a year, never]  58  In the last 30 days, how many times did you go to a theater to see a movie? [open-numeric]  59  In the last 30 days, how many times did you eat in a restaurant, not including take-out? [open-numeric]  60  In the last 30 days, how many times did you exercise, including walking for fitness, gardening, or running? [open-numeric]  61  Since January 2006, how many times have you seen a doctor, a dentist, or other health care professional about your own health at a doctor’s office, a clinic, or some other place? [open-numeric]  62  How did you arrive at your answer? [recall each visit and count them,estimate from how often you usually see a doctor, or just guess]  74  In what year were you born?  75  What is the highest level of education that you have completed?  83  Are you Spanish, Hispanic, or Latino?  84  What is your race? Would you say you are White, Black or African-American, Asian, [includes: Asian Indian, Chinese, Filipino, Japanese, Korean, Vietnamese] Pacific Islander, [includes: Native Hawaiian, Guamanian, Samoan], American Indian or Alaska Native, or Some other race?  85  Are you now married, living with a partner, widowed, divorced, separated, or never married?  86  Including yourself, how many people live in your home?  87  How many of these people are age 18 and under?  88  How many of them are currently in school?  #  Objective Questions  13  Do you have any family members or close friends who are serving or did serve in Iraq? [yes, no]  22  During the last two years, did you work as a volunteer for a political candidate running for national, state, or local office and got no pay at all or only a very small amount of pay for your work? [yes, no]  23  During the last two years, did you contribute money to a political candidate, a political party, a political action committee, or any other organization that supported political candidates? [yes, no]  24  During the last two years, did you work with others in your community or neighborhood to deal with some issue or problem? [yes, no]  25  During the last two years, did you contact a government official in person, by phone, or by letter about a problem or issue? [yes, no]  26  During the last two years, did you ever work as a volunteer for a political candidate running for national, state, or local office and got no pay at all or only a very small amount of pay for your work, or did you never do this? [did, never did this]  27  During the last two years, did you ever contribute money to a political candidate, a political party, a political action committee, or any other organization that supported political candidates, or did you never do this? [did, never did this]  28  During the last two years, did you ever work with others in your community or neighborhood to deal with some issue or problem, or did you never do this? [did, never did this]  29  During the last two years, did you ever contact a government official in person, by phone, or by letter about a problem or issue, or did you never do this? [did, never did this]  30  During the last two years, did you or did you not work as a volunteer for a political candidate running for national, state, or local office and got no pay at all or only a very small amount of pay for your work? [did, did not]  31  During the last two years, did you or did you not contribute money to a political candidate, a political party, a political action committee, or any other organization that supported political candidates? [did, did not]  32  During the last two years, did you or did you not work with others in your community or neighborhood to deal with some issue or problem? [did, did not]  33  During the last two years, did you or did you not contact a government official in person, by phone, or by letter about a problem or issue? [did, did not]  45  Are you or anyone in your household currently employed by a school or educational institution? [yes, no]  46  Have you ever been told by a doctor or other health professional that you had arthritis, also called rheumatism? [yes, no]  47  Have you ever been told by a doctor or other health professional that you had a heart problem? [yes, no]  48  Have you ever been told by a doctor or other health professional that you had hypertension, also called high blood pressure? [yes, no]  49  Have you ever been told by a doctor or other health professional that you had diabetes? [yes, no]  50  Have you ever been told by a doctor or other health professional that you had a kidney, bladder, or renal problem? [yes, no]  51  Have you ever been told by a doctor or other health professional that you had Multiple Sclerosis (MS), or Muscular Dystrophy (MD)? [yes, no]  55  In the last 12 months, about how often did you go to a theater to see a movie? [at least once a week, a few times a month, about once a month, a few times a year, once or twice a year, never]  56  In the last 12 months, about how often did you eat in a restaurant, not including take-out? [at least once a week, a few times a month, about once a month, a few times a year, once or twice a year, never]  57  In the last 12 months, about how often did you exercise, including walking for fitness, gardening, or running? [at least once a week, a few times a month, about once a month, a few times a year, once or twice a year, never]  58  In the last 30 days, how many times did you go to a theater to see a movie? [open-numeric]  59  In the last 30 days, how many times did you eat in a restaurant, not including take-out? [open-numeric]  60  In the last 30 days, how many times did you exercise, including walking for fitness, gardening, or running? [open-numeric]  61  Since January 2006, how many times have you seen a doctor, a dentist, or other health care professional about your own health at a doctor’s office, a clinic, or some other place? [open-numeric]  62  How did you arrive at your answer? [recall each visit and count them,estimate from how often you usually see a doctor, or just guess]  74  In what year were you born?  75  What is the highest level of education that you have completed?  83  Are you Spanish, Hispanic, or Latino?  84  What is your race? Would you say you are White, Black or African-American, Asian, [includes: Asian Indian, Chinese, Filipino, Japanese, Korean, Vietnamese] Pacific Islander, [includes: Native Hawaiian, Guamanian, Samoan], American Indian or Alaska Native, or Some other race?  85  Are you now married, living with a partner, widowed, divorced, separated, or never married?  86  Including yourself, how many people live in your home?  87  How many of these people are age 18 and under?  88  How many of them are currently in school?  #  Subjective Questions  1  Do you think the war in Iraq has helped, hurt, or had no effect on the image of the United States in the world? [helped, hurt, had no effect]  2  Do you think the Iraq war will turn out to be another Vietnam? [yes, no]  3  Over the next year, do you think that the U.S. military in Iraq will suffer more casualties or fewer casualties than it did in the last year? [more, fewer, the same (if volunteered)]  4  When do you think the United States will withdraw all of its troops from Iraq? [in less than a year, one to 3 years from now, more than 3 years from now]  5  Do you think Osama bin Laden is currently planning an attack against the United States? [yes, no]  6  Do you believe the US and its allies will defeat the Al Qaeda terrorist network? [yes, no]  7  How worried are you that there will be another terrorist attack on the United States? [very worried, somewhat worried, not very worried, not worried at all]  8  How worried are you that you or someone in your family will become a victim of terrorism in the United States? [very worried, somewhat worried, not very worried, not worried at all]  9  How do you now feel about continued US military involvement in the Iraq war? [strongly favor it, somewhat favor it, somewhat oppose it, strongly oppose it]  10  How do you now feel about continued US military involvement in the Iraq war? [favor, oppose]  11  How important is the Iraq war to you? [very important, somewhat important, not too important, not important at all]  12  Would you say your views on the Iraq war are mainly on one side of the issue, or are your views about this issue mixed? [mainly on one side, mixed]  14  How important do you think wiretapping is in maintaining the security of the United States? [very important, somewhat important, not very important, not at all important]  15  How do you feel about the government’s monitoring of telephone calls in the United States as a way to reduce the threat of terrorism? [strongly approve it, somewhat approve it, neither approve nor disapprove, somewhat disapprove, strongly disapprove it]  16  How important is it to you that the government protects Americans’ right to privacy? [very important, somewhat important, not very important, not at all important]  17  How concerned are you about losing your right to privacy as a result of the steps taken by the government to fight terrorism? [very concerned, somewhat concerned, not very concerned, not at all concerned]  18  How much do you favor or oppose the President authorizing wiretaps of Americans without prior court approval? [strongly favor it, somewhat favor it, somewhat oppose it, strongly oppose it]  19  Do you favor or oppose the President authorizing wiretaps of Americans without prior court approval? [favor, oppose]  20  How important is the wiretapping issue to you? [very important, somewhat important, not too important, not important at all]  21  Would you say your views on the wiretapping issue are mainly on one side of the issue, or are your views about this issue mixed? [mainly on one side, mixed]  34  How important do you think mathematics training in our elementary schools is to the economic success of the United States? [very important, somewhat important, not very important, not at all important]  35  Do you think having good mathematics skills has a positive effect, a negative effect, or no effect at all on the job opportunities available to a recent high school graduate? [positive effect, negative effect, no effect]  36  How much do you think a person’s standard of living in America depends on having good mathematics skills? [a lot on math skills, somewhat, not much, not at all on math skills]  37  Do you think that the mathematics skills of American elementary school students are better, worse, or about the same as those of elementary school students in countries such as Singapore and Japan? [better, worse, about the same]  38  How important do you think reading and writing training in our elementary schools is to the economic success of the United States? [very important, somewhat important, not very important, not at all important]  39  Do you think having good reading and writing skills has a positive effect, a negative effect, or no effect at all on the job opportunities available to a recent high school graduate? [positive effect, negative effect, no effect]  40  How much do you think a person’s standard of living in America depends on having good reading and writing skills? [a lot on reading and writing skills, somewhat, not much, not at all on reading and writing skills]  41  Do you think that the reading and writing skills of American elementary school students are better, worse, or about the same as those of elementary school students in countries such as Singapore and Japan? [better, worse, about the same]  42  If there were only resources for only one of these programs, which would you prefer the mathematics program or the reading and writing program? [mathematics, reading and writing]  43  How important is this choice between mathematics versus reading and writing to you? [very important, somewhat important, not very important, not at all important]  44  Would you say your views on the choice between more attention to mathematics versus reading and writing are mainly on one side of the issue, or are your views about this issue mixed? [mainly on one side, mixed]  52  How satisfied are you currently with your life as a whole? [very satisfied, somewhat satisfied, neither satisfied nor dissatisfied, somewhat dissatisfied, very dissatisfied]  53  Would you say that your physical health in general is excellent, very good, good, fair, or poor? [excellent, very good, good, fair, poor]  54  Would you say that your health in general is excellent, very good, good, fair, or poor? [excellent, very good, good, fair, poor]  63  How certain are you that you have seen a doctor, dentist or other health care professional [INSERT ANSWER TO 63 OR “zero” IF 63=0] times since January 2006? [very certain, somewhat certain, somewhat uncertain, or very uncertain]  64  How likely is it that you will eat fatty foods in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  65  How likely is it that you will eat sweets in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  66  How likely is it that you will not eat fatty foods in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  67  How likely is it that you will not eat sweets in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  68  How likely is it that you will avoid eating fatty foods in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  69  How likely is it that you will avoid eating sweets in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  70  How likely is it that you will eat fresh fruit, such as apples, strawberries, watermelon, or bananas, in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  71  How likely is it that you will eat fresh vegetables, such as lettuce, tomatoes, peppers, or spinach, in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  72  How likely is it that you will eat fresh fruit in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  73  How likely is it that you will eat fresh vegetables in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  76  Generally speaking, do you usually think of yourself as a Republican, a Democrat, an Independent, or something else?  77  Would you call yourself a strong Republican, or a not very strong Republican?  78  Would you call yourself a strong Democrat, or a not very strong Democrat?  79  Do you think of yourself as closer to the Republican party, or the Democratic party?  80  When it comes to politics, do you usually think of yourself as liberal, middle of the road, conservative, or haven’t you thought much about this?  81  Would you say you are extremely liberal, liberal, or slightly liberal?  82  Would you say you are extremely conservative, conservative, or slightly conservative?  #  Subjective Questions  1  Do you think the war in Iraq has helped, hurt, or had no effect on the image of the United States in the world? [helped, hurt, had no effect]  2  Do you think the Iraq war will turn out to be another Vietnam? [yes, no]  3  Over the next year, do you think that the U.S. military in Iraq will suffer more casualties or fewer casualties than it did in the last year? [more, fewer, the same (if volunteered)]  4  When do you think the United States will withdraw all of its troops from Iraq? [in less than a year, one to 3 years from now, more than 3 years from now]  5  Do you think Osama bin Laden is currently planning an attack against the United States? [yes, no]  6  Do you believe the US and its allies will defeat the Al Qaeda terrorist network? [yes, no]  7  How worried are you that there will be another terrorist attack on the United States? [very worried, somewhat worried, not very worried, not worried at all]  8  How worried are you that you or someone in your family will become a victim of terrorism in the United States? [very worried, somewhat worried, not very worried, not worried at all]  9  How do you now feel about continued US military involvement in the Iraq war? [strongly favor it, somewhat favor it, somewhat oppose it, strongly oppose it]  10  How do you now feel about continued US military involvement in the Iraq war? [favor, oppose]  11  How important is the Iraq war to you? [very important, somewhat important, not too important, not important at all]  12  Would you say your views on the Iraq war are mainly on one side of the issue, or are your views about this issue mixed? [mainly on one side, mixed]  14  How important do you think wiretapping is in maintaining the security of the United States? [very important, somewhat important, not very important, not at all important]  15  How do you feel about the government’s monitoring of telephone calls in the United States as a way to reduce the threat of terrorism? [strongly approve it, somewhat approve it, neither approve nor disapprove, somewhat disapprove, strongly disapprove it]  16  How important is it to you that the government protects Americans’ right to privacy? [very important, somewhat important, not very important, not at all important]  17  How concerned are you about losing your right to privacy as a result of the steps taken by the government to fight terrorism? [very concerned, somewhat concerned, not very concerned, not at all concerned]  18  How much do you favor or oppose the President authorizing wiretaps of Americans without prior court approval? [strongly favor it, somewhat favor it, somewhat oppose it, strongly oppose it]  19  Do you favor or oppose the President authorizing wiretaps of Americans without prior court approval? [favor, oppose]  20  How important is the wiretapping issue to you? [very important, somewhat important, not too important, not important at all]  21  Would you say your views on the wiretapping issue are mainly on one side of the issue, or are your views about this issue mixed? [mainly on one side, mixed]  34  How important do you think mathematics training in our elementary schools is to the economic success of the United States? [very important, somewhat important, not very important, not at all important]  35  Do you think having good mathematics skills has a positive effect, a negative effect, or no effect at all on the job opportunities available to a recent high school graduate? [positive effect, negative effect, no effect]  36  How much do you think a person’s standard of living in America depends on having good mathematics skills? [a lot on math skills, somewhat, not much, not at all on math skills]  37  Do you think that the mathematics skills of American elementary school students are better, worse, or about the same as those of elementary school students in countries such as Singapore and Japan? [better, worse, about the same]  38  How important do you think reading and writing training in our elementary schools is to the economic success of the United States? [very important, somewhat important, not very important, not at all important]  39  Do you think having good reading and writing skills has a positive effect, a negative effect, or no effect at all on the job opportunities available to a recent high school graduate? [positive effect, negative effect, no effect]  40  How much do you think a person’s standard of living in America depends on having good reading and writing skills? [a lot on reading and writing skills, somewhat, not much, not at all on reading and writing skills]  41  Do you think that the reading and writing skills of American elementary school students are better, worse, or about the same as those of elementary school students in countries such as Singapore and Japan? [better, worse, about the same]  42  If there were only resources for only one of these programs, which would you prefer the mathematics program or the reading and writing program? [mathematics, reading and writing]  43  How important is this choice between mathematics versus reading and writing to you? [very important, somewhat important, not very important, not at all important]  44  Would you say your views on the choice between more attention to mathematics versus reading and writing are mainly on one side of the issue, or are your views about this issue mixed? [mainly on one side, mixed]  52  How satisfied are you currently with your life as a whole? [very satisfied, somewhat satisfied, neither satisfied nor dissatisfied, somewhat dissatisfied, very dissatisfied]  53  Would you say that your physical health in general is excellent, very good, good, fair, or poor? [excellent, very good, good, fair, poor]  54  Would you say that your health in general is excellent, very good, good, fair, or poor? [excellent, very good, good, fair, poor]  63  How certain are you that you have seen a doctor, dentist or other health care professional [INSERT ANSWER TO 63 OR “zero” IF 63=0] times since January 2006? [very certain, somewhat certain, somewhat uncertain, or very uncertain]  64  How likely is it that you will eat fatty foods in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  65  How likely is it that you will eat sweets in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  66  How likely is it that you will not eat fatty foods in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  67  How likely is it that you will not eat sweets in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  68  How likely is it that you will avoid eating fatty foods in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  69  How likely is it that you will avoid eating sweets in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  70  How likely is it that you will eat fresh fruit, such as apples, strawberries, watermelon, or bananas, in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  71  How likely is it that you will eat fresh vegetables, such as lettuce, tomatoes, peppers, or spinach, in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  72  How likely is it that you will eat fresh fruit in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  73  How likely is it that you will eat fresh vegetables in the next couple of weeks? [very likely, somewhat likely, not very likely, or not likely at all]  76  Generally speaking, do you usually think of yourself as a Republican, a Democrat, an Independent, or something else?  77  Would you call yourself a strong Republican, or a not very strong Republican?  78  Would you call yourself a strong Democrat, or a not very strong Democrat?  79  Do you think of yourself as closer to the Republican party, or the Democratic party?  80  When it comes to politics, do you usually think of yourself as liberal, middle of the road, conservative, or haven’t you thought much about this?  81  Would you say you are extremely liberal, liberal, or slightly liberal?  82  Would you say you are extremely conservative, conservative, or slightly conservative?  Author notes AaronMaitland conducted this research as a senior survey methodologist at Westat and is currently a branch chief with the National Center for Health Statistics, Hyattsville, MD, USA. StanleyPresser is a Distinguished University Professor in the Joint Program in Survey Methodology and Department of Sociology at the University of Maryland, College Park, MD, USA. © The Author(s) 2017. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. All rights reserved. For Permissions, please email: journals.permissions@oup.com http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Survey Statistics and Methodology Oxford University Press

How do question evaluation methods compare in predicting problems observed in typical survey conditions?

Loading next page...
 
/lp/ou_press/how-do-question-evaluation-methods-compare-in-predicting-problems-5ONRZlEV0L
Publisher
Oxford University Press
Copyright
© The Author(s) 2017. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. All rights reserved. For Permissions, please email: journals.permissions@oup.com
ISSN
2325-0984
eISSN
2325-0992
D.O.I.
10.1093/jssam/smx036
Publisher site
See Article on Publisher Site

Abstract

Abstract This paper tests two hypotheses about how well five different methods—the Question Understanding AID (QUAID), the Survey Quality Predictor (SQP), expert review, the Questionnaire Appraisal System (QAS), and cognitive interviews—predict problems (as measured by missing data, behavior codes, and response latency) that occur in typical survey conditions. We find partial support for both the complementary methods hypothesis (using the evaluations methods together will yield the best prediction of problems) and the test environment hypothesis (the more directly the evaluation method observes the response process, the better it will predict problems). In addition, we find evidence that the methods perform somewhat differently for items measuring subjective as opposed to objective characteristics. 1. INTRODUCTION Questionnaire designers rely on various methods to assess their work. Yet studies have found that different methods of evaluating questions often lead to different conclusions (Presser and Blair, 1994; Willis, Schechter, and Whitaker, 1999; Rothgeb, Willis, and Forsyth, 2001; Yan, Kreuter, and Tourangeau, 2012). Thus it is important to understand how well the various question evaluation methods predict problems observed in typical survey conditions. As far as we know, however, only two studies have explored this issue: Forsyth, Rothgeb, and Willis (2004) examined how well assessments of twelve questions—by cognitive interviews, expert review, and the Questionnaire Appraisal System—predicted observable problems (as assessed by behavior coding of interviewer-respondent interactions) when the questions were administered in the field. And Blair, Ackerman, Piccinino, and Levenstein (2007) examined how well cognitive interview assessments of twenty-four questions predicted the items’ performance in the field (again, as assessed by behavior coding of interviewer-respondent interactions).1 We explore this issue in greater depth by comparing the performance of five testing methods (cognitive interviews, expert review, the Questionnaire Appraisal System, and two computer-based approaches—the Survey Quality Predictor and the Question Understanding Aid) in predicting four problems (respondent requests for clarification, initial adequate answers, response latency, and missing data) that arose during the administration of eighty-eight questions in a random digit dial telephone survey. In addition, we examine whether the methods performed differently for items that measured objective versus subjective matters. Our analysis is guided by two hypotheses. The first hypothesis stems from the finding that evaluation methods often disagree in their diagnoses of problems. This is frequently interpreted to mean that the methods are complementary and therefore it is better to use multiple methods together (Presser, Rothgeb, Couper, Lessler, Martin et al. 2004; Yan, Kreuter, and Tourangeau, 2012). We call this the “complementary methods hypothesis.” Our second hypothesis—the “test environment hypothesis”—proposes that methods that more closely observe the response process have an advantage over those that observe the process less closely or not at all. The response process is set within a sociocultural context, and some approaches, such as cognitive interviews, allow the researcher to observe the process in that context (Gerber and Wellens 1997; Miller 2011). Expert review does not directly observe the process, but draws on prior experience and research that may be informed by contextual considerations. The Questionnaire Appraisal System draws less on context, and computerized evaluations do so the least. 2. METHODS Our data come from the 2006 Joint Program in Survey Methodology (JPSM) Survey Practicum. The JPSM Survey Practicum is a two-semester course in which graduate students gain experience developing a questionnaire, sampling a population, collecting and analyzing data, and reporting results. The aim of the 2006 Practicum was to examine the reliability of survey responses. The questionnaire included four types of questions: two types of attitudinal questions—questions asking about relatively familiar issues (the Iraq war and wiretapping) and questions about an unfamiliar issue (a new school-based program in mathematics or English); quasi-attitudinal questions (e.g., self-ratings of health); behavioral questions (e.g., doctor visits, trips to movies and to restaurants); and demographic questions. A random digit dial sample of the noninstitutionalized adult population with landlines in the contiguous United States yielded 739 interviews (for an AAPOR response rate 2 of 24.8 percent). The computer-assisted interviewing was conducted by Westat telephone interviewers during the summer of 2006. 2.1 Independent Variables The eighty-eight questions shown in Appendix A were evaluated using a variety of methods. The appendix separates the fifty-two subjective measures from the thirty-six objective ones as this distinction has been shown to affect problems answering questions (e.g., Alwin 2007). Thus, we present our analyses separately for the two kinds of items. 2.1.1 Computer based systems We used two computer-based systems. The first, the Question Understanding Aid (QUAID),2 is based on computational models developed in the fields of computer science, computational linguistics, discourse processing, and cognitive science. The software identifies technical features of questions that have the potential to cause comprehension problems. It rates questions on five classes of comprehension problems: unfamiliar technical terms, vague or imprecise predicate or relative terms, vague or imprecise noun phrases, complex syntax, and working memory overload. QUAID identifies these problems by comparing the words in a question to several databases (e.g., Coltheart’s MRC Psycholinguistics Database). The second automated system, the Survey Quality Predictor (SQP), which is based on a meta-analysis of multitrait-multimethod (MTMM) studies, predicts the reliability, validity, method effects, and total quality of questions (Saris and Gallhofer 2007). Total quality is the product of reliability and validity. To use SQP, each question is coded according to the variables from the MTMM studies. One of the authors coded the questions using SQP 2.0.3 2.1.2 Expert Review Three reviewers, each of whom had either a PhD in survey methodology or a related discipline or more than five years of experience as a survey researcher, were given the questions and the following instructions: Question wordings, introductions associated with questions, and response categories are considered in scope for this evaluation. For each survey question, identify and briefly explain each specific problem you find. Please type a brief description of the problem immediately following the question in the attached document. You may observe multiple problems with a question. Please describe each one. You do not need to type anything after questions for which you do not observe a problem. This is an unstructured expert review as it did not provide a checklist or other specific guidance about how to conduct the evaluation. As far as we are aware, the relative frequency of structured versus unstructured expert reviews is not known. Likewise, we know of no evidence about the frequency with which subject matter experts are included with survey experts. 2.1.3 Forms Appraisal Students from a JPSM graduate level course on questionnaire design were asked to evaluate the questions using the Questionnaire Appraisal System (QAS).4 Students were assigned different sections of the questionnaire and each was asked to evaluate independently whether the questions had any of twenty-six potential problems. The form also called for a brief description of each problem found. 2.1.4 Cognitive Interviews About a month later, the same students who did the QAS coding conducted cognitive interviews of the questions. Ten students did cognitive interviews using only questions they had not coded with the QAS. The other seven students did cognitive interviews in which about one-third of the questions were questions they had coded with the QAS. In the language of Willis (2015), these constituted a single round (as opposed to an iterative round) of reparative (as opposed to interpretive) interviews. Each student was instructed to develop a cognitive protocol, including think-aloud exercises and probes, and then to interview four subjects, recruited from among their friends, neighbors, co-workers, or other convenient populations. Thus, there was substantial variability in the nature of the interviews (e.g., in the balance between concurrent and retrospective probing and in the content of the preplanned probes). All interviews were recorded so that the students could review the recordings when preparing reports that listed the problems they diagnosed (in their own words). Both the reports and audio tapes were turned in but our analysis is based only on the reports. There are many ways to conduct cognitive interviews (e.g., Beatty, Paul, and Willis 2007; Miller, Chepp, Willson, and Padilla 2014; Willis 2015). Our approach differs from a commonly recommended one to use a single protocol for each round of interviews. We know of no estimates of how often this approach is used as opposed to our approach (in which the protocols vary across interviews). But even with a single protocol, there is apt to be significant variability in how it is implemented. Indeed, cognitive interviewing is inherently unstandardized because “the objective is not to produce cookie-cutter responses to standard stimuli, but to enable our participants to provide rich elaborated information” (Willis 2015). Based partly on the finding that organizations using different protocols to test a common questionnaire arrived at similar conclusions (Willis, Schechter, and Whitaker 1999), we think our approach is likely to have yielded problems similar to those that would have been produced using a single protocol, though (as we recommend in our conclusion) this is an issue for future research. 2.1.5 Problem Coding The problems identified from QUAID, QAS, expert review and cognitive interviews were coded according to the scheme used by Presser and Blair (1994), which has four basic categories of problems: respondent semantic, respondent task, interviewer, and analysis. (SQP yields a single number that does not identify particular problem types.) Respondent semantic problems refer to respondents having difficulty understanding or remembering a question or having diverse understandings of the question’s meaning. They are divided into two types: problems due to the structure of the question or the questionnaire (for instance, item wordiness or connections between questions) and those due to the meaning of terms or concepts in the question. Respondent task problems are of three types: difficulty recalling information or formulating an answer; insufficient response categories; and question sensitivity. Interviewer problems refer to problems reading the question or difficulty understanding how to implement a question. Analysis problems involve difficulties confronted during data analysis (e.g., lack of variation in responses). Two research assistants decided which of the Presser-Blair categories best fit each of the 262 problems identified by the cognitive interviews. In a preliminary sample of similar problems double-coded by these research assistants, the overall inter-coder agreement, as measured by Cohen’s kappa, was 0.76, and the kappas by category were respondent semantic, 0.91; respondent task, 0.73; interviewer, 0.68; and analysis, 0.49. The 155 problems identified by expert review were coded into the Presser-Blair categories by the first author. A crosswalk (shown in table 1) was used to systematically code the QUAID and QAS problems into Presser-Blair categories. The first author then determined which problems matched across methods and assigned an identifier to each problem. Table 1. Crosswalk between QUAID, QAS, and Presser-Blair Codes Presser-Blair codes  QUAID  QAS  Semantic I: Structure of question or questionnaire  Information overload  Working memory overload      Structure/organization  Complex syntax  2a-b: Conflicting or inaccurate instructions, complicated instructions 3a: Wording: complex syntax    Transition problem      Semantic II: Meaning of words  Boundary lines  Vague or imprecise relative or technical term, vague or ambiguous noun phrase  3c-d: Vague, reference period    Technical term not understood  Unfamiliar technical term  3b & 7c. Technical term    Common term not understood        Double-barreled    4c. Double barreled  Respondent Task I: Recall/Response Formation  Recall/response difficult    5d. Computation    Recall/response impossible    5a-c: Knowledge, Attitude, Recall    Recall/response redundant        Recall/response resisted    4a-b: Inappropriate assumptions, Assumes constant behavior  Respondent Task II: Response Categories  Overlapping  Vague or imprecise relative or technical term or noun phrase  7d-e: Vague, Overlapping    Insufficient    7a & f: Open ended, Missing    Too fine distinction        Inappropriate    7b. Mismatch  Respondent Task III: Sensitivity  Sensitivity    6a-c: Sensitive content, Sensitive wording, Socially acceptable  Interviewer  Procedural        Reading problem    1a-c: What to read, Missing information, How to read    Coding answers to open      Analysis  Question answered same by all respondents        Question suggests answers        Acquiescence        Order of response categories            8. Other  Presser-Blair codes  QUAID  QAS  Semantic I: Structure of question or questionnaire  Information overload  Working memory overload      Structure/organization  Complex syntax  2a-b: Conflicting or inaccurate instructions, complicated instructions 3a: Wording: complex syntax    Transition problem      Semantic II: Meaning of words  Boundary lines  Vague or imprecise relative or technical term, vague or ambiguous noun phrase  3c-d: Vague, reference period    Technical term not understood  Unfamiliar technical term  3b & 7c. Technical term    Common term not understood        Double-barreled    4c. Double barreled  Respondent Task I: Recall/Response Formation  Recall/response difficult    5d. Computation    Recall/response impossible    5a-c: Knowledge, Attitude, Recall    Recall/response redundant        Recall/response resisted    4a-b: Inappropriate assumptions, Assumes constant behavior  Respondent Task II: Response Categories  Overlapping  Vague or imprecise relative or technical term or noun phrase  7d-e: Vague, Overlapping    Insufficient    7a & f: Open ended, Missing    Too fine distinction        Inappropriate    7b. Mismatch  Respondent Task III: Sensitivity  Sensitivity    6a-c: Sensitive content, Sensitive wording, Socially acceptable  Interviewer  Procedural        Reading problem    1a-c: What to read, Missing information, How to read    Coding answers to open      Analysis  Question answered same by all respondents        Question suggests answers        Acquiescence        Order of response categories            8. Other  Table 1. Crosswalk between QUAID, QAS, and Presser-Blair Codes Presser-Blair codes  QUAID  QAS  Semantic I: Structure of question or questionnaire  Information overload  Working memory overload      Structure/organization  Complex syntax  2a-b: Conflicting or inaccurate instructions, complicated instructions 3a: Wording: complex syntax    Transition problem      Semantic II: Meaning of words  Boundary lines  Vague or imprecise relative or technical term, vague or ambiguous noun phrase  3c-d: Vague, reference period    Technical term not understood  Unfamiliar technical term  3b & 7c. Technical term    Common term not understood        Double-barreled    4c. Double barreled  Respondent Task I: Recall/Response Formation  Recall/response difficult    5d. Computation    Recall/response impossible    5a-c: Knowledge, Attitude, Recall    Recall/response redundant        Recall/response resisted    4a-b: Inappropriate assumptions, Assumes constant behavior  Respondent Task II: Response Categories  Overlapping  Vague or imprecise relative or technical term or noun phrase  7d-e: Vague, Overlapping    Insufficient    7a & f: Open ended, Missing    Too fine distinction        Inappropriate    7b. Mismatch  Respondent Task III: Sensitivity  Sensitivity    6a-c: Sensitive content, Sensitive wording, Socially acceptable  Interviewer  Procedural        Reading problem    1a-c: What to read, Missing information, How to read    Coding answers to open      Analysis  Question answered same by all respondents        Question suggests answers        Acquiescence        Order of response categories            8. Other  Presser-Blair codes  QUAID  QAS  Semantic I: Structure of question or questionnaire  Information overload  Working memory overload      Structure/organization  Complex syntax  2a-b: Conflicting or inaccurate instructions, complicated instructions 3a: Wording: complex syntax    Transition problem      Semantic II: Meaning of words  Boundary lines  Vague or imprecise relative or technical term, vague or ambiguous noun phrase  3c-d: Vague, reference period    Technical term not understood  Unfamiliar technical term  3b & 7c. Technical term    Common term not understood        Double-barreled    4c. Double barreled  Respondent Task I: Recall/Response Formation  Recall/response difficult    5d. Computation    Recall/response impossible    5a-c: Knowledge, Attitude, Recall    Recall/response redundant        Recall/response resisted    4a-b: Inappropriate assumptions, Assumes constant behavior  Respondent Task II: Response Categories  Overlapping  Vague or imprecise relative or technical term or noun phrase  7d-e: Vague, Overlapping    Insufficient    7a & f: Open ended, Missing    Too fine distinction        Inappropriate    7b. Mismatch  Respondent Task III: Sensitivity  Sensitivity    6a-c: Sensitive content, Sensitive wording, Socially acceptable  Interviewer  Procedural        Reading problem    1a-c: What to read, Missing information, How to read    Coding answers to open      Analysis  Question answered same by all respondents        Question suggests answers        Acquiescence        Order of response categories            8. Other  2.2 Dependent Variables Our measures of problems in the field are of three kinds: two behavior codes, response latency, and missing data. Behavior coding involves the coding of behaviors that indicate a breakdown or potential problem with the question-answer process (Fowler 2011). Two results from behavior coding that are commonly used to assess survey questions are the percentage of respondents who provide an adequate answer to a question and the percentage of respondents who request clarification of the question. Although these behaviors may not always stem from problems with a question, Hess, Singer, and Bushery (1999) found that they were significant predictors of the reliability of questions. Response latency refers to how long it takes respondents to answer a question. Like behavior coding, response latencies provide a quantitative assessment of the difficulty respondents have with a question. The assumption is that problems with a question lead to slower response times, because resolution of the problems requires additional time (Basilli and Scott 1996). Draisma and Dijkstra (2004) show that longer response latencies were related to inaccurate responses. Item nonresponse is one of the most widely used indicators of data quality. Research suggests that question sensitivity and the cognitive effort needed to answer the question are two of the most important determinants of item nonresponse (Pickery and Loosveldt 2001; Shoemaker, Eicholz, and Skewes 2002). 2.2.1 Behavior coding Resource constraints led us to behavior code only a random subsample of 377 of the 739 interviews. Two research assistants coded 292 interviews and the first author coded eighty-five interviews. The coding scheme included interviewer codes (assessing the extent to which the interviewer read the question exactly and whether or not the interviewer probed or repeated the question) and respondent codes (indicating the adequacy of the respondent’s initial answer, whether the respondent requested clarification, whether the respondent used pauses or fillers, and whether the respondent interrupted the reading of the question). Approximately six percent of the cases were double coded, and Cohen’s kappa indicated very high agreement (0.9) for the two codes we use in our analysis: initial adequate answers and requests for clarification. 2.2.2 Response latency Again, due to resource constraints, only the first 111 cases that were behavior coded were selected for response latency measurement: the elapsed time from the end of the interviewer’s reading of a question to the beginning of the respondent’s answer. One research assistant coded eighty-eight interviews, and the first author coded the remaining twenty-three interviews. The 111 cases were asked an average of forty-six questions (many of the survey’s eighty-eight, questions were part of wording experiments asked of random half samples) for a total of 5,102 possible response latencies from which we dropped about five percent (287) because the respondent interrupted the reading of the question. The latencies were highly skewed, and thus, we assigned observations beyond the upper and lower one percentile the exact values of the upper and lower one percentile, respectively. We then log transformed the latencies for use in our statistical models. 2.2.3 Item nonresponse We coded responses to each survey question as zero if the respondent provided a valid answer and one if the respondent gave an invalid answer such as “don’t know” or refused to answer. 3. RESULTS 3.1 Descriptive Statistics Our primary focus is how well the assessments from the ex-ante (QUAID, SQP, Expert Review, and QAS) and laboratory (cognitive interview) methods predicted problems observed under actual survey conditions, as measured by behavior coding, item nonresponse, and response latency. Descriptive statistics for the four dependent variables are shown in table 2. As can be seen there, nonresponse was higher and more variable to the subjective items than to the objective ones. The means and variances of the remaining three dependent variables were relatively similar across the two types of items. Table 2. Means, Standard Deviations, and Ranges for Dependent Variables   Objective questions (n = 36)   Subjective questions (n = 52)   Dependent variable  Mean  Standard deviation  Min  Max  Mean  Standard deviation  Min  Max  % adequate answers  84.06  14.03  41.97  97.16  74.35  10.2  40  91.8  % requests for clarification  5.62  5.48  0.53  25.33  7.35  5.85  0  32.76  Log response latency (milliseconds)  7.16  0.63  6.15  8.86  7.36  0.47  6.2  8.41  % item nonresponse  0.92  0.88  0  2.94  4.51  5.94  0  23.08    Objective questions (n = 36)   Subjective questions (n = 52)   Dependent variable  Mean  Standard deviation  Min  Max  Mean  Standard deviation  Min  Max  % adequate answers  84.06  14.03  41.97  97.16  74.35  10.2  40  91.8  % requests for clarification  5.62  5.48  0.53  25.33  7.35  5.85  0  32.76  Log response latency (milliseconds)  7.16  0.63  6.15  8.86  7.36  0.47  6.2  8.41  % item nonresponse  0.92  0.88  0  2.94  4.51  5.94  0  23.08  Table 2. Means, Standard Deviations, and Ranges for Dependent Variables   Objective questions (n = 36)   Subjective questions (n = 52)   Dependent variable  Mean  Standard deviation  Min  Max  Mean  Standard deviation  Min  Max  % adequate answers  84.06  14.03  41.97  97.16  74.35  10.2  40  91.8  % requests for clarification  5.62  5.48  0.53  25.33  7.35  5.85  0  32.76  Log response latency (milliseconds)  7.16  0.63  6.15  8.86  7.36  0.47  6.2  8.41  % item nonresponse  0.92  0.88  0  2.94  4.51  5.94  0  23.08    Objective questions (n = 36)   Subjective questions (n = 52)   Dependent variable  Mean  Standard deviation  Min  Max  Mean  Standard deviation  Min  Max  % adequate answers  84.06  14.03  41.97  97.16  74.35  10.2  40  91.8  % requests for clarification  5.62  5.48  0.53  25.33  7.35  5.85  0  32.76  Log response latency (milliseconds)  7.16  0.63  6.15  8.86  7.36  0.47  6.2  8.41  % item nonresponse  0.92  0.88  0  2.94  4.51  5.94  0  23.08  With the exception of the SQP score, the independent variables in the model refer to the number of times that a method diagnosed a particular problem type (see table 1) for a question. Descriptive statistics for each of these diagnoses (as well as for the SQP score) are shown in table 3. Overall, QAS is most apt to identify problems, followed by cognitive interviews, expert review, and QUAID. Table 3. Means, Standard Deviations, and Ranges for Independent Variables   Objective questions (n = 36)   Subjective questions (n = 52)   Variable  Mean  Standard deviation  Min  Max  Mean  Standard deviation  Min  Max  Semantic I problems: question structure   QUAID  0.61  0.55  0  2  0.54  0.58  0  2   Expert review  0.50  0.81  0  3  0.25  0.56  0  2   QAS  0.83  1.08  0  4  0.56  0.85  0  3   Cognitive  interviews  0  0  0  0  0.19  0.49  0  2  Semantic II problems: meaning   QUAID  2.06  1.41  0  5  1.46  1.13  0  4   Expert review  0.61  0.84  0  3  0.87  0.89  0  3   QAS  3.56  2.32  0  8  3.38  1.59  0  8   Cognitive  interviews  1.11  1.14  0  5  2.00  2.04  0  8  Respondent task I problems: recall/judgment   QUAID  –  –  –  –  –  –  –  –   Expert review  0.61  0.64  0  2  0.21  0.46  0  2   QAS  1.19  1.31  0  5  2.06  1.84  0  6   Cognitive  interviews  0.67  1.55  0  7  0.87  1.51  0  6  Respondent task II problems: response categories   QUAID  0.11  0.52  0  3  0.23  0.43  0  1   Expert review  0.17  0.45  0  2  0.04  0.19  0  1   QAS  0.25  0.65  0  3  0.92  1.08  0  3   Cognitive  interviews  0.19  0.71  0  3  0.27  0.6  0  3  Respondent task III problems: sensitivity   QUAID  –  –  –  –  –  –  –  –   Expert review  0  0  0  0  0.15  0.41  0  2   QAS  0.78  0.76  0  3  0.85  0.80  0  3   Cognitive  interviews  0.44  0.81  0  2  0.04  0.19  0  1  Interviewer problems   QUAID  –  –  –  –  –  –  –  –   Expert review  0  0  0  0  0  0  0  0   QAS  0.08  0.28  0  1  0.02  0.14  0  1   Cognitive  interviews  0  0  0  0  0  0  0  0  Analysis problems   QUAID  –  –  –  –  –  –  –  –   Expert review  0.11  0.32  0  1  0.08  0.27  0  1   QAS  0  0  0  0  0.06  0.31  0  2   Cognitive  interviews  0  0  0  0  0  0  0  0  Other method   SQP total  quality score  0.6  0.06  0.49  0.71  0.56  0.05  0.46  0.63    Objective questions (n = 36)   Subjective questions (n = 52)   Variable  Mean  Standard deviation  Min  Max  Mean  Standard deviation  Min  Max  Semantic I problems: question structure   QUAID  0.61  0.55  0  2  0.54  0.58  0  2   Expert review  0.50  0.81  0  3  0.25  0.56  0  2   QAS  0.83  1.08  0  4  0.56  0.85  0  3   Cognitive  interviews  0  0  0  0  0.19  0.49  0  2  Semantic II problems: meaning   QUAID  2.06  1.41  0  5  1.46  1.13  0  4   Expert review  0.61  0.84  0  3  0.87  0.89  0  3   QAS  3.56  2.32  0  8  3.38  1.59  0  8   Cognitive  interviews  1.11  1.14  0  5  2.00  2.04  0  8  Respondent task I problems: recall/judgment   QUAID  –  –  –  –  –  –  –  –   Expert review  0.61  0.64  0  2  0.21  0.46  0  2   QAS  1.19  1.31  0  5  2.06  1.84  0  6   Cognitive  interviews  0.67  1.55  0  7  0.87  1.51  0  6  Respondent task II problems: response categories   QUAID  0.11  0.52  0  3  0.23  0.43  0  1   Expert review  0.17  0.45  0  2  0.04  0.19  0  1   QAS  0.25  0.65  0  3  0.92  1.08  0  3   Cognitive  interviews  0.19  0.71  0  3  0.27  0.6  0  3  Respondent task III problems: sensitivity   QUAID  –  –  –  –  –  –  –  –   Expert review  0  0  0  0  0.15  0.41  0  2   QAS  0.78  0.76  0  3  0.85  0.80  0  3   Cognitive  interviews  0.44  0.81  0  2  0.04  0.19  0  1  Interviewer problems   QUAID  –  –  –  –  –  –  –  –   Expert review  0  0  0  0  0  0  0  0   QAS  0.08  0.28  0  1  0.02  0.14  0  1   Cognitive  interviews  0  0  0  0  0  0  0  0  Analysis problems   QUAID  –  –  –  –  –  –  –  –   Expert review  0.11  0.32  0  1  0.08  0.27  0  1   QAS  0  0  0  0  0.06  0.31  0  2   Cognitive  interviews  0  0  0  0  0  0  0  0  Other method   SQP total  quality score  0.6  0.06  0.49  0.71  0.56  0.05  0.46  0.63  Note: QUAID entries with a “–” represent a problem that method is not designed to identify. Table 3. Means, Standard Deviations, and Ranges for Independent Variables   Objective questions (n = 36)   Subjective questions (n = 52)   Variable  Mean  Standard deviation  Min  Max  Mean  Standard deviation  Min  Max  Semantic I problems: question structure   QUAID  0.61  0.55  0  2  0.54  0.58  0  2   Expert review  0.50  0.81  0  3  0.25  0.56  0  2   QAS  0.83  1.08  0  4  0.56  0.85  0  3   Cognitive  interviews  0  0  0  0  0.19  0.49  0  2  Semantic II problems: meaning   QUAID  2.06  1.41  0  5  1.46  1.13  0  4   Expert review  0.61  0.84  0  3  0.87  0.89  0  3   QAS  3.56  2.32  0  8  3.38  1.59  0  8   Cognitive  interviews  1.11  1.14  0  5  2.00  2.04  0  8  Respondent task I problems: recall/judgment   QUAID  –  –  –  –  –  –  –  –   Expert review  0.61  0.64  0  2  0.21  0.46  0  2   QAS  1.19  1.31  0  5  2.06  1.84  0  6   Cognitive  interviews  0.67  1.55  0  7  0.87  1.51  0  6  Respondent task II problems: response categories   QUAID  0.11  0.52  0  3  0.23  0.43  0  1   Expert review  0.17  0.45  0  2  0.04  0.19  0  1   QAS  0.25  0.65  0  3  0.92  1.08  0  3   Cognitive  interviews  0.19  0.71  0  3  0.27  0.6  0  3  Respondent task III problems: sensitivity   QUAID  –  –  –  –  –  –  –  –   Expert review  0  0  0  0  0.15  0.41  0  2   QAS  0.78  0.76  0  3  0.85  0.80  0  3   Cognitive  interviews  0.44  0.81  0  2  0.04  0.19  0  1  Interviewer problems   QUAID  –  –  –  –  –  –  –  –   Expert review  0  0  0  0  0  0  0  0   QAS  0.08  0.28  0  1  0.02  0.14  0  1   Cognitive  interviews  0  0  0  0  0  0  0  0  Analysis problems   QUAID  –  –  –  –  –  –  –  –   Expert review  0.11  0.32  0  1  0.08  0.27  0  1   QAS  0  0  0  0  0.06  0.31  0  2   Cognitive  interviews  0  0  0  0  0  0  0  0  Other method   SQP total  quality score  0.6  0.06  0.49  0.71  0.56  0.05  0.46  0.63    Objective questions (n = 36)   Subjective questions (n = 52)   Variable  Mean  Standard deviation  Min  Max  Mean  Standard deviation  Min  Max  Semantic I problems: question structure   QUAID  0.61  0.55  0  2  0.54  0.58  0  2   Expert review  0.50  0.81  0  3  0.25  0.56  0  2   QAS  0.83  1.08  0  4  0.56  0.85  0  3   Cognitive  interviews  0  0  0  0  0.19  0.49  0  2  Semantic II problems: meaning   QUAID  2.06  1.41  0  5  1.46  1.13  0  4   Expert review  0.61  0.84  0  3  0.87  0.89  0  3   QAS  3.56  2.32  0  8  3.38  1.59  0  8   Cognitive  interviews  1.11  1.14  0  5  2.00  2.04  0  8  Respondent task I problems: recall/judgment   QUAID  –  –  –  –  –  –  –  –   Expert review  0.61  0.64  0  2  0.21  0.46  0  2   QAS  1.19  1.31  0  5  2.06  1.84  0  6   Cognitive  interviews  0.67  1.55  0  7  0.87  1.51  0  6  Respondent task II problems: response categories   QUAID  0.11  0.52  0  3  0.23  0.43  0  1   Expert review  0.17  0.45  0  2  0.04  0.19  0  1   QAS  0.25  0.65  0  3  0.92  1.08  0  3   Cognitive  interviews  0.19  0.71  0  3  0.27  0.6  0  3  Respondent task III problems: sensitivity   QUAID  –  –  –  –  –  –  –  –   Expert review  0  0  0  0  0.15  0.41  0  2   QAS  0.78  0.76  0  3  0.85  0.80  0  3   Cognitive  interviews  0.44  0.81  0  2  0.04  0.19  0  1  Interviewer problems   QUAID  –  –  –  –  –  –  –  –   Expert review  0  0  0  0  0  0  0  0   QAS  0.08  0.28  0  1  0.02  0.14  0  1   Cognitive  interviews  0  0  0  0  0  0  0  0  Analysis problems   QUAID  –  –  –  –  –  –  –  –   Expert review  0.11  0.32  0  1  0.08  0.27  0  1   QAS  0  0  0  0  0.06  0.31  0  2   Cognitive  interviews  0  0  0  0  0  0  0  0  Other method   SQP total  quality score  0.6  0.06  0.49  0.71  0.56  0.05  0.46  0.63  Note: QUAID entries with a “–” represent a problem that method is not designed to identify. 3.2 Multivariate Results Table 4 shows the Pearson correlations among the diagnoses for the objective items, and table 5 shows the same correlations for the subjective items. There are fewer entries in the former table as, among the objective items, cognitive interviews never identified a question structure problem, expert review never identified a sensitivity problem, and QAS never identified an analysis problem. For the objective items, among the 19 pairs of methods in which more than one method made a particular diagnosis (bolded in the table), 12 of the associations are statistically significant, with the significant correlations ranging between .33 and .81 and all being in the expected direction. For the subjective items, among the 25 pairs of methods in which more than one method made a particular diagnosis (bolded in the table), only 9 of the associations are statistically significant, with the significant correlations ranging between .29 and .68 and one of them being in the wrong direction. Thus consistent with findings from prior studies, there is only modest agreement between evaluation methods, though the agreement is higher for objective items than for subjective ones. Table 4. Pearson Correlations among the Method Assessments for the Objective Items   Variable  Semantic I   Semantic II   Respondent task I   Respondent task II   Respondent task III   Interviewer   Analysis   Survey       Quaid  Expert review  QAS  Cog. Int.  Quaid  Expert review  QAS  Cog. Int.  Expert review  QAS  Cog. Int.  Quaid  Expert review  QAS  Cog. Int.  Expert review  QAS  Cog. Int.  QAS  Expert review  QAS  Quality Predictor  Semantic I  Quaid    0.39*  0.51*  –  0.36*  0.22  0.53*  0.39*  0.37*  0.43*  0.35*  0.25  0.04  −0.12  0.20  –  −0.01  0.01  0.03  0.09  –  0.26    Exp. Rev      0.81*  –  0.17  0.08  0.41*  0.06  0.38*  0.18  −0.18  0.07  −0.16  −0.08  −0.12  –  −0.28  −0.35*  0.06  0.22  –  0.42*    QAS        –  0.42*  0.12  0.40*  0.11  0.31  0.27  0.00  0.19  −0.06  −0.06  −0.03  –  −0.25  −0.21  0.05  0.06  –  0.30    Cog. Int.          –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  Semantic II  Quaid            0.28  0.16  0.26  0.21  0.09  0.10  −0.09  0.12  −0.14  0.05  –  −0.12  0.20  −0.30  0.05  –  −0.03    Exp. Rev              0.64*  0.61*  0.03  0.15  −0.32  −0.16  −0.28  −0.29  −0.21  –  −0.18  0.01  −0.10  0.27  –  0.17    QAS                0.63*  0.21  0.32  −0.12  0.04  −0.31  −0.28  −0.15  –  −0.04  0.02  0.19  0.26  –  0.25    Cog. Int.                  −0.09  0.08  −0.11  −0.12  −0.21  −0.31  −0.06  –  −0.14  0.29  −0.12  0.12  –  −0.04  Respndt I  Exp. Rev                    0.67*  0.44*  0.13  −0.07  −0.24  −0.02  –  −0.01  −0.26  −0.13  0.22  –  0.23    QAS                      0.33*  0.13  −0.06  −0.26  −0.01  –  −0.04  −0.25  −0.12  0.22  –  0.35*    Cog. Int.                        0.58*  0.45*  −0.06  0.53*  –  0.20  0.19  −0.13  −0.15  –  −0.18  Respndt II  Quaid                          0.41*  −0.08  0.40*  –  0.14  0.02  −0.07  −0.08  –  −0.06    Exp. Rev                            0.25  0.52*  –  0.11  −0.05  −0.11  −0.13  –  −0.19    QAS                              0.08  –  0.46*  −0.22  −0.12  −0.14  –  −0.02    Cog. Int.                                -  0.24  0.14  −0.08  −0.10  –  −0.11  Respndt III  Exp. Rev                                  –  –  –  –  –  –    QAS                                    0.30  −0.04  0.22  –  −0.25    Cog. Int.                                      0.08  −0.20  –  −0.73*  Interviewer  QAS                                        −0.11  –  −0.08  Analysis  Exp. Rev                                          –  0.24    QAS                                            −  Survey Quality Predictor                                                Variable  Semantic I   Semantic II   Respondent task I   Respondent task II   Respondent task III   Interviewer   Analysis   Survey       Quaid  Expert review  QAS  Cog. Int.  Quaid  Expert review  QAS  Cog. Int.  Expert review  QAS  Cog. Int.  Quaid  Expert review  QAS  Cog. Int.  Expert review  QAS  Cog. Int.  QAS  Expert review  QAS  Quality Predictor  Semantic I  Quaid    0.39*  0.51*  –  0.36*  0.22  0.53*  0.39*  0.37*  0.43*  0.35*  0.25  0.04  −0.12  0.20  –  −0.01  0.01  0.03  0.09  –  0.26    Exp. Rev      0.81*  –  0.17  0.08  0.41*  0.06  0.38*  0.18  −0.18  0.07  −0.16  −0.08  −0.12  –  −0.28  −0.35*  0.06  0.22  –  0.42*    QAS        –  0.42*  0.12  0.40*  0.11  0.31  0.27  0.00  0.19  −0.06  −0.06  −0.03  –  −0.25  −0.21  0.05  0.06  –  0.30    Cog. Int.          –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  Semantic II  Quaid            0.28  0.16  0.26  0.21  0.09  0.10  −0.09  0.12  −0.14  0.05  –  −0.12  0.20  −0.30  0.05  –  −0.03    Exp. Rev              0.64*  0.61*  0.03  0.15  −0.32  −0.16  −0.28  −0.29  −0.21  –  −0.18  0.01  −0.10  0.27  –  0.17    QAS                0.63*  0.21  0.32  −0.12  0.04  −0.31  −0.28  −0.15  –  −0.04  0.02  0.19  0.26  –  0.25    Cog. Int.                  −0.09  0.08  −0.11  −0.12  −0.21  −0.31  −0.06  –  −0.14  0.29  −0.12  0.12  –  −0.04  Respndt I  Exp. Rev                    0.67*  0.44*  0.13  −0.07  −0.24  −0.02  –  −0.01  −0.26  −0.13  0.22  –  0.23    QAS                      0.33*  0.13  −0.06  −0.26  −0.01  –  −0.04  −0.25  −0.12  0.22  –  0.35*    Cog. Int.                        0.58*  0.45*  −0.06  0.53*  –  0.20  0.19  −0.13  −0.15  –  −0.18  Respndt II  Quaid                          0.41*  −0.08  0.40*  –  0.14  0.02  −0.07  −0.08  –  −0.06    Exp. Rev                            0.2