Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

The Importance of House Effects for Repeated Public Opinion Surveys

The Importance of House Effects for Repeated Public Opinion Surveys Results of public opinion surveys on the same topic can diverge for various reasons, for example, different survey timings, different operationalizations of the objects of investigation, different target populations, or the fact that the surveys are conducted by different survey agencies (“houses”). The latter phenomenon is conventionally referred to as “house effects” (Smith, 1978, 1982, 2011; Weisberg, 2005), which can occur even if the survey agencies use identical question wordings or target populations (cf., Converse & Traugott, 1986; Erikson & Wlezien, 2001; Flores, 2015; Lau, 1994; Traugott, 2005; Wlezien & Erikson, 2007). However, although the existence of house-related differences in survey results has been well known in public opinion research since the two studies by Tom W. Smith from 1978 and 1982, respectively (Smith, 1978, 1982), little work has been done on the relevance of house effects in repeated surveys. Furthermore, the studies that investigated house effects in repeated surveys were based on secondary data which were primarily collected to measure public opinion on certain topics rather than to address methodological research questions (Smith, 1978; Wright, Farrar, & Russell, 2014).1 Thus, in these studies, it was impossible to control for factors influencing differences between houses such as interviewer instructions, sampling procedures, order and position of questions in the questionnaire or basic parameters to be fulfilled by the houses during the field work. Against this background, we aim to contribute to resolving the issue of house effects by using primary data to address the methodological research question of whether house effects systematically affect the results of repeated public opinion surveys. We simultaneously commissioned two different survey agencies—one located in the south of Germany (and thus hereinafter referred to as “Inst_South”), the other in the north of Germany2 (and thus hereinafter referred to as “Inst_North”)—with the administration of computer-assisted telephone interviews (CATI) in Germany in 2015, 2016, and 2017. In each of the 3 years, the survey agencies received an identical questionnaire on respondents’ perceptions of the German energy transition (known as the Energiewende), identical interviewer as well as sampling instructions, an identical set of basic parameters to be fulfilled during the fielding (cf., Supplementary Online File), and were not informed about each other in order to ensure the meaningfulness of our analysis. To the best of our knowledge, house effects have not been examined up to now in repeated public opinion surveys conducted over 3 years by the same two survey agencies using identical questionnaires and identical interviewer and sampling instructions.3 This standardization allows us to examine whether any differences in the survey results between the two agencies at a specific point in time are replicated over the other two time points; this factor is expected to increase confidence in our analysis. The aim of our analysis is to explore which “equivalence problems” (Weisberg, 2005, p. 299) or “comparison errors” (Smith, 2011, p. 475) might be caused by house effects in repeated surveys. This is particularly important for assessing the total survey quality, that is, “quality from both the producer and user perspectives” (Biemer, 2010, p. 818). If data from repeated surveys are not comparable, they lack total survey quality even if they are accurate, because they are unfit for use (Biemer, 2010). Our analyses should provide insights for survey designers as to whether changing survey agencies for a repeated survey undermines the comparability of survey results over time and thereby diminishes the total survey quality. Methods In all three survey years, Inst_South completed 1,000 interviews. American Association for Public Opinion Research (AAPOR) Response Rate 5 for the surveys was, for Inst_South, 23.8% (2015), 23.0% (2016) and 23.4% (2017). Inst_North completed 1013 (2015), 1007 (2016) and 1004 (2017) interviews, respectively. The AAPOR Response Rate 5 was, for Inst_North, 21.5% (2015), 18.4% (2015), and 18.7% (2017). Thus, in all 3 years, Response Rate 5 was higher and over time more stable for Inst_South than for Inst_North.4 In order to examine possible trends in house effects, we compared two indicators of data quality, that is, survey satisficing and task simplification as well as point estimates and correlations across the two survey agencies for the 3 years 2015, 2016, and 2017. Survey Satisficing To measure survey satisficing, we chose three indicators: item nonresponse, acquiescence, and nondifferentiation (cf., Krosnick, 1991, 1999, 2000; Medway & Tourangeau, 2015). Item nonresponse was analyzed on the basis of all substantive variables in the survey (2015: 111 items, 2016: 115 items, and 2017: 114 items).5 For each respondent, we calculated the number of items that he or she left blank or where he or she chose the “Don’t know” option. On the basis of all the observations (n = 6,024), we regressed (ordinary least squares) the total number of item nonresponses on a factor variable for “year” and a factor variable for “survey agency,” specifying an interaction between the two factor variables.6 Subsequent to the estimation of this model, we compared the average adjusted predictions of the survey agencies for each year using Stata’s “margins, contrast” routine (StataCorp, 2015b), where we accounted for multiple comparisons by applying the Sidak alpha correction method (Abdi, 2007).7 This analytical strategy was also applied to the following two indicators (i.e., acquiescence and nondifferentiation). Acquiescence was measured on the basis of all items from the rating scales in a survey (2015: 41 items; 2016: 39 items; 2017: 61 items). For each survey agency and year, we counted the total number of responses to rating scales in the questionnaire where respondents agreed to the prescribed answer by choosing scale points 1–3 on the 7-point answer scale. In order to analyze acquiescence, we applied the same analytical strategy as with item nonresponse. Nondifferentiation was measured on the basis of all five-item batteries in a survey (2015: 83 items in 5-item batteries; 2016: 86 items in 7-item batteries; 2017: 64 items in 10-item batteries). For each of the item batteries, we flagged respondents who chose an identical answer option for all items in a respective item battery. For each respondent, we calculated the number of item batteries in which he or she answered items using an identical answer option. This variable can take values between zero (respondents applied nondifferentiation to none of the five-item batteries) and five (respondents applied nondifferentiation to all of the five-item batteries). Task Simplification Employees of a survey agency might be motivated to “reduce [the] time and effort necessary to complete interviews or to realize the required sample size” (in a process known as task simplification; Blasius & Thiessen, 2015, p. 481), for example, “by fabricating interviews through copy-and-paste procedures” (Blasius & Thiessen, 2015, p. 480). To measure task simplification, we screened the dataset for nonunique records (NUR) as applied by Slomczynski, Powalko, and Krauze (2017). Slomczynski et al. (2017) defined NUR as a sequence of all values in a given case (record), which is identical to that of another case in the same dataset. For each survey agency, NUR was generated on the basis of all substantial items in a survey year (2015: 105 items; 2016: 109 items; 2017: 107 items). In other words, variables capturing sociodemographic characteristics were not included in the analysis. Comparisons of Point Estimates Furthermore, we compared point estimates of the variables of the energy-related core questions, that is, questions posed in all survey years using identical wordings, for the three survey years. These questions measured the respondents’ attitudes toward the use of energy sources, their understanding of different aspects of the Energiewende, their ranking of these aspects, and their awareness of energy topics. In order to examine the variables with a 7-point rating scale, we applied two-sample t-tests testing the null hypothesis that the mean value of a particular variable in a year does not differ between the two survey agencies. For the dichotomous variables, we calculated Phi values based on chi-square statistics. Among the ranking variables, as well as trichotomous variables, we applied a chi-square test of homogeneity testing the null hypothesis that the distribution of a particular variable is identical across the samples of both agencies. In all of these tests, we accounted for multiple comparisons by applying alpha adjustment according to Sidak (Abdi, 2007). Comparison of Relationships For each survey agency and year, we calculated the correlations (Pearson) between respondents’ age and all 19 survey items that were measured on a 7-point rating scale. These items measured respondents’ attitudes toward the use of energy sources, interest in politics, satisfaction with political decisions, attitudes toward technology, and attitudes regarding the vulnerability of nature. Furthermore, for each year, we performed an asymptotic chi-square test for the equality of the two agency-specific correlation matrices introduced by Jennrich (1970) and proposed by Mardia, Kent and Bibby (1979). The null hypothesis of this test assumes equal correlation matrices across both survey agencies and was rejected in each year as the resulting Jennrich chi-square test statistic is significant (p < .001, adjusted for three comparisons). Results Survey Satisficing Item nonresponse On average, participants in the survey conducted by Inst_South left 5,148 items blank in 2015, while participants in the survey conducted by Inst_North left 4,887 items blank (cf. Table 1, Panel 1). However, this tendency does not hold for all three survey years (Inst_South: 2016 = 0.379, 2017 = 0.315; Inst_North: 2016 = 0.366, 2017 = 0.375). While average item nonresponse in 2015 was significantly higher (p < .01) for Inst_South compared to Inst_North, this indicator did not differ significantly between the agencies in 2016 and 2017 (p > .05). Table 1. Indicators of Survey Satisficing Survey 2015 Survey 2016 Survey 2017 Panel 1: Item nonresponse—Average adjusted predicted number of items left blank Inst_South 5.148 0.379 0.315 Inst_North 4.887 0.366 0.375 Difference (Inst_South—Inst_North) 0.260 0.013 0.060 p-value .001 .997 .781 Degrees of freedom 2011 2005 2002 Test power (α = .05) .605 .085 .749 Panel 2: Acquiescence—Average adjusted predicted number of items that respondents agreed with Inst_South 12.726 14.928 22.839 Inst_North 14.162 17.215 24.560 Difference (Inst_South—Inst_North) −1.436 −2.287 −1.721 p-value .000 .000 .000 Degrees of freedom 2011 2005 2002 Test power (α = .05) 1.000 1.000 1.000 Panel 3: Nondifferentiation—Average number of item batteries whose items were answered by choosing the same answer category Inst_South 0.196 0.634 1.149 Inst_North 0.014 0.252 0.478 Difference (Inst_South—Inst_North) 0.182 0.382 0.671 p-value .000 .000 .000 Degrees of freedom 2011 2005 2002 Test power (α = .05) 1.000 1.000 1.000 Survey 2015 Survey 2016 Survey 2017 Panel 1: Item nonresponse—Average adjusted predicted number of items left blank Inst_South 5.148 0.379 0.315 Inst_North 4.887 0.366 0.375 Difference (Inst_South—Inst_North) 0.260 0.013 0.060 p-value .001 .997 .781 Degrees of freedom 2011 2005 2002 Test power (α = .05) .605 .085 .749 Panel 2: Acquiescence—Average adjusted predicted number of items that respondents agreed with Inst_South 12.726 14.928 22.839 Inst_North 14.162 17.215 24.560 Difference (Inst_South—Inst_North) −1.436 −2.287 −1.721 p-value .000 .000 .000 Degrees of freedom 2011 2005 2002 Test power (α = .05) 1.000 1.000 1.000 Panel 3: Nondifferentiation—Average number of item batteries whose items were answered by choosing the same answer category Inst_South 0.196 0.634 1.149 Inst_North 0.014 0.252 0.478 Difference (Inst_South—Inst_North) 0.182 0.382 0.671 p-value .000 .000 .000 Degrees of freedom 2011 2005 2002 Test power (α = .05) 1.000 1.000 1.000 Note. p-values based on two-sided hypotheses, adjusted for each indicator for three comparisons (Sidak). Open in new tab Table 1. Indicators of Survey Satisficing Survey 2015 Survey 2016 Survey 2017 Panel 1: Item nonresponse—Average adjusted predicted number of items left blank Inst_South 5.148 0.379 0.315 Inst_North 4.887 0.366 0.375 Difference (Inst_South—Inst_North) 0.260 0.013 0.060 p-value .001 .997 .781 Degrees of freedom 2011 2005 2002 Test power (α = .05) .605 .085 .749 Panel 2: Acquiescence—Average adjusted predicted number of items that respondents agreed with Inst_South 12.726 14.928 22.839 Inst_North 14.162 17.215 24.560 Difference (Inst_South—Inst_North) −1.436 −2.287 −1.721 p-value .000 .000 .000 Degrees of freedom 2011 2005 2002 Test power (α = .05) 1.000 1.000 1.000 Panel 3: Nondifferentiation—Average number of item batteries whose items were answered by choosing the same answer category Inst_South 0.196 0.634 1.149 Inst_North 0.014 0.252 0.478 Difference (Inst_South—Inst_North) 0.182 0.382 0.671 p-value .000 .000 .000 Degrees of freedom 2011 2005 2002 Test power (α = .05) 1.000 1.000 1.000 Survey 2015 Survey 2016 Survey 2017 Panel 1: Item nonresponse—Average adjusted predicted number of items left blank Inst_South 5.148 0.379 0.315 Inst_North 4.887 0.366 0.375 Difference (Inst_South—Inst_North) 0.260 0.013 0.060 p-value .001 .997 .781 Degrees of freedom 2011 2005 2002 Test power (α = .05) .605 .085 .749 Panel 2: Acquiescence—Average adjusted predicted number of items that respondents agreed with Inst_South 12.726 14.928 22.839 Inst_North 14.162 17.215 24.560 Difference (Inst_South—Inst_North) −1.436 −2.287 −1.721 p-value .000 .000 .000 Degrees of freedom 2011 2005 2002 Test power (α = .05) 1.000 1.000 1.000 Panel 3: Nondifferentiation—Average number of item batteries whose items were answered by choosing the same answer category Inst_South 0.196 0.634 1.149 Inst_North 0.014 0.252 0.478 Difference (Inst_South—Inst_North) 0.182 0.382 0.671 p-value .000 .000 .000 Degrees of freedom 2011 2005 2002 Test power (α = .05) 1.000 1.000 1.000 Note. p-values based on two-sided hypotheses, adjusted for each indicator for three comparisons (Sidak). Open in new tab Acquiescence Respondents in the Inst_North survey (2015 = 14.162, 2016 = 17.215, 2017 = 24.560) agreed with significantly more items (p < .001) on average than respondents in the Inst_South survey (2015 = 12.726, 2016 = 14.928, and 2017 = 22.839) in all 3 years (cf., Table 1, Panel 2). Thus, our results reveal that on average Inst_North respondents applied this response style significantly more often than Inst_South respondents in each year. Nondifferentiation The average number of item batteries in which an identical answer option was given for each item in the battery varied between 0.014 (Inst_North, 2015) and 1.149 (Inst_South 2017), meaning that nondifferentiation rarely occurred in either survey agency (cf. Table 1, Panel 3). However, our results showed that in each year, respondents applied this response style significantly more often (p < .001) on average in the surveys conducted by Inst_South (2015 = 0.196, 2016 = 0.634, 2017 = 1.149) than in those conducted by Inst_North (2015 = 0.014, 2016 = 0.252, 2017 = 0.478). Task simplification (NUR) On the basis of all substantial variables of the surveys over the 3 years, we did not find any non-unique records for either survey agency. Comparison of point estimates Figures 1 and 2 depict differences in the mean values of the various energy-related variables and the percent values, respectively, in questions asked by both agencies in every survey year (core questions). Figure 1 shows the differences in mean values for the survey items that measured respondents’ attitudes toward the use of different energy sources. Respondents’ answers to these survey items were registered on a 7-point rating scale ranging from 1 (=absolutely opposed) to 7 (=absolutely in favor). In 20 out of 24 cases, the difference was positive, meaning that Inst_South reported a higher mean value than Inst_North; the largest deviation of mean values amounted to about 1-scale point and related to attitudes towards nuclear energy (cf., Figure 1nuclear energy in 2017). In 10 out of 24 t-tests on the difference between the mean values of the different variables, the t-test, as corrected for multiple comparisons, was significant (cf., Supplementary Table 8). Of the 10 significant differences, five occurred in 2017. This pattern points to a systematic rather than unsystematic difference in the survey results across both survey agencies and results in all three survey years in higher correlations between the attitudes regarding renewable energy sources for Inst_South compared to Inst_North (cf., Supplementary Tables 12–17). Figure 1. Open in new tabDownload slide Seven-point rating scales—differences in mean values of the results from the two survey organizations. Figure 1. Open in new tabDownload slide Seven-point rating scales—differences in mean values of the results from the two survey organizations. Figure 2. Open in new tabDownload slide Dichotomous variables—percentage point differences in “yes” answers. Figure 2. Open in new tabDownload slide Dichotomous variables—percentage point differences in “yes” answers. In five survey items with a dichotomous answer scale, respondents were asked to state whether or not they associate certain aspects (cf., Supplementary Table 9) with the term Energiewende (1 meant “yes” and 0 meant “no”). Figure 2 depicts the differences in the results between the two survey agencies regarding the proportion of respondents who associate a particular aspect with the term Energiewende. In 13 out of 15 cases, Inst_South reported a lower proportion of respondents who associated a particular aspect with the Energiewende compared to Inst_North. The largest difference amounts to nearly 19 percentage points with respect to the proportion of “yes” answers in the agencies (cf., Figure 2, reducing energy consumption in 2016). In 10 out of 15 comparisons, the difference in the proportion of “yes” answers is significantly different between the two agencies (cf., Supplementary Table 9). This pattern also points to a systematic rather than unsystematic difference in the survey results across both survey agencies over the three survey years. Subsequent to stating their understanding of different aspects of the Energiewende, respondents ranked five aspects of the Energiewende according to the importance they attach to each one (1 meant “This is the most important aspect of the Energiewende for me personally” and 5 meant “This is the least important aspect of the Energiewende for me personally”). Supplementary Figure 3 shows the proportion of respondents who ranked a particular aspect as most, second, third, fourth, and least important, respectively. For each of the five aspects, we applied a chi-square test of homogeneity for each year, taking multiple comparisons into account. Supplementary Table 10 shows the results of the 15 tests. We can see that the null hypothesis, according to which each sample has the same proportion of observations regarding a particular aspect, was rejected among three aspects in 2015 (phasing out nuclear power, affordability of electricity, and increasing use of renewable energy sources), four aspects in 2016 (phasing out nuclear power, reducing energy consumption, affordability of electricity, and increasing use of renewable energy sources), and all five aspects in 2017. Hence, the differences among the ranking variables between the two agencies in the three survey years appear to be systematic rather than random. Finally, we also examined agency-specific differences in participants’ responses to seven survey items that asked participants about their awareness of (seven) energy topics. Respondents answers were registered on a trichotomous scale (1 = “No, never heard of it,” 2 = “Yes, heard of it but know nothing or hardly anything about it,” and 3 = “Yes, heard of it and know quite a bit or a lot about it”). Supplementary Figure 4a and 4b depict the distribution of responses over the three categories for each of the seven survey items based on survey year and agency. We performed a chi-square test of homogeneity for each of the trichotomous variables in each year taking the multiple comparisons into account. Supplementary Table 11 shows that the chi-square test of homogeneity rejected the null hypothesis according to which the distribution of the particular trichotomous item was identical across the samples of both agencies in 16 out of 21 cases. Interestingly, in 2015, the chi-square test of homogeneity was only significant among two out of seven trichotomous variables (carbon capture and storage and vehicle-to-grid) while in the other 2 years it was significant for all trichotomous variables. This points to systematic differences between the two survey agencies in 2016 and 2017. Comparison of relationships While the above-mentioned analyses aimed to examine potential differences between point estimates across both agencies, in this section, we turn to the analysis of potential differences in the relationships between variables, as examined by Smith (1982). Supplementary Tables 12–17 report the correlations among all 19 survey items that were measured on a 7-point rating scale. In each of the 3 years, the average deviation of the resulting 190 pairwise correlations (in absolute terms) amounted to 0.11 across both agencies. The null hypothesis that the correlation matrices across both survey agencies are equal was rejected in each year, as the resulting Jennrich chi-square test statistic was significant (p < .001, adjusted for three comparisons). The 3-year specific correlation matrices for Inst_North exhibited very low correlations among all variables. For this agency, the largest positive correlation between the 20 variables amounted to r = .14 in 2016 (cf., Supplementary Table 15) while the largest negative correlation in absolute terms amounted to r = .13 in 2015 (cf., Supplementary Table 13). However, with regard to the correlations between the renewable energy sources, by comparing the point estimates we found that the support for renewable energy sources is systematically higher for respondents of Inst_South compared to Inst_North. Therefore, the correlations for these variables are also systematically higher for Inst_South than for Inst_North. Discussion The results from our study revealed house effects regarding indicators of data quality as well as concerning point estimates and correlations. In line with the findings of the studies by Smith (1978, 1982), the data from the two survey agencies differed in 2015 regarding item nonresponse. Moreover, in all survey years we also detected house-specific differences in acquiescence and nondifferentiation. These house-related differences raise the question of whether interviewer training differs in the two survey agencies to such an extent that this results in different styles of interviewing thus encouraging respondents to a different extent to answer “don’t know,” to respond in the affirmative or to provide very similar answers to the questions. This might be the case, although both survey agencies are members of the business association of private-sector market and social research agencies in Germany (Arbeitskreis Deutscher Markt- und Sozialforschungsinstitute [ADM]) and therefore should follow the ADM guidelines which were developed for, among others, for telephone surveys.8 Since the guidelines for telephone surveys do not include any regulations on how interviewer training should be conducted, it is in principle possible that such training differs perceptibly between ADM member agencies and leads to agency-specific interview styles. The comparison of point estimates revealed systematic house effects with respect to the 7-point rating scales on attitudes regarding the use of energy sources, the dichotomous answer scales on associations with the Energiewende and different mean ranking scores of the five aspects of the Energiewende. This means that the data from the two houses convey a different picture of the understanding and assessment of the Energiewende among German citizens. Furthermore, in most cases the data from the two agencies showed statistically different distributions of the trichotomous scales in every survey year and different developments of the variables over time. This means that the surveys report different levels of awareness of energy topics among the German public as well as different developments of the awareness over time. Finally, the correlation matrices of Inst_South and Inst_North provided different information about the relationships between the respondents’ age and their attitudes toward the use of energy sources, interest in politics, satisfaction with political decisions, attitudes toward technology, and attitudes regarding the vulnerability of nature. The fact that the surveys from Inst_South and Inst_North differ in terms of data quality and also convey different information about support for energy sources, understanding and assessment of the Energiewende, awareness of energy topics and relationships between attitudes toward energy sources, political interest, general values, and age might first (further) undermine confidence in survey agencies that conduct public surveys. Second, depending on which survey results were used, the implementation of the Energiewende might be undertaken differently by political decision makers with perceptibly different consequences.9 One potential limitation of our study might be the fact that our findings on house effects cannot be generalized to all survey agencies in Germany who are members of the ADM, because we selected the two survey agencies according to conscious rules which are probably very close to survey designers’ selection rules. For a generalization of our results, it would have been necessary to consider an appropriate number of survey agencies from the list of all ADM members selected according to a probability based sampling technique. This, however, was not financially feasible in the context of the extensive questionnaires. Despite this limitation regarding the generalizability of our results, the strength of our study is that it addresses the importance of house effects in a longitudinal design by using identical questionnaires and identical interviewer instructions. Conclusions The results of our comparative study showed that house effects can systematically affect the results of repeated public opinion surveys. This leads to the assumption that changing survey agencies for repeated surveys would considerably influence the total survey quality: even if the survey data were accurate, they would become unfit for use, because the data would not be comparable over time. Furthermore, if survey results differ from one survey agency to the next—even if the agencies use identical questionnaires with identical instructions for interviewers and fulfill the same basic parameters and conditions for the administration of the surveys—this will further reduce confidence in agencies conducting public surveys. However, further studies investigating the importance of house effects in repeated surveys are necessary to complement our results and assess the generalizability of our findings. Diana Schumann is a social scientist and senior researcher at Forschungszentrum Juelich GmbH, Institute of Energy and Climate Research, Systems Analysis and Technology Evaluation (IEK-STE). Her main field of research is public perception of energy systems’ transformation. Hawal Shamon is a social scientist working as a senior researcher at Forschungszentrum Juelich GmbH, Institute of Energy and Climate Research, Systems Analysis and Technology Evaluation (IEK-STE). His research is focused on social acceptance of energy systems’ transformation and survey methodology. Jürgen-Friedrich Hake received his diploma in mathematics/physics and holds the position of head of the Institute of Energy and Climate Research, Systems Analysis and Technology Evaluation (IEK-STE). His research and development activities focus on the assessment of energy systems and energy technologies in the context of societal guiding principles, e.g. sustainable development, as well as on the transformation of energy systems. Footnotes 1 Smith (1978) compared results from repeated public opinion surveys (e.g., the National Opinion Research Center’s General Social Survey) conducted by four survey agencies at several points in time. Wright et al. (2014) investigated polling accuracy for multiparty-elections using data from final pre-election polls in New Zealand published by major pollsters for three different election years. 2 The selection of the survey organizations, the mode of data collection, the predefined parameters for data collection and conditions for administering the survey as well as information about the questionnaires, numbers of interviewers, and costs of the surveys are included in the Supplementary Online File. 3 When planning our research design, we followed the suggestion of Tom W. Smith in his 1978 study: “(…) in planning replication studies, close attention should be given to minimizing such possible (procedural) effects by duplicating as far as possible, not just question wording but interviewer specifications, question placement, coding rules, and other features” (Smith, 1978, pp. 458–459). 4 The sampling procedures applied by the survey agencies, the results of our validation of the demographic representativeness of the survey samples as well as the values of AAPOR Cooperation Rate 1, Cooperation Rate 3, Refusal Rate 3, and Contact Rate 3 for both survey agencies are included in the Supplementary Online File. 5 For 2017, item nonresponse was calculated on the basis of 114 instead of 115 items, because one item was an open-text field which was provided for those respondents whose housing situation did not fit the answer categories provided in this item (residual category). 6 Since we only observed 3 years, year was treated as variable with three categories (k = 3). When a categorical variable is defined as a factor variable in Stata (14.2), the statistical program automatically creates dummy variables for (k-1) categories. Furthermore, defining categorical variables as factor variables makes it possible to specify interactions between different categorical variables with parsimonious program code (StataCorp, 2015a). We also estimated models in which we took into account respondents’ sociodemographic characteristics (i.e.,age, education, and gender). However, the results of these models did not differ substantially from the results reported in this study. 7 In the context of the particular model that we estimated, the described analytical strategy is identical to performing, for each year, a t-test on mean comparison of item nonresponse between the two survey agencies and adjusting for multiple comparisons. However, by following the described analytical strategy, alpha adjustment is performed by the statistical program rather than by hand. 8 Cf., ADM website (https://www.adm-ev.de/en/, accessed: 2019-04-02). 9 For example, if political decision makers were to assume on the basis of the results from Inst_North that the German population assesses the affordability of electricity as being not very important for them personally, they might underestimate the role that rising electricity prices might have for citizens’ everyday lives, discussed as “energy poverty” (cf., e.g., Bouzarovski, 2014). Conversely, if on the basis of the results from Inst_South political decision makers were aware that German citizens assess affordability of electricity as being very important for themselves, politicians might realize that the successful implementation of the Energiewende depends, among other aspects, on ensuring affordable electricity for all citizens in Germany. References Abdi H. ( 2007 ). The Bonferonni and Šidák corrections for multiple comparisons. In Salkind N. J. (Ed.), Encyclopedia of measurement and statistics (pp. 103 – 107 ). Thousand Oaks, CA : Sage . Google Preview WorldCat COPAC Biemer P. P. ( 2010 ). Total survey error: Design, implementation, and evaluation . Public Opinion Quarterly , 74 ( 5 ), 817 – 848 . Google Scholar Crossref Search ADS WorldCat Blasius J. , Thiessen V. ( 2015 ). Should we trust survey data? Assessing response simplification and data fabrication . Social Science Research , 52 , 479 – 493 . Google Scholar Crossref Search ADS PubMed WorldCat Bouzarovski S. ( 2014 ). Energy poverty in the European Union: Landscapes of vulnerability . Wiley Interdisciplinary Reviews: Energy and Environment , 3 ( 3 ), 276 – 289 . Google Scholar Crossref Search ADS WorldCat Converse P. E. , Traugott M. W. ( 1986 ). Assessing the accuracy of polls and surveys . Science , 234 ( 4780 ), 1094 – 1098 . Google Scholar Crossref Search ADS PubMed WorldCat Erikson R. S. , Wlezien C. ( 2001 ). Presidential polls as a time series: The case of 1996 . Public Opinion Quarterly , 63 ( 2 ), 163 – 177 . Google Scholar Crossref Search ADS WorldCat Flores A. R. ( 2015 ). Examining variation in surveying attitudes on same-sex marriage: A meta-analysis . Public Opinion Quarterly , 79 ( 2 ), 580 – 593 . Google Scholar Crossref Search ADS WorldCat Jennrich R. I. ( 1970 ). An asymptotic χ2 test for the equality of two correlation matrices . Journal of the American Statistical Association , 65 ( 330 ), 904 – 912 . WorldCat Krosnick J. A. ( 1991 ). Response strategies for coping with the cognitive demands of attitude measures in surveys . Applied Cognitive Psychology , 5 ( 3 ), 213 – 236 . Google Scholar Crossref Search ADS WorldCat Krosnick J. A. ( 1999 ). Survey research . Annual Review of Psychology , 50 , 537 – 567 . Google Scholar Crossref Search ADS PubMed WorldCat Krosnick J. A. ( 2000 ). The threat of satisficing in surveys: The shortcuts respondents take in answering questions . Survey Methods Newsletter , 20 ( 1 ), 4 – 8 . WorldCat Lau R. R. ( 1994 ). An analysis of the accuracy of "trial heat" polls during the 1992 presidential election . Public Opinion Quarterly , 58 ( 1 ), 2 – 20 . Google Scholar Crossref Search ADS WorldCat Mardia K. V. , Kent J. T. , Bibby J. M. ( 1979 ). Multivariate analysis . London : Academic Press . Google Preview WorldCat COPAC Medway R. L. , Tourangeau R. ( 2015 ). Response quality in telephone surveys: Do prepaid cash incentives make a difference? Public Opinion Quarterly , 79 ( 2 ), 524 – 543 . Google Scholar Crossref Search ADS WorldCat Slomczynski K. M. , Powalko P. , Krauze T. ( 2017 ). Non-unique records in international survey projects: The need for extending data quality control . Survey Research Methods , 11 ( 1 ), 16 . Smith T. W. ( 1978 ). In search of house effects: A comparison of responses to various questions by different survey organizations . Public Opinion Quarterly , 42 ( 4 ), 443 – 463 . Google Scholar Crossref Search ADS WorldCat Smith T. W. ( 1982 ). House effects and the reproducibility of survey measurements: A comparison of the 1980 GSS and the 1980 American National Election Study . Public Opinion Quarterly , 46 ( 1 ), 54 – 68 . Google Scholar Crossref Search ADS WorldCat Smith T. W. ( 2011 ). Refining the total survey error perspective . International Journal of Public Opinion Research , 23 ( 4 ), 464 – 484 . Google Scholar Crossref Search ADS WorldCat StataCorp. ( 2015a ). Stata 14. Base reference manual. 11.4.3 factor variables . College Station, TX : Stata Press . WorldCat COPAC StataCorp. ( 2015b ). Stata 14. Base reference manual. Margins postestimation – postestimation tools for margins . College Station, TX : Stata Press . WorldCat COPAC Traugott M. W. ( 2005 ). The accuracy of the national preelection polls in the 2004 presidential election . Public Opinion Quarterly , 69 ( 5 ), 642 – 654 . Google Scholar Crossref Search ADS WorldCat Weisberg H. F. ( 2005 ). The total survey error approach . Chicago : The University of Chicago Press . Google Scholar Crossref Search ADS Google Preview WorldCat COPAC Wlezien C. , Erikson R. S. ( 2007 ). The horse race: What polls reveal as the election campaign unfolds . International Journal of Public Opinion Research , 19 ( 1 ), 74 – 88 . Google Scholar Crossref Search ADS WorldCat Wright M. J. , Farrar D. P. , Russell D. F. ( 2014 ). Polling accuracy in a multiparty election . International Journal of Public Opinion Research , 26 ( 1 ), 113 – 124 . Google Scholar Crossref Search ADS WorldCat © The Author(s) 2019. Published by Oxford University Press on behalf of The World Association for Public Opinion Research. All rights reserved. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Public Opinion Research Oxford University Press

The Importance of House Effects for Repeated Public Opinion Surveys

Loading next page...
 
/lp/oxford-university-press/the-importance-of-house-effects-for-repeated-public-opinion-surveys-H2K99eHsvP

References (21)

Publisher
Oxford University Press
Copyright
© The Author(s) 2019. Published by Oxford University Press on behalf of The World Association for Public Opinion Research. All rights reserved.
ISSN
0954-2892
eISSN
1471-6909
DOI
10.1093/ijpor/edz039
Publisher site
See Article on Publisher Site

Abstract

Results of public opinion surveys on the same topic can diverge for various reasons, for example, different survey timings, different operationalizations of the objects of investigation, different target populations, or the fact that the surveys are conducted by different survey agencies (“houses”). The latter phenomenon is conventionally referred to as “house effects” (Smith, 1978, 1982, 2011; Weisberg, 2005), which can occur even if the survey agencies use identical question wordings or target populations (cf., Converse & Traugott, 1986; Erikson & Wlezien, 2001; Flores, 2015; Lau, 1994; Traugott, 2005; Wlezien & Erikson, 2007). However, although the existence of house-related differences in survey results has been well known in public opinion research since the two studies by Tom W. Smith from 1978 and 1982, respectively (Smith, 1978, 1982), little work has been done on the relevance of house effects in repeated surveys. Furthermore, the studies that investigated house effects in repeated surveys were based on secondary data which were primarily collected to measure public opinion on certain topics rather than to address methodological research questions (Smith, 1978; Wright, Farrar, & Russell, 2014).1 Thus, in these studies, it was impossible to control for factors influencing differences between houses such as interviewer instructions, sampling procedures, order and position of questions in the questionnaire or basic parameters to be fulfilled by the houses during the field work. Against this background, we aim to contribute to resolving the issue of house effects by using primary data to address the methodological research question of whether house effects systematically affect the results of repeated public opinion surveys. We simultaneously commissioned two different survey agencies—one located in the south of Germany (and thus hereinafter referred to as “Inst_South”), the other in the north of Germany2 (and thus hereinafter referred to as “Inst_North”)—with the administration of computer-assisted telephone interviews (CATI) in Germany in 2015, 2016, and 2017. In each of the 3 years, the survey agencies received an identical questionnaire on respondents’ perceptions of the German energy transition (known as the Energiewende), identical interviewer as well as sampling instructions, an identical set of basic parameters to be fulfilled during the fielding (cf., Supplementary Online File), and were not informed about each other in order to ensure the meaningfulness of our analysis. To the best of our knowledge, house effects have not been examined up to now in repeated public opinion surveys conducted over 3 years by the same two survey agencies using identical questionnaires and identical interviewer and sampling instructions.3 This standardization allows us to examine whether any differences in the survey results between the two agencies at a specific point in time are replicated over the other two time points; this factor is expected to increase confidence in our analysis. The aim of our analysis is to explore which “equivalence problems” (Weisberg, 2005, p. 299) or “comparison errors” (Smith, 2011, p. 475) might be caused by house effects in repeated surveys. This is particularly important for assessing the total survey quality, that is, “quality from both the producer and user perspectives” (Biemer, 2010, p. 818). If data from repeated surveys are not comparable, they lack total survey quality even if they are accurate, because they are unfit for use (Biemer, 2010). Our analyses should provide insights for survey designers as to whether changing survey agencies for a repeated survey undermines the comparability of survey results over time and thereby diminishes the total survey quality. Methods In all three survey years, Inst_South completed 1,000 interviews. American Association for Public Opinion Research (AAPOR) Response Rate 5 for the surveys was, for Inst_South, 23.8% (2015), 23.0% (2016) and 23.4% (2017). Inst_North completed 1013 (2015), 1007 (2016) and 1004 (2017) interviews, respectively. The AAPOR Response Rate 5 was, for Inst_North, 21.5% (2015), 18.4% (2015), and 18.7% (2017). Thus, in all 3 years, Response Rate 5 was higher and over time more stable for Inst_South than for Inst_North.4 In order to examine possible trends in house effects, we compared two indicators of data quality, that is, survey satisficing and task simplification as well as point estimates and correlations across the two survey agencies for the 3 years 2015, 2016, and 2017. Survey Satisficing To measure survey satisficing, we chose three indicators: item nonresponse, acquiescence, and nondifferentiation (cf., Krosnick, 1991, 1999, 2000; Medway & Tourangeau, 2015). Item nonresponse was analyzed on the basis of all substantive variables in the survey (2015: 111 items, 2016: 115 items, and 2017: 114 items).5 For each respondent, we calculated the number of items that he or she left blank or where he or she chose the “Don’t know” option. On the basis of all the observations (n = 6,024), we regressed (ordinary least squares) the total number of item nonresponses on a factor variable for “year” and a factor variable for “survey agency,” specifying an interaction between the two factor variables.6 Subsequent to the estimation of this model, we compared the average adjusted predictions of the survey agencies for each year using Stata’s “margins, contrast” routine (StataCorp, 2015b), where we accounted for multiple comparisons by applying the Sidak alpha correction method (Abdi, 2007).7 This analytical strategy was also applied to the following two indicators (i.e., acquiescence and nondifferentiation). Acquiescence was measured on the basis of all items from the rating scales in a survey (2015: 41 items; 2016: 39 items; 2017: 61 items). For each survey agency and year, we counted the total number of responses to rating scales in the questionnaire where respondents agreed to the prescribed answer by choosing scale points 1–3 on the 7-point answer scale. In order to analyze acquiescence, we applied the same analytical strategy as with item nonresponse. Nondifferentiation was measured on the basis of all five-item batteries in a survey (2015: 83 items in 5-item batteries; 2016: 86 items in 7-item batteries; 2017: 64 items in 10-item batteries). For each of the item batteries, we flagged respondents who chose an identical answer option for all items in a respective item battery. For each respondent, we calculated the number of item batteries in which he or she answered items using an identical answer option. This variable can take values between zero (respondents applied nondifferentiation to none of the five-item batteries) and five (respondents applied nondifferentiation to all of the five-item batteries). Task Simplification Employees of a survey agency might be motivated to “reduce [the] time and effort necessary to complete interviews or to realize the required sample size” (in a process known as task simplification; Blasius & Thiessen, 2015, p. 481), for example, “by fabricating interviews through copy-and-paste procedures” (Blasius & Thiessen, 2015, p. 480). To measure task simplification, we screened the dataset for nonunique records (NUR) as applied by Slomczynski, Powalko, and Krauze (2017). Slomczynski et al. (2017) defined NUR as a sequence of all values in a given case (record), which is identical to that of another case in the same dataset. For each survey agency, NUR was generated on the basis of all substantial items in a survey year (2015: 105 items; 2016: 109 items; 2017: 107 items). In other words, variables capturing sociodemographic characteristics were not included in the analysis. Comparisons of Point Estimates Furthermore, we compared point estimates of the variables of the energy-related core questions, that is, questions posed in all survey years using identical wordings, for the three survey years. These questions measured the respondents’ attitudes toward the use of energy sources, their understanding of different aspects of the Energiewende, their ranking of these aspects, and their awareness of energy topics. In order to examine the variables with a 7-point rating scale, we applied two-sample t-tests testing the null hypothesis that the mean value of a particular variable in a year does not differ between the two survey agencies. For the dichotomous variables, we calculated Phi values based on chi-square statistics. Among the ranking variables, as well as trichotomous variables, we applied a chi-square test of homogeneity testing the null hypothesis that the distribution of a particular variable is identical across the samples of both agencies. In all of these tests, we accounted for multiple comparisons by applying alpha adjustment according to Sidak (Abdi, 2007). Comparison of Relationships For each survey agency and year, we calculated the correlations (Pearson) between respondents’ age and all 19 survey items that were measured on a 7-point rating scale. These items measured respondents’ attitudes toward the use of energy sources, interest in politics, satisfaction with political decisions, attitudes toward technology, and attitudes regarding the vulnerability of nature. Furthermore, for each year, we performed an asymptotic chi-square test for the equality of the two agency-specific correlation matrices introduced by Jennrich (1970) and proposed by Mardia, Kent and Bibby (1979). The null hypothesis of this test assumes equal correlation matrices across both survey agencies and was rejected in each year as the resulting Jennrich chi-square test statistic is significant (p < .001, adjusted for three comparisons). Results Survey Satisficing Item nonresponse On average, participants in the survey conducted by Inst_South left 5,148 items blank in 2015, while participants in the survey conducted by Inst_North left 4,887 items blank (cf. Table 1, Panel 1). However, this tendency does not hold for all three survey years (Inst_South: 2016 = 0.379, 2017 = 0.315; Inst_North: 2016 = 0.366, 2017 = 0.375). While average item nonresponse in 2015 was significantly higher (p < .01) for Inst_South compared to Inst_North, this indicator did not differ significantly between the agencies in 2016 and 2017 (p > .05). Table 1. Indicators of Survey Satisficing Survey 2015 Survey 2016 Survey 2017 Panel 1: Item nonresponse—Average adjusted predicted number of items left blank Inst_South 5.148 0.379 0.315 Inst_North 4.887 0.366 0.375 Difference (Inst_South—Inst_North) 0.260 0.013 0.060 p-value .001 .997 .781 Degrees of freedom 2011 2005 2002 Test power (α = .05) .605 .085 .749 Panel 2: Acquiescence—Average adjusted predicted number of items that respondents agreed with Inst_South 12.726 14.928 22.839 Inst_North 14.162 17.215 24.560 Difference (Inst_South—Inst_North) −1.436 −2.287 −1.721 p-value .000 .000 .000 Degrees of freedom 2011 2005 2002 Test power (α = .05) 1.000 1.000 1.000 Panel 3: Nondifferentiation—Average number of item batteries whose items were answered by choosing the same answer category Inst_South 0.196 0.634 1.149 Inst_North 0.014 0.252 0.478 Difference (Inst_South—Inst_North) 0.182 0.382 0.671 p-value .000 .000 .000 Degrees of freedom 2011 2005 2002 Test power (α = .05) 1.000 1.000 1.000 Survey 2015 Survey 2016 Survey 2017 Panel 1: Item nonresponse—Average adjusted predicted number of items left blank Inst_South 5.148 0.379 0.315 Inst_North 4.887 0.366 0.375 Difference (Inst_South—Inst_North) 0.260 0.013 0.060 p-value .001 .997 .781 Degrees of freedom 2011 2005 2002 Test power (α = .05) .605 .085 .749 Panel 2: Acquiescence—Average adjusted predicted number of items that respondents agreed with Inst_South 12.726 14.928 22.839 Inst_North 14.162 17.215 24.560 Difference (Inst_South—Inst_North) −1.436 −2.287 −1.721 p-value .000 .000 .000 Degrees of freedom 2011 2005 2002 Test power (α = .05) 1.000 1.000 1.000 Panel 3: Nondifferentiation—Average number of item batteries whose items were answered by choosing the same answer category Inst_South 0.196 0.634 1.149 Inst_North 0.014 0.252 0.478 Difference (Inst_South—Inst_North) 0.182 0.382 0.671 p-value .000 .000 .000 Degrees of freedom 2011 2005 2002 Test power (α = .05) 1.000 1.000 1.000 Note. p-values based on two-sided hypotheses, adjusted for each indicator for three comparisons (Sidak). Open in new tab Table 1. Indicators of Survey Satisficing Survey 2015 Survey 2016 Survey 2017 Panel 1: Item nonresponse—Average adjusted predicted number of items left blank Inst_South 5.148 0.379 0.315 Inst_North 4.887 0.366 0.375 Difference (Inst_South—Inst_North) 0.260 0.013 0.060 p-value .001 .997 .781 Degrees of freedom 2011 2005 2002 Test power (α = .05) .605 .085 .749 Panel 2: Acquiescence—Average adjusted predicted number of items that respondents agreed with Inst_South 12.726 14.928 22.839 Inst_North 14.162 17.215 24.560 Difference (Inst_South—Inst_North) −1.436 −2.287 −1.721 p-value .000 .000 .000 Degrees of freedom 2011 2005 2002 Test power (α = .05) 1.000 1.000 1.000 Panel 3: Nondifferentiation—Average number of item batteries whose items were answered by choosing the same answer category Inst_South 0.196 0.634 1.149 Inst_North 0.014 0.252 0.478 Difference (Inst_South—Inst_North) 0.182 0.382 0.671 p-value .000 .000 .000 Degrees of freedom 2011 2005 2002 Test power (α = .05) 1.000 1.000 1.000 Survey 2015 Survey 2016 Survey 2017 Panel 1: Item nonresponse—Average adjusted predicted number of items left blank Inst_South 5.148 0.379 0.315 Inst_North 4.887 0.366 0.375 Difference (Inst_South—Inst_North) 0.260 0.013 0.060 p-value .001 .997 .781 Degrees of freedom 2011 2005 2002 Test power (α = .05) .605 .085 .749 Panel 2: Acquiescence—Average adjusted predicted number of items that respondents agreed with Inst_South 12.726 14.928 22.839 Inst_North 14.162 17.215 24.560 Difference (Inst_South—Inst_North) −1.436 −2.287 −1.721 p-value .000 .000 .000 Degrees of freedom 2011 2005 2002 Test power (α = .05) 1.000 1.000 1.000 Panel 3: Nondifferentiation—Average number of item batteries whose items were answered by choosing the same answer category Inst_South 0.196 0.634 1.149 Inst_North 0.014 0.252 0.478 Difference (Inst_South—Inst_North) 0.182 0.382 0.671 p-value .000 .000 .000 Degrees of freedom 2011 2005 2002 Test power (α = .05) 1.000 1.000 1.000 Note. p-values based on two-sided hypotheses, adjusted for each indicator for three comparisons (Sidak). Open in new tab Acquiescence Respondents in the Inst_North survey (2015 = 14.162, 2016 = 17.215, 2017 = 24.560) agreed with significantly more items (p < .001) on average than respondents in the Inst_South survey (2015 = 12.726, 2016 = 14.928, and 2017 = 22.839) in all 3 years (cf., Table 1, Panel 2). Thus, our results reveal that on average Inst_North respondents applied this response style significantly more often than Inst_South respondents in each year. Nondifferentiation The average number of item batteries in which an identical answer option was given for each item in the battery varied between 0.014 (Inst_North, 2015) and 1.149 (Inst_South 2017), meaning that nondifferentiation rarely occurred in either survey agency (cf. Table 1, Panel 3). However, our results showed that in each year, respondents applied this response style significantly more often (p < .001) on average in the surveys conducted by Inst_South (2015 = 0.196, 2016 = 0.634, 2017 = 1.149) than in those conducted by Inst_North (2015 = 0.014, 2016 = 0.252, 2017 = 0.478). Task simplification (NUR) On the basis of all substantial variables of the surveys over the 3 years, we did not find any non-unique records for either survey agency. Comparison of point estimates Figures 1 and 2 depict differences in the mean values of the various energy-related variables and the percent values, respectively, in questions asked by both agencies in every survey year (core questions). Figure 1 shows the differences in mean values for the survey items that measured respondents’ attitudes toward the use of different energy sources. Respondents’ answers to these survey items were registered on a 7-point rating scale ranging from 1 (=absolutely opposed) to 7 (=absolutely in favor). In 20 out of 24 cases, the difference was positive, meaning that Inst_South reported a higher mean value than Inst_North; the largest deviation of mean values amounted to about 1-scale point and related to attitudes towards nuclear energy (cf., Figure 1nuclear energy in 2017). In 10 out of 24 t-tests on the difference between the mean values of the different variables, the t-test, as corrected for multiple comparisons, was significant (cf., Supplementary Table 8). Of the 10 significant differences, five occurred in 2017. This pattern points to a systematic rather than unsystematic difference in the survey results across both survey agencies and results in all three survey years in higher correlations between the attitudes regarding renewable energy sources for Inst_South compared to Inst_North (cf., Supplementary Tables 12–17). Figure 1. Open in new tabDownload slide Seven-point rating scales—differences in mean values of the results from the two survey organizations. Figure 1. Open in new tabDownload slide Seven-point rating scales—differences in mean values of the results from the two survey organizations. Figure 2. Open in new tabDownload slide Dichotomous variables—percentage point differences in “yes” answers. Figure 2. Open in new tabDownload slide Dichotomous variables—percentage point differences in “yes” answers. In five survey items with a dichotomous answer scale, respondents were asked to state whether or not they associate certain aspects (cf., Supplementary Table 9) with the term Energiewende (1 meant “yes” and 0 meant “no”). Figure 2 depicts the differences in the results between the two survey agencies regarding the proportion of respondents who associate a particular aspect with the term Energiewende. In 13 out of 15 cases, Inst_South reported a lower proportion of respondents who associated a particular aspect with the Energiewende compared to Inst_North. The largest difference amounts to nearly 19 percentage points with respect to the proportion of “yes” answers in the agencies (cf., Figure 2, reducing energy consumption in 2016). In 10 out of 15 comparisons, the difference in the proportion of “yes” answers is significantly different between the two agencies (cf., Supplementary Table 9). This pattern also points to a systematic rather than unsystematic difference in the survey results across both survey agencies over the three survey years. Subsequent to stating their understanding of different aspects of the Energiewende, respondents ranked five aspects of the Energiewende according to the importance they attach to each one (1 meant “This is the most important aspect of the Energiewende for me personally” and 5 meant “This is the least important aspect of the Energiewende for me personally”). Supplementary Figure 3 shows the proportion of respondents who ranked a particular aspect as most, second, third, fourth, and least important, respectively. For each of the five aspects, we applied a chi-square test of homogeneity for each year, taking multiple comparisons into account. Supplementary Table 10 shows the results of the 15 tests. We can see that the null hypothesis, according to which each sample has the same proportion of observations regarding a particular aspect, was rejected among three aspects in 2015 (phasing out nuclear power, affordability of electricity, and increasing use of renewable energy sources), four aspects in 2016 (phasing out nuclear power, reducing energy consumption, affordability of electricity, and increasing use of renewable energy sources), and all five aspects in 2017. Hence, the differences among the ranking variables between the two agencies in the three survey years appear to be systematic rather than random. Finally, we also examined agency-specific differences in participants’ responses to seven survey items that asked participants about their awareness of (seven) energy topics. Respondents answers were registered on a trichotomous scale (1 = “No, never heard of it,” 2 = “Yes, heard of it but know nothing or hardly anything about it,” and 3 = “Yes, heard of it and know quite a bit or a lot about it”). Supplementary Figure 4a and 4b depict the distribution of responses over the three categories for each of the seven survey items based on survey year and agency. We performed a chi-square test of homogeneity for each of the trichotomous variables in each year taking the multiple comparisons into account. Supplementary Table 11 shows that the chi-square test of homogeneity rejected the null hypothesis according to which the distribution of the particular trichotomous item was identical across the samples of both agencies in 16 out of 21 cases. Interestingly, in 2015, the chi-square test of homogeneity was only significant among two out of seven trichotomous variables (carbon capture and storage and vehicle-to-grid) while in the other 2 years it was significant for all trichotomous variables. This points to systematic differences between the two survey agencies in 2016 and 2017. Comparison of relationships While the above-mentioned analyses aimed to examine potential differences between point estimates across both agencies, in this section, we turn to the analysis of potential differences in the relationships between variables, as examined by Smith (1982). Supplementary Tables 12–17 report the correlations among all 19 survey items that were measured on a 7-point rating scale. In each of the 3 years, the average deviation of the resulting 190 pairwise correlations (in absolute terms) amounted to 0.11 across both agencies. The null hypothesis that the correlation matrices across both survey agencies are equal was rejected in each year, as the resulting Jennrich chi-square test statistic was significant (p < .001, adjusted for three comparisons). The 3-year specific correlation matrices for Inst_North exhibited very low correlations among all variables. For this agency, the largest positive correlation between the 20 variables amounted to r = .14 in 2016 (cf., Supplementary Table 15) while the largest negative correlation in absolute terms amounted to r = .13 in 2015 (cf., Supplementary Table 13). However, with regard to the correlations between the renewable energy sources, by comparing the point estimates we found that the support for renewable energy sources is systematically higher for respondents of Inst_South compared to Inst_North. Therefore, the correlations for these variables are also systematically higher for Inst_South than for Inst_North. Discussion The results from our study revealed house effects regarding indicators of data quality as well as concerning point estimates and correlations. In line with the findings of the studies by Smith (1978, 1982), the data from the two survey agencies differed in 2015 regarding item nonresponse. Moreover, in all survey years we also detected house-specific differences in acquiescence and nondifferentiation. These house-related differences raise the question of whether interviewer training differs in the two survey agencies to such an extent that this results in different styles of interviewing thus encouraging respondents to a different extent to answer “don’t know,” to respond in the affirmative or to provide very similar answers to the questions. This might be the case, although both survey agencies are members of the business association of private-sector market and social research agencies in Germany (Arbeitskreis Deutscher Markt- und Sozialforschungsinstitute [ADM]) and therefore should follow the ADM guidelines which were developed for, among others, for telephone surveys.8 Since the guidelines for telephone surveys do not include any regulations on how interviewer training should be conducted, it is in principle possible that such training differs perceptibly between ADM member agencies and leads to agency-specific interview styles. The comparison of point estimates revealed systematic house effects with respect to the 7-point rating scales on attitudes regarding the use of energy sources, the dichotomous answer scales on associations with the Energiewende and different mean ranking scores of the five aspects of the Energiewende. This means that the data from the two houses convey a different picture of the understanding and assessment of the Energiewende among German citizens. Furthermore, in most cases the data from the two agencies showed statistically different distributions of the trichotomous scales in every survey year and different developments of the variables over time. This means that the surveys report different levels of awareness of energy topics among the German public as well as different developments of the awareness over time. Finally, the correlation matrices of Inst_South and Inst_North provided different information about the relationships between the respondents’ age and their attitudes toward the use of energy sources, interest in politics, satisfaction with political decisions, attitudes toward technology, and attitudes regarding the vulnerability of nature. The fact that the surveys from Inst_South and Inst_North differ in terms of data quality and also convey different information about support for energy sources, understanding and assessment of the Energiewende, awareness of energy topics and relationships between attitudes toward energy sources, political interest, general values, and age might first (further) undermine confidence in survey agencies that conduct public surveys. Second, depending on which survey results were used, the implementation of the Energiewende might be undertaken differently by political decision makers with perceptibly different consequences.9 One potential limitation of our study might be the fact that our findings on house effects cannot be generalized to all survey agencies in Germany who are members of the ADM, because we selected the two survey agencies according to conscious rules which are probably very close to survey designers’ selection rules. For a generalization of our results, it would have been necessary to consider an appropriate number of survey agencies from the list of all ADM members selected according to a probability based sampling technique. This, however, was not financially feasible in the context of the extensive questionnaires. Despite this limitation regarding the generalizability of our results, the strength of our study is that it addresses the importance of house effects in a longitudinal design by using identical questionnaires and identical interviewer instructions. Conclusions The results of our comparative study showed that house effects can systematically affect the results of repeated public opinion surveys. This leads to the assumption that changing survey agencies for repeated surveys would considerably influence the total survey quality: even if the survey data were accurate, they would become unfit for use, because the data would not be comparable over time. Furthermore, if survey results differ from one survey agency to the next—even if the agencies use identical questionnaires with identical instructions for interviewers and fulfill the same basic parameters and conditions for the administration of the surveys—this will further reduce confidence in agencies conducting public surveys. However, further studies investigating the importance of house effects in repeated surveys are necessary to complement our results and assess the generalizability of our findings. Diana Schumann is a social scientist and senior researcher at Forschungszentrum Juelich GmbH, Institute of Energy and Climate Research, Systems Analysis and Technology Evaluation (IEK-STE). Her main field of research is public perception of energy systems’ transformation. Hawal Shamon is a social scientist working as a senior researcher at Forschungszentrum Juelich GmbH, Institute of Energy and Climate Research, Systems Analysis and Technology Evaluation (IEK-STE). His research is focused on social acceptance of energy systems’ transformation and survey methodology. Jürgen-Friedrich Hake received his diploma in mathematics/physics and holds the position of head of the Institute of Energy and Climate Research, Systems Analysis and Technology Evaluation (IEK-STE). His research and development activities focus on the assessment of energy systems and energy technologies in the context of societal guiding principles, e.g. sustainable development, as well as on the transformation of energy systems. Footnotes 1 Smith (1978) compared results from repeated public opinion surveys (e.g., the National Opinion Research Center’s General Social Survey) conducted by four survey agencies at several points in time. Wright et al. (2014) investigated polling accuracy for multiparty-elections using data from final pre-election polls in New Zealand published by major pollsters for three different election years. 2 The selection of the survey organizations, the mode of data collection, the predefined parameters for data collection and conditions for administering the survey as well as information about the questionnaires, numbers of interviewers, and costs of the surveys are included in the Supplementary Online File. 3 When planning our research design, we followed the suggestion of Tom W. Smith in his 1978 study: “(…) in planning replication studies, close attention should be given to minimizing such possible (procedural) effects by duplicating as far as possible, not just question wording but interviewer specifications, question placement, coding rules, and other features” (Smith, 1978, pp. 458–459). 4 The sampling procedures applied by the survey agencies, the results of our validation of the demographic representativeness of the survey samples as well as the values of AAPOR Cooperation Rate 1, Cooperation Rate 3, Refusal Rate 3, and Contact Rate 3 for both survey agencies are included in the Supplementary Online File. 5 For 2017, item nonresponse was calculated on the basis of 114 instead of 115 items, because one item was an open-text field which was provided for those respondents whose housing situation did not fit the answer categories provided in this item (residual category). 6 Since we only observed 3 years, year was treated as variable with three categories (k = 3). When a categorical variable is defined as a factor variable in Stata (14.2), the statistical program automatically creates dummy variables for (k-1) categories. Furthermore, defining categorical variables as factor variables makes it possible to specify interactions between different categorical variables with parsimonious program code (StataCorp, 2015a). We also estimated models in which we took into account respondents’ sociodemographic characteristics (i.e.,age, education, and gender). However, the results of these models did not differ substantially from the results reported in this study. 7 In the context of the particular model that we estimated, the described analytical strategy is identical to performing, for each year, a t-test on mean comparison of item nonresponse between the two survey agencies and adjusting for multiple comparisons. However, by following the described analytical strategy, alpha adjustment is performed by the statistical program rather than by hand. 8 Cf., ADM website (https://www.adm-ev.de/en/, accessed: 2019-04-02). 9 For example, if political decision makers were to assume on the basis of the results from Inst_North that the German population assesses the affordability of electricity as being not very important for them personally, they might underestimate the role that rising electricity prices might have for citizens’ everyday lives, discussed as “energy poverty” (cf., e.g., Bouzarovski, 2014). Conversely, if on the basis of the results from Inst_South political decision makers were aware that German citizens assess affordability of electricity as being very important for themselves, politicians might realize that the successful implementation of the Energiewende depends, among other aspects, on ensuring affordable electricity for all citizens in Germany. References Abdi H. ( 2007 ). The Bonferonni and Šidák corrections for multiple comparisons. In Salkind N. J. (Ed.), Encyclopedia of measurement and statistics (pp. 103 – 107 ). Thousand Oaks, CA : Sage . Google Preview WorldCat COPAC Biemer P. P. ( 2010 ). Total survey error: Design, implementation, and evaluation . Public Opinion Quarterly , 74 ( 5 ), 817 – 848 . Google Scholar Crossref Search ADS WorldCat Blasius J. , Thiessen V. ( 2015 ). Should we trust survey data? Assessing response simplification and data fabrication . Social Science Research , 52 , 479 – 493 . Google Scholar Crossref Search ADS PubMed WorldCat Bouzarovski S. ( 2014 ). Energy poverty in the European Union: Landscapes of vulnerability . Wiley Interdisciplinary Reviews: Energy and Environment , 3 ( 3 ), 276 – 289 . Google Scholar Crossref Search ADS WorldCat Converse P. E. , Traugott M. W. ( 1986 ). Assessing the accuracy of polls and surveys . Science , 234 ( 4780 ), 1094 – 1098 . Google Scholar Crossref Search ADS PubMed WorldCat Erikson R. S. , Wlezien C. ( 2001 ). Presidential polls as a time series: The case of 1996 . Public Opinion Quarterly , 63 ( 2 ), 163 – 177 . Google Scholar Crossref Search ADS WorldCat Flores A. R. ( 2015 ). Examining variation in surveying attitudes on same-sex marriage: A meta-analysis . Public Opinion Quarterly , 79 ( 2 ), 580 – 593 . Google Scholar Crossref Search ADS WorldCat Jennrich R. I. ( 1970 ). An asymptotic χ2 test for the equality of two correlation matrices . Journal of the American Statistical Association , 65 ( 330 ), 904 – 912 . WorldCat Krosnick J. A. ( 1991 ). Response strategies for coping with the cognitive demands of attitude measures in surveys . Applied Cognitive Psychology , 5 ( 3 ), 213 – 236 . Google Scholar Crossref Search ADS WorldCat Krosnick J. A. ( 1999 ). Survey research . Annual Review of Psychology , 50 , 537 – 567 . Google Scholar Crossref Search ADS PubMed WorldCat Krosnick J. A. ( 2000 ). The threat of satisficing in surveys: The shortcuts respondents take in answering questions . Survey Methods Newsletter , 20 ( 1 ), 4 – 8 . WorldCat Lau R. R. ( 1994 ). An analysis of the accuracy of "trial heat" polls during the 1992 presidential election . Public Opinion Quarterly , 58 ( 1 ), 2 – 20 . Google Scholar Crossref Search ADS WorldCat Mardia K. V. , Kent J. T. , Bibby J. M. ( 1979 ). Multivariate analysis . London : Academic Press . Google Preview WorldCat COPAC Medway R. L. , Tourangeau R. ( 2015 ). Response quality in telephone surveys: Do prepaid cash incentives make a difference? Public Opinion Quarterly , 79 ( 2 ), 524 – 543 . Google Scholar Crossref Search ADS WorldCat Slomczynski K. M. , Powalko P. , Krauze T. ( 2017 ). Non-unique records in international survey projects: The need for extending data quality control . Survey Research Methods , 11 ( 1 ), 16 . Smith T. W. ( 1978 ). In search of house effects: A comparison of responses to various questions by different survey organizations . Public Opinion Quarterly , 42 ( 4 ), 443 – 463 . Google Scholar Crossref Search ADS WorldCat Smith T. W. ( 1982 ). House effects and the reproducibility of survey measurements: A comparison of the 1980 GSS and the 1980 American National Election Study . Public Opinion Quarterly , 46 ( 1 ), 54 – 68 . Google Scholar Crossref Search ADS WorldCat Smith T. W. ( 2011 ). Refining the total survey error perspective . International Journal of Public Opinion Research , 23 ( 4 ), 464 – 484 . Google Scholar Crossref Search ADS WorldCat StataCorp. ( 2015a ). Stata 14. Base reference manual. 11.4.3 factor variables . College Station, TX : Stata Press . WorldCat COPAC StataCorp. ( 2015b ). Stata 14. Base reference manual. Margins postestimation – postestimation tools for margins . College Station, TX : Stata Press . WorldCat COPAC Traugott M. W. ( 2005 ). The accuracy of the national preelection polls in the 2004 presidential election . Public Opinion Quarterly , 69 ( 5 ), 642 – 654 . Google Scholar Crossref Search ADS WorldCat Weisberg H. F. ( 2005 ). The total survey error approach . Chicago : The University of Chicago Press . Google Scholar Crossref Search ADS Google Preview WorldCat COPAC Wlezien C. , Erikson R. S. ( 2007 ). The horse race: What polls reveal as the election campaign unfolds . International Journal of Public Opinion Research , 19 ( 1 ), 74 – 88 . Google Scholar Crossref Search ADS WorldCat Wright M. J. , Farrar D. P. , Russell D. F. ( 2014 ). Polling accuracy in a multiparty election . International Journal of Public Opinion Research , 26 ( 1 ), 113 – 124 . Google Scholar Crossref Search ADS WorldCat © The Author(s) 2019. Published by Oxford University Press on behalf of The World Association for Public Opinion Research. All rights reserved. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Journal

International Journal of Public Opinion ResearchOxford University Press

Published: Mar 1, 2002

There are no references for this article.