Data Fusion for Correcting Measurement Errors

Data Fusion for Correcting Measurement Errors Abstract Often in surveys, key items are subject to measurement errors. Given just the data, it can be difficult to determine the extent and distribution of this error process and, hence, to obtain accurate inferences that involve the error-prone variables. In some settings, however, analysts have access to a data source on different individuals with high-quality measurements of the error-prone survey items. We present a data fusion framework for leveraging this information to improve inferences in the error-prone survey. The basic idea is to posit models about the rates at which individuals make errors, coupled with models for the values reported when errors are made. This can avoid the unrealistic assumption of conditional independence typically used in data fusion. We apply the approach on the reported values of educational attainments in the American Community Survey, using the National Survey of College Graduates as the high-quality data source. In doing so, we account for the sampling design used to select the National Survey of College Graduates. We also present a process for assessing the sensitivity of various analyses to different choices for the measurement error models. Supplemental material is available online. 1. INTRODUCTION Survey data often contains items that are subject to measurement errors. For example, some respondents might misunderstand a question or accidentally select the wrong response, thereby providing values unequal to their factual values. Left uncorrected, these measurement errors can result in degraded inferences (Kim, Cox, Karr, Reiter, and Wang 2015). Unfortunately, the distribution of the measurement errors typically is not estimable from the survey data alone. Typically, one needs to make strong assumptions about the measurement error process, as done, for example, in integrative data analysis (Curran, Hussong, Cai, Huang, Chassin, et al. 2008; Bauer and Hussong 2009; Curran and Hussong 2009), or to leverage information from other sources. We develop methodology for the latter approach here. One natural source of information is a validation sample, (i.e., a dataset with both the reported, possibly erroneous values and the true values measured on the same individuals). These individuals could be a subset of the original survey (Pepe 1992; Yucel and Zaslavsky 2005) or a completely distinct set (Raghunathan 2006; Schenker and Raghunathan 2007; Schenker, Raghunathan, and Bondarenko 2010; Carrig, Manrique-Vallier, Ranby, Reiter, and Hoyle 2015). With validation data, one can model the relationship between the error-prone and true values and use the model to replace the error-prone items with multiply-imputed, plausible true values (Yucel and Zaslavsky 2005; Reiter 2008; Siddique et al. 2015). In many settings, however, it is not possible to obtain validation samples (e.g., because it is too expensive or because someone other than the analyst collected the data). In such cases, another potential source of information is a separate, “gold standard” dataset that includes true (or at least very high quality) measurements of the items subject to error, but not the error-prone measurements. Unlike validation samples, the gold standard dataset alone does not provide enough information to estimate the relationship between the error-prone and true values; it only provides information about the distribution of the true values. Thus, analysts are faced with a special case of data fusion i.e., integrating information from two databases with disjoint sets of individuals and distinct variables (e.g., Rubin 1986; Moriarity and Scheuren 2001; Rassler 2002; D’Orazio, Di Zio, and M. Scanu 2006; Gilula, McCulloch, Rossi 2006; Reiter 2012; Fosdick, DeYoreo, and Reiter 2016). One default approach, common in other data fusion contexts, is to assume that the error-prone and true values are conditionally independent given some set of variables X common to both the survey and gold standard data. Effectively, this involves using the gold standard data to estimate a predictive model for the true values from X and applying the estimated model to impute replacements for all values of the error-prone items in the survey. However, this conditional independence assumption completely disregards the information in the error-prone values, which sacrifices potentially useful information. For example, consider national surveys that ask people to report their educational attainment. We might expect most people to report values accurately and only a modest fraction to make errors. It does not make sense to alter every individual’s reported values in the survey, as would be done using a conditional independence approach. In this article, we develop a framework for leveraging information from gold standard data to improve inferences in surveys subject to measurement errors. The basic idea is to encode plausible assumptions about the error process (e.g., most people do not make errors when reporting educational attainments) and the reporting process (e.g., when people make errors, they are more likely to report higher attainments than actual attainments) into statistical models. We couple those models with distributions for the underlying true data values and estimate posterior distributions of the true values and model parameters. This process allows us to avoid unrealistic conditional independence assumptions in lieu of more scientifically defensible models. The estimation algorithm provides draws of plausible corrections to the error-prone survey values, which can be treated as completed datasets in multiple imputation inference (Rubin 1987). The remainder of this article is organized as follows. In section 2, we review an example of misreporting of educational attainment in data collected by the census bureau, so as to motivate the methodological developments. In section 3, we introduce the general framework for specifying measurement error models to leverage the information in gold standard data. In section 4, we apply the framework to handle potential measurement error in educational attainment in the 2010 American Community Survey (ACS), using the 2010 National Survey of College Graduates (NSCG) as a gold standard file. In doing so, we deal with a key complication in the data integration: accounting for the sampling design used to sample the NSCG. We also demonstrate how the framework facilitates analysis of the sensitivity of conclusions to different measurement error model specifications. In section 5, we provide a brief summary. 2. MISREPORTING IN EDUCATIONAL ATTAINMENT To illustrate the potential for reporting errors in educational attainment that can arise in surveys, we examine data from the 1993 NSCG. The 1993 NSCG surveyed individuals who indicated on the 1990 census long form that they had at least a college degree (Fesco, Frase, and Kannankutty 2012). The questionnaire asked about educational attainment, including detailed questions about educational histories. These questions greatly reduce the possibility of respondent error, so that the educational attainment values in the NSCG are treated as a gold standard (Black, Sanders, and Taylor 2003). The census long form, in contrast, did not include detailed follow-up questions, so that reported educational attainment is prone to measurement error. The US Census Bureau linked each individual in the NSCG to their corresponding record in the long-form data. The linked file is available for download from the Inter-university Consortium for Political and Social Research (National Science Foundation 1993). Because of the linkages, we can characterize the actual measurement error mechanism for educational attainment in the 1990 long-form data. In the NSCG, we treat the highest degree of the three most recent degrees reported (coded as “ed6c1,” “ed6c2,” and “ed6c3” in the file) as the true education level. We disregard any degrees earned in the years 1990–1993, as these occur in the three-year gap between collection of the long form and NSCG data. This ensures consistent time frames for the NSCG and long form reported values. We cross-tabulate these degrees with the degrees reported in the long-form data (coded “yearsch” in the file). Table 1 displays the cross-tabulation. A similar analysis was done by Black et al. (2003). Table 1. Unweighted Cross-tabulation of Reported Education in the NSCG and Census Long Form from the Linked Dataset. BA stands for bachelor’s degree; MA stands for master’s degree; Prof stands for professional degree; and, PhD stands for PhD degree. The 14,319 individuals in the group labeled No Degree did not have a college degree, despite reporting otherwise. The 51,396 individuals in the group labeled Other did not have one of (BA, MA, Prof, PhD) and are discarded from subsequent analyses. Census-reported education ︷ BA MA Prof PhD Total NSCG-reported education { BA 89,580 4,109 1,241 249 95,179 MA 1,218 33,928 655 526 36,327 Prof 382 359 8,648 563 9,952 PhD 99 193 452 6,726 7,470 Total 91,279 38,589 10,996 8,064 14,8928 No Degree 10,150 1,792 2,040 337 14,319 Other 33,368 10,912 4,710 2,406 51,396 Census-reported education ︷ BA MA Prof PhD Total NSCG-reported education { BA 89,580 4,109 1,241 249 95,179 MA 1,218 33,928 655 526 36,327 Prof 382 359 8,648 563 9,952 PhD 99 193 452 6,726 7,470 Total 91,279 38,589 10,996 8,064 14,8928 No Degree 10,150 1,792 2,040 337 14,319 Other 33,368 10,912 4,710 2,406 51,396 Table 1. Unweighted Cross-tabulation of Reported Education in the NSCG and Census Long Form from the Linked Dataset. BA stands for bachelor’s degree; MA stands for master’s degree; Prof stands for professional degree; and, PhD stands for PhD degree. The 14,319 individuals in the group labeled No Degree did not have a college degree, despite reporting otherwise. The 51,396 individuals in the group labeled Other did not have one of (BA, MA, Prof, PhD) and are discarded from subsequent analyses. Census-reported education ︷ BA MA Prof PhD Total NSCG-reported education { BA 89,580 4,109 1,241 249 95,179 MA 1,218 33,928 655 526 36,327 Prof 382 359 8,648 563 9,952 PhD 99 193 452 6,726 7,470 Total 91,279 38,589 10,996 8,064 14,8928 No Degree 10,150 1,792 2,040 337 14,319 Other 33,368 10,912 4,710 2,406 51,396 Census-reported education ︷ BA MA Prof PhD Total NSCG-reported education { BA 89,580 4,109 1,241 249 95,179 MA 1,218 33,928 655 526 36,327 Prof 382 359 8,648 563 9,952 PhD 99 193 452 6,726 7,470 Total 91,279 38,589 10,996 8,064 14,8928 No Degree 10,150 1,792 2,040 337 14,319 Other 33,368 10,912 4,710 2,406 51,396 As evident in table 1, reported education levels on the long form are often higher than those on the NSCG, particularly for individuals with only a bachelor’s degree. Of the 163,247 individuals in scope in the NSCG, over 14,000 were determined not to have at least a bachelor’s degree when asked in the NSCG, despite reporting otherwise in the long form. A whopping 33 percent of individuals who reported being professionals in the long form actually are not professionals according to the NSCG (1 – 8648 / [10,996 + 2040] ≈ 0.33). One possible explanation for this error is confusion over the definition of professionals. The US Census Bureau intended the category to capture graduate degrees from universities (e.g., JD, MBA, MD), whereas Black et al. (2003) found that individuals in professions such as cosmetology, nursing, and health services, which require certifications but not graduate degrees, selected the category. Despite the nontrivial reporting error, the overwhelming majority of individuals’ reported education levels are consistent in the long form and in the NSCG. Of the individuals in the NSCG who had at least a college degree at the time of the 1990 census, about 93.3 percent of them have the same contemporaneous education levels in both files ([89,850 + 33,928 + 8649 + 6726] / 148,928 ≈ 0.93. This suggests that most people report correctly, an observation we want to leverage when constructing measurement error models for education in the 2010 ACS. In many situations, we do not have the good fortune of observing individuals’ error-prone and true values simultaneously. Instead, we are in the setting represented by figure 1. This is also the case in our analysis of educational attainments in the 2010 ACS, described in section 4. The sampling frame for the 2010 NSCG is constructed from reported education levels in the ACS, which replaced the long form (after the 2000 census). However, unlike in 1993, linked data is not available as public use files. Therefore, we treat the 2010 NSCG as gold standard data as justified by Black et al. (2003) and posit measurement models that connect the information from the two data sources, using the framework that we now describe. Figure 1. View largeDownload slide Graphical Representation of Data Fusion Set-up. In the survey data DE, we only observe the error-prone measurement Z but not the true value Y. In the gold standard data DG, we only observe Y but not Z. We observe variables X in both samples. In the application, DE is the ACS and DG is the NSCG. Figure 1. View largeDownload slide Graphical Representation of Data Fusion Set-up. In the survey data DE, we only observe the error-prone measurement Z but not the true value Y. In the gold standard data DG, we only observe Y but not Z. We observe variables X in both samples. In the application, DE is the ACS and DG is the NSCG. 3. MEASUREMENT ERROR MODELING VIA DATA FUSION As in figure 1, let DG and DE be two data sources comprising distinct individuals, with sample sizes nG and nE, respectively. For each individual i in DG or DE, let Xi=(Xi1,…,Xip) be variables common to both surveys and assumed to be free of error, such as demographic variables. We assume these variables have been harmonized (D’Orazio et al. 2006) or placed on the same measurement scale, across DG and DE. Let Y represent the error-free values of some variable of interest, and let Z be an error-prone version of Y. We observe Z but not Y for the nE individuals in DE. We observe Y but not Z for the nG individuals in DG. For simplicity of notation, we assume no missing values in any variable, although the multiple imputation framework easily handles missing values. Additionally, DE can include variables for which there is no corresponding variable in DG. These variables do not play a role in the measurement error modeling, although they can be used in multiple imputation inferences. We seek to estimate Pr(Y,Z|X), and use it to create multiple imputations for the missing values in Y for the individuals in DE. This yields multiple complete data files that can be released to analysts who wish to make inferences involving Y and X in DE. We do so for the common setting where (X,Y,Z) are all categorical variables; we discuss approaches for continuous data types in section 3.4. For j=1,…,p, let each Xj have dj levels. Let Z have dZ levels and Y have dYlevels. Typically dZ = dY, but this need not be the case generally. For example in the NSCG/ACS application, Z is the educational attainment among those who report a college degree in the ACS, which has dZ = 4 levels (bachelor’s degree, master’s degree, professional degree, or PhD degree), and Y is the educational attainment in the NSCG, which has dY= 5 levels. An additional level is needed because some individuals in the NSCG truly do not have a college degree. For all i∈DE, let Ei be an (unobserved) indicator of a reporting error, that is, Ei = 1 when Yi≠Zi and Ei = 0 otherwise. Using E enables us to write Pr(Y,Z|X) as a product of three sub-models. For individual i, the full data likelihood (omitting parameters for simplicity) can be factored as, Pr(Yi=k,Zi=l|Xi)=∑e=01Pr(Yi=k,Ei=e,Zi=l|Xi)=Pr(Yi=k|Xi)∑e=01Pr(Ei=e|Yi=k,Xi)Pr(Zi=l|Ei=e,Yi=k,Xi). (1) This separates the true data generation process and the measurement error generation process, which facilitates model specification. In particular, we can use DG to estimate the true data distribution Pr(Y|X). We then can posit different models for the rates of errors, Pr(Ei=e|Yi=k,Xi) and for the reported values when errors are made, Pr(Zi=l|Ei=1,Yi=k,Xi). Intuitively, the error model locates the records for which Yi≠Zi, and the reporting model captures the patterns of misreported Zi. Of course, when Ei = 0, Pr(Zi=Yi)=1. A similar factorization is used by Yucel and Zaslavsky (2005), He, Landrum, and Zaslavsky (2014), Kim et al. (2015), and Manrique-Vallier and Reiter (2018), among others. The error and reporting models in (1) are used to generate the conditional distributions Pr(Y|Z,X) and Pr(Z|Y,X), which cannot be estimated nonparametrically from the data alone, since (Y i, Zi) are never observed jointly. Put another way, if we tried to use a fully saturated log-linear model for (Y,Z|X), we would not be able to identify all the parameters using DG and DE alone. To see this, assume for the moment that all dX=Πj=1pdj possible combinations of X are present in DG and DE. The fully saturated log-linear model for the distribution of (Y,Z|X) has dYdZdX−dX=(dY−1)dX+(dZ−1)dYdX parameters, where each subtraction of one derives from the requirement that probabilities sum to one. For example, if Y and Z have dY=dZ=2 levels and a scalar X has dX = 3 levels, a fully saturated model for (Y,Z|X) has three parameters (accounting for the sum to one constraint) for each level of X, resulting in nine parameters as given in the formulas. However, DG and DE together provide only (dY−1)dX+(dZ−1)dX independent constraints on the probabilities in the joint distribution. In our simple example, DG and DE provide three constraints from the three conditional probabilities of (Y|X) and three conditional probabilities of (Z|X). We need to impose three more constraints to estimate the joint distribution. The goal of specifying measurement error models is to provide such constraints, as we discuss in the remainder of this section. We note that related identification issues arise in the context of refreshment sampling to adjust for nonignorable attrition in longitudinal studies (Hirano, Imbens, Ridder, and Rubin 2001; Schifeling, Cheng, Reiter, and Hillygus 2015; Si, Reiter, and Hillygus 2015). 3.1 True Data Model One can use any model for (Y|X) that adequately describes the conditional distribution, such as a (multinomial) logistic regression. In the NSCG/ACS application, we use a fully saturated multinomial model, accounting for the sampling design in DG using the approach described in section 4.1. One also could use a joint distribution for (Y,X), such as a log-linear model or a mixture of multinomials model (Dunson and Xing 2009; Si and Reiter 2013). 3.2 Error Model In cases where dY= dZ, a generic form for the error model is Pr(Ei=1|Xi,Yi=k)=g(Xi,Yi,β), (2) where g(Xi,Yi,β) is some function of its arguments and β is some set of unknown parameters. A convenient class of functions that we use here is the logistic regression of Ei on some design vector Mi derived from (Xi,Yi), with corresponding coefficients β. The analyst can encode different versions of Mi to represent assumptions about the error process. The simplest specification is to set each Mi equal to a vector of ones, which implies that there is a common probability of error for all individuals. This error model makes sense when the analyst believes the errors in Z occur completely at random; for example, when errors arise simply because respondents accidentally and randomly select the wrong response in the survey, or when all respondents are equally likely to misunderstand the survey question. A more realistic possibility is to allow the probability of error to depend on some variables in Xi but not on Yi (e.g., men misreport education at different rates than women). This could be encoded by including an intercept for one of the sexes in Mi. Finally, one can allow the probability of error to depend on Yi itself—for example, people who truly do not have at least a college degree are more likely to misreport—by including some function of it in Mi. In the case where dZ≠dY, as in the NSCG/ACS application, we automatically set Ei = 1 for any individual with Yi∉{1:dZ}, where {1:dZ}={1,…,dZ}. For example, we set Ei = 1 for all individuals who are determined in the NSCG not to have a college degree but report so in the ACS. The stochastic part of the error model only applies to individuals who truly have at least a bachelor’s degree. 3.3 Reporting Model When there is no reporting error for individual i (i.e., Ei = 0), we know that Zi = Y i. When there is a reporting error, we must model the reported value Zi. As with (2), one can posit a variety of distributions for the reporting error, which is some function h(Xi,Yi,α) with parameters α. We now describe a few reporting error models for illustration. One could use more complicated models as well (e.g., based on multinomial logistic regression). A simple model assumes that values of Zi are equally likely, as in Manrique-Vallier and Reiter (2018). We have Pr(Zi=l|Xi,Yi=k,Ei=1)={1/(dZ−1) if l≠k,k∈{1:dz}1/dZ if k∉{1:dZ}0 otherwise . (3) Such a reporting model could be reasonable when reporting errors due to clerical errors. We note that this model does not accurately characterize the reporting errors in the 1993 linked NSCG data, per table 1. Alternatively, one can allow the probabilities to depend on Yi, so that, (Zi|Xi,Yi=k,Ei=1)∼Categorical (pk(1),…,pk(dZ)), (4) where each pk(l) is the probability of reporting Z = l given that Y = k, and pk(k)=0. One can further parameterize the reporting model so that the reporting probabilities vary with X. For example, to make the probabilities vary with sex and true education values, we can use (Zi|Xi,Yi=k,Ei=1)∼{Categorical (pM,k(1),…,pM,k(dZ)) if Xi,sex=MCategorical (pF,k(1),…,pF,k(dZ)) if Xi,sex=F. (5) 3.4 Modeling Considerations As apparent in sections 3.2 and 3.3, the error and reporting models can take on many specifications. Without linked data, analysts cannot use exploratory data analysis to inform the model choice. Similarly, analysts cannot use holdout samples to evaluate model quality, since no unit has Y and Z measured simultaneously. Instead, all that analysts can do is posit scientifically defensible measurement error models and make post hoc checks of the sensibility of analyses from those models. We demonstrate this approach in section 4. For example, analysts can check whether or not the predicted probabilities of errors implied by the model seem plausible. As another diagnostic, analysts can compare the distribution of the imputed values of (Y|X) in DE to the empirical distribution of (Y|X) in DG. This is akin to diagnostics in multiple imputation for missing data that compare imputed and observed values (Abayomi, Gelman, and Levy 2008). When these distributions differ substantially, it suggests the measurement error model specification (or possibly the true data model) is inadequate. Such diagnostic checks only can reveal problems with the model specification; they do not indicate that a particular specification is correct (see Fosdick et al. 2016, for illustrations of related diagnostics). When specifying the models, analysts should include in X variables that plausibly predict Y and are available in DG and DE. Doing so helps ensure the relationships between X and the imputed values of Y in DE accord with those from DG. It is also beneficial to include any variables in X that potentially explain the likelihood of making errors. To find such variables, since Y and Z are not measured simultaneously, analysts must rely on knowledge external to the data, such as validation studies or domain knowledge from prior experience in other contexts. When important variables are excluded from X, either because they are not available in both files or because of modeling choices, the imputations of Y in DE may distort estimates involving Y and those excluded X variables. However, when the fraction of respondents who make errors is modest (as in table 1), Z provides substantial information about the missing Y. In this case, distortions of relationships between any excluded X variables and the imputed Y variables tend not to be severe. The specification of the error and reporting models constrains the probabilities in the joint distribution of (Y,Z|X), thereby enabling estimation of that distribution. To illustrate, consider again the example with binary Y and Z and three levels in a scalar X. If we make Pr(Ei=0|Yi,Xi)=π not depend on Xi or Yi, and we use (3) with dZ=dY=2, we obtain the constraints Pr(Yi=k|Zi=k,Xi)=Pr(Yi=k|Xi)∑e=01Pr(Zi=k|Ei=e,Yi=k,Xi)Pr(Ei=e)Pr(Zi=k|Xi)=πPr(Yi=k|Xi)Pr(Zi=k|Xi) (6) for k = 0 and k = 1. The result in (6) stems from this particular reporting error model specification, for which Zi = Yi whenever Ei = 0 and Zi=1−Yi whenever Ei = 1. Naturally, we also can find Pr(Yi=k|Zi=1−k,Xi). Since Pr(Yi=k|Xi) can be estimated from DG and Pr(Zi=k|Xi) can be estimated from DE, this measurement error model adds three constraints (one for each level of Xi) that allow for identification of all parameters in the model. When X is observed in both datasets, we can do the computations in (6) within each category of X, thereby arriving at similar identification results for error models where Ei depends on Xi but not Yi. The roles of the constraints are more subtle when the error model depends on Y. In this case, we replace π in (6) with Pr(Ei=0|Yi=k)=πYi. We also can find Pr(Yi=k|Zi=1−k)usingPr(Ei=1|Yi=k)=1−πYi. This still adds constraints that define a joint distribution when averaging over Ei; however, we cannot identify each πYi. A simple illustration shows this to be the case. Suppose Zi = 1 for 60 percent of the values in a very large DE, and Yi = 1 for 50 percent of the values in a very large DG. The procedure seeks to impute the missing Yi values in DE so that 50 percent are ones and 50 percent are zeroes. However, it can do so in many ways. For example, it can set all Yi = 0 for all records where Zi = 0 (i.e., π0=0) and change enough values of Zi = 1 into imputed Yi = 0 to get to 50 percent ones; or, it can change some other fractions of values with Zi = 0 to ones and values of Zi = 1 to zeros to get to 50 percent. The data provides no information to choose among feasible options. The lack of identification when measurement errors depend on Y can translate to increased variance in estimates of πYi. This may have only a modest impact on inferences for the marginal distribution of Y or the conditional distributions of Y given X. This is because the measurement error model permutes imputed values of Y within levels of X with frequency depending on πYi. However, the lack of identification can degrade estimates of associations involving Y and variables not in X. This suggests an important guidance for specifying measurement error models that depend on Y: it is helpful to use informative prior distributions that favor certain parameter values over others. This is borne out in the simulations of section 4. For additional illustrations of the effects of different specifications of measurement error models, see the simulation studies in Shifeling (2016, pp. 86 – 94). To estimate the models, it is convenient to use a two-stage strategy. When imputing missing Y in DE, all of the information needed from DG is represented by the parameters of the true data model, θ. Hence, we first can construct a (possibly approximate) posterior distribution of θ using only DG. Ideally, the conditional distribution of Y given X is the same in both DG and DE; indeed, the methodology is derived assuming that this is the case. When DG is collected using a complex design, it may be prudent to account for the design when estimating θ. For example, Kim and Yang (2017) suggest using survey-weighted, pseudo-maximum likelihood models to estimate θ. This approach also enables analysts to account for important design features in DG, such as cluster or stratum indicators that are not available in DE and not able to be part of X. We then sample many draws of θ from this distribution. We plug in these draws in the Gibbs sampling steps for a Bayesian predictive distribution for (Yi|Zi,Xi,θ) for the cases in DE, thereby generating the multiple imputations. We describe the Gibbs sampler for this step for the NSCG/ACS application in the supplementary material. In the NSCG/ACS application, and more generally, we do not account for design features of DE in specification or estimation of the measurement error models. We do not believe this is necessary since, conceptually, the goal of the measurement error models in the data fusion approach is to bridge from observed Zi to plausible Yi. However, complex designs in DE can and should be accounted for after creation of the plausible datasets; for example, analysts can use survey-weighted estimation with the completed versions of DE. Although we present models in the context of categorical data, analysts can adapt the strategy to handle continuous variables in (Yi, Zi) or Xi. For (Y|X), analysts can use DG to specify appropriate regression models. For (Ei|Yi,Xi), analysts can use binary regressions with continuous-valued predictors. For (Zi|Yi,Xi,Ei=1), analysts have to specify a measurement error model connecting Y and Z. For example, a common measurement error model assumes (Zi|Ei=1,Xi=xi)∼N(xi,σ2) where σ2 is fixed at some value based on previous experience with the measurement (Fuller 1987). Alternatively, to represent ignorance about the reporting error process, one can assume (Zi|Ei=1,Xi=xi)∼Unif(a,b) where a and b are fixed limits large enough to capture all the reported values. This model was used by Kim et al. (2015) for edit-imputation of continuous data from one error-prone source. 4. ADJUSTING FOR REPORTING ERRORS IN EDUCATION IN THE 2010 ACS We now use the framework to adjust inferences for potential reporting error in educational attainment in the 2010 ACS, using the public use microdata for the 2010 NSCG as the gold standard file DG. We consider two main analyses that could be affected by reporting error in education. First, we estimate from the ACS the number of science and engineering degrees awarded to women, after correcting for measurement error. We base the estimate on an indicator in the ACS for whether or not each individual has such a degree. Second, we examine average incomes across degrees. This focus is motivated in part by the findings of Black, Haviland, Sanders, and Taylor (2006, 2008), who found that apparent wage gaps in the 1990 census long-form data could be explained by reporting errors in education. As DE, we use the subset of ACS microdata that includes only individuals who reported a bachelor’s degree or higher and are under age 76. The resulting sample size is nE=600,150. In X, we include gender, age group (24 and younger, 25–39, 40–54, and 55 and older), and an indicator for whether the individual’s race is black or something else. We use these variables because the 1993 linked data (Black et al. 2003) allow us to specify informative prior distributions for them alone; we do not have information on the measurement error in education broken out by other demographic groups. Were such information available, either from a linked sample or previous knowledge, we would strongly consider making X richer. In the NSCG, we discarded thirty-eight records with race suppressed, leaving a sample size of nG=77,150. We consider two sets of measurement error model specifications. The first set uses specifications like those in section 3, with flat prior distributions for all parameters. We use this set to illustrate model diagnostics and sensitivity analysis absent informative prior beliefs about the measurement error process. The second set uses a common error and reporting model with different, informative prior distributions on its parameters. We construct these informative prior distributions based on the analysis of the 1993 linked file. For all specifications considered, we create M = 50 multiple imputations of the plausible true education values in the 2010 ACS, which we then analyze using the methods of Rubin (1987). We use M = 50 because it is sufficient to generate stable interval estimates. One may be able to get shorter intervals using the multiple imputation methods akin to those in Reiter (2008), which are developed for situations in which the data used for estimating the imputation model are not the same as the data used for analysis. We leave evaluation of the performance of this and other multiple imputation variance estimators for future research. For all specifications, the true data model is a saturated multinomial distribution for the five values of Y for each combination of X. We begin by describing how we estimate the parameters of the true data distribution, accounting for the sampling design of the NSCG. 4.1 Accounting for Sampling Design of NSCG The 2010 NSCG uses reported education in the 2010 ACS as a stratification variable (Fesco et al. 2012). Its unweighted percentages can overrepresent or underrepresent degree types in the population; this is most obviously the case for individuals without a college degree (Y i = 5). We need to account for this informative sampling when estimating parameters of the true data model. We do so with a two-stage approach. First, we use survey-weighted inferences to estimate population totals of (Y|X) from the 2010 NSCG. Second, we turn these estimates into an approximate Bayesian posterior distribution for input to fitting the measurement error models used to impute plausible values of Yi for individuals in the ACS. We now describe this process, which can be used generally when DG is collected via a complex survey design. Suppose for the moment that dY= dZ. This is not the case when DE is the ACS (where dZ = 4) and DG is the NSCG (where dY=5); however, we start here to fix ideas. For all possible combinations x, let θxk=Pr(Y=k|X=x), and let θx=(θx1,…,θxdY). We seek to use DG to specify f(θ|X,Y). To do so, we first parameterize θxk=Txk/∑j=1dYTxj, where Txk is the population count of individuals with (Xi=x,Yi=k). We estimate Tx=(Tx1,…,TxdY) and the associated covariance matrix of the estimator using standard survey-weighted estimation. Let wi be the sample weight for all i∈DG. We compute the estimated total and associated variance for each x and k as T^xk=∑i=1nGwiI(Xi=x,Yi=k) (7) Var̂(T^xk)=nGnG−1∑i=1nG(wiI(Xi=x,Yi=k)−T^xknG)2. (8) For each k and l, with l≠k, we also compute the estimated covariance, Cov̂(T^xk,T^xl)=nGnG−1∑i=1nG[(wiI(Xi=x,Yi=k)−T^xknG)×(wiI(Xi=x,Yi=l)−T^xlnG)]. (9) The variance and covariance estimators are the design-based estimators for probability proportional to size sampling with replacement, as is typical of multistage complex surveys (Lohr 2010). Switching now to a Bayesian modeling perspective, we assume that Tx∼ Log-Normal( μx,τx), so as to ensure a distribution with positive values for all true totals. We select (μx,τx) so that each E(Txk)=T^xk and Var(Tx)=Σ^(T^x), the estimated covariance matrix with elements defined by (8) and (9). These are derived from moment matching (Tarmast 2001). We have μxj= log ⁡(T^xj)−τx[j,j]/2 (10) τx[j,j]= log ⁡(1+Σ^x[j,j]/(T^xj2)) (11) τx[j,i]= log ⁡(1+Σ^x[j,i]/(T^xj·T^xi)), (12) where the notation [j,i] denotes an element in row j and column i of the matrix. We draw Tx* from this log-normal distribution and transform to draws θx*. Since the 2010 NSCG does not include individuals who claim in the ACS to have less than a bachelor’s degree, we cannot use DG directly to estimate Tx5. Instead, we estimate Tx+=Tx1+Tx2+Tx3+Tx4+Tx5 using the ACS data and estimate (Tx1,Tx2,Tx3,Tx4) from the NSCG using the method described previously; this leads to an estimate for Tx5. More precisely, let the ACS design-based estimator for Tx+ be T^x+, with design-based variance estimate σ^2(T^x+). We sample a value Tx+*∼Normal(T^x+,σ^2(T^x+)). Using an independent sample of values of (Tx1*,…,Tx4*) from the NSCG, we compute Tx5*=Tx+*−∑j=14Txj*, and set Tx*=(Tx1*,…,Tx5*). We repeat these steps 10,000 times. We then compute the mean and covariance matrix of the 10,000 draws, which we again plug into (10) – (12). The resulting log-normal distribution is the approximate posterior distribution of θx. We include an example of this entire procedure in the supplementary material. 4.2 Measurement Error Models The two sets of measurement error models include four that use flat prior distributions and three that use informative prior distributions based on the 1993 linked data. For all error models, we use a logistic regression of Ei on various main effects and interactions of Yi and Xi. For all reporting models, we use categorical distributions with probabilities that depend on Yi and possibly Xi. The four models with flat prior distributions are summarized in table 2. In model one, the error and reporting models depend only on Yi. Models 2 and 3 keep the reporting model as in (4) but expand the error model. In model two, the probability of a reporting error can vary with Yi and sex ( Xi,sex). In model three, error probabilities can vary with Yi and the indicator for black race ( Xi,black). In model four, the error and reporting models both depend on Y and sex. Table 2. Summary of the First Four Measurement Error Model Specifications for 2010 NSCG/ACS Analysis. These models use flat prior distributions on all parameters. Error model Reporting model Expression for MiTβ Pr(Zi|Xi,Yi=k,Ei=1) Model 1 β1+∑k=24βkI(Yi=k) Categorical(pk(1),…,pk(4)) Model 2 β1+∑k=24βk(M)I(Yi=k,Xi,sex=M) Categorical(pk(1),…,pk(4)) +∑k=14βk(F)I(Yi=k,Xi,sex=F) Model 3 β1+∑k=24βk(no)I(Yi=k,Xi,black=no) Categorical(pk(1),…,pk(4)) +∑k=14βk(yes)I(Yi=k,Xi,black=yes) Model 4 β1+∑k=24βk(M)I(Yi=k,Xi,sex=M) Categorical(pM,k(1),…,pM,k(4)) if Xi,sex= M +∑k=14βk(F)I(Yi=k,Xi,sex=F) Categorical(pF,k(1),…,pF,k(4)) if Xi,sex=F Error model Reporting model Expression for MiTβ Pr(Zi|Xi,Yi=k,Ei=1) Model 1 β1+∑k=24βkI(Yi=k) Categorical(pk(1),…,pk(4)) Model 2 β1+∑k=24βk(M)I(Yi=k,Xi,sex=M) Categorical(pk(1),…,pk(4)) +∑k=14βk(F)I(Yi=k,Xi,sex=F) Model 3 β1+∑k=24βk(no)I(Yi=k,Xi,black=no) Categorical(pk(1),…,pk(4)) +∑k=14βk(yes)I(Yi=k,Xi,black=yes) Model 4 β1+∑k=24βk(M)I(Yi=k,Xi,sex=M) Categorical(pM,k(1),…,pM,k(4)) if Xi,sex= M +∑k=14βk(F)I(Yi=k,Xi,sex=F) Categorical(pF,k(1),…,pF,k(4)) if Xi,sex=F Table 2. Summary of the First Four Measurement Error Model Specifications for 2010 NSCG/ACS Analysis. These models use flat prior distributions on all parameters. Error model Reporting model Expression for MiTβ Pr(Zi|Xi,Yi=k,Ei=1) Model 1 β1+∑k=24βkI(Yi=k) Categorical(pk(1),…,pk(4)) Model 2 β1+∑k=24βk(M)I(Yi=k,Xi,sex=M) Categorical(pk(1),…,pk(4)) +∑k=14βk(F)I(Yi=k,Xi,sex=F) Model 3 β1+∑k=24βk(no)I(Yi=k,Xi,black=no) Categorical(pk(1),…,pk(4)) +∑k=14βk(yes)I(Yi=k,Xi,black=yes) Model 4 β1+∑k=24βk(M)I(Yi=k,Xi,sex=M) Categorical(pM,k(1),…,pM,k(4)) if Xi,sex= M +∑k=14βk(F)I(Yi=k,Xi,sex=F) Categorical(pF,k(1),…,pF,k(4)) if Xi,sex=F Error model Reporting model Expression for MiTβ Pr(Zi|Xi,Yi=k,Ei=1) Model 1 β1+∑k=24βkI(Yi=k) Categorical(pk(1),…,pk(4)) Model 2 β1+∑k=24βk(M)I(Yi=k,Xi,sex=M) Categorical(pk(1),…,pk(4)) +∑k=14βk(F)I(Yi=k,Xi,sex=F) Model 3 β1+∑k=24βk(no)I(Yi=k,Xi,black=no) Categorical(pk(1),…,pk(4)) +∑k=14βk(yes)I(Yi=k,Xi,black=yes) Model 4 β1+∑k=24βk(M)I(Yi=k,Xi,sex=M) Categorical(pM,k(1),…,pM,k(4)) if Xi,sex= M +∑k=14βk(F)I(Yi=k,Xi,sex=F) Categorical(pF,k(1),…,pF,k(4)) if Xi,sex=F For models 5–7, we use the specification in model four and incorporate prior information about the measurement errors from the 1993 linked data. In constructing the priors, we first remove records that have been flagged as having missing education that has been imputed because these imputations might not closely reflect the actual education values (Black et al. 2003). Table 3 displays the prior distributions for males with bachelor’s degrees. Details on how we arrive at these and other groups’ prior specifications are in the supplementary material; here, we summarize briefly. Table 3. Summary of Informative Prior Specifications for 2010 NSCG/ACS Analysis for Males with Bachelor’s Degrees Error rate Reporting probabilities (pM,1(2),pM,1(3),pM,1(4)) Model 4 Beta (1, 1) Dirichlet (1, 1, 1) Model 5 Beta (0.76, 14.24) Dirichlet (3.54, 1.27, 0.19) Model 6 Beta (2,724.2, 50,862) Dirichlet (2,235.3, 799.7, 123.1) Model 7 Beta (500, 99,500) Dirichlet (1, 1, 1) Error rate Reporting probabilities (pM,1(2),pM,1(3),pM,1(4)) Model 4 Beta (1, 1) Dirichlet (1, 1, 1) Model 5 Beta (0.76, 14.24) Dirichlet (3.54, 1.27, 0.19) Model 6 Beta (2,724.2, 50,862) Dirichlet (2,235.3, 799.7, 123.1) Model 7 Beta (500, 99,500) Dirichlet (1, 1, 1) Table 3. Summary of Informative Prior Specifications for 2010 NSCG/ACS Analysis for Males with Bachelor’s Degrees Error rate Reporting probabilities (pM,1(2),pM,1(3),pM,1(4)) Model 4 Beta (1, 1) Dirichlet (1, 1, 1) Model 5 Beta (0.76, 14.24) Dirichlet (3.54, 1.27, 0.19) Model 6 Beta (2,724.2, 50,862) Dirichlet (2,235.3, 799.7, 123.1) Model 7 Beta (500, 99,500) Dirichlet (1, 1, 1) Error rate Reporting probabilities (pM,1(2),pM,1(3),pM,1(4)) Model 4 Beta (1, 1) Dirichlet (1, 1, 1) Model 5 Beta (0.76, 14.24) Dirichlet (3.54, 1.27, 0.19) Model 6 Beta (2,724.2, 50,862) Dirichlet (2,235.3, 799.7, 123.1) Model 7 Beta (500, 99,500) Dirichlet (1, 1, 1) For model five, we set the prior distributions for each βk(x) so that the error rates are centered at the estimate from the 1993 linked data. We also require the central 95 percent probability interval of the prior distribution on each error rate to be close to (0.005, 0.20), allowing for a wide but not unrealistic range of possible error rates. For the reporting probabilities pM,k(z) and pF,k(z), we center most of the prior distributions at the corresponding estimates from the 1993 linked data. We require the central 95 percent probability interval of each prior distribution to have support on values of p·,k(z) within ±.10 of the 1993 point estimate, truncating at zero or one as needed. One exception is the reporting probabilities for those with “no college degree” who report “professional” degree, which we center at half the 1993 estimate. The Census Bureau has improved the clarity of the definition of “professional” in the 20 years since the 1990 long form, as discussed in the prior specification section of the supplementary material. For model six, we use the same prior means as in model five for both error and reporting models. However, we substantially tighten the prior distributions to make the prior variance accord with the uncertainty in the point estimates from the 1993 linked data. We do so by using prior sample sizes that match those from the 1993 NSCG. For example, the 1993 NSCG included 53,586 males with bachelor’s degrees (excluding those records who had their census education imputed). We therefore use Beta(2724.2, 50,862) as the prior distribution for the error rate for this x. We similarly increase the prior sample sizes for the reporting probabilities to match the 1993 NSCG sample sizes. Model seven departs from the 1993 linked data estimates and encodes a strong prior belief that almost no one misreports their education except for haphazard mistakes. Here, we set the prior mean for the probability of misreporting education to 0.005 for all demographic groups. We use a prior sample size of 100,000, making the prior distribution concentrate strongly around 0.005. For the reporting probabilities, we use a noninformative prior distribution for convenience, since the estimates of the reporting probabilities are strongly influenced by the concentrated prior distributions on the error rates. Finally, for comparison purposes, we also fit the model based on a conditional independence assumption (CIA), which is the default assumption in data fusion. To impute Yi for individuals in the ACS under the CIA, we sample θ* and then impute (Y*|θ*,X) from the true data model. Here, we do not use the reported value of Zi in the imputations. 4.3 Empirical Results We first examine what each model suggests about the extent and nature of the measurement errors in the 2010 ACS. We then use the models to assess sensitivity of results about the substantive questions related to number of degrees and income. We use survey-weighted estimates in the multiple imputation inferences. 4.3.1 Distributions of errors in reported ACS education values. Table 4 displays the multiple imputation point estimates and 95 percent confidence intervals for the proportions of errors by gender and NSCG education, obtained from the M = 50 draws of Ei for all individuals in DE. We begin by comparing results for the set of models with flat prior distributions (models 1–4) and the CIA model, then move to the set of models with informative prior distributions (models 5–7). Table 4. Error Rate Estimates from Different Model Specifications. Models 1–7 are run for 100,000 MCMC iterations. We save M = 50 completed datasets under each model. For each dataset, we compute the estimated overall error rate, estimated error rate by gender and imputed Y, and associated variances using ratio estimators that incorporate the ACS final survey weights. Estimate by group Estimate overall Y= BA Y= MA Y= Prof. Y= PhD CIA model  Male 0.37 0.76 0.91 0.94 0.57 (0.36, 0.37) (0.75, 0.76) (0.91, 0.92) (0.93, 0.95) (0.55, 0.58)  Female 0.35 0.72 0.95 0.97 (0.35, 0.36) (0.71, 0.72) (0.94, 0.95) (0.96, 0.97) Model 1  Male 0.05 0.10 0.18 0.27 0.17 (0.04, 0.06) (0.08, 0.11) (0.15, 0.21) (0.23, 0.31) (0.16, 0.19)  Female 0.05 0.09 0.18 0.28 (0.05, 0.06) (0.08, 0.10) (0.15, 0.21) (0.24, 0.32) Model 2  Male 0.05 0.18 0.27 0.36 0.20 (0.04, 0.06) (0.16, 0.21) (0.18, 0.37) (0.30, 0.42) (0.18, 0.21)  Female 0.05 0.12 0.26 0.41 (0.05, 0.06) (0.10, 0.14) (0.20, 0.33) (0.29, 0.53) Model 3  Male 0.05 0.09 0.17 0.25 0.17 (0.04, 0.06) (0.08, 0.11) (0.14, 0.20) (0.21, 0.30) (0.16, 0.19)  Female 0.05 0.09 0.17 0.26 (0.05, 0.06) (0.08, 0.10) (0.14, 0.20) (0.21, 0.31) Model 4  Male 0.05 0.19 0.36 0.36 0.22 (0.04, 0.06) (0.16, 0.23) (0.26, 0.46) (0.27, 0.45) (0.20, 0.24)  Female 0.09 0.14 0.52 0.55 (0.08, 0.10) (0.11, 0.17) (0.44, 0.59) (0.40, 0.70) Model 5  Male 0.07 0.19 0.23 0.34 0.22 (0.06, 0.08) (0.16, 0.22) (0.14, 0.32) (0.27, 0.41) (0.20, 0.24)  Female 0.09 0.12 0.50 0.31 (0.08, 0.10) (0.09, 0.15) (0.43, 0.57) (0.17, 0.46) Model 6  Male 0.05 0.09 0.10 0.10 0.16 (0.05, 0.05) (0.08, 0.10) (0.09, 0.11) (0.09, 0.11) (0.14, 0.17)  Female 0.05 0.06 0.16 0.07 (0.04, 0.05) (0.05, 0.07) (0.14, 0.18) (0.06, 0.09) Model 7  Male 0.01 0.01 0.00 0.01 0.11 (0.01, 0.01) (0.00, 0.01) (0.00, 0.01) (0.00, 0.01) (0.09, 0.13)  Female 0.01 0.01 0.01 0.01 (0.01, 0.01) (0.01, 0.01) (0.00, 0.01) (0.00, 0.01) Estimate by group Estimate overall Y= BA Y= MA Y= Prof. Y= PhD CIA model  Male 0.37 0.76 0.91 0.94 0.57 (0.36, 0.37) (0.75, 0.76) (0.91, 0.92) (0.93, 0.95) (0.55, 0.58)  Female 0.35 0.72 0.95 0.97 (0.35, 0.36) (0.71, 0.72) (0.94, 0.95) (0.96, 0.97) Model 1  Male 0.05 0.10 0.18 0.27 0.17 (0.04, 0.06) (0.08, 0.11) (0.15, 0.21) (0.23, 0.31) (0.16, 0.19)  Female 0.05 0.09 0.18 0.28 (0.05, 0.06) (0.08, 0.10) (0.15, 0.21) (0.24, 0.32) Model 2  Male 0.05 0.18 0.27 0.36 0.20 (0.04, 0.06) (0.16, 0.21) (0.18, 0.37) (0.30, 0.42) (0.18, 0.21)  Female 0.05 0.12 0.26 0.41 (0.05, 0.06) (0.10, 0.14) (0.20, 0.33) (0.29, 0.53) Model 3  Male 0.05 0.09 0.17 0.25 0.17 (0.04, 0.06) (0.08, 0.11) (0.14, 0.20) (0.21, 0.30) (0.16, 0.19)  Female 0.05 0.09 0.17 0.26 (0.05, 0.06) (0.08, 0.10) (0.14, 0.20) (0.21, 0.31) Model 4  Male 0.05 0.19 0.36 0.36 0.22 (0.04, 0.06) (0.16, 0.23) (0.26, 0.46) (0.27, 0.45) (0.20, 0.24)  Female 0.09 0.14 0.52 0.55 (0.08, 0.10) (0.11, 0.17) (0.44, 0.59) (0.40, 0.70) Model 5  Male 0.07 0.19 0.23 0.34 0.22 (0.06, 0.08) (0.16, 0.22) (0.14, 0.32) (0.27, 0.41) (0.20, 0.24)  Female 0.09 0.12 0.50 0.31 (0.08, 0.10) (0.09, 0.15) (0.43, 0.57) (0.17, 0.46) Model 6  Male 0.05 0.09 0.10 0.10 0.16 (0.05, 0.05) (0.08, 0.10) (0.09, 0.11) (0.09, 0.11) (0.14, 0.17)  Female 0.05 0.06 0.16 0.07 (0.04, 0.05) (0.05, 0.07) (0.14, 0.18) (0.06, 0.09) Model 7  Male 0.01 0.01 0.00 0.01 0.11 (0.01, 0.01) (0.00, 0.01) (0.00, 0.01) (0.00, 0.01) (0.09, 0.13)  Female 0.01 0.01 0.01 0.01 (0.01, 0.01) (0.01, 0.01) (0.00, 0.01) (0.00, 0.01) Table 4. Error Rate Estimates from Different Model Specifications. Models 1–7 are run for 100,000 MCMC iterations. We save M = 50 completed datasets under each model. For each dataset, we compute the estimated overall error rate, estimated error rate by gender and imputed Y, and associated variances using ratio estimators that incorporate the ACS final survey weights. Estimate by group Estimate overall Y= BA Y= MA Y= Prof. Y= PhD CIA model  Male 0.37 0.76 0.91 0.94 0.57 (0.36, 0.37) (0.75, 0.76) (0.91, 0.92) (0.93, 0.95) (0.55, 0.58)  Female 0.35 0.72 0.95 0.97 (0.35, 0.36) (0.71, 0.72) (0.94, 0.95) (0.96, 0.97) Model 1  Male 0.05 0.10 0.18 0.27 0.17 (0.04, 0.06) (0.08, 0.11) (0.15, 0.21) (0.23, 0.31) (0.16, 0.19)  Female 0.05 0.09 0.18 0.28 (0.05, 0.06) (0.08, 0.10) (0.15, 0.21) (0.24, 0.32) Model 2  Male 0.05 0.18 0.27 0.36 0.20 (0.04, 0.06) (0.16, 0.21) (0.18, 0.37) (0.30, 0.42) (0.18, 0.21)  Female 0.05 0.12 0.26 0.41 (0.05, 0.06) (0.10, 0.14) (0.20, 0.33) (0.29, 0.53) Model 3  Male 0.05 0.09 0.17 0.25 0.17 (0.04, 0.06) (0.08, 0.11) (0.14, 0.20) (0.21, 0.30) (0.16, 0.19)  Female 0.05 0.09 0.17 0.26 (0.05, 0.06) (0.08, 0.10) (0.14, 0.20) (0.21, 0.31) Model 4  Male 0.05 0.19 0.36 0.36 0.22 (0.04, 0.06) (0.16, 0.23) (0.26, 0.46) (0.27, 0.45) (0.20, 0.24)  Female 0.09 0.14 0.52 0.55 (0.08, 0.10) (0.11, 0.17) (0.44, 0.59) (0.40, 0.70) Model 5  Male 0.07 0.19 0.23 0.34 0.22 (0.06, 0.08) (0.16, 0.22) (0.14, 0.32) (0.27, 0.41) (0.20, 0.24)  Female 0.09 0.12 0.50 0.31 (0.08, 0.10) (0.09, 0.15) (0.43, 0.57) (0.17, 0.46) Model 6  Male 0.05 0.09 0.10 0.10 0.16 (0.05, 0.05) (0.08, 0.10) (0.09, 0.11) (0.09, 0.11) (0.14, 0.17)  Female 0.05 0.06 0.16 0.07 (0.04, 0.05) (0.05, 0.07) (0.14, 0.18) (0.06, 0.09) Model 7  Male 0.01 0.01 0.00 0.01 0.11 (0.01, 0.01) (0.00, 0.01) (0.00, 0.01) (0.00, 0.01) (0.09, 0.13)  Female 0.01 0.01 0.01 0.01 (0.01, 0.01) (0.01, 0.01) (0.00, 0.01) (0.00, 0.01) Estimate by group Estimate overall Y= BA Y= MA Y= Prof. Y= PhD CIA model  Male 0.37 0.76 0.91 0.94 0.57 (0.36, 0.37) (0.75, 0.76) (0.91, 0.92) (0.93, 0.95) (0.55, 0.58)  Female 0.35 0.72 0.95 0.97 (0.35, 0.36) (0.71, 0.72) (0.94, 0.95) (0.96, 0.97) Model 1  Male 0.05 0.10 0.18 0.27 0.17 (0.04, 0.06) (0.08, 0.11) (0.15, 0.21) (0.23, 0.31) (0.16, 0.19)  Female 0.05 0.09 0.18 0.28 (0.05, 0.06) (0.08, 0.10) (0.15, 0.21) (0.24, 0.32) Model 2  Male 0.05 0.18 0.27 0.36 0.20 (0.04, 0.06) (0.16, 0.21) (0.18, 0.37) (0.30, 0.42) (0.18, 0.21)  Female 0.05 0.12 0.26 0.41 (0.05, 0.06) (0.10, 0.14) (0.20, 0.33) (0.29, 0.53) Model 3  Male 0.05 0.09 0.17 0.25 0.17 (0.04, 0.06) (0.08, 0.11) (0.14, 0.20) (0.21, 0.30) (0.16, 0.19)  Female 0.05 0.09 0.17 0.26 (0.05, 0.06) (0.08, 0.10) (0.14, 0.20) (0.21, 0.31) Model 4  Male 0.05 0.19 0.36 0.36 0.22 (0.04, 0.06) (0.16, 0.23) (0.26, 0.46) (0.27, 0.45) (0.20, 0.24)  Female 0.09 0.14 0.52 0.55 (0.08, 0.10) (0.11, 0.17) (0.44, 0.59) (0.40, 0.70) Model 5  Male 0.07 0.19 0.23 0.34 0.22 (0.06, 0.08) (0.16, 0.22) (0.14, 0.32) (0.27, 0.41) (0.20, 0.24)  Female 0.09 0.12 0.50 0.31 (0.08, 0.10) (0.09, 0.15) (0.43, 0.57) (0.17, 0.46) Model 6  Male 0.05 0.09 0.10 0.10 0.16 (0.05, 0.05) (0.08, 0.10) (0.09, 0.11) (0.09, 0.11) (0.14, 0.17)  Female 0.05 0.06 0.16 0.07 (0.04, 0.05) (0.05, 0.07) (0.14, 0.18) (0.06, 0.09) Model 7  Male 0.01 0.01 0.00 0.01 0.11 (0.01, 0.01) (0.00, 0.01) (0.00, 0.01) (0.00, 0.01) (0.09, 0.13)  Female 0.01 0.01 0.01 0.01 (0.01, 0.01) (0.01, 0.01) (0.00, 0.01) (0.00, 0.01) The CIA model suggests extremely large error percentages, especially for the highest education levels. These rates seem unlikely to be reality, leading us to reject the CIA model. The overall error rates for models 1–4 are similar and more realistic than those from the CIA model. The differences in error estimates between model two and model one suggest that the probability of error depends on sex. Comparing results for model three and model one, however, we see little evidence of important race effects on the propensity to make errors. Model four generalizes model two by allowing the reporting probabilities to vary by sex. If these probabilities were similar across sex in reality, we would expect the two models to produce similar results. However, the estimated error rates are fairly different; for example, the estimated proportion of errors for female professionals from model four is about double that from model two. To determine where the models differ most, we examine the estimated reporting probabilities, displayed in table 5. Model four estimates some significant differences in reporting probabilities by gender. For example, males with bachelor’s degrees who make a reporting error are estimated to report a master’s degree with probability 0.96, whereas females with bachelor’s degrees who make a reporting error are estimated to report a master’s degree with probability 0.67 and a professional degree with probability 0.30. Other large differences exist for professional degree holders. Females with professional degrees who make a reporting error are most likely to report a bachelor’s degree, whereas men with professional degrees who make a reporting error are most likely to report a master’s degree or PhD. We note that some of the estimates for model four are based on small sample sizes, which explains the wide standard errors. Table 5. Estimated Mean and 95 Percent Confidence Interval of Reporting Probabilities under Model Two and Reporting Probabilities by Gender under Model Four Z= BA Z= MA Z= Prof. Z= PhD Y=BA  Model 2 0.95 0.04 0.01 (0.87, 1.00) (0.00,0.11) (0.00,0.03)  Model 4: Male 0.96 0.02 0.02 (0.90, 1.00) (0.00,0.07) (0.00,0.05)  Model 4: Female 0.67 0.30 0.03 (0.58,0.76) (0.22,0.38) (0.00,0.07) Y=MA  Model 2 0.02 0.51 0.47 (0.00,0.06) (0.43,0.59) (0.39,0.55)  Model 4: Male 0.04 0.57 0.39 (0.00,0.11) (0.48,0.66) (0.31,0.47)  Model 4: Female 0.11 0.39 0.50 (0.00,0.25) (0.26,0.52) (0.40,0.61) Y=Prof.  Model 2 0.05 0.69 0.26 (0.00,0.16) (0.54,0.83) (0.14,0.38)  Model 4: Male 0.02 0.69 0.29 (0.00,0.06) (0.44,0.94) (0.04,0.54)  Model 4: Female 0.91 0.06 0.04 (0.79, 1.00) (0.00,0.16) (0.00,0.10) Y= PhD  Model 2 0.01 0.39 0.60 (0.00,0.04) (0.15,0.63) (0.36,0.83)  Model 4: Male 0.01 0.21 0.78 (0.00,0.05) (0.02,0.39) (0.60,0.96)  Model 4: Female 0.10 0.77 0.13 (0.00,0.30) (0.50, 1.00) (0.00,0.34) Y=None  Model 2 0.95 0.03 0.01 0.00 (0.95,0.96) (0.03,0.04) (0.01,0.01) (0.00,0.00)  Model 4: Male 0.97 0.03 0.01 0.00 (0.96,0.97) (0.02,0.03) (0.00,0.01) (0.00,0.00)  Model 4: Female 0.96 0.04 0.00 0.00 (0.95,0.97) (0.03,0.05) (0.00,0.00) (0.00,0.00) Z= BA Z= MA Z= Prof. Z= PhD Y=BA  Model 2 0.95 0.04 0.01 (0.87, 1.00) (0.00,0.11) (0.00,0.03)  Model 4: Male 0.96 0.02 0.02 (0.90, 1.00) (0.00,0.07) (0.00,0.05)  Model 4: Female 0.67 0.30 0.03 (0.58,0.76) (0.22,0.38) (0.00,0.07) Y=MA  Model 2 0.02 0.51 0.47 (0.00,0.06) (0.43,0.59) (0.39,0.55)  Model 4: Male 0.04 0.57 0.39 (0.00,0.11) (0.48,0.66) (0.31,0.47)  Model 4: Female 0.11 0.39 0.50 (0.00,0.25) (0.26,0.52) (0.40,0.61) Y=Prof.  Model 2 0.05 0.69 0.26 (0.00,0.16) (0.54,0.83) (0.14,0.38)  Model 4: Male 0.02 0.69 0.29 (0.00,0.06) (0.44,0.94) (0.04,0.54)  Model 4: Female 0.91 0.06 0.04 (0.79, 1.00) (0.00,0.16) (0.00,0.10) Y= PhD  Model 2 0.01 0.39 0.60 (0.00,0.04) (0.15,0.63) (0.36,0.83)  Model 4: Male 0.01 0.21 0.78 (0.00,0.05) (0.02,0.39) (0.60,0.96)  Model 4: Female 0.10 0.77 0.13 (0.00,0.30) (0.50, 1.00) (0.00,0.34) Y=None  Model 2 0.95 0.03 0.01 0.00 (0.95,0.96) (0.03,0.04) (0.01,0.01) (0.00,0.00)  Model 4: Male 0.97 0.03 0.01 0.00 (0.96,0.97) (0.02,0.03) (0.00,0.01) (0.00,0.00)  Model 4: Female 0.96 0.04 0.00 0.00 (0.95,0.97) (0.03,0.05) (0.00,0.00) (0.00,0.00) Table 5. Estimated Mean and 95 Percent Confidence Interval of Reporting Probabilities under Model Two and Reporting Probabilities by Gender under Model Four Z= BA Z= MA Z= Prof. Z= PhD Y=BA  Model 2 0.95 0.04 0.01 (0.87, 1.00) (0.00,0.11) (0.00,0.03)  Model 4: Male 0.96 0.02 0.02 (0.90, 1.00) (0.00,0.07) (0.00,0.05)  Model 4: Female 0.67 0.30 0.03 (0.58,0.76) (0.22,0.38) (0.00,0.07) Y=MA  Model 2 0.02 0.51 0.47 (0.00,0.06) (0.43,0.59) (0.39,0.55)  Model 4: Male 0.04 0.57 0.39 (0.00,0.11) (0.48,0.66) (0.31,0.47)  Model 4: Female 0.11 0.39 0.50 (0.00,0.25) (0.26,0.52) (0.40,0.61) Y=Prof.  Model 2 0.05 0.69 0.26 (0.00,0.16) (0.54,0.83) (0.14,0.38)  Model 4: Male 0.02 0.69 0.29 (0.00,0.06) (0.44,0.94) (0.04,0.54)  Model 4: Female 0.91 0.06 0.04 (0.79, 1.00) (0.00,0.16) (0.00,0.10) Y= PhD  Model 2 0.01 0.39 0.60 (0.00,0.04) (0.15,0.63) (0.36,0.83)  Model 4: Male 0.01 0.21 0.78 (0.00,0.05) (0.02,0.39) (0.60,0.96)  Model 4: Female 0.10 0.77 0.13 (0.00,0.30) (0.50, 1.00) (0.00,0.34) Y=None  Model 2 0.95 0.03 0.01 0.00 (0.95,0.96) (0.03,0.04) (0.01,0.01) (0.00,0.00)  Model 4: Male 0.97 0.03 0.01 0.00 (0.96,0.97) (0.02,0.03) (0.00,0.01) (0.00,0.00)  Model 4: Female 0.96 0.04 0.00 0.00 (0.95,0.97) (0.03,0.05) (0.00,0.00) (0.00,0.00) Z= BA Z= MA Z= Prof. Z= PhD Y=BA  Model 2 0.95 0.04 0.01 (0.87, 1.00) (0.00,0.11) (0.00,0.03)  Model 4: Male 0.96 0.02 0.02 (0.90, 1.00) (0.00,0.07) (0.00,0.05)  Model 4: Female 0.67 0.30 0.03 (0.58,0.76) (0.22,0.38) (0.00,0.07) Y=MA  Model 2 0.02 0.51 0.47 (0.00,0.06) (0.43,0.59) (0.39,0.55)  Model 4: Male 0.04 0.57 0.39 (0.00,0.11) (0.48,0.66) (0.31,0.47)  Model 4: Female 0.11 0.39 0.50 (0.00,0.25) (0.26,0.52) (0.40,0.61) Y=Prof.  Model 2 0.05 0.69 0.26 (0.00,0.16) (0.54,0.83) (0.14,0.38)  Model 4: Male 0.02 0.69 0.29 (0.00,0.06) (0.44,0.94) (0.04,0.54)  Model 4: Female 0.91 0.06 0.04 (0.79, 1.00) (0.00,0.16) (0.00,0.10) Y= PhD  Model 2 0.01 0.39 0.60 (0.00,0.04) (0.15,0.63) (0.36,0.83)  Model 4: Male 0.01 0.21 0.78 (0.00,0.05) (0.02,0.39) (0.60,0.96)  Model 4: Female 0.10 0.77 0.13 (0.00,0.30) (0.50, 1.00) (0.00,0.34) Y=None  Model 2 0.95 0.03 0.01 0.00 (0.95,0.96) (0.03,0.04) (0.01,0.01) (0.00,0.00)  Model 4: Male 0.97 0.03 0.01 0.00 (0.96,0.97) (0.02,0.03) (0.00,0.01) (0.00,0.00)  Model 4: Female 0.96 0.04 0.00 0.00 (0.95,0.97) (0.03,0.05) (0.00,0.00) (0.00,0.00) Turning to models 5–7, we can see the impact of the informative prior distributions by comparing results in table 4 under these models to those for model four. Moving from model four to model five, the most noticeable differences are for women with a PhD and men with a master’s degree, for whom model five suggests lower error rates. These groups have smaller sample sizes, so that the data do not swamp the effects of the prior distribution. When making the prior sample sizes very large, as in models six and seven, the information in the prior distribution tends to overwhelm the information in the data. We provide more thorough investigation of the impact of the prior specifications in the supplementary material. Of course, we cannot be certain which model most closely reflects the true measurement error mechanism. The best we can do is perform diagnostic tests to see which models, if any, should be discounted as not adequately describing the observed data. For each ACS imputed dataset DE(m) under each model, we compute the sample proportions, π^xk(m), and corresponding multiple imputation 95 percent confidence intervals for all 165˙ unique values of (X,Y). We determine how many of the eighty estimated population percentages of Y|X computed from the 2010 NSCG (using the estimated T^x+ from the ACS to back into an estimate of T^x5) fall within the multiple imputation 95 percent confidence intervals. Models that yield low rates do not describe the data accurately. For model one, seventy-three of eighty NSCG population share estimates are contained in the ACS multiple imputation intervals. Corresponding counts are seventy-five for model two, seventy-one for model three, and seventy-six for model four. These results suggest that model one and model three may be inferior to model two and model four. For the models with informative prior distributions, the counts are seventy-four for model five, sixty-seven for model six, and fifty-four for model seven. Although the prior beliefs in models six and seven seem plausible at first glance, the diagnostic suggests that they do not describe the 2010 data distributions as well as models four and five. Considering the results and the diagnostic check, if we had to choose one model, we would select model five. It seems plausible that the probability of misreporting education, as well as the reported value itself when errors are made, depend on both sex and true education level. Additionally, the prior distribution from the 1993 linked data pulls estimates in groups with little sample size to measurement error distributions that seem more plausible on face value. However, one need not use the data fusion framework for measurement error to select a single model; rather, one can use the framework to examine sensitivity of analyses to the different specifications. 4.3.2 Sensitivity analyses. Figure 2 displays the multiply-imputed, survey-weighted inferences for the total number of women with science and engineering degrees, computed using the ACS-specific indicator variable. We show results for models 4–7 and the CIA model—based on the ACS data without any adjustment for misreporting education. The confidence intervals for model four and model five overlap substantially, suggesting not much practical difference in choosing among these models. However, both are noticeably different from the other models, especially for the PhD and professional degrees. As the prior distributions on the error rates get stronger, the estimated counts increase towards the estimate using the ACS-reported education. We note that using the ACS-reported education without adjustments results in substantially higher estimated totals at the professional and PhD levels than any of the models that account for measurement error. We also note that the CIA model yields considerably lower counts for all but bachelor’s degrees. Across the measurement error methods, differences in interval lengths are modest compared with differences in point estimates. The interval lengths for the measurement error methods are somewhat larger than those based on the reported ACS data, as one would expect due to incorporating the additional uncertainty from the measurement errors. Figure 2. View largeDownload slide The Estimated Total Number of Science and Engineering Degrees Awarded to Women under Each Model. We plot the mean and 95 percent confidence intervals. Note the difference in scale for each degree category. Figure 2. View largeDownload slide The Estimated Total Number of Science and Engineering Degrees Awarded to Women under Each Model. We plot the mean and 95 percent confidence intervals. Note the difference in scale for each degree category. Figure 3 displays inferences for the average income for different degrees. For most degrees, the point estimates for models 4–7 are reasonably close, with models four and five again giving similar results. The estimated average income for professionals differs noticeably across models, with model four and model five suggesting lower averages than the unadjusted ACS estimates, or than models six and seven. The interval lengths for model four and model five tend to exceed those for models six and seven, most noticeably for master’s and PhD degrees, since the former models use weaker prior distributions than the latter models. The interval lengths for all measurement error methods tend to exceed the unadjusted ACS intervals, as one would expect. We note that the CIA model estimates are clearly implausible. Figure 3. View largeDownload slide Multiple Imputation Point and 95 Percent Confidence Interval Estimates for the Average Income within Each Education Level. The ACS estimate is the survey-weighted estimate based on the reported education level in the 2010 ACS. Figure 3. View largeDownload slide Multiple Imputation Point and 95 Percent Confidence Interval Estimates for the Average Income within Each Education Level. The ACS estimate is the survey-weighted estimate based on the reported education level in the 2010 ACS. As discussed in section 3.4, in fusion settings without linked data, analysts typically can check the plausibility of different model specifications in limited ways. However, for this particular inference (average income for different degrees), we can use the estimated average earnings in the 2010 Current Population Survey as an independent check on these estimates. They are $83,720 for professionals, $80,600 for PhD degrees, $66,144 for master’s degrees, and $53,976 for bachelor’s degrees (http://www.collegequest.com/bls-research-education-pays-2010.aspx). These values are closely aligned with the models’ estimates (except for the CIA model) and line up more closely with the estimates from model five than any other model, especially for the professional degree category, where the estimates most differ. Figure 4 displays inferences for the average income for men and women. All models support the conclusion that men make more than women; apparently, misreporting in education does not account for that gap, at least for the models considered here. We note that model four suggests potentially larger income gaps between male and female PhD recipients than the other models. Figure 4. View largeDownload slide Multiple Imputation Point and 95 Percent Confidence Interval Estimates for the Average Income for Men and Women within Each Education Level. The ACS estimate is the survey-weighted estimate based on the reported education level in the 2010 ACS. Figure 4. View largeDownload slide Multiple Imputation Point and 95 Percent Confidence Interval Estimates for the Average Income for Men and Women within Each Education Level. The ACS estimate is the survey-weighted estimate based on the reported education level in the 2010 ACS. 5. CONCLUDING REMARKS The framework presented in this article offers analysts tools for using the information in a high-quality, separate data source to adjust for measurement errors in the database of interest. Key to the framework is to replace conditional independence assumptions typically used in data fusion with carefully considered measurement error models. This avoids sacrificing information and facilitates analysis of the sensitivity of conclusions to alternative measurement error specifications. Analysts can use diagnostic tests like those described in section 3.4 and section 4.3 to rule out some measurement error models and perform sensibility tests on others to identify reasonable candidates. For example, if one measurement error specification suggests that 90 percent of values for a particular variable are in error and such high error rates are unrealistic for that variable, analysts can rule out that measurement error model as a reasonable description of the data. Besides survey sampling contexts like the one considered here involving the ACS and NSCG, the framework offers potential approaches for dealing with possible measurement errors in organic data, (i.e., data generated from administrative or other systems that are not designed surveys, https://www.census.gov/newsroom/blogs/director/2011/05/designed-data-and-organic-data.html). This is increasingly important, as data stewards and analysts consider replacing or supplementing high-quality but expensive surveys with inexpensive and large-sample organic data such as transaction or administrative databases. Often, scant attention is paid to the potential impact of measurement errors on inferences from that data. The framework could be used with high-quality, validated surveys as the gold standard data, allowing for adjustments to the error-prone organic data. While the assumptions in a measurement error model can be more reasonable than conditional independence, one cannot escape the fundamental need in data fusion to make unverifiable assumptions about the relationship between Y and Z. To help analysts understand the nature of these assumptions and, hence, understand the quality of data released after fusion, agencies can include both the reported Zi values and the imputed Yi values in released files. Further, they can publish results of diagnostic checks (e.g., measures of the similarity of the distributions of Y given X from the imputed versions of DG and from DE) and inferences about the parameters of the measurement error models. In general, though, finding ways to allow analysts to assess the quality of a data set created from fusion methodology is a complex problem worthy of future research. Supplementary Materials Supplementary materials are available online at academic.oup.com/jssam. The authors wish to thank Seth Sanders for his input on informative prior specifications and Mauricio Sadinle for discussion that improved the strategy for accounting for the sample design. This research was supported by The National Science Foundation under award SES-11–31897. REFERENCES Abayomi K. , Gelman A. , Levy M. ( 2008 ), “Diagnostics for Multivariate Imputations,” Journal of the Royal Statistical Society: Series C (Applied Statistics) , 57 , 273 – 291 . Google Scholar CrossRef Search ADS Bauer D. J. , Hussong A. M. ( 2009 ), “Psychometric Approaches for Developing Commensurate Measures across Independent Studies: Traditional and New Models,” Psychological Methods , 14 , 101 – 123 . Google Scholar CrossRef Search ADS PubMed Black D. , Haviland A. , Sanders S. , Taylor L. ( 2006 ), “Why Do Minority Men Earn Less? A Study of Wage Differentials among the Highly Educated,” The Review of Economics and Statistics , 88 , 300 – 313 . Google Scholar CrossRef Search ADS Black D. , Sanders S. , Taylor L. ( 2003 ), “Measurement of Higher Education in the Census and Current Population Survey,” Journal of the American Statistical Association , 98 , 545 – 554 . Google Scholar CrossRef Search ADS Black D. A. , Haviland A. M. , Sanders S. G. , Taylor L. J. ( 2008 ), “Gender Wage Disparities among the Highly Educated,” Journal of Human Resources , 43 , 630 – 659 . Google Scholar CrossRef Search ADS PubMed Carrig M. , Manrique-Vallier D. , Ranby K. , Reiter J. P. , Hoyle R. ( 2015 ), “A Multiple Imputation-Based Method for the Retrospective Harmonization of Data Sets,” Multivariate Behavioral Research , 50 , 383 – 397 . Google Scholar CrossRef Search ADS PubMed Curran P. J. , Hussong A. M. ( 2009 ), “Integrative Data Analysis: The Simultaneous Analysis of Multiple Data Sets,” Psychological Methods , 14 , 81 – 100 . Google Scholar CrossRef Search ADS PubMed Curran P. J. , Hussong A. M. , Cai L. , Huang W. , Chassin L. , Sher K. J. , Zucker R. A. ( 2008 ), “Pooling Data from Multiple Prospective Studies: The Role of Item Response Theory in Integrative Analysis,” Developmental Psychology , 44 , 365 – 80 . Google Scholar CrossRef Search ADS PubMed D’Orazio M. , Di Zio M. , Scanu M. ( 2006 ), Statistical Matching: Theory and Practice , Hoboken, NJ : Wiley . Google Scholar CrossRef Search ADS Dunson D. B. , Xing C. ( 2009 ), “Nonparametric Bayes Modeling of Multivariate Categorical Data,” Journal of the American Statistical Association , 104 , 1042 – 1051 . Google Scholar CrossRef Search ADS Fesco R. S. , Frase M. J. , Kannankutty N. ( 2012 ), “Using the American Community Survey as the sampling frame for the National Survey of College Graduates,” Working Paper NCSES 12-201, National Science Foundation, National Center for Science and Engineering Statistics, Arlington, VA. Fosdick B. K. , DeYoreo M. , Reiter J. P. ( 2016 ), “Categorical Data Fusion Using Auxiliary Information,” Annals of Applied Statistics , 10 , 1907 – 1929 . Google Scholar CrossRef Search ADS Fuller W. ( 1987 ), Measurement Error Models , New York : John Wiley & Sons . Google Scholar CrossRef Search ADS Gilula Z. , McCulloch R. , Rossi P. ( 2006 ), “A Direct Approach to Data Fusion,” Journal of Marketing Research , 43 , 73 – 83 . Google Scholar CrossRef Search ADS He Y. , Landrum M. B. , Zaslavsky A. M. ( 2014 ), “Combining Information from Two Data Sources with Misreporting and Incompleteness to Assess Hospice-Use among Cancer Patients: A Multiple Imputation Appraoch,” Statistics in Medicine , 33 , 3710 – 3724 . Google Scholar CrossRef Search ADS PubMed Hirano K. , Imbens G. , Ridder G. , Rubin D. ( 2001 ), “Combining Panel Data Sets with Attrition and Refreshment Samples,” Econometrica , 69 , 1645 – 1659 . Google Scholar CrossRef Search ADS Kim H. J. , Cox L. H. , Karr A. F. , Reiter J. P. , Wang Q. ( 2015 ), “Simultaneous Edit-Imputation for Continuous Microdata,” Journal of the American Statistical Association , 110 , 987 – 999 . Google Scholar CrossRef Search ADS Kim J. K. , Yang S. ( 2017 ), “A Note on Multiple Imputation under Complex Sampling,” Biometrika , 104 , 221 – 228 . Lohr S. L. ( 2010 ), Sampling: Design and Analysis ( 2nd ed. ), Boston : Brooks/Cole . Manrique-Vallier D. , Reiter J. P. ( 2018 ), “Bayesian Simultaneous Edit and Imputation for Multivariate Categorical Data,” Journal of the American Statistical Association , 112 , 1708 – 1719 . Google Scholar CrossRef Search ADS Moriarity C. , Scheuren F. ( 2001 ), “Statistical Matching: A Paradigm for Assessing the Uncertainty in the Procedure,” Journal of Official Statistics , 17 , 407 – 422 . National Science Foundation ( 1993 ), “National Survey of College Graduates, 1993,” Ann Arbor, MI: Inter-University Consortium for Political and Social Research [distributor], available at http://doi.org/10.3886/ICPSR06880.v1, iCPSR06880-v1. Pepe M. S. ( 1992 ), “Inference Using Surrogate Outcome Data and a Validation Sample,” Biometrika , 79 , 355 – 365 . Google Scholar CrossRef Search ADS Raghunathan T. E. ( 2006 ), “Combining Information from Multiple Surveys for Assessing Health Disparities,” Allgemeines Statistisches Archiv , 90 , 515 – 526 . Google Scholar CrossRef Search ADS Rassler S. ( 2002 ), Statistical Matching , New York : Springer . Google Scholar CrossRef Search ADS Reiter J. ( 2008 ), “Multiple Imputation When Records Used for Imputation Are Not Used or Disseminated for Analysis,” Biometrika , 95 , 933 – 946 . Google Scholar CrossRef Search ADS Reiter J. P. ( 2012 ), “Bayesian Finite Population Imputation for Data Fusion,” Statistica Sinica , 22 , 795 – 811 . Google Scholar CrossRef Search ADS Rubin D. B. ( 1986 ), “Statistical Matching Using File Concatenation with Adjusted Weights and Multiple Imputations,” Journal of Business & Economic Statistics , 4 , 87 – 94 . Rubin D. B. ( 1987 ), Multiple Imputation for Nonresponse in Surveys , New York : John Wiley & Sons . Google Scholar CrossRef Search ADS Schenker N. , Raghunathan T. E. ( 2007 ), “Combining Information from Multiple Surveys to Enhance Estimation of Measures of Health,” Statistics in Medicine , 26 , 1802 – 1811 . Google Scholar CrossRef Search ADS PubMed Schenker N. , Raghunathan T. E. , Bondarenko I. ( 2010 ), “Improving on Analyses of Self-Reported Data in a Large-Scale Health Survey by Using Information from an Examination-Based Survey,” Statistics in Medicine , 29 , 533 – 545 . Google Scholar PubMed Schifeling T. A. , Cheng C. , Reiter J. P. , Hillygus D. S. ( 2015 ), “Accounting for Nonignorable Unit Nonresponse and Attrition in Panel Studies with Refreshment Samples,” Journal of Survey Statistics and Methodology , 3 , 265 – 295 . Google Scholar CrossRef Search ADS Shifeling T. ( 2016 ), “Combining Information from Multiple Sources in Bayesian Modeling,” PhD thesis, Duke University, available at https://dukespace.lib.duke.edu/dspace/bitstream/handle/10161/12840/Schifeling_duke_0066D_13606.pdf? sequence=1. Si Y. , Reiter J. ( 2013 ), “Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys,” Journal of Educational and Behavioral Statistics , 38 , 499 – 521 . Google Scholar CrossRef Search ADS Si Y. , Reiter J. P. , Hillygus D. S. ( 2015 ), “Semi-Parametric Selection Models for Potentially Non-Ignorable Attrition in Panel Studies with Refreshment Samples,” Political Analysis , 23 , 92 – 112 . Google Scholar CrossRef Search ADS Siddique J. , Reiter J. P. , Brincks A. , Gibbons R. D. , Crespi C. M. , Brown C. H. ( 2015 ), “Multiple Imputation for Harmonizing Longitudinal Non-Commensurate Measures in Individual Participant Data Meta-Analysis,” Statistics in Medicine , 34 , 3399 – 3414 . Google Scholar CrossRef Search ADS PubMed Tarmast G. ( 2001 ), “Multivariate Log-normal Distribution,” International Statistical Institute: Seoul 53rd Session. Yucel R. M. , Zaslavsky A. M. ( 2005 ), “Imputation of Binary Treatment Variables with Measurement Error in Administrative Data,” Journal of the American Statistical Association , 100 , 1123 – 1132 . Google Scholar CrossRef Search ADS © The Author(s) 2018. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. All rights reserved. For permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Survey Statistics and Methodology Oxford University Press

Data Fusion for Correcting Measurement Errors

Loading next page...
 
/lp/ou_press/data-fusion-for-correcting-measurement-errors-D8TlSXl6xp
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. All rights reserved. For permissions, please email: journals.permissions@oup.com
ISSN
2325-0984
eISSN
2325-0992
D.O.I.
10.1093/jssam/smy010
Publisher site
See Article on Publisher Site

Abstract

Abstract Often in surveys, key items are subject to measurement errors. Given just the data, it can be difficult to determine the extent and distribution of this error process and, hence, to obtain accurate inferences that involve the error-prone variables. In some settings, however, analysts have access to a data source on different individuals with high-quality measurements of the error-prone survey items. We present a data fusion framework for leveraging this information to improve inferences in the error-prone survey. The basic idea is to posit models about the rates at which individuals make errors, coupled with models for the values reported when errors are made. This can avoid the unrealistic assumption of conditional independence typically used in data fusion. We apply the approach on the reported values of educational attainments in the American Community Survey, using the National Survey of College Graduates as the high-quality data source. In doing so, we account for the sampling design used to select the National Survey of College Graduates. We also present a process for assessing the sensitivity of various analyses to different choices for the measurement error models. Supplemental material is available online. 1. INTRODUCTION Survey data often contains items that are subject to measurement errors. For example, some respondents might misunderstand a question or accidentally select the wrong response, thereby providing values unequal to their factual values. Left uncorrected, these measurement errors can result in degraded inferences (Kim, Cox, Karr, Reiter, and Wang 2015). Unfortunately, the distribution of the measurement errors typically is not estimable from the survey data alone. Typically, one needs to make strong assumptions about the measurement error process, as done, for example, in integrative data analysis (Curran, Hussong, Cai, Huang, Chassin, et al. 2008; Bauer and Hussong 2009; Curran and Hussong 2009), or to leverage information from other sources. We develop methodology for the latter approach here. One natural source of information is a validation sample, (i.e., a dataset with both the reported, possibly erroneous values and the true values measured on the same individuals). These individuals could be a subset of the original survey (Pepe 1992; Yucel and Zaslavsky 2005) or a completely distinct set (Raghunathan 2006; Schenker and Raghunathan 2007; Schenker, Raghunathan, and Bondarenko 2010; Carrig, Manrique-Vallier, Ranby, Reiter, and Hoyle 2015). With validation data, one can model the relationship between the error-prone and true values and use the model to replace the error-prone items with multiply-imputed, plausible true values (Yucel and Zaslavsky 2005; Reiter 2008; Siddique et al. 2015). In many settings, however, it is not possible to obtain validation samples (e.g., because it is too expensive or because someone other than the analyst collected the data). In such cases, another potential source of information is a separate, “gold standard” dataset that includes true (or at least very high quality) measurements of the items subject to error, but not the error-prone measurements. Unlike validation samples, the gold standard dataset alone does not provide enough information to estimate the relationship between the error-prone and true values; it only provides information about the distribution of the true values. Thus, analysts are faced with a special case of data fusion i.e., integrating information from two databases with disjoint sets of individuals and distinct variables (e.g., Rubin 1986; Moriarity and Scheuren 2001; Rassler 2002; D’Orazio, Di Zio, and M. Scanu 2006; Gilula, McCulloch, Rossi 2006; Reiter 2012; Fosdick, DeYoreo, and Reiter 2016). One default approach, common in other data fusion contexts, is to assume that the error-prone and true values are conditionally independent given some set of variables X common to both the survey and gold standard data. Effectively, this involves using the gold standard data to estimate a predictive model for the true values from X and applying the estimated model to impute replacements for all values of the error-prone items in the survey. However, this conditional independence assumption completely disregards the information in the error-prone values, which sacrifices potentially useful information. For example, consider national surveys that ask people to report their educational attainment. We might expect most people to report values accurately and only a modest fraction to make errors. It does not make sense to alter every individual’s reported values in the survey, as would be done using a conditional independence approach. In this article, we develop a framework for leveraging information from gold standard data to improve inferences in surveys subject to measurement errors. The basic idea is to encode plausible assumptions about the error process (e.g., most people do not make errors when reporting educational attainments) and the reporting process (e.g., when people make errors, they are more likely to report higher attainments than actual attainments) into statistical models. We couple those models with distributions for the underlying true data values and estimate posterior distributions of the true values and model parameters. This process allows us to avoid unrealistic conditional independence assumptions in lieu of more scientifically defensible models. The estimation algorithm provides draws of plausible corrections to the error-prone survey values, which can be treated as completed datasets in multiple imputation inference (Rubin 1987). The remainder of this article is organized as follows. In section 2, we review an example of misreporting of educational attainment in data collected by the census bureau, so as to motivate the methodological developments. In section 3, we introduce the general framework for specifying measurement error models to leverage the information in gold standard data. In section 4, we apply the framework to handle potential measurement error in educational attainment in the 2010 American Community Survey (ACS), using the 2010 National Survey of College Graduates (NSCG) as a gold standard file. In doing so, we deal with a key complication in the data integration: accounting for the sampling design used to sample the NSCG. We also demonstrate how the framework facilitates analysis of the sensitivity of conclusions to different measurement error model specifications. In section 5, we provide a brief summary. 2. MISREPORTING IN EDUCATIONAL ATTAINMENT To illustrate the potential for reporting errors in educational attainment that can arise in surveys, we examine data from the 1993 NSCG. The 1993 NSCG surveyed individuals who indicated on the 1990 census long form that they had at least a college degree (Fesco, Frase, and Kannankutty 2012). The questionnaire asked about educational attainment, including detailed questions about educational histories. These questions greatly reduce the possibility of respondent error, so that the educational attainment values in the NSCG are treated as a gold standard (Black, Sanders, and Taylor 2003). The census long form, in contrast, did not include detailed follow-up questions, so that reported educational attainment is prone to measurement error. The US Census Bureau linked each individual in the NSCG to their corresponding record in the long-form data. The linked file is available for download from the Inter-university Consortium for Political and Social Research (National Science Foundation 1993). Because of the linkages, we can characterize the actual measurement error mechanism for educational attainment in the 1990 long-form data. In the NSCG, we treat the highest degree of the three most recent degrees reported (coded as “ed6c1,” “ed6c2,” and “ed6c3” in the file) as the true education level. We disregard any degrees earned in the years 1990–1993, as these occur in the three-year gap between collection of the long form and NSCG data. This ensures consistent time frames for the NSCG and long form reported values. We cross-tabulate these degrees with the degrees reported in the long-form data (coded “yearsch” in the file). Table 1 displays the cross-tabulation. A similar analysis was done by Black et al. (2003). Table 1. Unweighted Cross-tabulation of Reported Education in the NSCG and Census Long Form from the Linked Dataset. BA stands for bachelor’s degree; MA stands for master’s degree; Prof stands for professional degree; and, PhD stands for PhD degree. The 14,319 individuals in the group labeled No Degree did not have a college degree, despite reporting otherwise. The 51,396 individuals in the group labeled Other did not have one of (BA, MA, Prof, PhD) and are discarded from subsequent analyses. Census-reported education ︷ BA MA Prof PhD Total NSCG-reported education { BA 89,580 4,109 1,241 249 95,179 MA 1,218 33,928 655 526 36,327 Prof 382 359 8,648 563 9,952 PhD 99 193 452 6,726 7,470 Total 91,279 38,589 10,996 8,064 14,8928 No Degree 10,150 1,792 2,040 337 14,319 Other 33,368 10,912 4,710 2,406 51,396 Census-reported education ︷ BA MA Prof PhD Total NSCG-reported education { BA 89,580 4,109 1,241 249 95,179 MA 1,218 33,928 655 526 36,327 Prof 382 359 8,648 563 9,952 PhD 99 193 452 6,726 7,470 Total 91,279 38,589 10,996 8,064 14,8928 No Degree 10,150 1,792 2,040 337 14,319 Other 33,368 10,912 4,710 2,406 51,396 Table 1. Unweighted Cross-tabulation of Reported Education in the NSCG and Census Long Form from the Linked Dataset. BA stands for bachelor’s degree; MA stands for master’s degree; Prof stands for professional degree; and, PhD stands for PhD degree. The 14,319 individuals in the group labeled No Degree did not have a college degree, despite reporting otherwise. The 51,396 individuals in the group labeled Other did not have one of (BA, MA, Prof, PhD) and are discarded from subsequent analyses. Census-reported education ︷ BA MA Prof PhD Total NSCG-reported education { BA 89,580 4,109 1,241 249 95,179 MA 1,218 33,928 655 526 36,327 Prof 382 359 8,648 563 9,952 PhD 99 193 452 6,726 7,470 Total 91,279 38,589 10,996 8,064 14,8928 No Degree 10,150 1,792 2,040 337 14,319 Other 33,368 10,912 4,710 2,406 51,396 Census-reported education ︷ BA MA Prof PhD Total NSCG-reported education { BA 89,580 4,109 1,241 249 95,179 MA 1,218 33,928 655 526 36,327 Prof 382 359 8,648 563 9,952 PhD 99 193 452 6,726 7,470 Total 91,279 38,589 10,996 8,064 14,8928 No Degree 10,150 1,792 2,040 337 14,319 Other 33,368 10,912 4,710 2,406 51,396 As evident in table 1, reported education levels on the long form are often higher than those on the NSCG, particularly for individuals with only a bachelor’s degree. Of the 163,247 individuals in scope in the NSCG, over 14,000 were determined not to have at least a bachelor’s degree when asked in the NSCG, despite reporting otherwise in the long form. A whopping 33 percent of individuals who reported being professionals in the long form actually are not professionals according to the NSCG (1 – 8648 / [10,996 + 2040] ≈ 0.33). One possible explanation for this error is confusion over the definition of professionals. The US Census Bureau intended the category to capture graduate degrees from universities (e.g., JD, MBA, MD), whereas Black et al. (2003) found that individuals in professions such as cosmetology, nursing, and health services, which require certifications but not graduate degrees, selected the category. Despite the nontrivial reporting error, the overwhelming majority of individuals’ reported education levels are consistent in the long form and in the NSCG. Of the individuals in the NSCG who had at least a college degree at the time of the 1990 census, about 93.3 percent of them have the same contemporaneous education levels in both files ([89,850 + 33,928 + 8649 + 6726] / 148,928 ≈ 0.93. This suggests that most people report correctly, an observation we want to leverage when constructing measurement error models for education in the 2010 ACS. In many situations, we do not have the good fortune of observing individuals’ error-prone and true values simultaneously. Instead, we are in the setting represented by figure 1. This is also the case in our analysis of educational attainments in the 2010 ACS, described in section 4. The sampling frame for the 2010 NSCG is constructed from reported education levels in the ACS, which replaced the long form (after the 2000 census). However, unlike in 1993, linked data is not available as public use files. Therefore, we treat the 2010 NSCG as gold standard data as justified by Black et al. (2003) and posit measurement models that connect the information from the two data sources, using the framework that we now describe. Figure 1. View largeDownload slide Graphical Representation of Data Fusion Set-up. In the survey data DE, we only observe the error-prone measurement Z but not the true value Y. In the gold standard data DG, we only observe Y but not Z. We observe variables X in both samples. In the application, DE is the ACS and DG is the NSCG. Figure 1. View largeDownload slide Graphical Representation of Data Fusion Set-up. In the survey data DE, we only observe the error-prone measurement Z but not the true value Y. In the gold standard data DG, we only observe Y but not Z. We observe variables X in both samples. In the application, DE is the ACS and DG is the NSCG. 3. MEASUREMENT ERROR MODELING VIA DATA FUSION As in figure 1, let DG and DE be two data sources comprising distinct individuals, with sample sizes nG and nE, respectively. For each individual i in DG or DE, let Xi=(Xi1,…,Xip) be variables common to both surveys and assumed to be free of error, such as demographic variables. We assume these variables have been harmonized (D’Orazio et al. 2006) or placed on the same measurement scale, across DG and DE. Let Y represent the error-free values of some variable of interest, and let Z be an error-prone version of Y. We observe Z but not Y for the nE individuals in DE. We observe Y but not Z for the nG individuals in DG. For simplicity of notation, we assume no missing values in any variable, although the multiple imputation framework easily handles missing values. Additionally, DE can include variables for which there is no corresponding variable in DG. These variables do not play a role in the measurement error modeling, although they can be used in multiple imputation inferences. We seek to estimate Pr(Y,Z|X), and use it to create multiple imputations for the missing values in Y for the individuals in DE. This yields multiple complete data files that can be released to analysts who wish to make inferences involving Y and X in DE. We do so for the common setting where (X,Y,Z) are all categorical variables; we discuss approaches for continuous data types in section 3.4. For j=1,…,p, let each Xj have dj levels. Let Z have dZ levels and Y have dYlevels. Typically dZ = dY, but this need not be the case generally. For example in the NSCG/ACS application, Z is the educational attainment among those who report a college degree in the ACS, which has dZ = 4 levels (bachelor’s degree, master’s degree, professional degree, or PhD degree), and Y is the educational attainment in the NSCG, which has dY= 5 levels. An additional level is needed because some individuals in the NSCG truly do not have a college degree. For all i∈DE, let Ei be an (unobserved) indicator of a reporting error, that is, Ei = 1 when Yi≠Zi and Ei = 0 otherwise. Using E enables us to write Pr(Y,Z|X) as a product of three sub-models. For individual i, the full data likelihood (omitting parameters for simplicity) can be factored as, Pr(Yi=k,Zi=l|Xi)=∑e=01Pr(Yi=k,Ei=e,Zi=l|Xi)=Pr(Yi=k|Xi)∑e=01Pr(Ei=e|Yi=k,Xi)Pr(Zi=l|Ei=e,Yi=k,Xi). (1) This separates the true data generation process and the measurement error generation process, which facilitates model specification. In particular, we can use DG to estimate the true data distribution Pr(Y|X). We then can posit different models for the rates of errors, Pr(Ei=e|Yi=k,Xi) and for the reported values when errors are made, Pr(Zi=l|Ei=1,Yi=k,Xi). Intuitively, the error model locates the records for which Yi≠Zi, and the reporting model captures the patterns of misreported Zi. Of course, when Ei = 0, Pr(Zi=Yi)=1. A similar factorization is used by Yucel and Zaslavsky (2005), He, Landrum, and Zaslavsky (2014), Kim et al. (2015), and Manrique-Vallier and Reiter (2018), among others. The error and reporting models in (1) are used to generate the conditional distributions Pr(Y|Z,X) and Pr(Z|Y,X), which cannot be estimated nonparametrically from the data alone, since (Y i, Zi) are never observed jointly. Put another way, if we tried to use a fully saturated log-linear model for (Y,Z|X), we would not be able to identify all the parameters using DG and DE alone. To see this, assume for the moment that all dX=Πj=1pdj possible combinations of X are present in DG and DE. The fully saturated log-linear model for the distribution of (Y,Z|X) has dYdZdX−dX=(dY−1)dX+(dZ−1)dYdX parameters, where each subtraction of one derives from the requirement that probabilities sum to one. For example, if Y and Z have dY=dZ=2 levels and a scalar X has dX = 3 levels, a fully saturated model for (Y,Z|X) has three parameters (accounting for the sum to one constraint) for each level of X, resulting in nine parameters as given in the formulas. However, DG and DE together provide only (dY−1)dX+(dZ−1)dX independent constraints on the probabilities in the joint distribution. In our simple example, DG and DE provide three constraints from the three conditional probabilities of (Y|X) and three conditional probabilities of (Z|X). We need to impose three more constraints to estimate the joint distribution. The goal of specifying measurement error models is to provide such constraints, as we discuss in the remainder of this section. We note that related identification issues arise in the context of refreshment sampling to adjust for nonignorable attrition in longitudinal studies (Hirano, Imbens, Ridder, and Rubin 2001; Schifeling, Cheng, Reiter, and Hillygus 2015; Si, Reiter, and Hillygus 2015). 3.1 True Data Model One can use any model for (Y|X) that adequately describes the conditional distribution, such as a (multinomial) logistic regression. In the NSCG/ACS application, we use a fully saturated multinomial model, accounting for the sampling design in DG using the approach described in section 4.1. One also could use a joint distribution for (Y,X), such as a log-linear model or a mixture of multinomials model (Dunson and Xing 2009; Si and Reiter 2013). 3.2 Error Model In cases where dY= dZ, a generic form for the error model is Pr(Ei=1|Xi,Yi=k)=g(Xi,Yi,β), (2) where g(Xi,Yi,β) is some function of its arguments and β is some set of unknown parameters. A convenient class of functions that we use here is the logistic regression of Ei on some design vector Mi derived from (Xi,Yi), with corresponding coefficients β. The analyst can encode different versions of Mi to represent assumptions about the error process. The simplest specification is to set each Mi equal to a vector of ones, which implies that there is a common probability of error for all individuals. This error model makes sense when the analyst believes the errors in Z occur completely at random; for example, when errors arise simply because respondents accidentally and randomly select the wrong response in the survey, or when all respondents are equally likely to misunderstand the survey question. A more realistic possibility is to allow the probability of error to depend on some variables in Xi but not on Yi (e.g., men misreport education at different rates than women). This could be encoded by including an intercept for one of the sexes in Mi. Finally, one can allow the probability of error to depend on Yi itself—for example, people who truly do not have at least a college degree are more likely to misreport—by including some function of it in Mi. In the case where dZ≠dY, as in the NSCG/ACS application, we automatically set Ei = 1 for any individual with Yi∉{1:dZ}, where {1:dZ}={1,…,dZ}. For example, we set Ei = 1 for all individuals who are determined in the NSCG not to have a college degree but report so in the ACS. The stochastic part of the error model only applies to individuals who truly have at least a bachelor’s degree. 3.3 Reporting Model When there is no reporting error for individual i (i.e., Ei = 0), we know that Zi = Y i. When there is a reporting error, we must model the reported value Zi. As with (2), one can posit a variety of distributions for the reporting error, which is some function h(Xi,Yi,α) with parameters α. We now describe a few reporting error models for illustration. One could use more complicated models as well (e.g., based on multinomial logistic regression). A simple model assumes that values of Zi are equally likely, as in Manrique-Vallier and Reiter (2018). We have Pr(Zi=l|Xi,Yi=k,Ei=1)={1/(dZ−1) if l≠k,k∈{1:dz}1/dZ if k∉{1:dZ}0 otherwise . (3) Such a reporting model could be reasonable when reporting errors due to clerical errors. We note that this model does not accurately characterize the reporting errors in the 1993 linked NSCG data, per table 1. Alternatively, one can allow the probabilities to depend on Yi, so that, (Zi|Xi,Yi=k,Ei=1)∼Categorical (pk(1),…,pk(dZ)), (4) where each pk(l) is the probability of reporting Z = l given that Y = k, and pk(k)=0. One can further parameterize the reporting model so that the reporting probabilities vary with X. For example, to make the probabilities vary with sex and true education values, we can use (Zi|Xi,Yi=k,Ei=1)∼{Categorical (pM,k(1),…,pM,k(dZ)) if Xi,sex=MCategorical (pF,k(1),…,pF,k(dZ)) if Xi,sex=F. (5) 3.4 Modeling Considerations As apparent in sections 3.2 and 3.3, the error and reporting models can take on many specifications. Without linked data, analysts cannot use exploratory data analysis to inform the model choice. Similarly, analysts cannot use holdout samples to evaluate model quality, since no unit has Y and Z measured simultaneously. Instead, all that analysts can do is posit scientifically defensible measurement error models and make post hoc checks of the sensibility of analyses from those models. We demonstrate this approach in section 4. For example, analysts can check whether or not the predicted probabilities of errors implied by the model seem plausible. As another diagnostic, analysts can compare the distribution of the imputed values of (Y|X) in DE to the empirical distribution of (Y|X) in DG. This is akin to diagnostics in multiple imputation for missing data that compare imputed and observed values (Abayomi, Gelman, and Levy 2008). When these distributions differ substantially, it suggests the measurement error model specification (or possibly the true data model) is inadequate. Such diagnostic checks only can reveal problems with the model specification; they do not indicate that a particular specification is correct (see Fosdick et al. 2016, for illustrations of related diagnostics). When specifying the models, analysts should include in X variables that plausibly predict Y and are available in DG and DE. Doing so helps ensure the relationships between X and the imputed values of Y in DE accord with those from DG. It is also beneficial to include any variables in X that potentially explain the likelihood of making errors. To find such variables, since Y and Z are not measured simultaneously, analysts must rely on knowledge external to the data, such as validation studies or domain knowledge from prior experience in other contexts. When important variables are excluded from X, either because they are not available in both files or because of modeling choices, the imputations of Y in DE may distort estimates involving Y and those excluded X variables. However, when the fraction of respondents who make errors is modest (as in table 1), Z provides substantial information about the missing Y. In this case, distortions of relationships between any excluded X variables and the imputed Y variables tend not to be severe. The specification of the error and reporting models constrains the probabilities in the joint distribution of (Y,Z|X), thereby enabling estimation of that distribution. To illustrate, consider again the example with binary Y and Z and three levels in a scalar X. If we make Pr(Ei=0|Yi,Xi)=π not depend on Xi or Yi, and we use (3) with dZ=dY=2, we obtain the constraints Pr(Yi=k|Zi=k,Xi)=Pr(Yi=k|Xi)∑e=01Pr(Zi=k|Ei=e,Yi=k,Xi)Pr(Ei=e)Pr(Zi=k|Xi)=πPr(Yi=k|Xi)Pr(Zi=k|Xi) (6) for k = 0 and k = 1. The result in (6) stems from this particular reporting error model specification, for which Zi = Yi whenever Ei = 0 and Zi=1−Yi whenever Ei = 1. Naturally, we also can find Pr(Yi=k|Zi=1−k,Xi). Since Pr(Yi=k|Xi) can be estimated from DG and Pr(Zi=k|Xi) can be estimated from DE, this measurement error model adds three constraints (one for each level of Xi) that allow for identification of all parameters in the model. When X is observed in both datasets, we can do the computations in (6) within each category of X, thereby arriving at similar identification results for error models where Ei depends on Xi but not Yi. The roles of the constraints are more subtle when the error model depends on Y. In this case, we replace π in (6) with Pr(Ei=0|Yi=k)=πYi. We also can find Pr(Yi=k|Zi=1−k)usingPr(Ei=1|Yi=k)=1−πYi. This still adds constraints that define a joint distribution when averaging over Ei; however, we cannot identify each πYi. A simple illustration shows this to be the case. Suppose Zi = 1 for 60 percent of the values in a very large DE, and Yi = 1 for 50 percent of the values in a very large DG. The procedure seeks to impute the missing Yi values in DE so that 50 percent are ones and 50 percent are zeroes. However, it can do so in many ways. For example, it can set all Yi = 0 for all records where Zi = 0 (i.e., π0=0) and change enough values of Zi = 1 into imputed Yi = 0 to get to 50 percent ones; or, it can change some other fractions of values with Zi = 0 to ones and values of Zi = 1 to zeros to get to 50 percent. The data provides no information to choose among feasible options. The lack of identification when measurement errors depend on Y can translate to increased variance in estimates of πYi. This may have only a modest impact on inferences for the marginal distribution of Y or the conditional distributions of Y given X. This is because the measurement error model permutes imputed values of Y within levels of X with frequency depending on πYi. However, the lack of identification can degrade estimates of associations involving Y and variables not in X. This suggests an important guidance for specifying measurement error models that depend on Y: it is helpful to use informative prior distributions that favor certain parameter values over others. This is borne out in the simulations of section 4. For additional illustrations of the effects of different specifications of measurement error models, see the simulation studies in Shifeling (2016, pp. 86 – 94). To estimate the models, it is convenient to use a two-stage strategy. When imputing missing Y in DE, all of the information needed from DG is represented by the parameters of the true data model, θ. Hence, we first can construct a (possibly approximate) posterior distribution of θ using only DG. Ideally, the conditional distribution of Y given X is the same in both DG and DE; indeed, the methodology is derived assuming that this is the case. When DG is collected using a complex design, it may be prudent to account for the design when estimating θ. For example, Kim and Yang (2017) suggest using survey-weighted, pseudo-maximum likelihood models to estimate θ. This approach also enables analysts to account for important design features in DG, such as cluster or stratum indicators that are not available in DE and not able to be part of X. We then sample many draws of θ from this distribution. We plug in these draws in the Gibbs sampling steps for a Bayesian predictive distribution for (Yi|Zi,Xi,θ) for the cases in DE, thereby generating the multiple imputations. We describe the Gibbs sampler for this step for the NSCG/ACS application in the supplementary material. In the NSCG/ACS application, and more generally, we do not account for design features of DE in specification or estimation of the measurement error models. We do not believe this is necessary since, conceptually, the goal of the measurement error models in the data fusion approach is to bridge from observed Zi to plausible Yi. However, complex designs in DE can and should be accounted for after creation of the plausible datasets; for example, analysts can use survey-weighted estimation with the completed versions of DE. Although we present models in the context of categorical data, analysts can adapt the strategy to handle continuous variables in (Yi, Zi) or Xi. For (Y|X), analysts can use DG to specify appropriate regression models. For (Ei|Yi,Xi), analysts can use binary regressions with continuous-valued predictors. For (Zi|Yi,Xi,Ei=1), analysts have to specify a measurement error model connecting Y and Z. For example, a common measurement error model assumes (Zi|Ei=1,Xi=xi)∼N(xi,σ2) where σ2 is fixed at some value based on previous experience with the measurement (Fuller 1987). Alternatively, to represent ignorance about the reporting error process, one can assume (Zi|Ei=1,Xi=xi)∼Unif(a,b) where a and b are fixed limits large enough to capture all the reported values. This model was used by Kim et al. (2015) for edit-imputation of continuous data from one error-prone source. 4. ADJUSTING FOR REPORTING ERRORS IN EDUCATION IN THE 2010 ACS We now use the framework to adjust inferences for potential reporting error in educational attainment in the 2010 ACS, using the public use microdata for the 2010 NSCG as the gold standard file DG. We consider two main analyses that could be affected by reporting error in education. First, we estimate from the ACS the number of science and engineering degrees awarded to women, after correcting for measurement error. We base the estimate on an indicator in the ACS for whether or not each individual has such a degree. Second, we examine average incomes across degrees. This focus is motivated in part by the findings of Black, Haviland, Sanders, and Taylor (2006, 2008), who found that apparent wage gaps in the 1990 census long-form data could be explained by reporting errors in education. As DE, we use the subset of ACS microdata that includes only individuals who reported a bachelor’s degree or higher and are under age 76. The resulting sample size is nE=600,150. In X, we include gender, age group (24 and younger, 25–39, 40–54, and 55 and older), and an indicator for whether the individual’s race is black or something else. We use these variables because the 1993 linked data (Black et al. 2003) allow us to specify informative prior distributions for them alone; we do not have information on the measurement error in education broken out by other demographic groups. Were such information available, either from a linked sample or previous knowledge, we would strongly consider making X richer. In the NSCG, we discarded thirty-eight records with race suppressed, leaving a sample size of nG=77,150. We consider two sets of measurement error model specifications. The first set uses specifications like those in section 3, with flat prior distributions for all parameters. We use this set to illustrate model diagnostics and sensitivity analysis absent informative prior beliefs about the measurement error process. The second set uses a common error and reporting model with different, informative prior distributions on its parameters. We construct these informative prior distributions based on the analysis of the 1993 linked file. For all specifications considered, we create M = 50 multiple imputations of the plausible true education values in the 2010 ACS, which we then analyze using the methods of Rubin (1987). We use M = 50 because it is sufficient to generate stable interval estimates. One may be able to get shorter intervals using the multiple imputation methods akin to those in Reiter (2008), which are developed for situations in which the data used for estimating the imputation model are not the same as the data used for analysis. We leave evaluation of the performance of this and other multiple imputation variance estimators for future research. For all specifications, the true data model is a saturated multinomial distribution for the five values of Y for each combination of X. We begin by describing how we estimate the parameters of the true data distribution, accounting for the sampling design of the NSCG. 4.1 Accounting for Sampling Design of NSCG The 2010 NSCG uses reported education in the 2010 ACS as a stratification variable (Fesco et al. 2012). Its unweighted percentages can overrepresent or underrepresent degree types in the population; this is most obviously the case for individuals without a college degree (Y i = 5). We need to account for this informative sampling when estimating parameters of the true data model. We do so with a two-stage approach. First, we use survey-weighted inferences to estimate population totals of (Y|X) from the 2010 NSCG. Second, we turn these estimates into an approximate Bayesian posterior distribution for input to fitting the measurement error models used to impute plausible values of Yi for individuals in the ACS. We now describe this process, which can be used generally when DG is collected via a complex survey design. Suppose for the moment that dY= dZ. This is not the case when DE is the ACS (where dZ = 4) and DG is the NSCG (where dY=5); however, we start here to fix ideas. For all possible combinations x, let θxk=Pr(Y=k|X=x), and let θx=(θx1,…,θxdY). We seek to use DG to specify f(θ|X,Y). To do so, we first parameterize θxk=Txk/∑j=1dYTxj, where Txk is the population count of individuals with (Xi=x,Yi=k). We estimate Tx=(Tx1,…,TxdY) and the associated covariance matrix of the estimator using standard survey-weighted estimation. Let wi be the sample weight for all i∈DG. We compute the estimated total and associated variance for each x and k as T^xk=∑i=1nGwiI(Xi=x,Yi=k) (7) Var̂(T^xk)=nGnG−1∑i=1nG(wiI(Xi=x,Yi=k)−T^xknG)2. (8) For each k and l, with l≠k, we also compute the estimated covariance, Cov̂(T^xk,T^xl)=nGnG−1∑i=1nG[(wiI(Xi=x,Yi=k)−T^xknG)×(wiI(Xi=x,Yi=l)−T^xlnG)]. (9) The variance and covariance estimators are the design-based estimators for probability proportional to size sampling with replacement, as is typical of multistage complex surveys (Lohr 2010). Switching now to a Bayesian modeling perspective, we assume that Tx∼ Log-Normal( μx,τx), so as to ensure a distribution with positive values for all true totals. We select (μx,τx) so that each E(Txk)=T^xk and Var(Tx)=Σ^(T^x), the estimated covariance matrix with elements defined by (8) and (9). These are derived from moment matching (Tarmast 2001). We have μxj= log ⁡(T^xj)−τx[j,j]/2 (10) τx[j,j]= log ⁡(1+Σ^x[j,j]/(T^xj2)) (11) τx[j,i]= log ⁡(1+Σ^x[j,i]/(T^xj·T^xi)), (12) where the notation [j,i] denotes an element in row j and column i of the matrix. We draw Tx* from this log-normal distribution and transform to draws θx*. Since the 2010 NSCG does not include individuals who claim in the ACS to have less than a bachelor’s degree, we cannot use DG directly to estimate Tx5. Instead, we estimate Tx+=Tx1+Tx2+Tx3+Tx4+Tx5 using the ACS data and estimate (Tx1,Tx2,Tx3,Tx4) from the NSCG using the method described previously; this leads to an estimate for Tx5. More precisely, let the ACS design-based estimator for Tx+ be T^x+, with design-based variance estimate σ^2(T^x+). We sample a value Tx+*∼Normal(T^x+,σ^2(T^x+)). Using an independent sample of values of (Tx1*,…,Tx4*) from the NSCG, we compute Tx5*=Tx+*−∑j=14Txj*, and set Tx*=(Tx1*,…,Tx5*). We repeat these steps 10,000 times. We then compute the mean and covariance matrix of the 10,000 draws, which we again plug into (10) – (12). The resulting log-normal distribution is the approximate posterior distribution of θx. We include an example of this entire procedure in the supplementary material. 4.2 Measurement Error Models The two sets of measurement error models include four that use flat prior distributions and three that use informative prior distributions based on the 1993 linked data. For all error models, we use a logistic regression of Ei on various main effects and interactions of Yi and Xi. For all reporting models, we use categorical distributions with probabilities that depend on Yi and possibly Xi. The four models with flat prior distributions are summarized in table 2. In model one, the error and reporting models depend only on Yi. Models 2 and 3 keep the reporting model as in (4) but expand the error model. In model two, the probability of a reporting error can vary with Yi and sex ( Xi,sex). In model three, error probabilities can vary with Yi and the indicator for black race ( Xi,black). In model four, the error and reporting models both depend on Y and sex. Table 2. Summary of the First Four Measurement Error Model Specifications for 2010 NSCG/ACS Analysis. These models use flat prior distributions on all parameters. Error model Reporting model Expression for MiTβ Pr(Zi|Xi,Yi=k,Ei=1) Model 1 β1+∑k=24βkI(Yi=k) Categorical(pk(1),…,pk(4)) Model 2 β1+∑k=24βk(M)I(Yi=k,Xi,sex=M) Categorical(pk(1),…,pk(4)) +∑k=14βk(F)I(Yi=k,Xi,sex=F) Model 3 β1+∑k=24βk(no)I(Yi=k,Xi,black=no) Categorical(pk(1),…,pk(4)) +∑k=14βk(yes)I(Yi=k,Xi,black=yes) Model 4 β1+∑k=24βk(M)I(Yi=k,Xi,sex=M) Categorical(pM,k(1),…,pM,k(4)) if Xi,sex= M +∑k=14βk(F)I(Yi=k,Xi,sex=F) Categorical(pF,k(1),…,pF,k(4)) if Xi,sex=F Error model Reporting model Expression for MiTβ Pr(Zi|Xi,Yi=k,Ei=1) Model 1 β1+∑k=24βkI(Yi=k) Categorical(pk(1),…,pk(4)) Model 2 β1+∑k=24βk(M)I(Yi=k,Xi,sex=M) Categorical(pk(1),…,pk(4)) +∑k=14βk(F)I(Yi=k,Xi,sex=F) Model 3 β1+∑k=24βk(no)I(Yi=k,Xi,black=no) Categorical(pk(1),…,pk(4)) +∑k=14βk(yes)I(Yi=k,Xi,black=yes) Model 4 β1+∑k=24βk(M)I(Yi=k,Xi,sex=M) Categorical(pM,k(1),…,pM,k(4)) if Xi,sex= M +∑k=14βk(F)I(Yi=k,Xi,sex=F) Categorical(pF,k(1),…,pF,k(4)) if Xi,sex=F Table 2. Summary of the First Four Measurement Error Model Specifications for 2010 NSCG/ACS Analysis. These models use flat prior distributions on all parameters. Error model Reporting model Expression for MiTβ Pr(Zi|Xi,Yi=k,Ei=1) Model 1 β1+∑k=24βkI(Yi=k) Categorical(pk(1),…,pk(4)) Model 2 β1+∑k=24βk(M)I(Yi=k,Xi,sex=M) Categorical(pk(1),…,pk(4)) +∑k=14βk(F)I(Yi=k,Xi,sex=F) Model 3 β1+∑k=24βk(no)I(Yi=k,Xi,black=no) Categorical(pk(1),…,pk(4)) +∑k=14βk(yes)I(Yi=k,Xi,black=yes) Model 4 β1+∑k=24βk(M)I(Yi=k,Xi,sex=M) Categorical(pM,k(1),…,pM,k(4)) if Xi,sex= M +∑k=14βk(F)I(Yi=k,Xi,sex=F) Categorical(pF,k(1),…,pF,k(4)) if Xi,sex=F Error model Reporting model Expression for MiTβ Pr(Zi|Xi,Yi=k,Ei=1) Model 1 β1+∑k=24βkI(Yi=k) Categorical(pk(1),…,pk(4)) Model 2 β1+∑k=24βk(M)I(Yi=k,Xi,sex=M) Categorical(pk(1),…,pk(4)) +∑k=14βk(F)I(Yi=k,Xi,sex=F) Model 3 β1+∑k=24βk(no)I(Yi=k,Xi,black=no) Categorical(pk(1),…,pk(4)) +∑k=14βk(yes)I(Yi=k,Xi,black=yes) Model 4 β1+∑k=24βk(M)I(Yi=k,Xi,sex=M) Categorical(pM,k(1),…,pM,k(4)) if Xi,sex= M +∑k=14βk(F)I(Yi=k,Xi,sex=F) Categorical(pF,k(1),…,pF,k(4)) if Xi,sex=F For models 5–7, we use the specification in model four and incorporate prior information about the measurement errors from the 1993 linked data. In constructing the priors, we first remove records that have been flagged as having missing education that has been imputed because these imputations might not closely reflect the actual education values (Black et al. 2003). Table 3 displays the prior distributions for males with bachelor’s degrees. Details on how we arrive at these and other groups’ prior specifications are in the supplementary material; here, we summarize briefly. Table 3. Summary of Informative Prior Specifications for 2010 NSCG/ACS Analysis for Males with Bachelor’s Degrees Error rate Reporting probabilities (pM,1(2),pM,1(3),pM,1(4)) Model 4 Beta (1, 1) Dirichlet (1, 1, 1) Model 5 Beta (0.76, 14.24) Dirichlet (3.54, 1.27, 0.19) Model 6 Beta (2,724.2, 50,862) Dirichlet (2,235.3, 799.7, 123.1) Model 7 Beta (500, 99,500) Dirichlet (1, 1, 1) Error rate Reporting probabilities (pM,1(2),pM,1(3),pM,1(4)) Model 4 Beta (1, 1) Dirichlet (1, 1, 1) Model 5 Beta (0.76, 14.24) Dirichlet (3.54, 1.27, 0.19) Model 6 Beta (2,724.2, 50,862) Dirichlet (2,235.3, 799.7, 123.1) Model 7 Beta (500, 99,500) Dirichlet (1, 1, 1) Table 3. Summary of Informative Prior Specifications for 2010 NSCG/ACS Analysis for Males with Bachelor’s Degrees Error rate Reporting probabilities (pM,1(2),pM,1(3),pM,1(4)) Model 4 Beta (1, 1) Dirichlet (1, 1, 1) Model 5 Beta (0.76, 14.24) Dirichlet (3.54, 1.27, 0.19) Model 6 Beta (2,724.2, 50,862) Dirichlet (2,235.3, 799.7, 123.1) Model 7 Beta (500, 99,500) Dirichlet (1, 1, 1) Error rate Reporting probabilities (pM,1(2),pM,1(3),pM,1(4)) Model 4 Beta (1, 1) Dirichlet (1, 1, 1) Model 5 Beta (0.76, 14.24) Dirichlet (3.54, 1.27, 0.19) Model 6 Beta (2,724.2, 50,862) Dirichlet (2,235.3, 799.7, 123.1) Model 7 Beta (500, 99,500) Dirichlet (1, 1, 1) For model five, we set the prior distributions for each βk(x) so that the error rates are centered at the estimate from the 1993 linked data. We also require the central 95 percent probability interval of the prior distribution on each error rate to be close to (0.005, 0.20), allowing for a wide but not unrealistic range of possible error rates. For the reporting probabilities pM,k(z) and pF,k(z), we center most of the prior distributions at the corresponding estimates from the 1993 linked data. We require the central 95 percent probability interval of each prior distribution to have support on values of p·,k(z) within ±.10 of the 1993 point estimate, truncating at zero or one as needed. One exception is the reporting probabilities for those with “no college degree” who report “professional” degree, which we center at half the 1993 estimate. The Census Bureau has improved the clarity of the definition of “professional” in the 20 years since the 1990 long form, as discussed in the prior specification section of the supplementary material. For model six, we use the same prior means as in model five for both error and reporting models. However, we substantially tighten the prior distributions to make the prior variance accord with the uncertainty in the point estimates from the 1993 linked data. We do so by using prior sample sizes that match those from the 1993 NSCG. For example, the 1993 NSCG included 53,586 males with bachelor’s degrees (excluding those records who had their census education imputed). We therefore use Beta(2724.2, 50,862) as the prior distribution for the error rate for this x. We similarly increase the prior sample sizes for the reporting probabilities to match the 1993 NSCG sample sizes. Model seven departs from the 1993 linked data estimates and encodes a strong prior belief that almost no one misreports their education except for haphazard mistakes. Here, we set the prior mean for the probability of misreporting education to 0.005 for all demographic groups. We use a prior sample size of 100,000, making the prior distribution concentrate strongly around 0.005. For the reporting probabilities, we use a noninformative prior distribution for convenience, since the estimates of the reporting probabilities are strongly influenced by the concentrated prior distributions on the error rates. Finally, for comparison purposes, we also fit the model based on a conditional independence assumption (CIA), which is the default assumption in data fusion. To impute Yi for individuals in the ACS under the CIA, we sample θ* and then impute (Y*|θ*,X) from the true data model. Here, we do not use the reported value of Zi in the imputations. 4.3 Empirical Results We first examine what each model suggests about the extent and nature of the measurement errors in the 2010 ACS. We then use the models to assess sensitivity of results about the substantive questions related to number of degrees and income. We use survey-weighted estimates in the multiple imputation inferences. 4.3.1 Distributions of errors in reported ACS education values. Table 4 displays the multiple imputation point estimates and 95 percent confidence intervals for the proportions of errors by gender and NSCG education, obtained from the M = 50 draws of Ei for all individuals in DE. We begin by comparing results for the set of models with flat prior distributions (models 1–4) and the CIA model, then move to the set of models with informative prior distributions (models 5–7). Table 4. Error Rate Estimates from Different Model Specifications. Models 1–7 are run for 100,000 MCMC iterations. We save M = 50 completed datasets under each model. For each dataset, we compute the estimated overall error rate, estimated error rate by gender and imputed Y, and associated variances using ratio estimators that incorporate the ACS final survey weights. Estimate by group Estimate overall Y= BA Y= MA Y= Prof. Y= PhD CIA model  Male 0.37 0.76 0.91 0.94 0.57 (0.36, 0.37) (0.75, 0.76) (0.91, 0.92) (0.93, 0.95) (0.55, 0.58)  Female 0.35 0.72 0.95 0.97 (0.35, 0.36) (0.71, 0.72) (0.94, 0.95) (0.96, 0.97) Model 1  Male 0.05 0.10 0.18 0.27 0.17 (0.04, 0.06) (0.08, 0.11) (0.15, 0.21) (0.23, 0.31) (0.16, 0.19)  Female 0.05 0.09 0.18 0.28 (0.05, 0.06) (0.08, 0.10) (0.15, 0.21) (0.24, 0.32) Model 2  Male 0.05 0.18 0.27 0.36 0.20 (0.04, 0.06) (0.16, 0.21) (0.18, 0.37) (0.30, 0.42) (0.18, 0.21)  Female 0.05 0.12 0.26 0.41 (0.05, 0.06) (0.10, 0.14) (0.20, 0.33) (0.29, 0.53) Model 3  Male 0.05 0.09 0.17 0.25 0.17 (0.04, 0.06) (0.08, 0.11) (0.14, 0.20) (0.21, 0.30) (0.16, 0.19)  Female 0.05 0.09 0.17 0.26 (0.05, 0.06) (0.08, 0.10) (0.14, 0.20) (0.21, 0.31) Model 4  Male 0.05 0.19 0.36 0.36 0.22 (0.04, 0.06) (0.16, 0.23) (0.26, 0.46) (0.27, 0.45) (0.20, 0.24)  Female 0.09 0.14 0.52 0.55 (0.08, 0.10) (0.11, 0.17) (0.44, 0.59) (0.40, 0.70) Model 5  Male 0.07 0.19 0.23 0.34 0.22 (0.06, 0.08) (0.16, 0.22) (0.14, 0.32) (0.27, 0.41) (0.20, 0.24)  Female 0.09 0.12 0.50 0.31 (0.08, 0.10) (0.09, 0.15) (0.43, 0.57) (0.17, 0.46) Model 6  Male 0.05 0.09 0.10 0.10 0.16 (0.05, 0.05) (0.08, 0.10) (0.09, 0.11) (0.09, 0.11) (0.14, 0.17)  Female 0.05 0.06 0.16 0.07 (0.04, 0.05) (0.05, 0.07) (0.14, 0.18) (0.06, 0.09) Model 7  Male 0.01 0.01 0.00 0.01 0.11 (0.01, 0.01) (0.00, 0.01) (0.00, 0.01) (0.00, 0.01) (0.09, 0.13)  Female 0.01 0.01 0.01 0.01 (0.01, 0.01) (0.01, 0.01) (0.00, 0.01) (0.00, 0.01) Estimate by group Estimate overall Y= BA Y= MA Y= Prof. Y= PhD CIA model  Male 0.37 0.76 0.91 0.94 0.57 (0.36, 0.37) (0.75, 0.76) (0.91, 0.92) (0.93, 0.95) (0.55, 0.58)  Female 0.35 0.72 0.95 0.97 (0.35, 0.36) (0.71, 0.72) (0.94, 0.95) (0.96, 0.97) Model 1  Male 0.05 0.10 0.18 0.27 0.17 (0.04, 0.06) (0.08, 0.11) (0.15, 0.21) (0.23, 0.31) (0.16, 0.19)  Female 0.05 0.09 0.18 0.28 (0.05, 0.06) (0.08, 0.10) (0.15, 0.21) (0.24, 0.32) Model 2  Male 0.05 0.18 0.27 0.36 0.20 (0.04, 0.06) (0.16, 0.21) (0.18, 0.37) (0.30, 0.42) (0.18, 0.21)  Female 0.05 0.12 0.26 0.41 (0.05, 0.06) (0.10, 0.14) (0.20, 0.33) (0.29, 0.53) Model 3  Male 0.05 0.09 0.17 0.25 0.17 (0.04, 0.06) (0.08, 0.11) (0.14, 0.20) (0.21, 0.30) (0.16, 0.19)  Female 0.05 0.09 0.17 0.26 (0.05, 0.06) (0.08, 0.10) (0.14, 0.20) (0.21, 0.31) Model 4  Male 0.05 0.19 0.36 0.36 0.22 (0.04, 0.06) (0.16, 0.23) (0.26, 0.46) (0.27, 0.45) (0.20, 0.24)  Female 0.09 0.14 0.52 0.55 (0.08, 0.10) (0.11, 0.17) (0.44, 0.59) (0.40, 0.70) Model 5  Male 0.07 0.19 0.23 0.34 0.22 (0.06, 0.08) (0.16, 0.22) (0.14, 0.32) (0.27, 0.41) (0.20, 0.24)  Female 0.09 0.12 0.50 0.31 (0.08, 0.10) (0.09, 0.15) (0.43, 0.57) (0.17, 0.46) Model 6  Male 0.05 0.09 0.10 0.10 0.16 (0.05, 0.05) (0.08, 0.10) (0.09, 0.11) (0.09, 0.11) (0.14, 0.17)  Female 0.05 0.06 0.16 0.07 (0.04, 0.05) (0.05, 0.07) (0.14, 0.18) (0.06, 0.09) Model 7  Male 0.01 0.01 0.00 0.01 0.11 (0.01, 0.01) (0.00, 0.01) (0.00, 0.01) (0.00, 0.01) (0.09, 0.13)  Female 0.01 0.01 0.01 0.01 (0.01, 0.01) (0.01, 0.01) (0.00, 0.01) (0.00, 0.01) Table 4. Error Rate Estimates from Different Model Specifications. Models 1–7 are run for 100,000 MCMC iterations. We save M = 50 completed datasets under each model. For each dataset, we compute the estimated overall error rate, estimated error rate by gender and imputed Y, and associated variances using ratio estimators that incorporate the ACS final survey weights. Estimate by group Estimate overall Y= BA Y= MA Y= Prof. Y= PhD CIA model  Male 0.37 0.76 0.91 0.94 0.57 (0.36, 0.37) (0.75, 0.76) (0.91, 0.92) (0.93, 0.95) (0.55, 0.58)  Female 0.35 0.72 0.95 0.97 (0.35, 0.36) (0.71, 0.72) (0.94, 0.95) (0.96, 0.97) Model 1  Male 0.05 0.10 0.18 0.27 0.17 (0.04, 0.06) (0.08, 0.11) (0.15, 0.21) (0.23, 0.31) (0.16, 0.19)  Female 0.05 0.09 0.18 0.28 (0.05, 0.06) (0.08, 0.10) (0.15, 0.21) (0.24, 0.32) Model 2  Male 0.05 0.18 0.27 0.36 0.20 (0.04, 0.06) (0.16, 0.21) (0.18, 0.37) (0.30, 0.42) (0.18, 0.21)  Female 0.05 0.12 0.26 0.41 (0.05, 0.06) (0.10, 0.14) (0.20, 0.33) (0.29, 0.53) Model 3  Male 0.05 0.09 0.17 0.25 0.17 (0.04, 0.06) (0.08, 0.11) (0.14, 0.20) (0.21, 0.30) (0.16, 0.19)  Female 0.05 0.09 0.17 0.26 (0.05, 0.06) (0.08, 0.10) (0.14, 0.20) (0.21, 0.31) Model 4  Male 0.05 0.19 0.36 0.36 0.22 (0.04, 0.06) (0.16, 0.23) (0.26, 0.46) (0.27, 0.45) (0.20, 0.24)  Female 0.09 0.14 0.52 0.55 (0.08, 0.10) (0.11, 0.17) (0.44, 0.59) (0.40, 0.70) Model 5  Male 0.07 0.19 0.23 0.34 0.22 (0.06, 0.08) (0.16, 0.22) (0.14, 0.32) (0.27, 0.41) (0.20, 0.24)  Female 0.09 0.12 0.50 0.31 (0.08, 0.10) (0.09, 0.15) (0.43, 0.57) (0.17, 0.46) Model 6  Male 0.05 0.09 0.10 0.10 0.16 (0.05, 0.05) (0.08, 0.10) (0.09, 0.11) (0.09, 0.11) (0.14, 0.17)  Female 0.05 0.06 0.16 0.07 (0.04, 0.05) (0.05, 0.07) (0.14, 0.18) (0.06, 0.09) Model 7  Male 0.01 0.01 0.00 0.01 0.11 (0.01, 0.01) (0.00, 0.01) (0.00, 0.01) (0.00, 0.01) (0.09, 0.13)  Female 0.01 0.01 0.01 0.01 (0.01, 0.01) (0.01, 0.01) (0.00, 0.01) (0.00, 0.01) Estimate by group Estimate overall Y= BA Y= MA Y= Prof. Y= PhD CIA model  Male 0.37 0.76 0.91 0.94 0.57 (0.36, 0.37) (0.75, 0.76) (0.91, 0.92) (0.93, 0.95) (0.55, 0.58)  Female 0.35 0.72 0.95 0.97 (0.35, 0.36) (0.71, 0.72) (0.94, 0.95) (0.96, 0.97) Model 1  Male 0.05 0.10 0.18 0.27 0.17 (0.04, 0.06) (0.08, 0.11) (0.15, 0.21) (0.23, 0.31) (0.16, 0.19)  Female 0.05 0.09 0.18 0.28 (0.05, 0.06) (0.08, 0.10) (0.15, 0.21) (0.24, 0.32) Model 2  Male 0.05 0.18 0.27 0.36 0.20 (0.04, 0.06) (0.16, 0.21) (0.18, 0.37) (0.30, 0.42) (0.18, 0.21)  Female 0.05 0.12 0.26 0.41 (0.05, 0.06) (0.10, 0.14) (0.20, 0.33) (0.29, 0.53) Model 3  Male 0.05 0.09 0.17 0.25 0.17 (0.04, 0.06) (0.08, 0.11) (0.14, 0.20) (0.21, 0.30) (0.16, 0.19)  Female 0.05 0.09 0.17 0.26 (0.05, 0.06) (0.08, 0.10) (0.14, 0.20) (0.21, 0.31) Model 4  Male 0.05 0.19 0.36 0.36 0.22 (0.04, 0.06) (0.16, 0.23) (0.26, 0.46) (0.27, 0.45) (0.20, 0.24)  Female 0.09 0.14 0.52 0.55 (0.08, 0.10) (0.11, 0.17) (0.44, 0.59) (0.40, 0.70) Model 5  Male 0.07 0.19 0.23 0.34 0.22 (0.06, 0.08) (0.16, 0.22) (0.14, 0.32) (0.27, 0.41) (0.20, 0.24)  Female 0.09 0.12 0.50 0.31 (0.08, 0.10) (0.09, 0.15) (0.43, 0.57) (0.17, 0.46) Model 6  Male 0.05 0.09 0.10 0.10 0.16 (0.05, 0.05) (0.08, 0.10) (0.09, 0.11) (0.09, 0.11) (0.14, 0.17)  Female 0.05 0.06 0.16 0.07 (0.04, 0.05) (0.05, 0.07) (0.14, 0.18) (0.06, 0.09) Model 7  Male 0.01 0.01 0.00 0.01 0.11 (0.01, 0.01) (0.00, 0.01) (0.00, 0.01) (0.00, 0.01) (0.09, 0.13)  Female 0.01 0.01 0.01 0.01 (0.01, 0.01) (0.01, 0.01) (0.00, 0.01) (0.00, 0.01) The CIA model suggests extremely large error percentages, especially for the highest education levels. These rates seem unlikely to be reality, leading us to reject the CIA model. The overall error rates for models 1–4 are similar and more realistic than those from the CIA model. The differences in error estimates between model two and model one suggest that the probability of error depends on sex. Comparing results for model three and model one, however, we see little evidence of important race effects on the propensity to make errors. Model four generalizes model two by allowing the reporting probabilities to vary by sex. If these probabilities were similar across sex in reality, we would expect the two models to produce similar results. However, the estimated error rates are fairly different; for example, the estimated proportion of errors for female professionals from model four is about double that from model two. To determine where the models differ most, we examine the estimated reporting probabilities, displayed in table 5. Model four estimates some significant differences in reporting probabilities by gender. For example, males with bachelor’s degrees who make a reporting error are estimated to report a master’s degree with probability 0.96, whereas females with bachelor’s degrees who make a reporting error are estimated to report a master’s degree with probability 0.67 and a professional degree with probability 0.30. Other large differences exist for professional degree holders. Females with professional degrees who make a reporting error are most likely to report a bachelor’s degree, whereas men with professional degrees who make a reporting error are most likely to report a master’s degree or PhD. We note that some of the estimates for model four are based on small sample sizes, which explains the wide standard errors. Table 5. Estimated Mean and 95 Percent Confidence Interval of Reporting Probabilities under Model Two and Reporting Probabilities by Gender under Model Four Z= BA Z= MA Z= Prof. Z= PhD Y=BA  Model 2 0.95 0.04 0.01 (0.87, 1.00) (0.00,0.11) (0.00,0.03)  Model 4: Male 0.96 0.02 0.02 (0.90, 1.00) (0.00,0.07) (0.00,0.05)  Model 4: Female 0.67 0.30 0.03 (0.58,0.76) (0.22,0.38) (0.00,0.07) Y=MA  Model 2 0.02 0.51 0.47 (0.00,0.06) (0.43,0.59) (0.39,0.55)  Model 4: Male 0.04 0.57 0.39 (0.00,0.11) (0.48,0.66) (0.31,0.47)  Model 4: Female 0.11 0.39 0.50 (0.00,0.25) (0.26,0.52) (0.40,0.61) Y=Prof.  Model 2 0.05 0.69 0.26 (0.00,0.16) (0.54,0.83) (0.14,0.38)  Model 4: Male 0.02 0.69 0.29 (0.00,0.06) (0.44,0.94) (0.04,0.54)  Model 4: Female 0.91 0.06 0.04 (0.79, 1.00) (0.00,0.16) (0.00,0.10) Y= PhD  Model 2 0.01 0.39 0.60 (0.00,0.04) (0.15,0.63) (0.36,0.83)  Model 4: Male 0.01 0.21 0.78 (0.00,0.05) (0.02,0.39) (0.60,0.96)  Model 4: Female 0.10 0.77 0.13 (0.00,0.30) (0.50, 1.00) (0.00,0.34) Y=None  Model 2 0.95 0.03 0.01 0.00 (0.95,0.96) (0.03,0.04) (0.01,0.01) (0.00,0.00)  Model 4: Male 0.97 0.03 0.01 0.00 (0.96,0.97) (0.02,0.03) (0.00,0.01) (0.00,0.00)  Model 4: Female 0.96 0.04 0.00 0.00 (0.95,0.97) (0.03,0.05) (0.00,0.00) (0.00,0.00) Z= BA Z= MA Z= Prof. Z= PhD Y=BA  Model 2 0.95 0.04 0.01 (0.87, 1.00) (0.00,0.11) (0.00,0.03)  Model 4: Male 0.96 0.02 0.02 (0.90, 1.00) (0.00,0.07) (0.00,0.05)  Model 4: Female 0.67 0.30 0.03 (0.58,0.76) (0.22,0.38) (0.00,0.07) Y=MA  Model 2 0.02 0.51 0.47 (0.00,0.06) (0.43,0.59) (0.39,0.55)  Model 4: Male 0.04 0.57 0.39 (0.00,0.11) (0.48,0.66) (0.31,0.47)  Model 4: Female 0.11 0.39 0.50 (0.00,0.25) (0.26,0.52) (0.40,0.61) Y=Prof.  Model 2 0.05 0.69 0.26 (0.00,0.16) (0.54,0.83) (0.14,0.38)  Model 4: Male 0.02 0.69 0.29 (0.00,0.06) (0.44,0.94) (0.04,0.54)  Model 4: Female 0.91 0.06 0.04 (0.79, 1.00) (0.00,0.16) (0.00,0.10) Y= PhD  Model 2 0.01 0.39 0.60 (0.00,0.04) (0.15,0.63) (0.36,0.83)  Model 4: Male 0.01 0.21 0.78 (0.00,0.05) (0.02,0.39) (0.60,0.96)  Model 4: Female 0.10 0.77 0.13 (0.00,0.30) (0.50, 1.00) (0.00,0.34) Y=None  Model 2 0.95 0.03 0.01 0.00 (0.95,0.96) (0.03,0.04) (0.01,0.01) (0.00,0.00)  Model 4: Male 0.97 0.03 0.01 0.00 (0.96,0.97) (0.02,0.03) (0.00,0.01) (0.00,0.00)  Model 4: Female 0.96 0.04 0.00 0.00 (0.95,0.97) (0.03,0.05) (0.00,0.00) (0.00,0.00) Table 5. Estimated Mean and 95 Percent Confidence Interval of Reporting Probabilities under Model Two and Reporting Probabilities by Gender under Model Four Z= BA Z= MA Z= Prof. Z= PhD Y=BA  Model 2 0.95 0.04 0.01 (0.87, 1.00) (0.00,0.11) (0.00,0.03)  Model 4: Male 0.96 0.02 0.02 (0.90, 1.00) (0.00,0.07) (0.00,0.05)  Model 4: Female 0.67 0.30 0.03 (0.58,0.76) (0.22,0.38) (0.00,0.07) Y=MA  Model 2 0.02 0.51 0.47 (0.00,0.06) (0.43,0.59) (0.39,0.55)  Model 4: Male 0.04 0.57 0.39 (0.00,0.11) (0.48,0.66) (0.31,0.47)  Model 4: Female 0.11 0.39 0.50 (0.00,0.25) (0.26,0.52) (0.40,0.61) Y=Prof.  Model 2 0.05 0.69 0.26 (0.00,0.16) (0.54,0.83) (0.14,0.38)  Model 4: Male 0.02 0.69 0.29 (0.00,0.06) (0.44,0.94) (0.04,0.54)  Model 4: Female 0.91 0.06 0.04 (0.79, 1.00) (0.00,0.16) (0.00,0.10) Y= PhD  Model 2 0.01 0.39 0.60 (0.00,0.04) (0.15,0.63) (0.36,0.83)  Model 4: Male 0.01 0.21 0.78 (0.00,0.05) (0.02,0.39) (0.60,0.96)  Model 4: Female 0.10 0.77 0.13 (0.00,0.30) (0.50, 1.00) (0.00,0.34) Y=None  Model 2 0.95 0.03 0.01 0.00 (0.95,0.96) (0.03,0.04) (0.01,0.01) (0.00,0.00)  Model 4: Male 0.97 0.03 0.01 0.00 (0.96,0.97) (0.02,0.03) (0.00,0.01) (0.00,0.00)  Model 4: Female 0.96 0.04 0.00 0.00 (0.95,0.97) (0.03,0.05) (0.00,0.00) (0.00,0.00) Z= BA Z= MA Z= Prof. Z= PhD Y=BA  Model 2 0.95 0.04 0.01 (0.87, 1.00) (0.00,0.11) (0.00,0.03)  Model 4: Male 0.96 0.02 0.02 (0.90, 1.00) (0.00,0.07) (0.00,0.05)  Model 4: Female 0.67 0.30 0.03 (0.58,0.76) (0.22,0.38) (0.00,0.07) Y=MA  Model 2 0.02 0.51 0.47 (0.00,0.06) (0.43,0.59) (0.39,0.55)  Model 4: Male 0.04 0.57 0.39 (0.00,0.11) (0.48,0.66) (0.31,0.47)  Model 4: Female 0.11 0.39 0.50 (0.00,0.25) (0.26,0.52) (0.40,0.61) Y=Prof.  Model 2 0.05 0.69 0.26 (0.00,0.16) (0.54,0.83) (0.14,0.38)  Model 4: Male 0.02 0.69 0.29 (0.00,0.06) (0.44,0.94) (0.04,0.54)  Model 4: Female 0.91 0.06 0.04 (0.79, 1.00) (0.00,0.16) (0.00,0.10) Y= PhD  Model 2 0.01 0.39 0.60 (0.00,0.04) (0.15,0.63) (0.36,0.83)  Model 4: Male 0.01 0.21 0.78 (0.00,0.05) (0.02,0.39) (0.60,0.96)  Model 4: Female 0.10 0.77 0.13 (0.00,0.30) (0.50, 1.00) (0.00,0.34) Y=None  Model 2 0.95 0.03 0.01 0.00 (0.95,0.96) (0.03,0.04) (0.01,0.01) (0.00,0.00)  Model 4: Male 0.97 0.03 0.01 0.00 (0.96,0.97) (0.02,0.03) (0.00,0.01) (0.00,0.00)  Model 4: Female 0.96 0.04 0.00 0.00 (0.95,0.97) (0.03,0.05) (0.00,0.00) (0.00,0.00) Turning to models 5–7, we can see the impact of the informative prior distributions by comparing results in table 4 under these models to those for model four. Moving from model four to model five, the most noticeable differences are for women with a PhD and men with a master’s degree, for whom model five suggests lower error rates. These groups have smaller sample sizes, so that the data do not swamp the effects of the prior distribution. When making the prior sample sizes very large, as in models six and seven, the information in the prior distribution tends to overwhelm the information in the data. We provide more thorough investigation of the impact of the prior specifications in the supplementary material. Of course, we cannot be certain which model most closely reflects the true measurement error mechanism. The best we can do is perform diagnostic tests to see which models, if any, should be discounted as not adequately describing the observed data. For each ACS imputed dataset DE(m) under each model, we compute the sample proportions, π^xk(m), and corresponding multiple imputation 95 percent confidence intervals for all 165˙ unique values of (X,Y). We determine how many of the eighty estimated population percentages of Y|X computed from the 2010 NSCG (using the estimated T^x+ from the ACS to back into an estimate of T^x5) fall within the multiple imputation 95 percent confidence intervals. Models that yield low rates do not describe the data accurately. For model one, seventy-three of eighty NSCG population share estimates are contained in the ACS multiple imputation intervals. Corresponding counts are seventy-five for model two, seventy-one for model three, and seventy-six for model four. These results suggest that model one and model three may be inferior to model two and model four. For the models with informative prior distributions, the counts are seventy-four for model five, sixty-seven for model six, and fifty-four for model seven. Although the prior beliefs in models six and seven seem plausible at first glance, the diagnostic suggests that they do not describe the 2010 data distributions as well as models four and five. Considering the results and the diagnostic check, if we had to choose one model, we would select model five. It seems plausible that the probability of misreporting education, as well as the reported value itself when errors are made, depend on both sex and true education level. Additionally, the prior distribution from the 1993 linked data pulls estimates in groups with little sample size to measurement error distributions that seem more plausible on face value. However, one need not use the data fusion framework for measurement error to select a single model; rather, one can use the framework to examine sensitivity of analyses to the different specifications. 4.3.2 Sensitivity analyses. Figure 2 displays the multiply-imputed, survey-weighted inferences for the total number of women with science and engineering degrees, computed using the ACS-specific indicator variable. We show results for models 4–7 and the CIA model—based on the ACS data without any adjustment for misreporting education. The confidence intervals for model four and model five overlap substantially, suggesting not much practical difference in choosing among these models. However, both are noticeably different from the other models, especially for the PhD and professional degrees. As the prior distributions on the error rates get stronger, the estimated counts increase towards the estimate using the ACS-reported education. We note that using the ACS-reported education without adjustments results in substantially higher estimated totals at the professional and PhD levels than any of the models that account for measurement error. We also note that the CIA model yields considerably lower counts for all but bachelor’s degrees. Across the measurement error methods, differences in interval lengths are modest compared with differences in point estimates. The interval lengths for the measurement error methods are somewhat larger than those based on the reported ACS data, as one would expect due to incorporating the additional uncertainty from the measurement errors. Figure 2. View largeDownload slide The Estimated Total Number of Science and Engineering Degrees Awarded to Women under Each Model. We plot the mean and 95 percent confidence intervals. Note the difference in scale for each degree category. Figure 2. View largeDownload slide The Estimated Total Number of Science and Engineering Degrees Awarded to Women under Each Model. We plot the mean and 95 percent confidence intervals. Note the difference in scale for each degree category. Figure 3 displays inferences for the average income for different degrees. For most degrees, the point estimates for models 4–7 are reasonably close, with models four and five again giving similar results. The estimated average income for professionals differs noticeably across models, with model four and model five suggesting lower averages than the unadjusted ACS estimates, or than models six and seven. The interval lengths for model four and model five tend to exceed those for models six and seven, most noticeably for master’s and PhD degrees, since the former models use weaker prior distributions than the latter models. The interval lengths for all measurement error methods tend to exceed the unadjusted ACS intervals, as one would expect. We note that the CIA model estimates are clearly implausible. Figure 3. View largeDownload slide Multiple Imputation Point and 95 Percent Confidence Interval Estimates for the Average Income within Each Education Level. The ACS estimate is the survey-weighted estimate based on the reported education level in the 2010 ACS. Figure 3. View largeDownload slide Multiple Imputation Point and 95 Percent Confidence Interval Estimates for the Average Income within Each Education Level. The ACS estimate is the survey-weighted estimate based on the reported education level in the 2010 ACS. As discussed in section 3.4, in fusion settings without linked data, analysts typically can check the plausibility of different model specifications in limited ways. However, for this particular inference (average income for different degrees), we can use the estimated average earnings in the 2010 Current Population Survey as an independent check on these estimates. They are $83,720 for professionals, $80,600 for PhD degrees, $66,144 for master’s degrees, and $53,976 for bachelor’s degrees (http://www.collegequest.com/bls-research-education-pays-2010.aspx). These values are closely aligned with the models’ estimates (except for the CIA model) and line up more closely with the estimates from model five than any other model, especially for the professional degree category, where the estimates most differ. Figure 4 displays inferences for the average income for men and women. All models support the conclusion that men make more than women; apparently, misreporting in education does not account for that gap, at least for the models considered here. We note that model four suggests potentially larger income gaps between male and female PhD recipients than the other models. Figure 4. View largeDownload slide Multiple Imputation Point and 95 Percent Confidence Interval Estimates for the Average Income for Men and Women within Each Education Level. The ACS estimate is the survey-weighted estimate based on the reported education level in the 2010 ACS. Figure 4. View largeDownload slide Multiple Imputation Point and 95 Percent Confidence Interval Estimates for the Average Income for Men and Women within Each Education Level. The ACS estimate is the survey-weighted estimate based on the reported education level in the 2010 ACS. 5. CONCLUDING REMARKS The framework presented in this article offers analysts tools for using the information in a high-quality, separate data source to adjust for measurement errors in the database of interest. Key to the framework is to replace conditional independence assumptions typically used in data fusion with carefully considered measurement error models. This avoids sacrificing information and facilitates analysis of the sensitivity of conclusions to alternative measurement error specifications. Analysts can use diagnostic tests like those described in section 3.4 and section 4.3 to rule out some measurement error models and perform sensibility tests on others to identify reasonable candidates. For example, if one measurement error specification suggests that 90 percent of values for a particular variable are in error and such high error rates are unrealistic for that variable, analysts can rule out that measurement error model as a reasonable description of the data. Besides survey sampling contexts like the one considered here involving the ACS and NSCG, the framework offers potential approaches for dealing with possible measurement errors in organic data, (i.e., data generated from administrative or other systems that are not designed surveys, https://www.census.gov/newsroom/blogs/director/2011/05/designed-data-and-organic-data.html). This is increasingly important, as data stewards and analysts consider replacing or supplementing high-quality but expensive surveys with inexpensive and large-sample organic data such as transaction or administrative databases. Often, scant attention is paid to the potential impact of measurement errors on inferences from that data. The framework could be used with high-quality, validated surveys as the gold standard data, allowing for adjustments to the error-prone organic data. While the assumptions in a measurement error model can be more reasonable than conditional independence, one cannot escape the fundamental need in data fusion to make unverifiable assumptions about the relationship between Y and Z. To help analysts understand the nature of these assumptions and, hence, understand the quality of data released after fusion, agencies can include both the reported Zi values and the imputed Yi values in released files. Further, they can publish results of diagnostic checks (e.g., measures of the similarity of the distributions of Y given X from the imputed versions of DG and from DE) and inferences about the parameters of the measurement error models. In general, though, finding ways to allow analysts to assess the quality of a data set created from fusion methodology is a complex problem worthy of future research. Supplementary Materials Supplementary materials are available online at academic.oup.com/jssam. The authors wish to thank Seth Sanders for his input on informative prior specifications and Mauricio Sadinle for discussion that improved the strategy for accounting for the sample design. This research was supported by The National Science Foundation under award SES-11–31897. REFERENCES Abayomi K. , Gelman A. , Levy M. ( 2008 ), “Diagnostics for Multivariate Imputations,” Journal of the Royal Statistical Society: Series C (Applied Statistics) , 57 , 273 – 291 . Google Scholar CrossRef Search ADS Bauer D. J. , Hussong A. M. ( 2009 ), “Psychometric Approaches for Developing Commensurate Measures across Independent Studies: Traditional and New Models,” Psychological Methods , 14 , 101 – 123 . Google Scholar CrossRef Search ADS PubMed Black D. , Haviland A. , Sanders S. , Taylor L. ( 2006 ), “Why Do Minority Men Earn Less? A Study of Wage Differentials among the Highly Educated,” The Review of Economics and Statistics , 88 , 300 – 313 . Google Scholar CrossRef Search ADS Black D. , Sanders S. , Taylor L. ( 2003 ), “Measurement of Higher Education in the Census and Current Population Survey,” Journal of the American Statistical Association , 98 , 545 – 554 . Google Scholar CrossRef Search ADS Black D. A. , Haviland A. M. , Sanders S. G. , Taylor L. J. ( 2008 ), “Gender Wage Disparities among the Highly Educated,” Journal of Human Resources , 43 , 630 – 659 . Google Scholar CrossRef Search ADS PubMed Carrig M. , Manrique-Vallier D. , Ranby K. , Reiter J. P. , Hoyle R. ( 2015 ), “A Multiple Imputation-Based Method for the Retrospective Harmonization of Data Sets,” Multivariate Behavioral Research , 50 , 383 – 397 . Google Scholar CrossRef Search ADS PubMed Curran P. J. , Hussong A. M. ( 2009 ), “Integrative Data Analysis: The Simultaneous Analysis of Multiple Data Sets,” Psychological Methods , 14 , 81 – 100 . Google Scholar CrossRef Search ADS PubMed Curran P. J. , Hussong A. M. , Cai L. , Huang W. , Chassin L. , Sher K. J. , Zucker R. A. ( 2008 ), “Pooling Data from Multiple Prospective Studies: The Role of Item Response Theory in Integrative Analysis,” Developmental Psychology , 44 , 365 – 80 . Google Scholar CrossRef Search ADS PubMed D’Orazio M. , Di Zio M. , Scanu M. ( 2006 ), Statistical Matching: Theory and Practice , Hoboken, NJ : Wiley . Google Scholar CrossRef Search ADS Dunson D. B. , Xing C. ( 2009 ), “Nonparametric Bayes Modeling of Multivariate Categorical Data,” Journal of the American Statistical Association , 104 , 1042 – 1051 . Google Scholar CrossRef Search ADS Fesco R. S. , Frase M. J. , Kannankutty N. ( 2012 ), “Using the American Community Survey as the sampling frame for the National Survey of College Graduates,” Working Paper NCSES 12-201, National Science Foundation, National Center for Science and Engineering Statistics, Arlington, VA. Fosdick B. K. , DeYoreo M. , Reiter J. P. ( 2016 ), “Categorical Data Fusion Using Auxiliary Information,” Annals of Applied Statistics , 10 , 1907 – 1929 . Google Scholar CrossRef Search ADS Fuller W. ( 1987 ), Measurement Error Models , New York : John Wiley & Sons . Google Scholar CrossRef Search ADS Gilula Z. , McCulloch R. , Rossi P. ( 2006 ), “A Direct Approach to Data Fusion,” Journal of Marketing Research , 43 , 73 – 83 . Google Scholar CrossRef Search ADS He Y. , Landrum M. B. , Zaslavsky A. M. ( 2014 ), “Combining Information from Two Data Sources with Misreporting and Incompleteness to Assess Hospice-Use among Cancer Patients: A Multiple Imputation Appraoch,” Statistics in Medicine , 33 , 3710 – 3724 . Google Scholar CrossRef Search ADS PubMed Hirano K. , Imbens G. , Ridder G. , Rubin D. ( 2001 ), “Combining Panel Data Sets with Attrition and Refreshment Samples,” Econometrica , 69 , 1645 – 1659 . Google Scholar CrossRef Search ADS Kim H. J. , Cox L. H. , Karr A. F. , Reiter J. P. , Wang Q. ( 2015 ), “Simultaneous Edit-Imputation for Continuous Microdata,” Journal of the American Statistical Association , 110 , 987 – 999 . Google Scholar CrossRef Search ADS Kim J. K. , Yang S. ( 2017 ), “A Note on Multiple Imputation under Complex Sampling,” Biometrika , 104 , 221 – 228 . Lohr S. L. ( 2010 ), Sampling: Design and Analysis ( 2nd ed. ), Boston : Brooks/Cole . Manrique-Vallier D. , Reiter J. P. ( 2018 ), “Bayesian Simultaneous Edit and Imputation for Multivariate Categorical Data,” Journal of the American Statistical Association , 112 , 1708 – 1719 . Google Scholar CrossRef Search ADS Moriarity C. , Scheuren F. ( 2001 ), “Statistical Matching: A Paradigm for Assessing the Uncertainty in the Procedure,” Journal of Official Statistics , 17 , 407 – 422 . National Science Foundation ( 1993 ), “National Survey of College Graduates, 1993,” Ann Arbor, MI: Inter-University Consortium for Political and Social Research [distributor], available at http://doi.org/10.3886/ICPSR06880.v1, iCPSR06880-v1. Pepe M. S. ( 1992 ), “Inference Using Surrogate Outcome Data and a Validation Sample,” Biometrika , 79 , 355 – 365 . Google Scholar CrossRef Search ADS Raghunathan T. E. ( 2006 ), “Combining Information from Multiple Surveys for Assessing Health Disparities,” Allgemeines Statistisches Archiv , 90 , 515 – 526 . Google Scholar CrossRef Search ADS Rassler S. ( 2002 ), Statistical Matching , New York : Springer . Google Scholar CrossRef Search ADS Reiter J. ( 2008 ), “Multiple Imputation When Records Used for Imputation Are Not Used or Disseminated for Analysis,” Biometrika , 95 , 933 – 946 . Google Scholar CrossRef Search ADS Reiter J. P. ( 2012 ), “Bayesian Finite Population Imputation for Data Fusion,” Statistica Sinica , 22 , 795 – 811 . Google Scholar CrossRef Search ADS Rubin D. B. ( 1986 ), “Statistical Matching Using File Concatenation with Adjusted Weights and Multiple Imputations,” Journal of Business & Economic Statistics , 4 , 87 – 94 . Rubin D. B. ( 1987 ), Multiple Imputation for Nonresponse in Surveys , New York : John Wiley & Sons . Google Scholar CrossRef Search ADS Schenker N. , Raghunathan T. E. ( 2007 ), “Combining Information from Multiple Surveys to Enhance Estimation of Measures of Health,” Statistics in Medicine , 26 , 1802 – 1811 . Google Scholar CrossRef Search ADS PubMed Schenker N. , Raghunathan T. E. , Bondarenko I. ( 2010 ), “Improving on Analyses of Self-Reported Data in a Large-Scale Health Survey by Using Information from an Examination-Based Survey,” Statistics in Medicine , 29 , 533 – 545 . Google Scholar PubMed Schifeling T. A. , Cheng C. , Reiter J. P. , Hillygus D. S. ( 2015 ), “Accounting for Nonignorable Unit Nonresponse and Attrition in Panel Studies with Refreshment Samples,” Journal of Survey Statistics and Methodology , 3 , 265 – 295 . Google Scholar CrossRef Search ADS Shifeling T. ( 2016 ), “Combining Information from Multiple Sources in Bayesian Modeling,” PhD thesis, Duke University, available at https://dukespace.lib.duke.edu/dspace/bitstream/handle/10161/12840/Schifeling_duke_0066D_13606.pdf? sequence=1. Si Y. , Reiter J. ( 2013 ), “Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys,” Journal of Educational and Behavioral Statistics , 38 , 499 – 521 . Google Scholar CrossRef Search ADS Si Y. , Reiter J. P. , Hillygus D. S. ( 2015 ), “Semi-Parametric Selection Models for Potentially Non-Ignorable Attrition in Panel Studies with Refreshment Samples,” Political Analysis , 23 , 92 – 112 . Google Scholar CrossRef Search ADS Siddique J. , Reiter J. P. , Brincks A. , Gibbons R. D. , Crespi C. M. , Brown C. H. ( 2015 ), “Multiple Imputation for Harmonizing Longitudinal Non-Commensurate Measures in Individual Participant Data Meta-Analysis,” Statistics in Medicine , 34 , 3399 – 3414 . Google Scholar CrossRef Search ADS PubMed Tarmast G. ( 2001 ), “Multivariate Log-normal Distribution,” International Statistical Institute: Seoul 53rd Session. Yucel R. M. , Zaslavsky A. M. ( 2005 ), “Imputation of Binary Treatment Variables with Measurement Error in Administrative Data,” Journal of the American Statistical Association , 100 , 1123 – 1132 . Google Scholar CrossRef Search ADS © The Author(s) 2018. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. All rights reserved. For permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Journal

Journal of Survey Statistics and MethodologyOxford University Press

Published: May 25, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off